Business Challenge
-
The client needed Real-Time solution to monitor the load from overview to depth on Kubernetes cluster.
-
The client also wanted a real-time alerting platform in which user can be alerted in real-time as soon as data is ingested into our platform
-
The client wanted a centralized dashboard where he could define the rules by metrics we receiving into our platform, and our alerting platform could use rules dynamically, and it should also be integrated with Slack, Email, mobile devices and our web dashboard.
-
Logs aggregation feature was also an important requirement for us as we wanted to see all co-related logs at particular timestamp at single place.
-
Anomaly detection engine was required so that we could see the real-time fluctuations in our monitoring data and detect anomalies in our cluster health and performance.
-
Predictive Analysis Engine, so that we could predict when our cluster usage can go high or down, and we can scale up or down before our cluster nodes crashed.
Solution Offered
To overcome these challenges we built a platform in which we defined our data collection layer which collects metrics stats of clusters, pipelines, applications running on it using REST API, agent-based collection using SNMP protocols, etc.