Posts

Prometheus: Solve montoring in the cloud

Hundreds of companies are now using the open-source Prometheus monitoring solution in production, across telecommunications and cloud providers across video streaming and databases.

In the run-up to CloudNativeCon + KubeCon Europe 2017, which takes place in Berlin from March 29 to 30, we spoke with Brian Brazil, the founder of Robust Perception, and one of the core developers of the Prometheus project, who at CloudNativeCon a keynote on Prometheus Will give. Be sure to catch the full Prometheus track at the conference.

With a traditional setup, there was a relatively small number of services, each with their own machine. Monitoring was based on machine metrics like CPU usage and free memory, which are the best way to alert to user problems. In a cloud Native world, where many different services not only divide machines, but the way in which they share them is in constant flux, such an approach is not scalable.

In the same way that the move has been made from the manual management of machines and services to tools such as Chef and now Kubernetes, we must make a similar transition in the surveillance area.

Prometheus client libraries allow you to orchestrate your applications for the metrics and KPIs that are important in your system. For third-party applications like Cassandra, HAProxy, or MySQL, there are a variety of exporters to reveal their useful metrics.

The data collected by Prometheus are enriched by labels. Labels are arbitrary key-value pairs that can be used to distinguish the development cluster from the production environment, or which HTTP endpoints the metric is broken.

The PromQL query language allows for aggregation based on these labels, the calculation of 95th percentile latencies per container, service or data center, prognosis, and any other mathematics that you want to do. What is more: If you are doing it graphically, you can point it out. This gives you the power to have warnings about what is really important to you and your users, and helps eliminate these late night alerts for non-problems.