Tuesday, April 06, 2021

Prometheus its components, architecture and uses.

 As per the site Prometheus is open source monitoring solutions. That is capable of monitoring your application, system metrics along with functionality of alerting when some criteria is met.

So what is Prometheus?
It is the tool that can be used for monitoring application that run not only on bare app server or web server along with system O/S but also the application that is running on Containerization concept along with orchestra i.e. K8.

Why premetheus is needed?

In current infrastructure when we are running application in side container like Docker which run inside pod for K8 and then finally on the cluster environment inside the Node. As you can see not only the Application logs, but we also need to look into the different system level ,pod level, container level, node level logs. To be more complex we run this in cluster env where in we have replicas of pod and all are interconnected with each other with distributed environment. For CI CD it becomes a mandate to make sure that application run smoothly and for that we need continuous monitoring and alerting if something goes wrong. And Prometheus is used in this concept.

So inshort to check hardware or application errors, resource avalability and crash, overloaded, response latency etc we need premetheus.

So when we are working in distributed cluster environment where in we have

Apps ()–> Pods () –> Cluser ()–>Node Machines() in distributed system.

If anything goes wrong it is required to know where the error comes from and where to look into it… this is crucial for quick recovery time.

In this case it would be better if we have a system that will continuously monitor all our microservices and alert us if they go down or even before that as a preventive maintenance. i.e. regular checking memory usage, CPU usage, slag period, Storage issues (low in space as more log is generated) etc. And this is what Prometheus works.

Now lets talk about how the premetheus work i.e. its architecture.

It contains following part
1- Prometheus Server
This is divided into three parts

1- Storage :- Time series Database :- this is used to store the metrics data such as cpu usage or number of exception.
2- Data retrieval worker:- Responsible for pulling data from the application , services, server and other target resource and pushing them inside the time series database.
3- http server or API Server :- That exposed the data from storage to the outer world in form of the api. it also supports Prometheus Query language to query the data. i.e. PromQL query. It takes input as PromoQL from premetheus UI or graphana and fire it to Time series DB and return the result to them as a api response.

Resource:-
https://www.booleanworld.com/install-use-prometheus-monitoring/

What does the Prometheus is capable of monitoring?

Frankly speaking it can monitor all the things like

System :- Linux, Windows
stand alone Appache server, single application running on this server and O/S.
database etc.

These all above things that Prometheus monitor is called targets. As each target has the unit to monitor i.e. for linux os we have unit like CPU usage, Stroage Usage , Memory usage etc. For Target like application we have units like request count, request time , exception count.

Targets :- M/C or server or application that premetheus monitor
Units :- what it monitor in that targets.

These units which we monitor is called metrics and is stored in Time series database. When we open the UI /metric of premetheus it gives human readable text for this metrics. Metric also has some of the attributes like help and type i..e help says what this metrics contains and type we have 3 types i..e

1- counter :- example how many request hit for this matrics
2- gauge :- that changes with time i.e what is the current value of CPU usage now or current capacity of diskspace now.
3- histogram :- This is for tracking which deal with how long the request time is to execute or get response or how big the data that is requested by the request.

Now Lets check how the premetheus collect that metrics from its client or targets.

For this it is mandate that all the target should exposed the data in form of /metrics api and also in the formate that is understand by Premetheus.

Some service/target do this by default but for some we need to do it by ourself using extra component. This extra component is called Exporter

Exporter :- This fetch the data from the target and convert it into the format that is understand by Premetheus and also expose it in it own /metrics endpoints so that premetheus can consume it and can be able to store as a metric in time series DB.

There are large number of exporter available with the Premetheus. Please visit to the below url for the same.

https://prometheus.io/docs/guides/multi-target-exporter/#understanding-and-using-the-multi-target-exporter-pattern

i.e. if we want to monitor microservices metric we can use blackbox_exporter or for monitoring linux we can use node exporter etc and this will expose the metrics data on their respective /metric url or api and then we can configure Premetheus server to monitor this url and pull the metrics data or scrap this metrics data and add the data in side its time series DB.

Best is it we get all this exporter as a docker container so i.e. we have MYsql image running on the docker container inside K8 cluster we can deploy and side car container exporter for this in same pod where your mysql is running and then connect it and exposed the data to the premetheus server form your K8 cluster.

Now lets talk about the exporter that can monitor our application. Prometheus has many client library in different language i.e. c, c++, java, python we can use them to configure it with our application and exposed the scrap metrics end point to the premetheus. this will help to data such as how many request, exception server resource is used etc.

https://prometheus.io/docs/instrumenting/clientlibs/#client-libraries

One best aspect of Prometheus is that it work with pull mechanism and not the push. Generally in push mechanism application get bottleneck as it has an additional task of pushing the data and its resources are used in doing this job and it has its own cost and also we need to deploy this demon on each service but in case of pull application continuous do their job and Prometheus pull the data as per its configurations.
Additional pull allows the Prometheus to confirm that application from where it is pulling is up and running.

However there are some short lived job that are best suit for push mechanism such as batch job for the particular target and for it is better to use push mechanism and this can be done using pushgateway component. This gateway allows the targets/services to push their metrics directly to the Prometheus instead of asking Prometheus to pull it.

How to configure premetheus.

First we need to inform Prometheus which target he need to go for pull request
Second at what interval it has to go for pull request

All above is configured in premetheus.yml files and to find target premetheus uses the discover target mechaniusm to find that target and star the pull request.

Default premetheus.yml has some default value as shown below

As shown above

1- global :- we set the interval when to fire the pull request
2- rule_files:- rule files is basically used to aggregate the result or fire a alert request to alert manager if something triggers i.e. cpu usage or storage is above 80%. Evaluate_interval will tell how often the Prometheus will execute this rule to generate alert if required.
3- scrap_config:- this gives the list of the target which Prometheus monitors with url and port and name. Here you can see Prometheus its self this is because it can exposed itself to be monitor on /metric url for health check. We can add our additional target here i.e mysql or node exporter etc.

As we said Prometheus send alert. Now the question is who is doing this. For this Prometheus has an alert manager. It pushes the alert to alert manager when some criteria is met (as per set in rule_files) and it is the responsibility of the alert manager to send this alert to the end user either by mail, sms pager, slack etc.

Additional Prometheus store the data i.e. metrics data in side the time series HDD or SDD in local machine where Prometheus is installed or it can also store it in remote DB or in cloud. As this is time series data so you cannot do RDBMS query on it but it provide PromQL to query on it using http server.

PromQL is used to fire the Query language from Prometheus UI to http server or server api to query the time series data and fetch the result.
Agree PromQL learning is another thread but you can get ride of it if you use Graphana as data visualization tool above Prometheus. Graphana shows you beautiful dashboard and under hood it will execute PromQL to fetch the result. We will talk of Graphana in another lecture.

So inshort premetheus is good monitoring and alerting tool which is reliable and indepdend in work nature i.e. it does not depend on any other third part tool, it works even if the othe services break down, easily setup. Premetheus is fully compatible with docker and k8. Premetheus component are availbale as docker images and easily deploy on K8. Once we deploy premetheus on K8 it start to monitor each K8 nodes with out any other configurations.

It also has limitation it is difficult to scale i.e. if we have large number of node in the environment it becomes difficult to monitor all the nodes either we need to increase the Prometheus server capacity or we need to lower the matrices’ data collections.

No comments: