Thursday, May 13, 2021

FluentBit log collector and processesor

 As per the site FluentBit is logs and metrics processor tool as you know all applications. This means it has capability to collect logs, process or Transform them as per the need of the ouput system in a most resource and memory efficient ways. So in short Fluent Bit is an open source Log Processor and Forwarder which allows you to collect any data like metrics and logs from different sources, enrich them with filters and send them to multiple destinations. It’s the preferred choice for containerized environments like Kubernetes.Fluent Bit is designed with performance in mind: high throughput with low CPU and Memory usage. It’s written in C language and has a pluggable architecture supporting more than 70 extensions for inputs, filters and outputs. Logs can come from many parts i.e. application log, system logs, database log, Mobile logs, Metrics logs and access logs etc.

due to this characterstic of FluentBit is the most preferred log collector, processor and forwarding tool for K8 environments. Due to its architecture it is easily installed on standalone physical Machine, Virtual machine, Clude, K8 etc environments.

It also support matrics collection i.e. from linux data like cpu usage, memory usage, storage usage etc.

Its three main best quality are

1- Lightweight
It has been designed as a lightweight solution with high performance in mind. From a design perspective, it’s fully asynchronous (event-driven) and take the most of the operating systems API for performance and reliability.

2:- Extensible
All inputs, filters and outputs features are implemented through the plugins interface. Extend the features with C, Lua (filters) or Golang (outputs).

3:- Metrics
Meassuring is important. Fluent Bit comes with native plugins to gather metrics from your CPU, Memory, Disk I/O and Network usage on Linux systems. In addition, it can receive metrics from external services like Statsd and Collectd.

Features:-

1:- Event Driven
Fluent Bit as a service is fully event-driven, it only uses asynchronous operations to collect and deliver data. i.e. it can act of event generation and perform operation that also in Async manner so no need to Waite for operation to complete. It will revert with callback if needed to confirm the operation is done.

Flexible Routing
The data that comes in the pipeline, can be routed to multiple places using custom routing rules. Ship your data to multiple places with zero-copy strategy. i.e. it has capability to route the data/log from different system to route to different outputsytem using Tags and matching. i.e. you can tag the logs and in filter ask them to route to particular output system i.e. logs of nodejs to go to Elasticsearch and logs from spring boot to go to Splunk. in Filter step

Configuration
It configuration is very simple and human-readable, it allows specifying how it will behave, which features to enable and how routing is performed. Optionally Fluent Bit can run from the command line without it. i.e. in K8 we can inform pod yml file that we want to use fluentbit.io/parser as apache. This will parse the data as per the APACHE need. Parse means convert the unstructured data to structured data.

I/O Handler
The Input/Output layer provides an abstraction that allow to perform read/write operations in an asynchronous way.

Upstream Manager
When dealing with upstream backends like remote network services, handling TCP connections can be challenging. The Upstream manager simplify the connection process and take care of timeout/network exceptions and Keepalive states.

Security & TLS
When delivering data to destinations, output connectors inherit full TLS capabilities in an abstracted way. Add your certificates as required.

So now we understood following things

1- FluentBit is a log collector and processor tool that collect log from source and convert into the format that is required by the output/destination system.
2- For doing this its consume less resource i.e. less cpu, memory , storage i.e. more resource efficient and performance base.
3- It is most suitable for K8 environments where in our application run in container with many replica set and in different nodes and cluster.So it is mandated to understand where this log is coming from ie.. application –>pods –> Which replica –> Nodes etc. This is done easily in fluentbit with configuration in yml files.

Now lets look into the architecture of the FluentBit. How technically it works. The whole process is distributed into following below aspect

https://docs.fluentbit.io/manual/concepts/data-pipeline

Input –> Parser –> Filter –> Buffer –>Routing — > Output system i..e ElasticSearch, splunk etc.

Lets discuss about all the section in details

1- Input:- These are the input plugin that gather information from different source. There are many input plugin available for fluentbit. i.e if you have to read the data from log file you will need logfile reader plugin and same is the case if you want to read the data from TCP then you will need plugin from TCP etc.fluent bit also has input plugins for metrics data collection for example it supports statsd
2- Parser:- After the log are collected and read then they are taken to next Parser step. This is the place where the log is processed i.e. depending on the type of log they are parsed differently using different filter. In short parser are used to convert unstructured data into structured data/message. The plugin needs a parser file which defines how to parse each field. So in this step parser plugin will read the message and as per the parser file it will parse that message into structured format.

refer to this for more documentation
https://docs.fluentbit.io/manual/pipeline/filters/parser

3- Filter :- There are tons of filter available. They are baiscally used for following purpose

a:- Modify record :-They are used to modify the record i.e change the format of specific field.
b:- Enrich the record:- filters can be used to change the log record or even add some additional metadata to it like pod id or namespace where the log is coming from and so on.
c:- Drop the record :- you can also use filters to drop or ignore some records
d:- Custom lua script :- to make the filtering even more flexible in fluent bits you can use custom lua scripts as filters to modify and process the records

refer to this for more documentation


https://docs.fluentbit.io/manual/concepts/data-pipeline/filter


https://docs.fluentbit.io/manual/pipeline/filters

4- Buffer :- This is the best functionality provided by the FluentBit. It provides SQL Queries on logs or metrics. One unique advanced feature that fluent bit has is sql stream processing this allows users to write sql queries on the logs or metrics to do aggregations, calculations and even time series predictions this is useful if you need to calculate an average max or min before sending the data to the storage or count the number of times a message appears or aggregate data. The best part about the sql stream processing is that no database is required, and no indices are required, everything runs on the same lightweight high performance process in memory itself.

5- Routing :- This will route the data as per the configuration. Routing is a core feature that allows to route your data through Filters and finally to one or multiple destinations. The router relies on the concept of Tags and Matching rules

i.e.

[INPUT]
Name cpu
Tag my_cpu

[INPUT]
Name mem
Tag my_mem

[OUTPUT]
Name es
Match my_cpu

[OUTPUT]
Name stdout
Match my_mem

Routing works automatically reading the Input Tags and the Output Match rules. If some data has a Tag that doesn’t match upon routing time, the data is deleted. This means any tag which is matched with Match my_cpu will be forwarded to system named as es i.e. Name es. We can also used wild card in it.

6- Output:- The output interface allows us to define destinations for the data. Common destinations are remote services, local file system or standard interface with others. Outputs are implemented as plugins and there are many available. i.e. we can use the splunk output plugin to insert send the data to splunk required in his format and display on its ui.

https://docs.fluentbit.io/manual/pipeline/outputs

Inshort……….

Input:- The way to gather data from your sources
Parser:- Convert Unstructured to Structured messages
Filter:- Modify, Enrich or Drop your records
Buffer:- Data processing with reliability
Router:- Create flexible routing rules
Output:- Destinations for your data: databases, cloud services and more!

Input plugin ————> FluentBit ———–>Outputplugin
convert the input log and give it to fluentbit and then it after doing above operation give it to output plugin and finally delivered to external system.

One of the biggest advantage of fluentBit is its use in K8. It set as a daemons set in th K8 i.e. it start as soon as node is started and start reading or collecting logs from the containers from that nodes and also gather data from k8 api server i.e.pod id , container id, namespace etc. and this can be achieved just by adding one line in the deployment yaml files i.e. fluentbit.io/parser as apache.

Due to its pluggable architecture it can easily integarte instead of replace the base system i.e. it will use its plug to take the data and give the output as requried by the system.

Visit to below site for more information on Input, Filter and Output plugin of FluentBit.

https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/configuration-file

Additional FluentBit also support secure transfer of the log data to external system as an output plugin i.e. when you transfer the log using output plugin to destination specific format i.e. splunk via TLS or https. It can als be scalable easily as it is a deamon set that run on each and every node as it started.

Before closing lets try to do some comarison between FluentBit and FluentD. Both belongs to same family. But FluentBit is smaller is size i.e. 560 KB and FluentD is 40MB. Also there is no dependencies but FluentD is buidl on RudyGem and hence need some Gem to run. Performance for FluentBit is much better than FluentD with less resource.

In Next article we will try to have running example of EFK stack i.e. Elasticsearch fluentbit and kibana using springboot Microservice.

No comments: