Avoiding Telemetry Knowledge Loss With Fluent Bit - DZone - Uplaza

Are you able to get began with cloud-native observability with telemetry pipelines?

This text is a part of a sequence exploring a workshop guiding you thru the open supply mission Fluent Bit, what it’s, a primary set up, and establishing the primary telemetry pipeline mission. Discover ways to handle your cloud-native information from supply to vacation spot utilizing the telemetry pipeline phases protecting assortment, aggregation, transformation, and forwarding from any supply to any vacation spot.

Within the earlier article on this sequence, we explored what backpressure was, the way it manifests in telemetry pipelines, and took the primary steps to mitigate this with Fluent Bit. On this article, we take a look at easy methods to allow Fluent Bit options that can assist with avoiding telemetry information loss as we noticed within the earlier article.

You could find extra particulars within the accompanying workshop lab.

Earlier than we get began it is necessary to evaluate the phases of a telemetry pipeline. Within the diagram beneath we see them laid out once more. Every incoming occasion goes from enter to parser to filter to buffer to routing earlier than they’re despatched to its remaining output vacation spot(s).

For readability on this article, we’ll break up up the configuration into recordsdata which might be imported right into a primary fluent bit configuration file we’ll identify workshop-fb.conf.

Tackling Knowledge Loss

Beforehand, we explored how enter plugins can hit their ingestion limits when our telemetry pipelines scale past reminiscence limits when utilizing default in-memory buffering of our occasions. We additionally noticed that we are able to restrict the scale of our enter plugin buffers to forestall our pipeline from failing on out-of-memory errors, however that the pausing of the ingestion may also result in information loss if the clearing of the enter buffers takes too lengthy.

To rectify this drawback, we’ll discover one other buffering answer that Fluent Bit provides, guaranteeing information and reminiscence security at scale by configuring filesystem buffering.

To that finish, let’s discover how the Fluent Bit engine processes information that enter plugins emit. When an enter plugin emits occasions, the engine teams them right into a Chunk. The chunk dimension is round 2MB. The default is for the engine to put this Chunk solely in reminiscence.

We noticed that limiting in-memory buffer dimension didn’t clear up the issue, so we’re modifying this default habits of solely putting chunks into reminiscence. That is carried out by altering the property storage.kind from the default Reminiscence to Filesystem.

It is necessary to grasp that reminiscence and filesystem buffering mechanisms usually are not mutually unique. By enabling filesystem buffering for our enter plugin we mechanically get efficiency and information security

Filesystem Buffering Suggestions

When altering our buffering from reminiscence to filesystem with the property storage.kind filesystem, the settings for mem_buf_limit are ignored.

As an alternative, we have to use the property storage.max_chunks_up for controlling the scale of our reminiscence buffer. Shockingly, when utilizing the default settings the property storage.pause_on_chunks_overlimit is about to off, inflicting the enter plugins to not pause. As an alternative, enter plugins will change to buffering solely within the filesystem. We will management the quantity of disk house used with storage.total_limit_size.

If the property storage.pause_on_chunks_overlimit is about to on, then the buffering mechanism to the filesystem behaves similar to our mem_buf_limit situation demonstrated beforehand.

Configuring Careworn Telemetry Pipeline

On this instance, we’re going to use the identical confused Fluent Bit pipeline to simulate a necessity for enabling filesystem buffering. All examples are going to be proven utilizing containers (Podman) and it is assumed you might be conversant in container tooling resembling Podman or Docker.

We start the configuration of our telemetry pipeline within the INPUT part with a easy dummy plugin producing a lot of entries to flood our pipeline with as follows in our configuration file inputs.conf (notice that the mem_buf_limit repair is commented out):

# This entry generates a considerable amount of success messages for the workshop.
[INPUT]
  Identify   dummy
  Tag    huge.information
  Copies 15000
  Dummy  {"message":"true 200 success", "big_data": "blah blah blah blah blah blah blah blah blah"}
  #Mem_Buf_Limit 2MB

Now make sure the output configuration file outputs.conf has the next configuration:

# This entry directs all tags (it matches any we encounter)
# to print to straightforward output, which is our console.
[OUTPUT]
  Identify  stdout
  Match *

With our inputs and outputs configured, we are able to now deliver them collectively in a single primary configuration file. Utilizing a file referred to as workshop-fb.conf in our favourite editor, guarantee the next configuration is created. For now, simply import two recordsdata:

# Fluent Bit primary configuration file.
#
# Imports part.
@INCLUDE inputs.conf
@INCLUDE outputs.conf

Let’s now attempt testing our configuration by working it utilizing a container picture. The very first thing that’s wanted is to make sure a file referred to as Buildfile is created. That is going for use to construct a brand new container picture and insert our configuration recordsdata. Be aware this file must be in the identical listing as our configuration recordsdata, in any other case alter the file path names:

FROM cr.fluentbit.io/fluent/fluent-bit:3.0.4

COPY ./workshop-fb.conf /fluent-bit/and so on/fluent-bit.conf
COPY ./inputs.conf /fluent-bit/and so on/inputs.conf
COPY ./outputs.conf /fluent-bit/and so on/outputs.conf

Now we’ll construct a brand new container picture, naming it with a model tag as follows utilizing the Buildfile and assuming you might be in the identical listing:

$ podman construct -t workshop-fb:v8 -f Buildfile

STEP 1/4: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.4
STEP 2/4: COPY ./workshop-fb.conf /fluent-bit/and so on/fluent-bit.conf
--> a379e7611210
STEP 3/4: COPY ./inputs.conf /fluent-bit/and so on/inputs.conf
--> f39b10d3d6d0
STEP 4/4: COPY ./outputs.conf /fluent-bit/and so on/outputs.conf
COMMIT workshop-fb:v6
--> e74b2f228729
Efficiently tagged localhost/workshop-fb:v8
e74b2f22872958a79c0e056efce66a811c93f43da641a2efaa30cacceb94a195

If we run our pipeline in a container configured with constricted reminiscence, in our case, we have to give it round a 6.5MB restrict, then we’ll see the pipeline run for a bit after which fail attributable to overloading (OOM):

$ podman run --memory 6.5MB --name fbv8 workshop-fb:v8

The console output reveals that the pipeline ran for a bit; in our case, beneath to occasion quantity 862 earlier than it hit the OOM limits of our container setting (6.5MB):

...
[860] huge.information: [[1716551898.202389716, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah blah"}]
[861] huge.information: [[1716551898.202389925, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah blah"}]
[862] huge.information: [[1716551898.202390133, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah blah"}]
[863] huge.information: [[1

We can validate that the stressed telemetry pipeline actually failed on an OOM status by viewing our container, and inspecting it for an OOM failure to validate our backpressure worked:

# Use the container name to inspect for reason it failed
$ podman inspect fbv8 | grep OOM

 "OOMKilled": true,

Already having tried in a previous lab to manage this with mem_buf_limit settings, we’ve seen that this also is not the real fix. To prevent data loss we need to enable filesystem buffering so that overloading the memory buffer means that events will be buffered in the filesystem until there is memory free to process them.

Using Filesystem Buffering

The configuration of our telemetry pipeline in the INPUT phase needs a slight adjustment by adding storage.type to as shown, set to filesystem to enable it. Note that mem_buf_limit has been removed:

# This entry generates a large amount of success messages for the workshop.
[INPUT]
  Identify   dummy
  Tag    huge.information
  Copies 15000
  Dummy  {"message":"true 200 success", "big_data": "blah blah blah blah blah blah blah blah  blah"}
  storage.kind filesystem

We will now deliver all of it collectively in the principle configuration file. Utilizing a file referred to as the next workshop-fb.conf in our favourite editor, replace the file to incorporate SERVICE configuration is added with settings for managing the filesystem buffering:

# Fluent Bit primary configuration file.
[SERVICE] 
  flush 1 
  log_Level data 
  storage.path /tmp/fluentbit-storage 
  storage.sync regular 
  storage.checksum off 
  storage.max_chunks_up 5 

# Imports part
@INCLUDE inputs.conf
@INCLUDE outputs.conf

A number of phrases on the SERVICE part properties may be wanted to clarify their operate:

storage.path – Placing filesystem buffering within the tmp filesystem
storage.sync– Utilizing regular and turning off checksum processing
storage.max_chunks_up – Set to ~10MB, quantity of allowed reminiscence for occasions

Now it is time for testing our configuration by working it utilizing a container picture. The very first thing that’s wanted is to make sure a file referred to as Buildfile is created. That is going for use to construct a brand new container picture and insert our configuration recordsdata. Be aware this file must be in the identical listing as our configuration recordsdata, in any other case alter the file path names:

FROM cr.fluentbit.io/fluent/fluent-bit:3.0.4

COPY ./workshop-fb.conf /fluent-bit/and so on/fluent-bit.conf
COPY ./inputs.conf /fluent-bit/and so on/inputs.conf
COPY ./outputs.conf /fluent-bit/and so on/outputs.conf

Now we’ll construct a brand new container picture, naming it with a model tag, as follows utilizing the Buildfile and assuming you might be in the identical listing:

$ podman construct -t workshop-fb:v9 -f Buildfile

STEP 1/4: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.4
STEP 2/4: COPY ./workshop-fb.conf /fluent-bit/and so on/fluent-bit.conf
--> a379e7611210
STEP 3/4: COPY ./inputs.conf /fluent-bit/and so on/inputs.conf
--> f39b10d3d6d0
STEP 4/4: COPY ./outputs.conf /fluent-bit/and so on/outputs.conf
COMMIT workshop-fb:v6
--> e74b2f228729
Efficiently tagged localhost/workshop-fb:v9
e74b2f22872958a79c0e056efce66a811c93f43da641a2efaa30cacceb94a195

If we run our pipeline in a container configured with constricted reminiscence (barely bigger worth attributable to reminiscence wanted for mounting the filesystem) – in our case, we have to give it round a 9MB restrict – then we’ll see the pipeline working with out failure:

$ podman run -v ./:/tmp --memory 9MB --name fbv9 workshop-fb:v9

The console output reveals that the pipeline runs till we cease it with CTRL-C, with occasions rolling by as proven beneath.

...
[14991] huge.information: [[1716559655.213181639, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah"}]
[14992] huge.information: [[1716559655.213182181, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah"}]
[14993] huge.information: [[1716559655.213182681, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah"}]
...

We will now validate the filesystem buffering by wanting on the filesystem storage. Test the filesystem from the listing the place you began your container. Whereas the pipeline is working with reminiscence restrictions, will probably be utilizing the filesystem to retailer occasions till the reminiscence is free to course of them. Should you view the contents of the file earlier than stopping your pipeline, you may see a messy message format saved inside (cleaned up for you right here):

$ ls -l ./fluentbit-storage/dummy.0/1-1716558042.211576161.flb

-rw-------  1 username  groupname   1.4M Might 24 15:40 1-1716558042.211576161.flb

$ cat fluentbit-storage/dummy.0/1-1716558042.211576161.flb

??wbig.information???fP??
?????message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
?p???message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
߲???message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
?F???message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
?d???message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
...

Final Ideas on Filesystem Buffering

This answer is the best way to take care of backpressure and different points that may flood your telemetry pipeline and trigger it to crash. It is price noting that utilizing a filesystem to buffer the occasions additionally introduces the boundaries of the filesystem getting used.

It is necessary to grasp that simply as reminiscence can run out, so can also the filesystem storage attain its limits. It is best to have a plan to deal with any potential filesystem challenges when utilizing this answer, however that is outdoors the scope of this text.

This completes our use circumstances for this text. You’ll want to discover this hands-on expertise with the accompanying workshop lab.

What’s Subsequent?

This text walked us by means of how Fluent Bit filesystem buffering offers a data- and memory-safe answer to the issues of backpressure and information loss.

Keep tuned for extra hands-on materials that can assist you together with your cloud-native observability journey.

Avoiding Telemetry Knowledge Loss With Fluent Bit – DZone – Uplaza