Log aggregation and the journey to optimized logs

CTO at Rookout

July 10, 2020

7 min read

Ever experienced bad logging- whether it’s the wrong log, the wrong information, or a multitude of other logging woes? We aren’t able to count the number of times anymore that we’ve happily gone and set log lines, only to find out that it was all for naught. The frustrations are endless. What is meant to be magic for your code, the ultimate savior when debugging, has become the ultimate frustration. The pitfalls of logging are many, but there is a way to optimize them to your benefit. Let us take a look at what makes logging so challenging and how you can ultimately optimize your logs to your maximum benefit.

The logging rabbit hole

It seems that log aggregation systems aren’t the end all and be all that we make them out to be. Yes, they’re awesome, but as developers we often find ourselves dealing with a phenomenon known as “Logging FOMO”. This is the fear of missing out on logs that could, perhaps maybe, might be needed at some point in the future, driving devs to unnecessarily add more and more logs. Many of these log lines are unnecessary, causing high overhead fees and wasting a lot of time that developers could be using for other projects. So how do we make sure we’re not logging needlessly?

The 3 stages of logging

Developers go through three stages when logging. They begin at the ‘no logging’ stage. Here, as you may have accurately guessed, they write no log lines. The common mentality is that “logging is for losers” and unnecessary because they know their code best. Unfortunately (or not, depending on how much you enjoy watching people learn life lessons), these devs soon find out just how wrong they were. This is when, at some future point, they wake up in the middle of the night to find that something broke, and worse they have no log lines to give them the data to fix it.

After that painful experience, the dev reaches the ‘bad logging’ stage of their career. While the quality of their logs is nothing to write home about, it’s definitely better than writing no logs at all.

Once the dev has learned to write logs, albeit not great ones, comes the final ‘good logging’ stage. Here, they have finally learned not only the importance that quality logs give in terms of productivity and efficiency, but more importantly as a source of data when looking to understand their code.

Maintaining lessons learned

Staying in the effective logging stage is no easy feat, so here are a few tricks that can help. First, remember to write the logs not for yourself to read, but for the person who has never encountered your code before. This lets them understand it, even out of context. Next, keep the surprises in your code to a bare minimum. Log the unexpected so you don’t have to be THAT person, pretending you like surprises when everyone knows you almost ran out of the room at your last surprise party.

After surprises, try to contain as much relevant data as possible. There’s no such thing as too many relevant details. Every piece of data is critical when it comes to problem solving. The last trick is to remember that it’s okay to add and remove logs. Logging is dynamic, and if you want to minimize the noise and increase visibility, it’s often necessary to do just this. In order to facilitate adjusting what and when to log, application logging frameworks have been developed such as log4j for example. These logging frameworks let you define log severity levels for your logs so that you can turn down logging severity levels during normal execution of your application and turn them up if you need more information or to debug an issue.

Although keeping as much relevant data as possible is extra helpful when debugging challenging issues, the sheer volume of logging data collected can grow exponentially. This leads to increased management overhead as well as costs associated with managing that data. We’ll address this later on in the blog.

Logging money pits

Despite the importance and value of good logging, the actual cost can be prohibitive at times. There are a variety of ways in which you can aggregate your logs to make them easier to manage. If using a SaaS offering, such as Logz.io, Splunk, or Datadog, you can often find yourself shelling out anywhere from hundreds of dollars to hundreds of thousands – and that’s just per month.

If you’re using an on-prem version of the above solutions, whether in your data center or the cloud, you might find that while your licensing is lower, ultimately your Total Cost of Ownership might be higher.

Logs require a lot of storage, as they bring in a large amount of data about your applications. Not only that, but we need to use fast storage formats like SSDs to run fast queries, and look for additional storage capacity for the indices. Then there is the ingestion for building those indices since developers will be wanting the available data as fast as possible. With all of these, you’ll find yourself requiring a lot of computing power. You will also find that processing these queries and storing some of these indices requires a significant amount of memory and CPU resources. Tally all of these up and it comes together to create a black hole of logs of which valuable resources are thrown into.

Manage your logs, manage the money

You now find yourself at a fork in the road. How do you get a grasp on the sheer number of logs and the money they’re draining without getting rid of important data? One way could be by reducing logging volume. In essence, this method figures out which logs are taking up the most space and removes them.

Choosing your path

Once you pinpoint the logs that are taking up too much space, it’s time to decide what to do with them. If it’s for logs created by your own applications, there are a few options. Large variables, such as buffers and collections don’t need to be logged in their entirety, rather just the information that may be most relevant to your debugging workflow. Alternatively, a log that is too detailed can be replaced with one that is less so and more to the point. Take advantage of logging verbosity levels (INFO, ERROR, DEBUG, etc) and ensure you are logging at the right level. If it’s logs created by 3rd party applications, check out their logging configuration (whether you can define verbosity or define the events you need), as well as use drop filters in your log aggregator or log processor to remove excess logs.

Moving forward

If you are using structured-logging, keep in mind it will increase your overall logging volume. Enriched logs are going to make your life so much better, but if you aren’t careful they can easily balloon every single record by a factor of ten or more. A surprising amount of volume can be saved by culling out insignificant or rarely used fields in your metadata.

Once you’ve done this, check to ensure that the important metadata fields are efficiently represented within whichever format you’re using. Do this by using short-forms of values, optimizing field names, and avoiding unnecessary padding. All of these will help optimize your logs moving forward.

Take one step further down your logging path, and you’ll find the ability to archive. While many logs become less useful as time passes (older data simply tends to be less relevant) you may need to keep them for security, compliance, or regulatory purposes. If so, rather than throwing your money away with high-cost log aggregation services, archive your logs into cold storage services like Amazon S3 Glacier that allow you to load them back up when you need them.

Logging Optimized

Simply put, logging can be challenging. What happens if you don’t have any logs in an area of code that breaks? What if you discover a defect in your application, but your logs don’t give you any useful information? Luckily, there are tools that help you avoid these possibilities by giving you the ability to augment your existing logging data with additional data collected on demand from any place in your application without stopping or restarting it.

Rookout lets you set Non-Breaking Breakpoints in your production code and let’s you get any data you need, all with just one click, in seconds. There’s no need to write extra code, restart or redeploy your app and it works across all cloud providers and platforms. It also lets you send your data to any 3rd party logging or monitoring system you might be using. Also, any personally identifiable information (PII) in your data can be easily redacted to ensure no sensitive data leaves your network.

This enables developers to change the way they think about and implement logging. It allows developers to take a second look at their logs and decide which logs are necessary as well as adjust what they are logging.

Don’t let your log be the straw that broke the camel’s back. Optimize your logs for maximum efficiency and see how much time you can save yourself. On that note, did someone say it’s time for another relaxing cup of coffee?