Logging: What, Why and When

There are multiple PostgreSQL configuration parameters that determine what exactly to log. In Postgres 16, there are 30 of them if you exclude the ones related to the file naming and location. We rarely change them when we create a new instance. In a best-case scenario, your organization has a default set of cnfiguration parameters which are applied when a new instance is created.

Why are we usually ignorant of these parameters? My hypothesis is that that’s because we rarely use Postgres logs for anything. There are many extensions and third-party tools to monitor different Postgres activities, to alert of high CPU or IO, of too many connections, disk usage increase, tables bloat, and so on. None of them use information from Postgres logs. Even when we want to monitor queries’ performance changes, we use pg_stat_statements, not logs.

If anybody who is reading this blog uses Postgres logs “as is” on a regular basis, please let me know! PgBadger fans – please hold off; we will get there!

During all my Postgres years, all the cases when I or anybody whom I know would use Postgres log were for investigations; that’s when we try to find something specific that happened at some particular moment or period of time, but never for any regular monitoring. That’s why, in a majority of cases, we do not care. But if we truly “do not care,” why do any logging at all? One of the default log-related settings is log_min_duration_statement=10 which means that Postgres will record all statements which take 10 or more milliseconds.

On the surface, it makes sense. We want to record only those SQL statements that exceed the response time threshold so that we can investigate what exactly causes the problem when an application experiences slowness. However, this logging does not help us determine whether this is a sudden spike, or the execution time was always just below 10 ms and finally reached the magic number, or even what portion of similar queries is slow.

Anybody who ever was on support for PostgreSQL databases had a moment when they would be paged for Postgres running out of space for system logs. The default action is to increase the space, but I believe the first thing to do would be to check what is being logged. Quite often, we would see that the reason for the log growth is that the number of queries with execution time over 10 ms significantly increased. At this point, I would ask whether anybody cared about logging, or more specifically, whether anybody will be looking in

When we use Postgres logs for investigation and want to know what happened during a particular session, we need to see all issued statements, not just the ones that “exceed 10 ms.” With many other logging parameters, the situation is similar. You might want to set log_connections on to record which users connected to the database at what time, but unless we record what was done during the session, there is not that much value in this information.

Of course, there are cases when you need to investigate some specific problems like autovacuum not doing its job or replication issues, and then you will turn on these specific parameters, but in the general case of “let’s figure out what’s going on inside my database,” only full logging will be helpful. However, when we turn all logging on, the size of the log might skyrocket, and the problem will not be how much space you need but how you are going to process these massive volumes and extract valuable information.

At this point, we all should collectively thank Gilles Darold for pgBadger. For many years, pgBadger has held the honor of being the most often used tool for investigating performance problems and the “weapon of choice” for many. I am not an exception: I love it, and I use it often. It proved to be instrumental not only for investigations but also for demonstrating to application developers what they are doing wrong. However, to get the results you need, you should enable full logging. The pgBadger READ.me lists all the logging that should be turned on to start collecting the information:

log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h
log_min_duration_statement = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_autovacuum_min_duration = 0
log_error_verbosity = default

My current logging philosophy is:

Log almost nothing by default; turn on logging almost everything when you need to collect data for pgBadger.

And if you can set up prompt logs removal (and log size in general is manageable), just have full logging on all the time.

Best thing for last

A couple months ago, some of my peers noticed that my name appeared in pgBadger commits. No, I didn’t contribute anything, and yes, I still can’t program, but I had a dream!

With all of the great features of pgBadger, there were two that I really wanted to have, and they were missing: the ability to analyze logs my own way: group, aggregate, slice, and dice however I want, and the ability to trace individual user sessions. Being a database person, I thought that the only way to accomplish this task was to load logs to a database, and for that, I needed “raw” logs, the ones that were already parsed but not analyzed.

Long story short, the option –dump-raw-csv does exactly that. To take full advantage of this new option, I process each log twice. First, I generate “a regular” pgBadger report, and then I create a raw log in the CSV format, which I can load into a database table. In reality, it’s not as easy and straightforward as it might sound, but I am working on this process automation, and as soon as it is done, you will hear more!

Leave a comment

Filed under Data management, SQL

Leave a comment