Plot Files

One of collectl's main features is its ability to generate files in a ready-to-plot format which is compatible with what gnuplot expects and there are actually 2 main types of files that it generates. The first, which has an extension of tab, represents a table of all the summary data. What makes this file unique is that all data elements are in a fixed set of columns - some columns may get added over time, but for all intents and purposes, the set of data for say CPUs do not change regardless of how many CPUs are in the system. The second type of files deal with detail data, the amount of which changes with the number of instances so a 4 CPU system will have 1/2 the data an 8 CPU system has. There is one file for each type of detail data.

Plot files can be generated in 2 ways and each has its own advantages as well as disadvantages.

At first glance, it sounds like you'd always want to generate plot files directly since you avoid the need for the conversion step, but you should also realize a few things about this methodology:

Generating Plot Files On-The-Fly

While generating files this way is as easy as appending -P to the collectl command either when run interactively or in /etc/collectl.conf, there are a couple of things to keep in mind:

Generating Plot Files from RAW Files

Collectl has the capability to play back a single file or multiple once but in either case the first thing collectl does is examine the raw file header to get the source host name and creation date. There will always be a new set of data generated for each unique combination of host and creation date. Note that depending on the subsystems chosen there may be multiple output files generated. This also means a single raw file that spans multiple dates will result in a single set of data.

By default, the name of the plot file contains only the date and a test is made to see if a file with that name already exists. If not, it is created in append mode. This means that multiple raw data files for the same host on the same date will result in a single set of data. However, if that file already exists, collectl will NOT process any data, and request you specify -oc to tell it to perform the first open in create mode so that subsequent files can be appended. If you specify -oa all files will be appended to the original one which may not be what you want. Collectl cannot read your mind so to be safe, be explicit. If you want to generate a unique set of data files for each raw file use -ou which causes the time to be included in file names, resulting in a unique output file name for each raw file.

This certainly maximizes your flexibility for all the reasons listed earlier. However, this now puts the responsibility of managing your data more squarely on your shoulders. Some of the questions you need to answer include:

Having answered these questions and perhaps others, it now just becomes a matter of executing the appropriate copy and/or collectl commands, which can be relatively easily scripted.

TIP - If you rsync raw files to another server and then process them using a wildcard in your playback command, you will probably end up processing some of today's files too! If you then later copy over the rest of today's file(s) you will need to recreate today's plot file since collectl will not overwrite an exiting file by default. But if you specify the -oc switch with a wild card you will end up recreating all the plot files which will result in a lot more processing than you were planning on. Collectl supports a special syntax that allows you to playback just the files from yesterday by replacing that string with yesterday's date as in the following:

collectl -p "YESTERDAY*" etc...
noting that all uppercase characters are required and you can include other characters in the string such as a host name if need be.

TIP - If you want to create multiple sets of plot files from the same raw file, you can always include a unique qualifier along with the directory name with the -f switch to give each set a different prefix.