4. Caching and named samples

In this example, we show caching and named samples. Caching allows you to make sure you never overload a service from monitoring. Named samples allows logical grouping of related monitoring from different targets.

Configuration file

As before, copy the configuration file below as the file /etc/monamid.d/example.conf, overwriting any existing file.

 ##
 ##  MonAMI by Example, Section 4
 ##

# Our root filesystem
[filesystem]
 name = root-fs
 location = /
 cache = 2 ❶

# Our /home filesystem
[filesystem]
 name = home-fs
 location = /home
 cache = 2s

# Bring together information about the two partitions
[sample]
 name = partitions  ❷
 read = root-fs, home-fs
 cache = 10

# Update our snapshot every ten seconds
[sample]
 read = partitions  ❸
 write = snapshot
 interval = 10

# Once a minute, send data to a log file.
[sample]
 read = partitions.root-fs.capacity.available, \ ❹
        partitions.home-fs.capacity.available
 write = filelog
 interval = 1m  ❺

# A file containing current filesystem information
[snapshot]
 filename = /tmp/monami-fs-current

# A permanent log of a few important metrics
[filelog] ❻
 filename = /tmp/monami-fs-log

Some points of interest:

The cache attribute specifies a guaranteed minimum delay between successive requests for information. Here, there will always be at least two seconds between consecutive requests.

The value is a time-period: one or more words that specify how long the period should be. This is the same format as the sample interval attribute, so “5m” is five minutes and “1h 30m” is an hour and a half.

Like all targets, this name must be unique.

This sample reads all available metrics from the partitions target. To gather this information, the partitions target will read from the two filesystem targets.

Sometimes attribute lines can get quite long. To make them easier to read and edit, long lines can be broken down into multiple shorter lines provided the last character is a backslash (\).

This interval is deliberately short to allow quick gathering of information. For normal use a much longer interval would be more appropriate.

The filelog plugin creates a file, if it does not already exit, and appends a new line for each datatree it receives. It is a simple method of archiving monitoring information.

Running MonAMI

With this example, you should leave MonAMI running for a few minutes. Whilst it is running, you can check that data is being appended to the log file (/tmp/monami-fs-log) correctly using, for example, the cat program.

Depending on which version of MonAMI you are using and the current state of your partitions, the file /tmp/monami-fs-current should look like:

"partitions.root-fs.fragment size"      "1024" (B) [every 10s]
"partitions.root-fs.blocks.size"        "1024" (B) [every 10s]
"partitions.root-fs.blocks.total"       "264445" (blocks) [every 10s]
"partitions.root-fs.blocks.free"        "142771" (blocks) [every 10s]
"partitions.root-fs.blocks.available"   "129118" (blocks) [every 10s]
"partitions.root-fs.capacity.total"     "258.24707" (MiB) [every 10s]
"partitions.root-fs.capacity.free"      "139.424805" (MiB) [every 10s]
"partitions.root-fs.capacity.available" "126.091797" (MiB) [every 10s]
"partitions.root-fs.capacity.used"      "118.822266" (MiB) [every 10s]
"partitions.root-fs.files.used" "68272" (files) [every 10s]
"partitions.root-fs.files.free" "56294" (files) [every 10s]
"partitions.root-fs.files.available"    "56294" (files) [every 10s]
"partitions.root-fs.flag"       "0" () [every 10s]
"partitions.root-fs.namemax"    "255" () [every 10s]
"partitions.home-fs.fragment size"      "4096" (B) [every 10s]
"partitions.home-fs.blocks.size"        "4096" (B) [every 10s]
"partitions.home-fs.blocks.total"       "16490546" (blocks) [every 10s]
"partitions.home-fs.blocks.free"        "3699442" (blocks) [every 10s]
"partitions.home-fs.blocks.available"   "2861754" (blocks) [every 10s]
"partitions.home-fs.capacity.total"     "64416.195312" (MiB) [every 10s]
"partitions.home-fs.capacity.free"      "14450.945312" (MiB) [every 10s]
"partitions.home-fs.capacity.available" "11178.726562" (MiB) [every 10s]
"partitions.home-fs.capacity.used"      "49965.25" (MiB) [every 10s]
"partitions.home-fs.files.used" "8388608" (files) [every 10s]
"partitions.home-fs.files.free" "8008117" (files) [every 10s]
"partitions.home-fs.files.available"    "8008117" (files) [every 10s]
"partitions.home-fs.flag"       "0" () [every 10s]
"partitions.home-fs.namemax"    "255" () [every 10s]

The file /tmp/monami-fs-log should look like:

#       time            partitions.root-fs.capacity.available   partitions.
home-fs.capacity.available
2007-10-03 11:12:59     126.091797      11178.707031
2007-10-03 11:13:59     126.091797      11178.703125
2007-10-03 11:14:59     126.091797      11178.703125
2007-10-03 11:15:59     126.091797      11178.710938

Named sample targets

A named sample target is simply a sample target that has a name attribute specified. In contrast, a sample without any specified name attribute is an anonymous sample. All the samples in previous sections are anonymous.

The main use for named samples is to allow grouping of monitoring data. Suppose you wanted to monitor multiple attributes about a service; for example, count active TCP connections, watch the application's use of the database, and count number of daemons running. You may, for ease of handling, want to build a datatree containing the combined set of metrics. A named sample allows you to do this.

Another aspect of named targets is that it allows other targets (such as anonymous samples) to request monitoring data from the named sample. Named samples can be used, in effect, as simple monitoring target (such as root-fs target above).

What's in a name?

In fact, anonymous sample sections do have a name: their name is assigned automatically when MonAMI starts. However, you should never use this name or need to know it. If you find you need to collect data from an anonymous sample, simply give the target a name.

Note that, although not illustrated in the above example, named samples will honour the interval attribute. This allows them to provide periodic monitoring information (in common with anonymous samples) whilst simultaneously allowing other targets to request information at other times.

Caching

Monitoring will always incur some cost (computational, memory and sometimes storage and network bandwidth usage). Sometimes this cost is sufficiently high that we might want to rate-limit any queries so, for example, a service is never monitored more than once every minute.

Within MonAMI, this is achieved with the cache attribute. You can configure any target to cache gathered metrics for a period. In the above example, metrics from the partitions named sample are cached for ten seconds. If one of the anonymous samples had the interval attribute set to less than 10 seconds, they would not trigger any gathering of fresh data. Instead, they would receive the previous (cached) result until the ten-second cache had expired.

Default caching policy

By default MonAMI will cache all results for one second. Since MonAMI monitoring frequency (the interval attribute) has a granularity of one second, this default cache will not be noticed when a target obtains data. However, if multiple targets request data from the same target at almost the same time (to within a second), the default cache ensures all the requests receive data from a single datatree.

Note that the cache attribute works for sample targets, as demonstrated in the above example. Caching targets with different cache-intervals allows a conservative level of caching for the bulk of the monitoring activity whilst retaining the possibility of adding more frequent monitoring.

Some monitoring plugins will report a different set of metrics over time; this causes the structure of the datatree changes due to the number of reported metrics varying. Most often this happens when the service being monitored changes availability (when a service “goes down” or “comes up”), although some services report additional metrics once they have stabilised. The Apache HTTP server is an example; after an initial delay, it provides a measure of bandwidth usage.

When a change in a datatree structure is detected, MonAMI will invalidate all its internal caches that use this datatree; independent caches are left unaffected. Subsequent requests to a target for fresh data will gather new data, either freshly cached or direct from the monitoring target. This allows the new structure to propagate independent of the cache attributes.