3.3. Features common across plugins

There are some features that are common to each of the plugins. Rather than repeat the same information under each plugin's description, the information is presented here.

3.3.1. The name attribute

Each distinct service has a separate stanza within the configuration file, using the plugin name. Considering the apache monitoring plugin (which monitors an Apache HTTP webserver) as an example, one can monitor multiple Apache webservers with several separate [apache] stanzas: one for each monitoring target. To illustrate this, the following configuration describes how to monitor an intranet web server and an external web server.

[apache]
 name = external-webserver
 host = www.example.org

[apache]
 name = internal-webserver
 host = www.intranet.example.org

Each target must have a unique name. It is possible to specify the name a target will adopt with the name attribute (as in the above example). If no name attribute is given, the target take the name of the plugin by default. However, since all names must be unique, only one target can adopt the default name: all subsequent targets (from this plugin) must have their name specified explicitly using the name attribute.

Although specifying a name is optional, it is often useful to set a name explicitly (preferably to something meaningful). Simple configuration files will work fine without explicitly specifying target names, whilst configuration files describing more complex monitoring requirements will likely fail unless they have explicitly named targets.

If there is an ambiguity (due to different targets having the same name) MonAMI will attempt to monitor as much as possible (to “degrade gracefully”) but some loss of functionality is inevitable.

3.3.2. The cache attribute

Acquiring the current status of a service will inevitably take resources (such as CPU time and perhaps disk space) away from the service. For some services this effort is minimal, for others it is more substantial. Whatever the burden, there will be some monitoring frequency above which monitoring will impact strongly on service provision.

To prevent overloading a service, the results from querying a service are stored within MonAMI for a period. If there is a subsequent request for the current state of the target within that period then the stored results are used rather than directly querying the underlying service: the results are cached.

The cache retention period is adjustable for each target and can be set with the cache attribute. The cache attribute value is the time for which data is retained, or (equivalently) the guaranteed minimum time between successive queries to the underlying service.

The value is specified using the standard time-interval notation: one or more numbers each followed by a single letter modifier. The modifiers are s, m, h and d for seconds, minutes, hours and days respectively. If a qualifier is omitted, seconds is assumed. The total cache retention period is the sum of the time. For example 5m 10s is five minutes and ten seconds and is equivalent to specifying 310.

In the following example configuration file, the MySQL queries are cached for a minute whilst the Apache queries are cached for 2 seconds:

[apache]
 host = www.example.org
 cache = 2

[mysql]
 host = mysql-serv.example.org
 user = monami
 password = monami-secret
 cache = 1m

If no cache retention period is specified, a default value of one second is used. Since MonAMI operates at the granularity of one second, there is apparently no effect on individual monitoring activity, yet we ensure that targets are queried no more often than once a second.

For many services, a one second cache retention time is too short and the cached data should be retained for longer; yet if the cache retention time is set for too long, transitory behaviour will not be detectable. A balance must be struck, which (most likely) will need some experimentation.

3.3.3. The map attribute

The map attribute describes how additional information is to be added to an incoming datatree. When a datatree is sent to a target that has one or more map attributes it is first processed to alter the incoming datatree. To the target, the additional metrics provided by map attributes are indistinguishable from those of the original datatree.

The map attribute values take the following form:

map = target metric : source

The value of target metric determines the name of the new metric and where it is to be stored. Any periods (.) within target metric will be interpreted as a path within the datatree. If the elements of the path do not exist, they are created as necessary, unless there is already a metric with the same name as a path element.

The source describes where the information for this new metric is to come from. The two possibilities are string-literals and specials.

String-literals are a string metric that never change: they have a fixed value independent of any monitoring activity. A string-literal starts and ends with a double-quote symbol (") and can have any content in between. Since MonAMI aims at providing monitoring information, the use of string literals is discouraged.

A special is something that provides some very basic information about the computer: sufficiently basic that providing the information via a plugin is unnecessary. A special is represented by its name contained in angle-brackets (< and >). The following specials are available:

FQDN

the Fully Qualified Domain Name of the machine. This is the full DNS name of the computer; for example, www.example.org.

The follow simple, stand-alone MonAMI configuration illustrates map attributes.

[null]

[sample]
 read = null
 write = snapshot
 interval = 1

[snapshot]
 filename = /tmp/monami-snapshot
 map = tests.string-literal.first : "this is a string-literal"
 map = tests.special.fqdn : <FQDN>
 map = tests.string-literal.second : "this is also a \
                   string-literal"

The null plugin (see Section 3.4.9, “null”) produces datatrees with no data. Without the map attributes, the snapshot would produce an empty file at /tmp/monami-snapshot. The map attributes add additional metrics to otherwise-empty datatrees. This is reflected in the contents of /tmp/monami-snapshot.

3.3.4. Estimating future data-gathering delays

The process of gathering monitoring data from a service is not instantaneous. In general, there will be a delay between MonAMI requesting the data and it receiving that data. The length of this delay may depend on several factors, but is likely to depend strongly on the software being monitored and how busy is the server.

Whenever MonAMI receives data, it makes a note of how long this data-gathering took. MonAMI uses this information to maintain an estimate for the time needed for the next request for data from this monitoring target.

This estimate is available to all plugins, but currently only two use it: ganglia and sample. The ganglia plugin passes this information on to Ganglia as the dmax value (see Section 3.5.3, “dmax”) and the sample plugin uses this information to achieve adaptive monitoring (see Section 3.6.4, “Adaptive monitoring”).

When maintaining an estimate of the next data-gathering delay, MonAMI takes a somewhat pessimistic view. It assumes that data-gathering will take as long as the longest observed delay, unless there is strong evidence that the situation has improved. If gathering data took longer than the current estimate, the estimate is increased correspondingly. If a service becomes sufficiently loaded (e.g., due to increase user activity) so that the observed data-gathering delay increases, MonAMI will adjust its estimate to match.

If data-gathering takes less time than the current estimated value, the current estimate is not automatically decreased. Instead, MonAMI waits to see if the lower value is reliable, and that the delay has stabilised at the lower value. Once it is reasonably sure of this, MonAMI will reduce its estimate for future data-gathering delays.

To determine when the delay has stabilised, MonAMI keeps a history of previous data-gathering delay values. The history is stored as several discrete intervals, each with the same minimum duration. By default, there are ten history intervals each with a one minute minimum duration, giving MonAMI a view of recent history going back at least ten minutes.

Each interval has only one associated value: the maximum observed delay during that interval. At all times, there is an interval called the current interval. Only the current interval is updated, the other intervals provide historical context. As data is gathered the maximum observed delay for the current interval is updated.

When the current interval has existed for more than the minimum duration (one minute, by default), all the intervals moved: the current history interval becomes the first non-current history interval, what was the first non-current interval becomes the second, and so on. The information in the last history interval is dropped and a new current interval is created. Future data-gathering delays are recorded in this new current interval until the minimum interval has elapsed and the intervals moved again.

MonAMI takes two statistical measures of the history intervals: the maximum value and the average absolute deviation (or average deviation for short). The maximum value is the proposed new value for the estimated delay, if it is lower, and the absolute deviation is used to determine if the change is significant.

Broadly speaking, the average deviation describes how settled the data stored in the historic intervals are over the recent history: a low number implies data-taking delays are more predictable, a high number indicates they are less predicable. MonAMI only reduces the estimate for future delays if the difference (between current estimate value and the maximum over all historic intervals) is significant. It is significant if the ratio between the proposed drop in delay and the average deviation exceeds a certain threshold value.

In summary, to reduce the estimate of future delays, the observed delay must be persistently low over the recorded history (minimum of 10 minutes, by default). If the delay is temporarily low, is decreasing over time or fluctuates, the estimate is not reduced.

There are two attributes that affect how MonAMI determines its estimate. The default values should be sufficient under most circumstances. Moreover, there are separate attributes for adjusting the behaviour both of adaptive monitoring (see Section 3.6.5, “Sample attributes”), and the dmax value of Ganglia (see Section 3.5.3, “Attributes”). Adjusting these attributes may be more appropriate.

Attributes

md_intervals integer, optional

the number of historic intervals to consider. The default is 10 and the value must be between 2 and 30. Increased number of intervals results in more stringent requirement needed before the estimate is reduced. It also increases the accuracy of the average deviation measurements.

Having a small number of intervals (less then 5, say) is not recommended as the statistics becomes less reliable.

A large number of intervals gives more reliable statistical results, but the system will take longer to react (to reduce the delay estimate) to changing situations. Perhaps this is most noticeable if there is a single data-gathering delay that is unusually long. If this happens, MonAMI will take at least the md_intervals times the minimum delay to reduce the delay estimate.

md_duration integer, optional

The minimum duration, in seconds, for an interval. The default is 60 seconds and the value must be between 1 second and 1200 seconds (20 minutes).

Each interval must have at least one data point: an observation of the data-gathering delay. To ensure this, the value of md_duration is implemented as a minimum duration and, in practise, the intervals can be longer. For example, with the default configuration (md_duration of one minute, md_intervals of 10) if only a single monitoring flow is established that gathers data from a monitoring target every 90 seconds, each interval will have a 90 second duration and complete history will be 15 minute.