There are some features that are common to each of the plugins. Rather than repeat the same information under each plugin's description, the information is presented here.
Each distinct service has a separate stanza within the
configuration file, using the plugin name. Considering the
apache monitoring plugin (which monitors an Apache
HTTP
webserver) as an example, one can monitor multiple
Apache webservers with several separate
[apache]
stanzas: one for each monitoring
target. To illustrate this, the following configuration
describes how to monitor an intranet web server and an
external web server.
[apache] name = external-webserver host = www.example.org [apache] name = internal-webserver host = www.intranet.example.org
Each target must have a unique name. It is possible to specify the name a target will adopt with the name attribute (as in the above example). If no name attribute is given, the target take the name of the plugin by default. However, since all names must be unique, only one target can adopt the default name: all subsequent targets (from this plugin) must have their name specified explicitly using the name attribute.
Although specifying a name is optional, it is often useful to set a name explicitly (preferably to something meaningful). Simple configuration files will work fine without explicitly specifying target names, whilst configuration files describing more complex monitoring requirements will likely fail unless they have explicitly named targets.
If there is an ambiguity (due to different targets having the same name) MonAMI will attempt to monitor as much as possible (to “degrade gracefully”) but some loss of functionality is inevitable.
Acquiring the current status of a service will inevitably take resources (such as CPU time and perhaps disk space) away from the service. For some services this effort is minimal, for others it is more substantial. Whatever the burden, there will be some monitoring frequency above which monitoring will impact strongly on service provision.
To prevent overloading a service, the results from querying a service are stored within MonAMI for a period. If there is a subsequent request for the current state of the target within that period then the stored results are used rather than directly querying the underlying service: the results are cached.
The cache retention period is adjustable for each target and can be set with the cache attribute. The cache attribute value is the time for which data is retained, or (equivalently) the guaranteed minimum time between successive queries to the underlying service.
The value is specified using the standard time-interval
notation: one or more numbers each followed by a single
letter modifier. The modifiers are s
,
m
, h
and
d
for seconds, minutes, hours and days
respectively. If a qualifier is omitted, seconds is
assumed. The total cache retention period is the sum of the
time. For example 5m 10s
is five minutes
and ten seconds and is equivalent to specifying
310
.
In the following example configuration file, the MySQL queries are cached for a minute whilst the Apache queries are cached for 2 seconds:
[apache] host = www.example.org cache = 2 [mysql] host = mysql-serv.example.org user = monami password = monami-secret cache = 1m
If no cache retention period is specified, a default value of one second is used. Since MonAMI operates at the granularity of one second, there is apparently no effect on individual monitoring activity, yet we ensure that targets are queried no more often than once a second.
For many services, a one second cache retention time is too short and the cached data should be retained for longer; yet if the cache retention time is set for too long, transitory behaviour will not be detectable. A balance must be struck, which (most likely) will need some experimentation.
The map attribute describes how additional information is to be added to an incoming datatree. When a datatree is sent to a target that has one or more map attributes it is first processed to alter the incoming datatree. To the target, the additional metrics provided by map attributes are indistinguishable from those of the original datatree.
The map attribute values take the following form:
map =target metric
:source
The value of
determines the name of the
new metric and where it is to be stored. Any periods
(target
metric
.
) within
will be interpreted as a path
within the datatree. If the elements of the path do not
exist, they are created as necessary, unless there is
already a metric with the same name as a path element.
target
metric
The
describes where the information for this new metric is to
come from. The two possibilities are string-literals and
specials.
source
String-literals are a string metric
that never change: they have a fixed value independent of
any monitoring activity. A string-literal starts and ends
with a double-quote symbol ("
) and can
have any content in between. Since MonAMI aims at providing
monitoring information, the use of string literals is
discouraged.
A special is something that provides
some very basic information about the computer: sufficiently
basic that providing the information via a plugin is
unnecessary. A special is represented by its name contained
in angle-brackets (<
and
>
). The following specials are
available:
FQDN
the Fully Qualified Domain Name of the machine. This
is the full DNS name of the computer; for example,
www.example.org
.
The follow simple, stand-alone MonAMI configuration illustrates map attributes.
[null] [sample] read = null write = snapshot interval = 1 [snapshot] filename = /tmp/monami-snapshot map = tests.string-literal.first : "this is a string-literal" map = tests.special.fqdn : <FQDN> map = tests.string-literal.second : "this is also a \ string-literal"
The null plugin (see Section 3.4.9, “null”)
produces datatrees with no data. Without the
map attributes, the snapshot would
produce an empty file at
/tmp/monami-snapshot
. The
map attributes add additional metrics
to otherwise-empty datatrees. This is reflected in the
contents of /tmp/monami-snapshot
.
The process of gathering monitoring data from a service is not instantaneous. In general, there will be a delay between MonAMI requesting the data and it receiving that data. The length of this delay may depend on several factors, but is likely to depend strongly on the software being monitored and how busy is the server.
Whenever MonAMI receives data, it makes a note of how long this data-gathering took. MonAMI uses this information to maintain an estimate for the time needed for the next request for data from this monitoring target.
This estimate is available to all plugins, but currently
only two use it: ganglia and
sample. The ganglia
plugin passes this information on to Ganglia as the
dmax
value (see Section 3.5.3, “dmax”)
and the sample plugin uses this information to
achieve adaptive monitoring (see Section 3.6.4, “Adaptive monitoring”).
When maintaining an estimate of the next data-gathering delay, MonAMI takes a somewhat pessimistic view. It assumes that data-gathering will take as long as the longest observed delay, unless there is strong evidence that the situation has improved. If gathering data took longer than the current estimate, the estimate is increased correspondingly. If a service becomes sufficiently loaded (e.g., due to increase user activity) so that the observed data-gathering delay increases, MonAMI will adjust its estimate to match.
If data-gathering takes less time than the current estimated value, the current estimate is not automatically decreased. Instead, MonAMI waits to see if the lower value is reliable, and that the delay has stabilised at the lower value. Once it is reasonably sure of this, MonAMI will reduce its estimate for future data-gathering delays.
To determine when the delay has stabilised, MonAMI keeps a history of previous data-gathering delay values. The history is stored as several discrete intervals, each with the same minimum duration. By default, there are ten history intervals each with a one minute minimum duration, giving MonAMI a view of recent history going back at least ten minutes.
Each interval has only one associated value: the maximum observed delay during that interval. At all times, there is an interval called the current interval. Only the current interval is updated, the other intervals provide historical context. As data is gathered the maximum observed delay for the current interval is updated.
When the current interval has existed for more than the minimum duration (one minute, by default), all the intervals moved: the current history interval becomes the first non-current history interval, what was the first non-current interval becomes the second, and so on. The information in the last history interval is dropped and a new current interval is created. Future data-gathering delays are recorded in this new current interval until the minimum interval has elapsed and the intervals moved again.
MonAMI takes two statistical measures of the history intervals: the maximum value and the average absolute deviation (or average deviation for short). The maximum value is the proposed new value for the estimated delay, if it is lower, and the absolute deviation is used to determine if the change is significant.
Broadly speaking, the average deviation describes how settled the data stored in the historic intervals are over the recent history: a low number implies data-taking delays are more predictable, a high number indicates they are less predicable. MonAMI only reduces the estimate for future delays if the difference (between current estimate value and the maximum over all historic intervals) is significant. It is significant if the ratio between the proposed drop in delay and the average deviation exceeds a certain threshold value.
In summary, to reduce the estimate of future delays, the observed delay must be persistently low over the recorded history (minimum of 10 minutes, by default). If the delay is temporarily low, is decreasing over time or fluctuates, the estimate is not reduced.
There are two attributes that affect how MonAMI determines
its estimate. The default values should be sufficient under
most circumstances. Moreover, there are separate attributes
for adjusting the behaviour both of adaptive monitoring (see
Section 3.6.5, “Sample attributes”), and the
dmax
value of Ganglia (see Section 3.5.3, “Attributes”). Adjusting these attributes
may be more appropriate.
the number of historic intervals to consider. The default is 10 and the value must be between 2 and 30. Increased number of intervals results in more stringent requirement needed before the estimate is reduced. It also increases the accuracy of the average deviation measurements.
Having a small number of intervals (less then 5, say) is not recommended as the statistics becomes less reliable.
A large number of intervals gives more reliable statistical results, but the system will take longer to react (to reduce the delay estimate) to changing situations. Perhaps this is most noticeable if there is a single data-gathering delay that is unusually long. If this happens, MonAMI will take at least the md_intervals times the minimum delay to reduce the delay estimate.
The minimum duration, in seconds, for an interval. The default is 60 seconds and the value must be between 1 second and 1200 seconds (20 minutes).
Each interval must have at least one data point: an observation of the data-gathering delay. To ensure this, the value of md_duration is implemented as a minimum duration and, in practise, the intervals can be longer. For example, with the default configuration (md_duration of one minute, md_intervals of 10) if only a single monitoring flow is established that gathers data from a monitoring target every 90 seconds, each interval will have a 90 second duration and complete history will be 15 minute.