This section describes the different services that can be monitored (for example, a MySQL database or an Apache webserver). It gives brief introductions to which services the plugins can monitor and how they can be configured. Wherever possible, sensible defaults are available so often little or no configuration is required for common deployment scenarios.
The available monitoring plugins depend on which plugins have been built and installed. If you have received this document as part of a binary distribution, it is possible that the distribution does not include all the plugins described here. It might also contain other plugins provided independently from the main MonAMI release.
AMGA (ARDA Metadata Catalogue Project) is a metadata server provided by the ARDA/EGEE project as part of their gLite software releases. It provides additional metadata functionality by wrapping an underlying database storage. More information about AMGA is available from the AMGA project page.
The amga monitoring plugin will monitor the server's database connection usage and the number of incoming connections. For both, the current value and configured maximum permitted are monitored.
the host on which the AMGA server is running. The default
value is localhost
.
the port on which the AMGA server listens. The default value is 8822.
The Apache HTTP (or web) server is perhaps the most well known project from the Apache Software Foundation. Since April 1996, the Netcraft web survey has shown it to be the most popular on the Internet. More details can be found at the Apache home page.
The apache plugin monitors the current status of an Apache HTTP server. It can also provide event-based monitoring, based on various log files.
The Apache server monitoring is achieved by downloading the
server-status page (provided by the mod_status Apache plugin) and
parsing the output. Usually, this option is available within the
Apache configuration, but commented-out by default (depending on
the distribution). The location of the Apache configuration is
Apache-version and OS specific, but is usually found in either the
/etc/apache
, /etc/apache2
or /etc/httpd
directory. To enable the
server-status page, uncomment the section or add lines within the
apache configuration that look like:
<Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from .example.com </Location>
Here .example.com
is an illustration of how to limit
access to this page. You should change this to either your DNS
domain or explicitly to the machine on which you are to run
MonAMI.
There is an ExtendedStatus option that configures Apache to include some additional information. This is controlled within the Apache configuration by lines similar to:
<IfModule mod_status.c> ExtendedStatus On </IfModule>
Switching on the extended status should not greatly affect the server's load and provides some additional information. MonAMI can understand this extra information, so it is recommended to switch on this ExtendedStatus option.
Event-based monitoring is made available by watching log files. Any time the Apache server writes to a watched log file, an event is generated. The plugin supports multiple event channels, allowing support for multi-homed servers that log events to different log files.
Event channels are specified by log attributes. This can be repeated to configure multiple event channels. Each log attribute has a corresponding value like:
name
:
path
[
type
]
where:
name
is an arbitrary name given to this channel. It cannot have
a colon (:
) and should not have a dot
(.
) but most names are valid.
path
is the path to the file. Log rotations (where a log file is archived and a new one created) are supported.
type
is either combined
, or error
.
The following example configures the access
channel to read the log file
/var/log/apache2/access.log
, which is in the
Apache standard “combined” format.
[apache] log = access: /var/log/apache2/access.log [combined]
the hostname for webserver to monitor. The default value is
localhost
.
the port on which the webserver listens. The default value is 80
specifies an event monitoring channel. Each
log attribute has a value like:
name
:
path
[
type
]
dCache (see dCache home page) is a system jointly developed by Deutsches Elektronen-Synchrotron (DESY) and Fermilab that aims to provide a mechanism for storing and retrieving huge amounts of data among a large number of heterogeneous server nodes, which can be of varying architectures (x86, ia32, ia64). It provides a single namespace view of all of the files that it manages and allows access to these files using a variety of protocols, including SRM, GridFTP, dCap and xroot. By connecting dCache to a tape storage backend, it becomes a hierarchical storage manager (HSM).
The dCache monitoring plugin works by connecting to the underlying PostGreSQL database that dCache uses to store the current system state. To achieve this, MonAMI must have the credentials (a username and password) to log into the database and perform read queries.
If you do not already have a read-only account, you will need to create such an account. It is strongly recommended not to use an account with any write privileges as the password will be stored plain-text within the MonAMI configuration file (see Section 4.2.2, “Passwords being stored insecurely”).
To configure PostGreSQL, SQL commands need to be sent to the database server. To achieve this, you will need to use the psql command, connecting to the dcache database. On many systems you must log in as the database user “postgres”, which often has no password when connecting from the same machine on which database server is running. A suitable command is:
psql -U postgres -d dcache
The following SQL commands will create an account
with password
monami
that has
read-only access to the tables that MonAMI will read.
monami-secret
Please ensure you change the example password
(
).
monami-secret
CREATE USERmonami
; ALTER USERmonami
PASSWORD 'monami-secret
'; GRANT SELECT ON TABLE copyfilerequests_b TOmonami
; GRANT SELECT ON TABLE getfilerequests_b TOmonami
; GRANT SELECT ON TABLE putfilerequests_b TOmonami
;
If you intend to monitor the database remotely, you may need to
add an extra entry in PostGreSQL's remote access file:
pg_hba.conf
. With some distribution, this
file is located in the directory
/var/lib/pgsql/data
.
Currently, the information gathered is limited to the rate of SRM
GET, PUT and COPY requests received. This information is gathered
from the copyfilerequests_b,
getfilerequests_b and putfilerequests_b tables. Future
versions of MonAMI may read other tables, so requiring additional
GRANT
statements.
the host on which the PostGreSQL database is running. The
default is localhost
.
the IP address of the host on which the database is running. This is useful when the host is on multiple IP subnets and a specific one must be used. The default is to look up the IP address from the host.
the TCP
port to use when connecting to the database. The
default is port 5432 (the standard PostGreSQL port).
the username to use when connecting to the database. The
default is the username of the system account MonAMI is
running under. When running as a daemon from a standard
RPM-based installation, the default user will be monami
.
the password to use when authenticating. The default is to attempt password-less login to the database.
Disk Pool Manager (DPM) is a service that implements the SRM protocol (mainly for remote access) and rfio protocol (for site-local access). It is an easy-to-deploy solution that can support multiple disk servers but has no support for tape/mass-storage systems. More information on DPM can be found at the DPM home page.
The dpm plugin connects to the MySQL server DPM uses. By querying this database, information is extracted such as the status of the filesystems and the used and available space. The space statistics are available as a summary, and broken down for each group, and for each filesystem. The daemon activity on the head node can also be monitored.
This plugin requires read-only privileges for the database DPM
uses. The following set of SQL statements creates login
credentials with username of
and password of
monamiuser
suitable for
local access:
monamipass
GRANT SELECT ON cns_db.* TO 'monamiuser
'@'localhost' IDENTIFIED BY 'monamipass
'; GRANT SELECT ON dpm_db.* TO 'monamiuser
'@'localhost' IDENTIFIED BY 'monamipass
';
If MonAMI is to monitor the MySQL database remotely, the following SQL can be used to create login credentials
GRANT SELECT ON cns_db.* TO 'monamiuser
'@'%' IDENTIFIED BY 'monamipass
'; GRANT SELECT ON dpm_db.* TO 'monamiuser
'@'%' IDENTIFIED BY 'monamipass
';
If local and remote access to the MonAMI database is needed all four above SQL commands should be combined.
the host on which the MySQL server is running. Default is
localhost
.
the username with which to log into the server.
the password with which to log into the server.
The filesystem plugin monitors generic (i.e., non-filesystem specific) features of a mounted filesystem. It reports both capacity and “file” statistics. The “file” statistics correspond to inode usage for filesystems that use inodes (such as ext2).
With both reported resources (blocks and files), there are similar-sounding metrics: “free” and “available”. “free” refers to total resource potentially available and “available” refers to the resource available to general (non-root) users.
The difference between the two comes about because it is common to reserve some capacity for the root user. This allows core system services to continue when a partition is full: normal users cannot create files but root (and processes running as root) can.
the absolute path to any file on the filesystem.
The Globus Alliance distribute a modified version of the WU-FTP client that has been patched to allow GSI-based authentication and multiple streams. This is often referred to as “GridFTP”.
Various grid components use GridFTP as an underlying transfer mechanism. Often, these have the same log-file format for recording transfers, so parsing this log-file is a common requirement.
The gridftp plugin monitors GridFTP log files, providing an event for each transfer. This is under the transfers channel.
the absolute path to the GridFTP log file.
On their website, Cluster Resources describe Maui as “an advanced batch scheduler with a large feature set well suited for high performance computing (HPC) platforms”. Within a cluster it is used to decide which job (of many that are available) should be run next. Maui provides sophisticated scheduling features such as advanced fair-share definitions and “allocation bank”. More details are available within the Maui homepage.
The MonAMI maui plugin will need sufficient access rights
to query the Maui server. If MonAMI is running on the same
machine as the Maui server, (most likely) no additional host will
be needed. If MonAMI is running on a remote machine, then
access-right must be granted for that machine. Append the remote
host's hostname to the space-separated ADMINHOST
list.
The plugin will also need to use a valid username. By default it
will use the name of the user it is running as (monami
),
but the plugin can use an alternative username (see the
user attribute). To add an additional
username, append the username to the space-separated
ADMIN3
list.
The following example configuration shows how to configure Maui to
allow monitoring from host
as user
monami.example.org
.
monami
SERVERHOST maui-server.example.org ADMIN1 root ADMIN3monami
ADMINHOST maui-server.example.orgmonami.example.org
RMCFG[base] TYPE=PBS SERVERPORT 40559 SERVERMODE NORMAL
The Maui authenticates by the client and server keeping a shared secret: a password. Currently this password must be integer number. Unfortunately, the password is decided as part of the Maui build process. If one is not explicitly specified, a random number is selected as the password. The password is then embedded within the Maui client programs and used when they communicate with the Maui server. Currently, it is not possible to configure the Maui server to use an alternative password without rebuilding the Maui client and servers.
To communicate with the Maui server the maui plugin must know the password. Unfortunately, as the password is only stored within the executables, it is difficult to discover. The maui plugin has heuristics that allow it to scan a Maui client program and, in most cases, discover the password. This requires a Maui client program to be present on whichever computer MonAMI is running. If the Maui client is in a non-standard location, its absolute path can be specified with the exec attribute.
If the password is known (for example, its value was specified when compiling Maui) then it can be specified using the password attribute. Specifying the password attribute will stop MonAMI from scanning Maui client programs.
Once the password is known, it can be stored in the MonAMI configuration using the password attribute. This removes the need for a Maui client program. However, should the Maui binaries change (for example, upgrading an installed Maui package), it is likely that the password will also change. This would stop the MonAMI plugin from working until the new password was supplied.
The recommended deployment strategy is to install MonAMI on the Maui server and allow the maui plugin to scan the Maui client programs for the required password.
When communicating between the maui and Maui server, both parties want to know that the messages are really from the other party. The shared-secret is one part of this process, another is to check the time within the message. This is to prevent a malicious third-party from sending messages that have already been sent: a “replay attack”.
To prevent these replay attacks, the clocks on the Maui server and the server MonAMI is running must agree. If both machines are well configured, their clocks will agree with ~10 millisecond difference. Since the network may introduce a slight delay, some tolerance is needed.
The maui plugin requires an agreement of one second by default. This should be easy to satisfied with modern networks. If, for whatever reason, this is not possible the tolerance can be make more lax by specifying the max_time_delta attribute.
Should there be a systematic error between the clocks on two servers, effort should be made in synchronosing those clocks. Increasing the max_time_delta makes MonAMI more vulnerable to replay attacks.
the hostname of the Maui server. If not specified,
localhost
will be used.
the TCP
port to which the plugin with connect. If not
specified, the default value is 40559.
the user name to present to the Maui server when communicating. The default value is the name of the account under which MonAMI is running.
the maximum allowed time difference, in seconds, between the server and client. The default value is one second.
the shared-secret between this plugin and the Maui server. The default policy is to attempt to discover the password automatically. Specifying the password will prevent attempts at discovering it automatically.
the time MonAMI should wait for a reply. The string is in
time-interval format (e.g., “5m
10s
” is five minutes and ten seconds;
“310
” would be equivalent).
The default behaviour is to wait indefinitely.
the absolute path to the mclient (or
similar) Maui client program. If the plugin was
unsuccessful scanning the program given by exec
it will also try standard locations.
This plugin monitors the performance of a MySQL database. MySQL is a commonly used Free (GPLed) database. The parent company (MySQL AB) describe it as “the world's most popular open source database”. For more information, please see the MySQL home page
The statistics monitored are taken from the status variables.
They are acquired by executing the MySQL SQL SHOW
STATUS;
. The raw variables are described in the MySQL
manual, section 5.2.5: Status
Variables.
The metrics names provided by MySQL are in a flat namespace. These names are not used by MonAMI; instead, the metrics are mapped into a tree structure, allowing more easy navigation of, and section from, the available metrics.
To function, this plugin requires an account to access the
database. Please note: this database account requires no database
access privileges, only that the username and password will allow
MonAMI to connect to the MySQL database. For security
considerations, you should not employ login
credentials used elsewhere (and never root
or similar
power-user). The following is a suitable SQL statement for
creating a username and password of
and
monami
.
monamipass
CREATE USER 'monami
'@'localhost' IDENTIFIED BY "monamipass
";
Sharing login credentials is not recommended. If you decide to
share credentials make sure the MonAMI configuration file is
readable only by the monami
user (see Section 3.2.2, “Dropping root
privileges”).
In addition to monitoring a MySQL database, the mysql plugin can also store information MonAMI has gathered within MySQL. This is described in Section 3.5.8, “MySQL”.
the username with which to log into the server.
the password with which to log into the server
the host on which the MySQL server is running. If no host
is specified, the default localhost
is used.
The null plugin is perhaps the simplest to understand. As a monitoring plugin, it providing an empty datatree when requested for data. The main use for null as a monitoring target is to demonstrating aspects of MonAMI without the distraction of real-life effects from other monitoring plugins.
The null plugin will supply an empty datatree. In conjunction with a reporting plugin (e.g., the snapshot), this can be used to demonstrate the map attribute for adding static content. This attribute is described in Section 3.3.3, “The map attribute”.
Another use for a null target is to investigate the effect of a service taking a variable length of time to respond with monitoring data. This is emulated by specifying a delay file. If the delayfile attribute is set, then the corresponding file is read. It should contain a single integer number. This number dictates how long (in seconds) a null target should wait when requested for data. The file can be changed at any time and the change will affect the next time the null target is read from. This is particularly useful for demonstrating how MonAMI estimates future delays (see Section 3.3.4, “Estimating future data-gathering delays”) and undertakes adaptive monitoring (see Section 3.6.4, “Adaptive monitoring”).
The following example will demonstrate this usage:
[null] delayfile=/tmp/monami-delay [sample] read = null write = null interval = 1s
Then, by changing the number stored in
/tmp/monami-delay
, the delay can be adjusted
dynamically. To set the delay to three seconds, do:
$ echo 3 > /tmp/monami-delay
To remove the delay, simply set the delay to zero:
$ echo 0 > /tmp/monami-delay
the filename of the delay file, the contents of which is parsed as an integer number. This number is the number of seconds the null target will delay when replying with an empty datatree.
Network UPS Tools (NUT) provides a standard method through which an Uninterruptable Power Supply (UPS) can be monitored. Part of this framework allows for signalling, so that machines can undergo a controlled shutdown in the event of a power failure. Further details of NUT are available from the NUT home page.
The MonAMI nut plugin connects to the NUT data
aggregator daemon (upsd
) and queries the status of all known,
attached UPS devices. The ups.conf
file must
be configured for available hardware and the startup scripts must
be configured to start the required UPS-specific monitoring
daemons.
By default, localhost
will be allowed access to the upsd
daemon but access for external hosts must be added explicitly in
the upsd.conf
file. See the NUT
documentation on how best to achieve this.
the host on which the NUT upsd
daemon is running. The
default value is localhost
.
the port on which the NUT upsd
daemon listens. The
default value is 3493.
The process plugin monitors Unix processes. It can count the number of processes that match search criteria and can give detailed information on a specific process.
The information process gives should not be confused with any process, memory or thread statistics other monitoring plugins provide. Some services report their current thread, process or memory usage, which may duplicate some of the information this plugin reports (see, for example, Section 3.4.2, “Apache” and Section 3.4.8, “MySQL”). However, process reports information from the kernel and should work with any application.
The process plugin has two main types of monitors: counting processes and detailed information about a single process. A single process target can be configured to do any number of either type of monitoring and the results are combined in the resulting datatree.
To count the number of processes, a count
attribute must be specified. In its simplest form, the
count attribute value is simply the name of
the process to count. The following example reports the number of
imapd
processes that are currently in existance.
[process] count = imapd
The format of the count attribute allows
for more sophisticated queries of form:
reported name
:
proc name
[cond1
,
cond2
, ...]
All of the parts are optional: the part upto and including the
colon
(
),
the part after the colon but before the square brackets
(reported name
:
) and the
part in square brackets (proc name
[
) can be omitted, but
at least one of the first two parts must be specified. The
examples below may help clarify this!
cond1
,
cond2
, ...]
To be included in the count, a process' name must match the
(if
specified). The statistics will be reported as
proc name
. If no
reporting name is specified, then
reported name
will be
used.
proc name
The part in square brackets, if present, specifies some additional
constraints. The comma-separated list of key, value pairs define
additional predicates; for example, [uid=root,
state=R]
means only processes that are running as
root
and are in state running will be counted. The valid
conditions are:
uid = uid
to be considered, the process must be running with a user ID
of
. The value
may be the numerical uid or the username.
uid
gid = gid
the process must be running with a group ID of
. The value may
be the numerical gid or the group name.
gid
state = statelist
the process must have one of the states listed in
. Each
acceptable process state is represented by a single capital
letter and they are concatinated together. Valid process
states letters are:
statelist
R
process is running (or ready to be run),
S
sleeping, awaiting some external event,
D
in uninterruptable sleep (typically waiting for disk IO to complete),
T
stopped (due to being traced),
W
paging,
X
dead,
Z
defunct (or "zombie" state).
The following example illustrates count used to count the number of processes. The different attributes show how the different criteria are represented.
[process] count = imapd ❶ count = io_imapd : imapd [state=D] ❷ count = all_java : java ❸ count = tomcat_java : java [uid=tomcat5] ❹ count = zombies : [state=Z] ❺ count = tcat_z : java [uid=tomcat4, state=Z] ❻ count = run_as_root : [uid=0] ❼
Count the number of | |
Count the number of | |
Count the number of java processes that
are running. Store the number as a metric called
| |
Count the number of java processes that
are running as user | |
Count the total number of zombie processes. Store the
number as a metric called | |
Count the number of zombie tomcat
processes. Store the number as a metric called
| |
Count the number of processes running as |
The watch attribute specifies a process to monitor in detail. The process to watch is identified using the same format as with count statements; however, the expectation is that only a single process will match the criteria.
If there is more than one process matching the search criteria then one is chosen and that process is reported. In principle, the selected process might change from one time to the next, which would lead to confusing results. In practise, the process with the lowest pid is chosen, so is both likely to be the oldest process and unlikely to change over time. However, this behaviour is not guaranteed.
Much information is gathered with a watch
attribute. This information is documented in the
stat
and status
sections
of the proc(5) manual page. Some of the
more useful entries are copied below:
the process ID the the process being monitored.
the process ID of the parent process.
a single character, with the same semantics as the different process states listed above.
number of minor memory page faults (no disk swap activity was required).
number of major memory page faults (those requiring disk swap activity).
number of jiffies[1] of time spent with this process scheduled in user-mode.
number of jiffies[1] of time spent with this process scheduled in kernel-mode.
number of threads in use by this process.
An accurate value is provided by the 2.6-series kernels. Under 2.4-series kernel with LinuxThreads, heuristics are used to derive a value. This value should be correct under most circumstances, but it may be confused if multiple instances of the same multi-threaded process is running concurrently.
virtual memory size: total memory used by the process.
Resident Set Size: number of pages of physical memory a process is using (less 3 for administrative bookkeeping).
either the name of the process(es) to count, or the conditions processes must satisfy to be included in the count. This attribute may be repeated for multiple process counting.
count attributes have the form:
reported name
:
proc name
[cond1
,
cond2
, ...]
either the name of the process to obtain detailed information, or the conditions a process must satisfy to be watched. This attribute may be repeated to obtain detailed information about multiple processes.
watch attributes have the form:
reported name
:
proc name
[cond1
,
cond2
, ...]
The stocks plugin uses one of the web-services provided by XMethods to obtain a near real-time quote (delayed by 20 minutes) for one or more stocks on the United States Stock market. Further details of this service are available from the Stocks service summary page.
In addition to providing financial information, stocks is
a pedagogical example that demonstrates the use of SOAP
within
MonAMI.
The authors of MonAMI expressly disclaim the accuracy, adequacy, or completeness of any data and shall not be liable for any errors, omissions or other defects in, delays or interruptions in such data, or for any actions taken in reliance thereon.
Please do not send too many requests. A request every couple of minutes should be sufficient.
a comma- (or space-) separated list of ticker symbols to
monitor. For example, GOOG
is the symbol for
Google Inc. and RHT
is the symbol for RedHat
Inc.
The tcp monitoring plugin provides information about the
number of TCP
sockets in a particular state. Here, a socket is
either a TCP
connection to some machine or the ability to
receive a particular connection (i.e., that the local machine is
“listening” for incoming connections).
A tcp monitoring target takes an arbitrary number of
count attributes. The value of a
count attributes describes how to report
the number of matching sockets and the criteria for including a
socket within that count. These attributes take values like:
, where
name
[cond1
,
cond2
, ...]
is the name used to
report the number of matching name
TCP
sockets. The conditions
(
,
cond1
, etc.) are
comma-separated keyword-value pairs (e.g.,
cond2
state=ESTABLISHED
). A socket must match all
conditions to be included in the count.
The condition keywords may be any of the following:
local_addr
The local IP
address to which the socket is bound. This
may be useful on multi-homed machines for sockets bound to a
single interface.
remote_addr
The remote IP
address of the socket, if connected.
local_port
The port on the local machine. This can be the numerical
value or a common name for the port, as defined in
/etc/service
.
remote_port
The port on the remote machine, if connected. This can be the numerical value or a common name for the port.
port
A socket's local or remote port must match. This can be the numerical value or a common name for the port.
state
The current state of the socket. Each local socket will be in one of a number of states and changes state during the lifetime of a connection. All the states listed below are valid and may occur naturally on a working system; however, under normal circumstances some states are transitory: one would not expect a socket to stay in a transitory state for long. A large and/or increasing number of sockets in one of these transitory states might indicate a networking problem somewhere.
The valid states are listed below. For each state, a brief description is given and the possible subsequent states are listed.
A program has indicated it will receive connections from remote sites.
Next: SYN_RECV, SYN_SENT
Either a program on the local machine is the client and is attempting to connect to remote machine, or the local machine sends data from a LISTENing socket (less likely).
Next: ESTABLISHED, SYN_RECV or CLOSED
Either a LISTENing socket has received an incoming request to establish a connection, or both the local and remote machines are attempting to connect at the same time (less likely)
Next: ESTABLISHED, FIN_WAIT_1 or CLOSED
Data can be sent to/from local and remote site.
Next: FIN_WAIT_1 or CLOSE_WAIT
Start of an active close. The application on local machine has closed the connection. Indication of this has been sent to the remote machine.
Next: FIN_WAIT_2, CLOSING or TIME_WAIT
Remote machine has acknowledged that local application has closed the connection.
Next: TIME_WAIT
Both local and remote applications have closed their connections “simultaneously”, but remote machine has not yet acknowledged that the local application has closed the local connection.
Next: TIME_WAIT
Local connection is closed and we know the remote site knows this. We know the remote site's connection is closed, but we don't know if the remote site know that we know this. (It is possible that the last ACK packet was lost and, after a timeout, the remote site will retransmit the final FIN packet.)
To prevent the potential packet loss (of the local machine's final ACK) from accidentally closing a fresh connection, the socket will stay in this state for twice MSL timeout (depending on implementation, a minute or so).
Next: CLOSED
The start of a passive close. The application on the remote machine has closed its end of the connection. The local application has not yet closed this end of the connection.
Next: LASK_ACK
Local application has closed its end of the connection. This has been sent to the remote machine but the remote machine has not yet acknowledged this.
Next: CLOSED
The socket is not in use.
Next: LISTEN or SYN_SENT
A pseudo state. The transitory states when starting a connection match, specifically either SYN_SENT or SYN_RECV.
A pseudo state. The transitory states when shutting down a connection match, specifically any of FIN_WAIT_1, FIN_WAIT_2, CLOSING, TIME_WAIT, CLOSE_WAIT or LASK_ACK match.
The states ESTABLISHED and LISTEN are long-lived states. It is natural to find sockets that are in these states for extended periods.
For applications that use “half-closed” connections, the FIN_WAIT_2 and TIME_WAIT states are less transitory. As the name suggests, half-closed connections allows data to flow in one direction only. It is achieved by the application that no longer wishes to send data closing their connection (see FIN_WAIT_1 above), whilst the application wishing to continue sending data does nothing (and so suffers a passive close). Once the half-closed connection is established, the active close socket (which can no longer send data) will be in FIN_WAIT_2, whilst the passive close socket (which can still send data) will be in CLOSE_WAIT.
There are two pseudo states for the normal transitory states: CONNECTING and DISCONNECTING. They are intended to help catch networking or software problems.
The following example checks whether an application is listening on three well-known port numbers. This might be used as a check whether services are running as expected.
[tcp] name = listening count = ssh [local_port=ssh, state=LISTEN] count = ftp [port=ftp, state=LISTEN] count = mysql [local_port=mysql, state=LISTEN]
The following example records the number of connections to a
webserver. The established
metric records the
connections where data may flow in either direction. The other
two metrics record connections in the two pseudo states. Normal
traffic should not stay long in these pseudo states; connections
that persist in these states may be symptomatic of some problem.
[tcp] name = incoming_web_con count = established [local_port=80, state=ESTABLISHED] count = connecting [local_port=80, state=CONNECTING] count = disconnecting [local_port=80, state=DISCONNECTING]
the name to report for this metric followed by square
brackets containing a comma-separated list of conditions a
socket must satisfy to be included in the count. This
option may be repeated for multiple TCP
connection counts.
The conditions are keyword-value pairs, separated by
=
, with the following valid keywords:
local_addr
,
remote_addr
,
local_port
,
remote_port
, port
,
state
.
The state
keyword can have one of the
following TCP
states: LISTEN, SYN_RECV, SYN_SENT,
ESTABLISHED, CLOSED, FIN_WAIT_1, FIN_WAIT_2,
CLOSE_WAIT, CLOSING, TIME_WAIT, LASK_ACK; or one of
the following two pseudo states: CONNECTING, DISCONNECTING.
Apache Tomcat is one of the projects from the Apache Software
Foundation. It is a Java-based application server (or servlet
container) based on Java Servlet and JavaServer Pages
technologies. Servlets and JSP
are defined under Sun's Java
Community Process. More information about Tomcat can be found
at the Apache
Tomcat home page.
Also under development of the Java Community Process is the Java
Monitoring eXtensions (JMX
). JMX
provides a standard method
of instrumenting servlets and JSP
s, allowing remote monitoring
and control of Java applications and servlets.
The tomcat plugin uses the JMX
-proxy servlet to monitor
(potentially) arbitrary aspects of a Servlet and JSP
s. This
provides structured plain-text output from Tomcat's JMX
MBean
interface. Applications that require monitoring should connect to
that interface for MonAMI to discover their data.
To monitor a custom servlet, the required instrumentation within
the servlet/JSP
must be written. Currently, there is an
additional light-weight conversion needed within MonAMI, adding
some extra information about the monitored data. Sample code
exists that monitors aspects of the Tomcat server itself.
Any tomcat monitoring target will need a username and
password that matches a valid account within the Tomcat server
that has the manager
role. This is normally configured in the file
$CATALINA_HOME/conf/tomcat-users.xml
.
Including the following line within this file creates a new user
monami
, with password
and
monami-secret
manager
role, to
Tomcat.
<user username="monami
" password="monami-secret
" roles="manager"/>
This line should be added within the
<tomcat-users>
context.
Be aware that Basic authentication sends the username and password unencrypted over the network. These values are at risk if packets can be captured. If you are not sure, you should run MonAMI on the same server as Tomcat.
In addition to connecting to Tomcat, you also need to specify which classes of information you wish to monitor. The following are available: ThreadPool and Connector. To monitor some aspect, you must specify the object type along with the identifier for that object within the monitoring definition. For example:
[tomcat] name = local-tomcat ThreadPool = http-8080 Connector = 8080
ThreadPool monitors a named thread pool (e.g.,
http-8080
), monitoring the following
quantities:
the minimum number of threads the server will maintain.
the number of threads that are either actively processing a request or waiting for input.
total number of threads within this ThreadPool.
if the number of spare threads exceeds this value, the excess are deleted.
an absolute maximum number of threads.
the priority at which the threads run.
The Connector monitors a ConnectorMBean and is identified by which port it listens on. It monitors the following quantities:
Can we trace the output?
Did the client authenticate?
Is the connection compressed?
Is the upload timeout disabled?
Is there no session?
Are lookups enabled?
Is the TCP
SO_NODELAY
flag set?
does the URI
contain body information?
are the connections secure?
number of pending connections this Connector will accept before rejecting incoming connections.
size of the input buffer.
how long the connection lingers, waiting for other connections.
the timeout for this connection.
the timeout for uploads.
the maximum size for HTTP
header.
how many keep-alives before the connection is considered dead.
maximum size of the information POST
ed.
c.f. ThreadPool
c.f. ThreadPool
c.f. ThreadPool
c.f. ThreadPool
the port on which this connector listens.
the proxy port associated with this connector.
the port to which this connector will redirect.
which protocol the connector uses
(e.g., HTTP/1.1
)
the SSL
protocol the connector uses (e.g.,
TLS
)
which scheme the URI
will use (e.g.,
http
, https
)
The tomcat monitoring target accepts the following options:
the hostname of the machine to monitor. The default value
is localhost
.
the TCP
port on which Tomcat listens. The default value
is 8080
the path to the JMX
-proxy servlet within the application
server URI
namespace. The default path is /manager/jmxproxy/
the username to use when completing Basic authentication.
the password to use when completing Basic authentication.
The Torque homepage describes Torque as “an open source resource manager providing control over batch jobs and distributed compute nodes.” Torque was based on the original PBS/Open-PBS project, but incorporates many new features. It is now a widely used batch control system.
Torque is heavily influenced by the IEEE 1003.1 specification,
in particular Section
3 (Batch Evironment Services) of the Shell
& Utilities volume. However, it also includes some additional
features, such as support for jobs in the suspended
state.
Torque uses username-and-host based authorisation. Users may query the status of their own jobs, but may require special privileges to view the status of all jobs. Because of this, the MonAMI torque plugin may require authorisation to gather monitoring information.
To grant torque sufficient privileges to conduct its
monitoring, the Torque server must have either
query_other_jobs
set to True
(allowing all users to see other user's job information) or have
the MonAMI user (typically monami
) and host added as one of
the operators
. Setting either option is
sufficient and both can be achieved using the
qmgr command.
The command qmgr -ac "list server
query_other_jobs"
will display the current value of
query_other_jobs
. To allow all users to see
other user's job status, run the command: qmgr -ac "set
server query_other_jobs = True"
.
The command qmgr -ac "list server operators"
will display the current list of operators. To add user
monami
running on host mon-hq.example.org
as another
operator, use the command qmgr -ac "set server operators
+=
.
monami
@mon-hq.example.org
"
It is often useful to group together multiple execution queues when generating statistics. The group may represent queues with a similar purpose, or the group represents a set of queues that support a wider community. MonAMI supports this by allowing the definition of queue-groups and will report statistics for each of these groups.
A queue-group is defined by including a group attribute in the torque target. Multiple groups can be defined by repeating the group attributes, one attribute for each group.
A group attribute's value defines the group
like:
, where
name
:
queue1
,
queue2
, ...
is the name of
the queue-group and
name
is the first
queue to be included,
queue1
the second,
and so on. The group statistics are generated based on all jobs
that have any of the listed execution queues.
queue2
As an example, the following torque stanza defines four
groups: HEP
, LHC
,
Grid OPS
, and Local
.
[torque] group = HEP : alice, atlas, babar, dzero, lhcb, cms, zeus group = LHC : atlas, lhcb, cms, alice group = Grid OPS : dteam, ops group = Local : biomed, carmont, glbio, glee
the hostname of the Torque server. If not specified, a
default value will be used, which is specified externally to
MonAMI. This default may be localhost
or may be
configured to whatever is the most appropriate Torque
server.
defines a new queue-group that statistics are collected
against. The group value is like:
. Each
Torque queue may appear in any number (zero or more) of
queue-group definitions.
name
:
queue1
,
queue2
, ...
The Varnish home
page describes Varnish as a
“state-of-the-art, high-performance HTTP
accelerator. Varnish is targeted primarily at the FreeBSD
6/7
and Linux
2.6 platforms, and takes full advantage of the virtual
memory system and advanced I/O features offered by these operating
systems.”
Varnish offers a management interface. The MonAMI varnish plugin connects to this this interface and request the server's current set of statistics.
the host on which Varnish is running. Default is
localhost
.
the TCP
port on which the Varnish management interface
is listening. The default value is 6082.
[1] a jiffy is hard-coded period of time. On most Linux machines, it is 10ms (1/100s). It can be altered to some different value, but it remains constant whilst the kernel is running. In practise, the number of jiffies since the machine booted is held as a counter, which is incremented when the timer interrupt occurs.