I have my own collection of plugins to give interesting information to Ganglia, a popular distributed cluster monitoring application. This is to supplement the list at sourceforge, which has been closed for a long time and I don't really like most of the metrics there anyways. All of these metrics are aimed at Linux, though some of them will work (or could be ported to) other operating systems. I call each of these scripts once every two minutes from /etc/cron.d/ganglia (don't forget the user field - root).
Without further ado...
2007-05-31 ganglia_ramdisk.tgz
Ganglia ramdisk is a collection of a cronjob and a start script that help you set up the gmetad rrds repository on a ramdisk intsead of using physical disk. This helps as your cluster grows. Missing is grub.conf, which should be modified to specify the size of the ramdisk you wish to create. Also missing is switching /var/lib/ganglia/rrds (or wherever your default storage is) to a symlink and pointing it to the ramdisk.
2007-03-08 nfsd_gmetric.pl (local mirror)
nfsd_gmetric.pl is a script written by Vladimir Vuksan that collects information about the NFSd server, including access, getattr, setattr, remove, etc. It reads the info straight out of /proc, which makes it nice and light weight. All his ganglia gmetric scripts can be found at his ganlia gmetric repository.
2006-11-15 mcd_gmetric.sh
mcd_gmetric.sh collects metrics from the excellent memcached package. The metrics it collects include total number of connections and items, in addition to the number of gets, sets, and hits per second and the overall hit percentage. Knowing these numbers greatly helps you tune your memcached installation. Please Note: this version of the script only works with memcached version 1.1.13. To change it to work with different versions, the index into the array holding the stats must change; the package reports different stats with each version. Really, the fault lies in my inflexible code which I will fix one of these days, but until then, you must tailor this script to your installed version. (p.s. versions prior to 1.1.13 can't take more than 3GB RAM due to a bug. That's why I use 1.1.13... ;)
Jan Miczaika has written a more stable version (that actually keys off the memcached stat name rather than its postition) that is available on the HitFlip Open Source page. His doesn't report the hit percentage yet, but he is planning on adding some of my stats to his script. So our two scripts are on a convergent path towards perfection... :)
2006-06-27 ganglia_apache.pl
Credit for this script goes to Nicolas Marchildon. ganglia_apache.pl interfaces with apache to report the number of queries per second, sorted by return code (200 = ok, 300 = redirect, etc.) This script is not called from cron, but instead hooks into apache via the following two lines in the httpd.conf file:
LogFormat "%>s" status_only CustomLog "|/path/to/apache-logs-to-ganglia.pl -d 10" status_only
2006-02-07 disk_wait_gmetric.sh
disk_wait_gmetric.sh collects the '%utilization' and 'await' statistics about each device. This script runs for 30s to collect the needed information. It also requires iostat from the sysstat package
2006-01-31 mysql_gmetric.sh
mysql_gmetric.sh collects the following metrics from mysql:
- number of threads
- number of queries per second
- number of slow queries per second (though this metric is kinda meaningless)
- seconds a slave is behind the master (if it is a slave)
2006-01-31 network_gmetric.sh
network_gmetric.sh collects network usage statistics on a per-interface level. This script collects its information directly from /proc/net/dev and looks for devices that are named 'eth', so it will only work on Linux.
2006-01-31 sensors_gmetric.sh
sensors_gmetric.sh collects CPU temperature and voltage on a Tyan motherboard. This is my least happy script.
2006-01-31 disk_gmetric.sh
disk_gmetric.sh collects the number of writes and reads per disk. It requires the 'iostat' program, from the sysstat package.



