2005-11-16 Jason Schoonover Unnoc RRD Support 1. Introduction 2. Details of the graphs 3. Configuration 4. Performance 5. Limitations 1. Introduction As of unnoc 1.0.4, RRD graphing support has been added in. RRD is the successor to MRTG, you can graph more than one variable, have a line or area graph and have all sorts of different colors and shadings and stuff. It's a bit different than MRTG, MRTG is kind of application specific--it expects two variables, in and out, and will plot them on a graph and supply historical graphs. It was originally designed for routers, but sysadmins have graphed everything from CPU usage to temperature to disk usage. RRDTool is a little more generic, it will just take a value and plot it. So if you want to mimic MRTG, you would have to set up a graph to do in/out bytes with one AREA green graph and a LINE blue graph. However, if you wanted to graph something like all of the disks on your server, then you could easily just throw 4 lines on a graph and that's it. Or if you wanted to graph all of the apache2 processes that a cluster is running, then you could easily throw all of those on one graph. Or you could graph all of the /tmp/ bytes used on all of your servers, for instance. It's much, much, much more powerful and has very few limitations. Currently I've implemented the following in unnoc, for servers and plugins: - load averages - number of processes running - disk usage - open TCP connections - CPU usage - memory usage / available - network interface monitoring - OpenLDAP specific RRD plugin graphing - MySQL specific RRD plugin graphing - UPS load percentage - Temperature As of 1.0.5, MRTG is no longer required for servers. All servers use RRD for all graphing, this should free up load on the server (not running MRTG on them all the time) and should also make for much more powerful analysis of what your server is doing. As of 1.0.6, MRTG is no longer required at all. All plugins and hosts use RRD for graphing, which allows for much more powerful analysis of what is going on with your network devices. Configuration is extremely trivial, you basically just enable it and it goes, see the configuration and performance section below. To view the graphs, click on any of the processes, load averages or disks on the server view. It will bring up a 24 hour graph, a weekly graph, a monthly graph and a yearly graph. You can also click on each of those graphs again, and it will zoom in and show you a much more useful graph (in most cases). 2. Details There is generally one RRD file per object. And object is defined as: one load average one process one disk one network interface one aspect (memory, or CPU for instance) And there are 8 graphs per RRD file: daily, weekly, monthly and yearly, and then zoomed daily, zoomed weekly, zoomed monthly and zoomed yearly. This makes each RRD file completely separate from each other The really cool thing about RRDTool is that you can have more than one RRD file in the graph, so in the future there can be all sorts of combination graphs, clusters of graphs, disk graphs with all sorts of configurations. It's extremely flexible. i. Processes Each graph has limits set on it. They are limits that are pulled from SNMP. So, for instance, are monitoring apache2 and have the max and min set to 15 and 5, there will be a line drawn at 15 and a line drawn at 5. Each scope having it's own color, which is configured in the graph_colors.conf configuration file. ii. Load Averages Load averages have three zones: good, warning and high. The warning is defined by $warn_load_avg in the unnoc.conf, the high is pulled form SNMP off of the server, the good is anything below warning. Each zone has it's own configurable color in the graph_colors.conf iii. Disks Disks also have three zones: good, warning and high. The warning is defined as $warn_disk_space (a percentage), and the high is defined in SNMP configuration. Each zone has it's own configurable colors in the graph_colors.conf file iv. Network Interfaces Network interfaces just record the in bytes and out bytes, so they have two colors. There are no other warning colors or "threshold" colors. Each color is defined in the graph_colors.conf file. v. Memory Memory usage has two zones: real and swap. It will stack swap on top of real, so you can see how much swap is being used (or reportedly being used by SNMP). Each color is defined in the graph_colors.conf file. vi. CPU Usage There are two graphs for CPU usage so far. One is the System usage vs. User usage, and the Active Load % which is a combination of System+User. Each color is defined in the graph_colors.conf file and is configurable. vii. TCP Connections TCP connections have two colors: an outline color and solid color. This is useful if you have a jagged graph but would like to see a more defined outline of the graph itself. Each color is defined in the graph_colors.conf file. viii. Plugin specific graphs Each plugin specific graph (Mysql, LDAP, etc) will come with it's own documentation and is explained there. Again each of the colors for every graph is specified in the graph_colors.conf file 3. Configuration First you need to install rrdtool, you can download it from: http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/download.en.html Or you can find your OS's distribution for it, for debian it's just as simple as 'apt-get install rrdtool'. The RRDTool version that I've developed against is: 1.2.11 I haven't tested it with any other versions, so my recommendation is to just upgrade to that version. To enable it, simply set this variable in unnoc.conf: rrd_enable = 1 Tell unnoc where the rrdtool executable is: rrdtool_bin = /usr/bin/rrdtool You can safely leave the other two rrd_* variables alone, those are used internally in unnoc Now create this directory (if it doesn't exist already): mkdir /location/of/unnoc/rrd And you're done, it will start collecting data every time check-stats.pl is run (whatever cron_stats is set to). Graphs will show up after about 10 or 15 pieces of data. So that means that if you're monitoring it every 2 minutes, then it will show up in about 30 minutes or so. If you get an error like "Disk not being monitored" when you click on it or "Process not being monitored" then wait 10 or 15 minutes for the monitoring to kick in. If you still can't see it then, double check that the rrd/ directory exists in unnoc/. If you -still- can't see it, check if the rrdtool executable exists, and if that doesn't work then check the apache2 logs and check the permissions on the images/ directory. 4. Performance After doing CPU tests, the actual rrdtool create and rrdtool update functions CPU usage is minimal, I'd say maybe a 2% CPU increase. rrdtool is fast and very optimized. The slow part, however, comes in actually creating the graphs. I've added in the code to create daily, weekly, monthly, and yearly graphs for every load average, every process, every disk, and 4 other graphs per server, and also network interfaces as well. So if you have 10 servers: each monitoring 3 load averages, 3 disks and 10 processes, 2 network interfaces, and memory, TCP connections, CPU usage (there are two graphs for CPU) that's 22 daily graphs per server * 4 (for daily, weekly, monthly and yearly graphs) = 88 graphs PER server, * 2 (for zoomed graphs) = that's 176 graphs per server. So if you have 10 servers, that's 1760 graphs!! And if these are run from cron then they need to be updated every 2 minutes. This is where I ran into trouble, you can't possibly update all of these graphs every two minutes, otherwise the rrd-graph.pl will start overlapping with each other and things will get bad. So the solution is to just create the graphs on the fly. The graphs will only be viewed 4 (daily, weekly, monthly, yearly) at a time, and the graph generation for 4 graphs is very minimal, maybe under 0.5 seconds for the 4 graphs (depending on the load of the server). You ask if it's so minimal, why do it on the fly? Well, if you're trying to collect data from SNMP AND create graphs, that's where it just takes a really long time. So I figure that it would make much more sense to just do it on the fly for a few reasons. Mainly, we only care about a 2 or 3 minute window, so why would we want to generate ALL of the graphs every time, if we only want to actually see it only at a given time? Makes more sense to me to just generate it when you actually want to view it. The only downfall of this approach is that there is a tiny delay (again, depending on what time you ask for the graph) for when you click on the link and the graphs actually show up. I think that the delay is minimal and acceptable. Comments definitely welcome. 5. Limitations The only limitation I've found so far with RRDTool is that you can't make your graphs grow from the left, like MRTG (where all the new stuff shows up on the right). In current implementations, you can only grow from the right, so all the new stuff shows up on the right. Hopefully Tobias will add in a GrowLeft feature soon.. vim:tw=72:wm=1