2006-10-31 Jason Schoonover Unnoc config file explanation This file is here to explain the variables of the unnoc.conf file and the syntax of it as well. 1. Syntax 2. unnoc.conf variables a. normal variables b. host directive variables 3. graph_colors.conf variables This should be a complete list of all variables available in unnoc.conf 1. Proper syntax for variables Currently there are 2 types of variables in the unnoc.conf file: 1. normal variables 2. arrays a. normal variables I don't know what else to call these, they are the variables are one liners, like a scalar variable in perl. Once you set a normal variable, you can use it anywhere in the unnoc.conf file by placing a '$' in front of it. example: webroot = /var/www/unnoc ignore_hosts = $webroot/etc/ignore_hosts b. arrays Arrays are just a group of lines of txt that are surrounded by brackets {}'s. The definition is very picky right now, it must be defined with the starting bracket on the same line as the definition, then the values start on the NEXT line, then the ending brace is on a line of it's own. Any other way and you're going to confuse the config parsers. You cannot reuse any arrays in the config file. proper examples: page_group { group = defaultgroup recipient = email@domain.com, email2@domain.com } host { hostname = host community = public updown = ping } improper examples: ***************************** These will *not* work!!: host { hostname = host; community = comm; } ss_valid_emails = { email@domain.com email2@domain.com }; email_group = { group = email@domain.com } host { hostname = host1 community = comm } nomrtg { hostname = host community = comm } Those do NOT work ***************************** 2. Explanation of all of the variables that are available in unnoc.conf a. normal / global variables webroot - (normal variable) webroot is the file system location of the install base of unnoc. I use /var/www/unnoc, although you may use whatever you want. wwwroot - (normal variable) wwwroot is the HTML reference location of webroot (i.e. unnoc.domain.com/webroot_path) this is a relational path in the HREF requests. If you create a VirtualHost in your apache config and set $webroot as the DocumentRoot, then you would set this to '/'. rrd_enable - (normal variable) Value is an integer, 0 or 1 A value of 1 will enable rrd data collecting in check-stats.pl, it will also allow you to view the graphs on the status page of any server rrdtool_bin - (normal variable) Value is a string, should be a full path to the rrdtool binary mail_host - (normal variable) mail_host is the SMTP relay machine that is used to send emails out. Currently it cannot authenticate via username and password, but only straight SMTP. ignore - (normal variable) Value is a number 1 or 0. Set this to ignore a host completely. It will not alert, not monitor and not graph. alert - (normal variable) Value is a number 1 or 0. Set this to 1 to enable alerts. Set this to 0 to disable all emails sent out alert_blackout Value is a string, proper syntax is, the "+" is optional: HH:MM - +HH:MM Proper time syntax is 24 hour time. Valid hours are 0-23, valid minutes are 0-59 This is both a per host directive and a global directive. If both exist, then both are checked (neither overwrites the other). This is a string that will tell unnoc not to send any alerts out (either pages or emails) during a specific window. This is very handy for a few reasons: a. if you have a server that has a hard time coping with lots of disk activity, and performs some routine maintenance every day that causes the load average to spike, every day at the same time. b. if you have a server that has a hard time coping with some sort of backup procedure that happens at the same time every day c. if you have any server or device that will always go down/up at the same time every day If any of the above circumstances are true, then if you configure this option, it will not alert you at any time during this Syntax is a time range, separated by a dash "-"; if the end time is on the next day, then place a plus sign "+" in front of the time. Examples: ## this will silence all alerts from 9:05PM to 10:05PM alert_blackout = 21:05 - 22:05 ## this will silence all alerts from 11:50PM to 2:00AM ## the next day, note the "+" sign indicating that it ## should go to the next day alert_blackout = 23:50 - +02:00 ## this will silence all alerts from noon all the way ## through to 11:05AM the next day. That means that the ## only time that unnoc is allowed to send alerts out is ## from 11:05AM to 12:05PM. Why you would want to do ## this is beyond me, but if you did, that's how you ## would do it alert_blackout = 12:05 - +11:05 This is also a global setting. In the case that there is a global setting, *both* times are counted. For instance, if you have a host "server1" with a perhost setting of 01:00 - 02:00, and you also have a global setting of 23:00 - 0:00, then during the time window of 11:00PM to midnight, you would receive no emails. And server1, in addition to the global setting, would silence all alerts from 1:00AM to 2:00AM. WARNING: all alerts will be silenced during this time window. That means that if a host has some serious problem or something happens that's not supposed to happen during this window, unnoc will NOT notify anyone of this. USE THIS WITH CAUTIN. Try to keep the window as small as possible. alert_with_description - (normal variable) Value is a number 1 or 0. Set this to 1 use the description as the identifier when sending alerts. To disable, set to 0. Example: host { hostname = server1 description = Main DB Server ($hostname) type = ms-server .. } For all alerts, it will look like this: "Alert! Main DB Server (server1) is DOWN!" If alert_with_description is disabled, it will look like this: "Alert! server1 is DOWN!" alert_with_description is enabled by default. email_from - (normal variable) email_from is the From: address that all alerts get sent out as email_group - (array) email_group is a variable that will determine sets of groups to send alerts off to. It is associated with the host directive "alert_group" If you want paging enabled, then it should have a cooresponding page_group group as well. It has two subvariables: group and recipient group should be one word or number recipient can be a comma separated list of email addresses daemon_user - (normal variable) This is the user that the unnoc daemon will run as daemon_group - (normal variable) This is the group that the unnoc daemon will run as daemon_concurrent - (normal variable) The number of concurrent hosts that will be checked at any given time. This number should be tweaked on a per installation basis. This is the maximum number of forks that unnocd will do at any given time. Default is 10 daemon_alert_on_die - (normal variable) if this variable is set, then an alert will be sent out to the $daemon_alert_group stating that the unnoc daemon has died. NOTE: this is NOT the same as starting/stopping the unnocd daemon. These alerts will get raised if something goes majorly wrong with the daemon and it unexpectedly dies. It is not common for it to die, but if it does then Unnoc is no longer running. The default is 0 daemon_alert_group - (normal variable) this is the alert group that will receive daemon alerts (only for when the daemon goes down) if daemon_alert_on_die is set email_to - (array) [ deprecated ] See email_group. email_to is a list of email address that get sent unnoc alerts; if this is specified, then it will be the default email_group page - (normal variable) Set this variable if you wish to receive pages, if you do not wish to receive pages, set this to 0. If you're temporarily working on a server or a device and don't mind the emails, but don't' want to be charged for the SMS alert, then set this to 0. This started out as a txt file, as of 1.0.1 it moved to the variable 'nopage', and then in 1.0.6 it was changed to 'page.' page_group - (array) This variable has two sub variables: group and recipient group should be a name or a number that is associated with an email_group. A host uses the "alert_group" directive to determine whether or not it notifies a particular individual group can be only one word recipient should be a comma separated value of email addresses page_to - (array) [ deprecated ] See the page_group variable This variable is a list of email addresses, on per line, that will determine who gets paged for all ups/downs. red, green, yellow, grey, white - (normal variables) These are the links to the images that are used in on the main page. The defaults are fine unless you're using you're own pictures db_user, db_pass, db_host - (normal variables) Database information. username, password and hostname to connect to for the MySQL database. you can safely leave all of the database variables alone, even if you're not using a particular plugin description - (normal variable) Valid value is a string, quotes not required This is the title description of the unnoc installation on this host. It is the title displayed on the main index.php page. This can also be set on a per host basis. fuzz - (normal variable) This setting is the amount of times that a host with an alert is called before an alert is raised. This is usually set on a per host basis, however if you want to set it globally then it will apply to all servers (unless otherwise set in the host definition). An example: fuzz = 3 If unnoc connects to a server and gets no response, it will try 3 more times, and on the 4th try if it doesn't receive a response, then it will notify you. This can be useful in cutting down false alarms, because many times if a server is bogged down (depending on what type of hardware it has), it might not respond in that particular second that you called it. A good value of fuzz is 2, depending on the latency to the server. This can also be set on a per host basis. warn_disk_space - (normal variable) A number, in percent, of disk usage (not disk availability) that is considered a warning. unnoc will not alert you of these, it will color the dot yellow on the main page. For actual alert thresholds, those are configured on each host in the snmpd.conf config file. This can also be set on a per host basis. warn_load_avg - (normal variable) A number that is considered a warning load average. Red alerts are set on each host in the snmpd.conf config file. This can also be set on a per host basis. load_alert - (normal variable) This is a value of one of three options: Load-1, Load-5, or Load-15 Whatever this is set to is the load averages that you are going to get warnings on. 1 and 5 minute load averages spike often, but if you have a high 15 minute load average, then chances are something is wrong. If you set this to Load-1, then you will get notified of all load averages spikes. If you set this to Load-5, then you will get notified of both Load-5 and Load-15. If you set this to Load-15, then you will get notified of only Load-15 This can also be set on a per host basis. uptime - (normal variable) This is for the Uptime plugin. This will determine how the uptime is calculated that is displayed on the main index.php. Valid values are mean, max and min. temp_warn - (normal variable) Global temperature warning setting for all temperature enable devices. This is the temperature at which you will receive an alert saying that it's too hot This can also be set on a per host basis. apc_load_warn - (normal variable) For all UPS's. If you're using an APC, you can receive load alerts if the load has become too high. I generally set the load alert on the APC itself to a number a bit higher than this, so it's kind of like two lines of defense. Or if you have an APC that doesn't have alert access, you can use this as the only line of defense. interval - (normal variable) The global interval that each host is checked with from unnoc-cron.pl. If this value is set to 2, for instance, then each host will be checked every 2 minutes. This can also be set on a per host basis. interface_reset - (normal variable) For the Cron Wrapper, unnoc-cron.pl, see README.cron_wrapper Value is a time, in the format hour:minute Commented out will disable resetting it Examples: interface_reset = 03:30 interface_reset = 23:59 This is for the network INTERFACEs, this is the time of the day that will reset all of the peak in/out values of all of the network interfaces in unnoc. The plugin was initially designed to keep peak in/out values and reset them every 24 hours. ss_interval - (normal variable) This is for the System status plugin, the proper value is for how often the system status plugin is run from unnoc-cron.pl If you want the status quickly, a valid of 1 is recommended here. ss_username - (normal variable) Part of the System Status plugin, this is the POP3 username that unnoc will use to check for system status msgs. Must be a valid account. See README.system-status for more information ss_password - (normal variable) Part of the System Status plugin. This is the password of the POP3 account. Goes with ss_username ss_pop_server - (normal variable) Part of the System Status plugin. This is the POP3 server that is used to check for system status messages. ss_valid_emails - (array) Part of the System Status plugin. Put all valid emails in here that can check the status of unnoc. b. host directive variables host - (array) This is how you specify what hosts unnoc should look at, there are common variables and there are variables that are specific only to various plugins. The syntax is of the following format: variable = value Just like normal configuration variables, except they are within the host {} brackets: host { variable = value } The following keywords are common variables between all device types: hostname The hostname of the server. Can be a FQDN, a shortened name or an IP address. Multiple values should be separated with a comma, if you specify multiple values then they all have the same values. community The SNMP community of the device snmp_port Alternate SNMP port The Default value is 161 snmp_version Alternate SNMP version Valid values are 1 and 2 The default value is 1 type The type of device, one of the following: server, ms-server, generic, apc, apc-pdu, netapp, aironet, airport, em01, vcms, esx, cisco-generic description This is the description of the device. If you write out a string description here, it will be displayed this way on the front index page. There are two special variables for description: snmp will pull the description from SNMP hostname will display the hostname If description is not specified, then it defaults to the hostname There is no default value fuzz The fuzz value for the host. This means that unnoc will try this many times to contact a server before it considers it down. The default value is whatever the global value of fuzz is. interval The minute interval to check the host. This means that unnoc will check the host every x minutes. The default value is whatever the global setting of interval is updown This is the up/down checker, this is the method that is used to determine whether or not a host is up or down. Valid values are: ping, snmp, none. If you specify ping, then the host will be pinged to see if it is considered up or down If you specify snmp, then the host will be checked via snmp to see if it is considered up or down If you specify none, then the host's uptime will attempt to be collected once, if it fails, it will simply mark leave the host untouched; if it succeeds, then it will continue to poll the rest of the device. This was put in because, at times, some flaky hardware can not respond to snmp when it is under heavy load, however the network interface seems to work just fine. On these particular server if you specify ping, then it's generally a better idea of if the server is up/down. The default value is snmp alert Whether or not an alert will be raised for any part of this host. If this value is set to 0, then no alerts will be raised at all (it would strictly be for observing the server). Valid values are 0 and 1 The default value is 1 alert_group This is a variable that is associated with an email_group and a page_group. This will allow you to assign different servers to different administrators, so that everybody doesn't get paged for everything. If no alert_group is specified, then it will default to the email_group and page_group "default"; if there is no email_group or page_group defined, then it will default to the email_to and page_to variables. Valid values are any name/number cooresponding to the email_group and page_group listed globally Multiple alert_groups should be separated by a comma Please note that if a group is specified here, then it will ONLY notify that group. Meaning that it doesn't always notify the default administrator. If you want to also notify the default group, then you should specify the default group in there as well. graph Whether or not RRD should be run against the host. If this value is set to 0, then there will be no graphing of any kind. Valid values are 0 and 1 The default value is 1 group This is for displaying on the main index.php page. You can put hosts into groups and use them as bundles to print. All hosts who have the same group value are considered part of that group. Valid value is a positive integer There is no default value Type specific options for Netapps disk This is a new directive to 1.0.7, this will allow you to set per partition limits, this is very handy if your version of ONTAP does not support this. Inside the disk directive, there are three possible values: parition, warn_usage, high_usage partition - this is the partition name as found by SNMP (note if yours specifies a trailing slash) warn_usage - this is either a percentage or a kilobyte value that will set a warning when this limit is reached. For example, if this number is set to 75%, then when disk free space is less than 25%, it will set a warning high_usage - this is either a percentage or a kilobyte value that will set an alert when the limit is reach. For example, if this number is se to 95% percent, then when the disk free space is 5% or less, then it will set an alert An example: host { hostname = netapp community = ... disk { parition = /vol/nfs/ warn_usage = 75% high_usage = 85% } disk { partition = /vol/root/ warn_usage = 800000 high_usage = 950000 } } This is only valid for Netapps Type specific options for Servers warn_disk_space The percentage of disk usage (not disk availability) for a particular server that is considered too full before a warning is raised. This should be a few steps below the hard value specified in the snmpd.conf file for a disk that is considered full. Valid values are integers between 1 and 100. The default value is whatever the global warn_disk_space is set to. load_alert This is the load average level that alerts are raised on. There are three different types of load averages: 1 minute, 5 minute and 15 minute. If you specify the 1 minute load average, then alerts will be raised on all load averages. If you specify the 5 minute load average, then alerts will be raised on both 5 and 15 minute loads. If you specify 15, then alerts will be raised on only the 15 minute load average. This really should be a per server configuration as each server is going to be different, and when something is wrong each server will act different. Valid values are: Load-1, Load-5, and Load-15 The default value is whatever the global load_alert is set to interfaces This tells unnoc which network interfaces to pay attention to. The default for unnoc is to grab all network interfaces on the server. However if you have a lot of interfaces that you don't care about, then you specify an interface number here, and all others will be ignored. Multiple values should be separated by a comma. Valid values are integers The default value is to read in all interfaces The opposite of this command is the interfaces_ignore option. interfaces_ignore This tells unnoc which network interfaces not to pay attention to. Any interface number specified here will be ignored entirely and will not be put into the database. This can be useful for ignoring only a few interfaces instead of explicitly specifying just the ones that you want. Multiple values should be separated by a comma Valid values are integers The default value is to ignore none The opposite of this command is the interfaces option. interfaces_reverse This will tell unnoc to reverse the in/out octets on the particular interface. This is handy if, for instance, you have a simple router that you can only monitor the internal interface, and it's graphing what it's SENDING to the network and RECEIVING from the internet, which is usually reversed what it should be logically. This option will reverse the values read from SNMP. Usually you shouldn't have to specify this option. The default is to not reverse the interfaces interface_columns This is the number of columns that the interface graphs are displayed on the network interfaces page. Valid value is an integer The default value is 1 show_down_interfaces If this is set, it will include the interfaces that are operationally or administratively down for any given port Valid value is 1 for enabled, 0 for disabled The default value is 0 Type specific values for Firewalls/Routers: interfaces This tells unnoc which network interfaces to pay attention to. The default for unnoc is to grab all network interfaces on the device. However there are many times, interfaces that we you don't care about on a router/firewall. You can specify interfaces here to only pay attention to these. All others will be ignored. Multiple values should be separated by a comma. Valid values are integers The default value is to read in all interfaces interfaces_ignore This tells unnoc which network interfaces not to pay attention to. Any interface number specified here will be ignored entirely and will not be put into the database. This can be useful for ignoring only a few interfaces instead of explicitly specifying just the ones that you want. Multiple values should be separated by a comma Valid values are integers The default value is to ignore none The opposite of this command is the interfaces option. interfaces_reverse This will tell unnoc to reverse the in/out octets on the particular interface. This is handy if, for instance, you have a simple router that you can only monitor the internal interface, and it's graphing what it's SENDING to the network and RECEIVING from the internet, which is usually reversed what it should be logically. This option will reverse the values read from SNMP. Usually you shouldn't have to specify this option. The default is to not reverse the interfaces interface_columns This is the number of columns that the interface graphs are displayed on the network interfaces page. Valid value is an integer The default value is 1 show_down_interfaces If this is set, it will include the interfaces that are operationally or administratively down for any given port Valid value is 1 for enabled, 0 for disabled The default value is 0 primary_interface This tells unnoc which interface to display on the host view page. Interface statistics (peak values, current values) are kept for every interface monitored, however if you specify a value here then it will be the one that is displayed on the main page. Valid values are one integer only There is no default for this Type specific values for Switches interfaces This tells unnoc which network interfaces to pay attention to. The default for unnoc is to grab all network interfaces on the switch. Multiple values should be separated by a comma. Valid values are integers The default value is to read in all interfaces interfaces_reverse This will tell unnoc to reverse the in/out octets on the particular interface. This is handy if, for instance, you have a simple router that you can only monitor the internal interface, and it's graphing what it's SENDING to the network and RECEIVING from the internet, which is usually reversed what it should be logically. This option will reverse the values read from SNMP. Usually you shouldn't have to specify this option. The default is to not reverse the interfaces interfaces_ignore This tells unnoc which network interfaces not to pay attention to. Any interface number specified here will be ignored entirely and will not be put into the database. This can be useful for ignoring only a few interfaces instead of explicitly specifying just the ones that you want. Multiple values should be separated by a comma Valid values are integers The default value is to ignore none The opposite of this command is the interfaces option. interface_columns This is the number of columns that the interface graphs are displayed on the network interfaces page. Valid value is an integer The default value is 1 show_down_interfaces If this is set, it will include the interfaces that are operationally or administratively down for any given port Valid value is 1 for enabled, 0 for disabled The default value is 0 Type specific values for an APC Smart-UPS ups_load_warn This is the value at which the maximum that the load percentage can reach before an alert is raised. On APC Smart-UPS's with Web/SNMP cards also have configurable alerts, however these alerts are quite annoying. For one, they send out multiple alerts, and they send them out in bulk. You can configure this value for a few notches below that alert level so that you can have two lines of notice. Valid values are a number between 1-100 (a percentage of load). The default value is whatever the global ups_load_warn is set to. temp_warn Temperature warning alerts on a per UPS basis. This is the temperature threshold that is considered too hot. Valid values are an integer The default value is whatever the global temp_warn is set to Type specific values for an EM01 Environment Sensor temp_warn Temperature warning alerts on a per sensor basis. This is the temperature threshold that is considered too hot. Valid values are an integer The default value is whatever the global temp_warn is set to Debugging variables print_email_alerts This is a variable that will print all email alerts to STDOUT, instead of emailing them. Good for debugging. Valid value is an integer, 0 or 1 3. Explanation of the variables in graph_colors.conf All of the variables in graph_colors.conf are normal variables. The exact same syntax applies to graph_colors.conf as it does to unnoc.conf. The variables in here are all for RRD graph colors. You can reuse a graph variable by placing a $color_ prefix in front of it, so for instance, to reuse disk_over_used, type in: $color_disk_over_used The color syntax is a 6 digit hex format, you can either add in the '#' in or you can omit it, it does not matter. example: disk_over_used = #FF0000 disk_warning_used = 00FF00 Load Average graphs load_high Color for values graphed at anything over it's the limit set in the SNMP configuration load_warning Color for values that are greater than $warn_load_avg set in the unnoc.conf file load_normal Color for values below the warning level (normal) load_high_line Color for the line drawn at the load_high threshold load_warning_line Color for the line drawn at the load_warning threshold Processes graphs process_too_many Color for values graphed at anything over it's the limit set in the SNMP configuration process_normal Color for values less than process_too_many (normal) process_too_few Color for values graphed at anything under it's limit set in the SNMP configuration process_over_line Color for the line drawn at the process_too_many threshold (the max) process_under_line Color for the line drawn at the process_too_few line (the min) Disk graphs disk_over_used Color for values graphed at anything over it's the limit set in the SNMP configuration disk_warning_used Color for values that are greater than $disk_warning set in the unnoc.conf file disk_normal_used Color for values less than warning (normal) disk_upper_limit_line Color drawn at the very top of the graph, this signifies the actual disk size of the whole disk disk_over_line Color for the line drawn at the disk_over_used threshold disk_warning_line Color for the line drawn at the disk_warning_used threshold Network interface graphs interface_in_bytes Color for the incoming (bytes received) bytes interface_out_bytes Color for the outgoing (bytes sent) bytes interface_out_bytes_outline Color for the outline around the outgoing AREA graph (if you comment this out, an outline will not be drawn) CPU graphs cpu_user Color for the User usage on the CPU graph cpu_system Color for the System usage portion on the CPU graph cpu_active Color active CPU load percentage cpu_active_trace Color around the cpu_active AREA graph TCP connections graph tcp_connections Color for the TCP Connections AREA Graph tcp_connections_trace Color around the tcp_connections AREA graph Memory graphs memory_real_used Color for the memory usage of physical RAM memory_swap_used Color for the memory usage of Swap space OpenLDAP graphs (not implemented yet) ldap_current_connections Color for the current connections on the LDAP server ldap_total_connections Color for the total connections to the server ldap_total_bytes Color for the total bytes processes by the server ldap_pdu Color for the total Protocol Data Units processed ldap_total_operations Color for the total operations performed by the LDAP server ldap_total_add Color for the total add operations performed by the LDAP server ldap_total_bind Color for total amount of binds performed by the server ldap_total_compare Color for the total compares performed by the LDAP server ldap_total_delete Color for the total deletes performed by the LDAP server ldap_total_modify Color for the total modifies performed by the LDAP server ldap_total_unbind Color for the total of unbinds by the LDAP server ldap_total_search Color for total amount of searches performed by the server MySQL graphs (not implemented yet) mysql_bytes_received Color for the total bytes received (incoming) mysql_bytes_sent Color for the total bytes sent (outgoing) mysql_connections Color for the total number of connections mysql_max_used_conns Color for the total number of max connections used concurrently by the server mysql_questions Color for the total amount of queries received by the server mysql_com_delete Color for total delete commands performed mysql_com_insert Color for total insert commands performed mysql_com_select Color for total select commands performed mysql_com_update Color for total update commands performed mysql_key_blocks_used Color for the number of used blocks in the key cache mysql_key_read_requests Color for the number of requests to read a key from the cache mysql_key_write_requests Color for the number of requests to write a key to the cache mysql_key_writes Color for the number of physical writes to the key block on a disk mysql_key_reads Color for the number of physical reads to the key block on a disk PostgreSQL graphs (not completely implemented) psql_numbackends Color for the number of concurrent server processes connected psql_numbackends_trace Trace color for the backend psql_xact_commit Color for the number of transactions committed psql_xact_rollback Color for the number of transactions rolled back psql_blks_read Color for the number of disk blocks read psql_blks_read Color for the number of cache hits (hits that would've been disk block reads, but instead were cached already) vim:tw=72:wm=1