2007-04-11 Jason Schoonover This document is another explanation on the hosts syntax and the valid variables for it. As of Unnoc 1.0.6, the configuration is on a per host basis. 1. Configuration To add a host to unnoc, use the host directive in the unnoc.conf file. The syntax is of the following syntax: host { hostname = host-name type = host-type community = community-name snmp_version = version-number snmp_port = alternate-port-number option = value option2 = value2 .. } These are common variables for all host definitions: hostname hostname can be one of: FQDN, short name or an IP address type type should be one of the types of devices that unnoc knows how to deal with, valid values are: server for Net-SNMP/UCD-SNMP devices that have load averages, number of processes, disk usages, standard memory usage; these are basically UNIX or Linux servers. Non Microsoft Windows. ms-server for all Microsoft Server/workstation products aironet this is for Cisco Aironets, the SSID, VLAN, number of clients and each network interface will be monitored airport this is for Apple Airports, the SSID, wireless clients and each network interfaces will be monitored apc this is for APC UPS's, currently only the Smart-UPS is supported, it will monitor load, voltage, temperature and batter status/capacity (with graphs for each); single and three phase are both supported apc-pdu this is for APC Power Distribution Units; both single and three phase are supported. em01 this is for the EM01 websensor, it monitors temperature and humidity with graphs for each netapp this is for Network Appliance Filers generic this is a generic directive that will monitor network interfaces only. This is the default if no type is specified cisco-generic this is a generic directive special to cisco products; it is identical to 'generic' except it will also check a few extra MIB's for CPU percent usage. vcms VMWare VirtualCenter Management Server esx VMWare VI3 ESX (esx 3.0 and higher) --------------------------------------- A note about network interface speeds: --------------------------------------- Some versions of SNMPD will report the wrong speed for a particular interface; especially if you are using VLANs or are bonding interfaces together; for Net-SNMP, you can correct this by using the "interface" directive in snmpd.conf; For instance: interface eth0 6 100000000 The above example will tell snmpd that eth0 is of type ethernetCsmacd(6) (see IANAifType-MIB.txt for more types) and its speed is 100 Mbps; see the snmpd documentation for more information community This is the SNMP community. If this is not specified, then it will use 'ping' instead of SNMP Also snmp_community snmp_port If specified, SNMP will be accessed on this alternate port instead of the default '161' snmp_version If specified, SNMP will use this version instead of the default which is '1'. interval The minute interval at which this host is checked. The default value is whatever the global interval value is set to. updown The method at which the host is checked for being up/down. Usually it's a safe bet to leave this option alone, however at times it might be best to specify a value of 'ping' here. Basically, this is how unnoc will initially try and contact the server. If it gets no response with this method, then it will say that the host is 'not responding' and depending on the fuzz value, will notify you. The default value is to use snmp fuzz This is the number of times that unnoc will tolerate a host being down before it tries to contact you. Under normal circumstances, this should be left at 1. 1 means that it will try once to contact the server, if it fails then it will notify you and raise an alert. If you specify a value of 2 here, then it will try the first time, if it doesn't succeed it will wait 'interval' minutes and then try again, if it fails again, then it will notify you and raise an alert. Some flaky hardware or low-end servers can't handle a heavy load, if a server is under heavy load (like some distros of linux are at 6:10AM when they are updating their updatedb), sometimes it won't respond right away. If you set a fuzz value, then unnoc is a little more lenient. For high-end production servers, fuzz usually can be left at one. The default value per server is whatever the global fuzz value is. group The display group, used for print_href_group(). All hosts with the same 'group' value are considered being in the same group proc MS Servers only. This configuration option allows processes to be monitored; it will monitor given processes, based on criteria and will alert if there is a problem. CPU, Memory and number of processes are all graphed (if RRD graphs are enabled). Syntax: proc { name = process name description = Optional description number min = Optional minimum number of processes max = Optional maximum number of processes } If a description is given, then it will be displayed, instead of the actual process name. name The actual process name that will be running (not including the path, just the exectuable name). Comma separated values are excepted. Spaces are OK. NOTE: if you are using the optional description field, then the name is case-sensitive. Otherwise, it will not match. examples: name = explorer.exe name = services.exe, winlogon.exe, winvnc4.exe min This is the minimum number of processes that should be running. If it drops below this level, then a notification will be sent out (if alerts are configured) that the number of processes are too low. Set to 0 to disable, or simply do not specify. example: min = 3 max This is the maximum num of processes that should be running. If it goes above this level, then a notification will be sent out (if alerts are configured) that the number of processes are too high. Set to a high number if you dont care, or simply do not specify. example: max = 10 description An optional parameter. Will display the given string instead of the name of the process. You can also place any of the other two variables inside of description, for instance, the following is legal: name = sqlservr.exe, sqlmangr.exe, sqlhost.exe description = MS SQL Server ($name) Or you can just specify a description: description = VMware Tools Note, that process names are CASE SENSITIVE ONLY if you are matching a description to a process. Otherwise, you can specify it any case you wish. There is also an option all_processes, specified outside the scope of the proc { } brackets; If this is specified, then all processes will be shown and all alerts will be disabled. Examples: proc { name = explorer.exe description = MS Explorer min = 1 max = 10 } proc { name = csrss.exe max = 10 } proc { name = lmgrd.exe, VMWARELM.exe description = License Server ($name) min = 1 max = 2 } proc { name = exporer.exe, SNMP.EXE, WINLOGON.EXE, description = Windows - $name } The following example will graph all processes, but will not alert at all: host { hostname = msserver1 community = public type = ms-server updown = ping all_processes = 1 } alert Whether or not you should be alerted at all for this host, set this to 0 if you don't ever want to be alerted when things go wrong. The log file will still be updated. alert_group This is a variable that is associated with an email_group and a page_group. This will allow you to assign different servers to different administrators, so that everybody doesn't get paged for everything. If no alert_group is specified, then it will default to the email_group and page_group "default"; if there is no email_group or page_group defined, then it will default to the email_to and page_to variables. Valid values are any name/number cooresponding to the email_group and page_group listed globally Multiple alert_groups should be separated by a comma Please note that if a group is specified here, then it will ONLY notify that group. Meaning that it doesn't always notify the default administrator. If you want to also notify the default group, then you should specify the default group in there as well. alert_blackout This is a string that will tell unnoc not to send any alerts out (either pages or emails) during a specific window. This is very handy for a few reasons: a. if you have a server that has a hard time coping with lots of disk activity, and performs some routine maintenance every day that causes the load average to spike, every day at the same time. b. if you have a server that has a hard time coping with some sort of backup procedure that happens at the same time every day c. if you have any server or device that will always go down/up at the same time every day If any of the above circumstances are true, then if you configure this option, it will not alert you at any time during this Syntax is a time range, separated by a dash "-"; if the end time is on the next day, then place a plus sign "+" in front of the time. Proper time syntax is 24 hour time. Valid hours are 0-23, valid minutes are 0-59 Examples: ## this will silence all alerts from 9:05PM to 10:05PM alert_blackout = 21:05 - 22:05 ## this will silence all alerts from 11:50PM to 2:00AM ## the next day, note the "+" sign indicating that it ## should go to the next day alert_blackout = 23:50 - +02:00 ## this will silence all alerts from noon all the way ## through to 11:05AM the next day. That means that the ## only time that unnoc is allowed to send alerts out is ## from 11:05AM to 12:05PM. Why you would want to do ## this is beyond me, but if you did, that's how you ## would do it alert_blackout = 12:05 - +11:05 This is also a global setting. In the case that there is a global setting, *both* times are counted. For instance, if you have a host "server1" with a perhost setting of 01:00 - 02:00, and you also have a global setting of 23:00 - 0:00, then during the time window of 11:00PM to midnight, you would receive no emails. And server1, in addition to the global setting, would silence all alerts from 1:00AM to 2:00AM. WARNING: all alerts will be silenced during this time window. That means that if a host has some serious problem or something happens that's not supposed to happen during this window, unnoc will NOT notify anyone of this. USE THIS WITH CAUTIN. Try to keep the window as small as possible. graph Whether or not you should run RRD against this host. set to 0 to not graph. snmp_exec Experimental This is a special directive that tells unnoc to check the SNMP MIB 1.3.6.1.4.1.2021.8 for an external program. It's not really used right now, it is mainly for the future. disktype This is autodetected, only necessary for stuborn hosts that refuse to display their disks properly This is either one of two values: ucd Uses the UCD disk MIB table (1.3.6.1.4.1.2021.9.1) hr Uses the Host Resource MIB table (1.3.6.1.2.1.25.2.3.1) It really depends on the device. If there are no UCD disks configured on the host, then it will choose HR automatically. Some device manufacturers will halfway configure the UCD portion so that paths do not show up correctly. You can override the table with this value. See the specific type documentation for other variables. vim:tw=72:wm=1