Home

Open Source Management Options

1. Figure 58 OpenNMS Asset information page including Node ID The snmpStorageF lag parameter in the snmp collection stanza of datacollection config xml defines for which interfaces of a device data will be stored Possible values are e all the old default e primary the primary SNMP interface e select collect from all IP interfaces and can use Admin GUI to select additional non IP interfaces to collect data from new default since OpenNMS 1 1 0 81 File Edit View History Bookmarks Tools Help i M x e i http opennms 8980 opennms admin snmpGetinterfaces node 22 amp nodek gt a amp O Select SNMP Inte G Li Nagios Addons N Nagios E SourceForge net Li snmpstorageflag s Select SNMP Interfaces open User admin Notices On Log out 05 Aug 2008 02 08 GMT 05 00 Node List S ch Outages Path Ou board Events Alarms Notifications A Reports Charts Surveillance Map Help Home Admin Select SNMP Interfaces Choose SNMP Interfaces for Data Listed below are all the interfaces discovered for the selected node If snmpStorageFlag is set to select for a collection scheme that includes the interface marked as Primary only the interfaces checked below will have their collected SNMP data stored This has no effect if snmpStorageFlag is set to primary or all In order to change what interfaces are scheduled for collection simple check or uncheck the box beside th
2. Interface Service Severity Normal 09 07 08 23 54 08 lt gt 127 0 0 2 uei opennms org generic traps EnterpriseDefault Edit notifications for event Received unformatted enterprise event enterprise 1 3 6 1 4 1 123 generic 6 specific 1234 1 args 1 3 6 1 4 1 123 1234 bad news 1 Normal 09 07 08 23 53 03 S gt E es Des eere T AEAN TENES P E ES PEPERIT E R EE ES ESSERE USE ECTS Figure 44 OpenNMS Unknown trap appears in the Events list Clicking on the event ID gives the detail of the event which shows all the information that arrived with the TRAP Event Detail User admin Notices On Log out 10 Jul 2008 00 06 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications le a8 C ANS illance Map Admin Help Home Events Detail Event 158730 Acknowledged By Severity Normal Time 7 9 08 11 54 08 PM Interface 0 0 Time Acknowledged Service UEI uei opennms org generic traps EnterpriseDefault Log Message Received unformatted enterprise event enterprise 1 3 6 1 4 1 123 generic 6 specific 1234 1 args 1 3 6 1 4 1 123 1234 bad news 1 Description This is the default event format used when an enterprise specific event trap is received for which no format has been configured i e no event definition exists The total number of arguments received
3. Name zCollectorPlugins Path Server Windows r zenoss snmp NewDeviceMap zenoss snmp DeviceM ap zenoss snmp DellDeviceM ap Classes Events Devices m zenoss snmp HPDeviceMap Services zenoss snmp interfaceM ap Processes zenoss snmp RouteMap Products zenoss snmp ipServiceM ap Browse By zenoss snmp HRFileSystemMap zenoss snmp HRSWinstalledM ap Systems Groups Locations Networks zenoss snmp HRSWRunMap zenoss snmp CpuMap zenoss snmp DellCPUMap Reports zenoss snmp DellPCIMap zenoss snmp HPCPUM ap Management zenoss snmp InformantHardDiskM ap Add Device zenoss wmi WinServiceMap Mibs Collectors Settings Event Manager Plugins drag to change order Save Delete Figure 98 Zenoss default plugins for class Devices Server Windows e zenoss snmp InterfaceMap uses SNMP to query for interface info e zenoss snmp IpServiceMap zenstatus daemon queries TCP UDP port info e zenoss snmp HRSWRunMap uses SNMP to get process info from Host resources MIB e zenoss wmi WinServiceMap zenwin daemon uses WMI to query for Windows services One way to find what plugins are applied by default to device classes is to inspect the migration script supplied in usr local zenoss zenoss Products ZenModeler migrate zCollectorPlugins py To see what plugins are active on a specific device use the devices main page menu and select the More menu to find the Collector Plugins menu 117 Ze
4. les t s o foo Po EE Joptizenossilibexec e ssh m Device m zporitdmd imglicons noicon png string boolean lines float int string float int Done a Figure 82 Zenoss zProperties for the Device class part 1 103 Zenoss Devices Mozilla Firefox File Edit View History Bookmarks Tools Help zicon zIf Description zinterface Maplgnore Names zinterfaceMaplgnore Types zip Service Map MaxPort zKeyPath zLinks zLocallnterface Names zLocallpAddresses zz MaxOIDPerRequest z Pinglnterface Description zPinglnterface Name z Ping Monitorlgnore zProdState Threshold zPythonClass z Route Map Collect OnlyIndirect z Route Map Collect Only Local zSnmpAuthPassword iz SnmpAuth Type zSnmp Communities z Snmp Community zSnmp Monitorlgnore zSnmpPort zSnmpPrivPassword zSnmpPriv Type zSnmp Security Name zSnmp Timeout zSnmp Tries zSnmpVer z Status Connect Timeout zSysedgeDisk Maplgnore Names z TelnetEnable Jzportidmd img icons noicon png J False v 1024 ssh id dsa lo vmnet L m m bes ooo m ooo Bins FIZzAOPIESUSARZMA m o m m MEN Bo 11 a Figure 83 Zenoss zProperties for the Device class part 2 104 int string boolean boolean string string lines string boolean int string string string float int string float string boolean zWinEventlog boolean zWinEventlogMnSeverity int zWinPassword string zWinUser string zWmi
5. lt threshd conf igurat ion Figure 72 OpenNMS Modified threshd configuration xml 93 Different filters are applied to each package The thresholding group parameter is required here and the value points to a matching definition in thresholds xml where the MIBs to threshold and the threshold values are specified Session Edit View Bookmarks Settings Help Ki xml version 1 0 gt lt thresholding config gt lt group nane CC snmp rrdRepository opt opennms share rrd snmp gt lt threshold type high ds name augBusu5 ds type node value 5 rearm 4 trigger 2 7 gt lt threshold type low ds name freeMem ds type node value 1024 rearm 1000000 trigger 3 7 gt lt group gt lt t Note that rearm and trigger are ignored for relativeChange thresholds these check for 5 increase gt lt group nane raddle snmp rrdRepository opt opennms share rrd snmp gt lt threshold type relatiueChange ds name ifInOctets ds type if value 1 05 rearm 50 trigger 3 7 gt lt threshold type relatiueChange ds name ifOutOctets ds tupe if value 1 05 rearm 1000000 trigger 3 lt group gt group nane default snmp rrdRepository opt opennms share rrd snmp gt lt threshold type high ds name augBusu5 ds type node value 90 rearm 50 trigger 3 7 gt lt threshold type lou ds name freeMem ds type node value 1024 rearm 1000000 trigger 3 7 gt
6. 10 0 0 97 SNMP uei opennms org nodes dataCollectionFailed Edit notifications for event SNMP data collection on interface 10 0 0 97 failed 151278 Major 09 07 08 08 59 37 lt gt hp7410 skills 1st co uk uei opennms org nodes nodeDown Edit notifications for event Node hp7410 skills 1st co uk is down m 151197 Normal 09 07 08 08 48 27 lt gt group 100 s2 class example org 172 31 100 21 SNMP uei opennms org nodes dataCollectionSucceeded Edit notifications for event SNMP data collection on interface 172 31 100 21 previously failed and has been restored fT 151180 Normal 09 07 08 08 46 17 lt gt deodar skills 1st co uk uei opennms org internal capsd rescanCompleted Edit notifications for event A services scan has been completed on this node 151163 Normal 09 07 08 08 44 59 lt gt switch skills 1st co uk uei opennms org internal capsd rescanCompleted Edit notifications for event A services scan has been completed on this node Qaa L1 1 arnun i n s class examnle nra 4 1 1 172 31 100 21 4 1 SNMP 4 1 1 1 I Figure 39 OpenNMS display of All events The column headers can be clicked on to use as sort keys ascending descending The Ack box can be ti
7. Event parameters OpenNMS Mozilla Firefox File Edit View History Bookmarks Tools Help gt COA i O Detail OpenN Sel Gs NE S amp Gz O SourceForge Event param G O Re opennms http www opennms org index php Event_parameters N Nagios B PZ0NMsS Enterprise grade Open source Network Management Main Page Latest Release Other Downloads SourceForge Project getopennms get help Official Documentation FAQs White Papers Discussion Lists Commercial Support get involved Development Home Report a Bug opennms IRC Current Events Browse Source Code API docs XSD docs getto know us Order of the Green Polo OpenNMS Store search oo _Go Search toolbox What links here Related changes Upload file Special pages Printable version SPONSORS Dev Jam 007 _ Google article lI discussion amp Log in create account view source history Event parameters Event parameters are used in the event configuration xml and notifications xml files The parameters are parsed as tokens delimitted with percent 96 signs This is the currrent list of valid paramenters Someone should better define these heventid The Event ID xml tag huei The UEI xml tag hsource The event source xml tag h
8. Cacti now has support for SNMP V3 For high performance polling Spine used to be cactid can replace the base cmd php polling engine The user manual suggests that Spine could support polling intervals of less than 60 seconds for at least 20 000 data sources Cacti is supported on both Unix and Windows platforms Get the Cacti User Manual from http www cacti net downloads docs pdf manual pdf Cacti has a very active user forum with hundreds of appends per month There is also a documented release roadmap going forward to 2nd quarter 2009 Here are a few screenshots of Cacti to give a feel for the product 14 N console Create New Graphs Management Graph Management Graph Trees Data Sources Devices Collection Methods Data Queries Data Input Methods Templates Graph Templates Host Templates Data Templates Import Export Import Templates Export Templates Configuration Settings Utilities System Utilities User Management Logout User Done amp cacti Mozilla Firefox File Edit View History Bookmarks Tools Help m Q A A ntp rcacti cacti host php x W Firefox Support Plug in FAQ RSS Feeds graphs Console gt Devices gt IG 5 Logged in as admin Logout Devices Add Type Any Status Any search Aa Showing Rows 1 to 7 of 7 1 Next gt gt Current ms Average ms Availability W
9. initial impressions and a comparison of strengths and weaknesses Subsequent documents will investigate Nagios OpenNMS and Zenoss in more detail 13 5 A quick look at Cacti The Dude and netdisco Cacti The Dude and netdisco do not meet my mandatory requirements however they are interesting niche solutions that were investigated during the tools evaluation process Cacti and netdisco were installed The Dude was only Internet researched 5 1 Cacti Cacti is a niche tool for collecting storing and displaying performance data Itisa comprehensive frontend to RRDTool including the concept of user management Although the default method of data collection is SNMP other data collectors typically scripts are possible Data collection is very configurable and is driven by the Cacti Poller process which is called periodically by the Operating System scheduler cron for Unix The default polling interval is 5 minutes Devices need to be manually added using the Cacti web based GUI Basic information such as hostname SNMP parameters and device type should be supplied Depending on the device type selected eg ucd net SNMP Host Cisco Router one or more default graph templates can be associated with a device along with one or more default SNMP data queries In addition to the web based GUI configuration of Cacti can be done by Command Line using PHP which is a general purpose scripting language especially suited for web development
10. lt castor class name org opennns netngt threshd ThreshdConf igurat ion lt threshd conf igurat ion threads 5 gt lt package name example1 gt lt filter gt IPADDR t 0 0 0 0 lt filter gt lt include range begin 192 168 0 1 end 192 168 0 254 75 lt service name SNMP interval 300000 user defined false status on gt lt parameter key thresholding group value default snmp gt lt parameter key range value 600000 7 gt lt service gt lt package gt lt thresholder service SNMP class name org opennns netmgt threshd SnmpThresho lder gt lt threshd conf igurat ion Figure 70 OpenNMS Default threshd configuration xml 92 The default threshd configuration xml is setup for the interim design between versions 1 3 10 and 1 5 90 For OpenNMS 1 5 93 collectd configuration xml should be changed as shown below Session Edit View Bookmarks Settings Help Kf xml version 1 0 gt lt castor class name org opennns netmgt col lectd Col lectdConf igurat ion gt collectd configuration threads 50 gt lt package name example1 gt lt filter gt IPADDR t 0 0 0 0 lt filter gt lt include range begin 1 1 1 1 end 254 254 254 254 gt lt service name SNMP interval 300000 user defined false status on gt lt parameter key collection value default gt lt parameter key thresholding enabled value true gt lt service gt lt pack
11. 1 host_name Short name of host that the service is associated with 2 suc description Description of the service 3 return_code fn integer that determines the state of the service check 0 0K 1 WARNING 2 CRITICAL 3 UNKNOWN 4 plugin output A text string that should be used as the plugin output for the service check dt dt dt dt dt dt d dt dt dt dt d d dt dt d dt dt d dt dt lechocmd bin echo CommandF i le usr local nagios var ru nag ios cmd get the current date time in seconds since UNIX epoch datet ime date s create the command line to add to the command file cmdline datetime PROCESS SERUICE CHECK RESULT 1 2 3 4 append the command to the end of the command file echocmd cmdline gt gt CommandF ile submit check result readonly 36L 1182C 1 1 All e ie Shell Figure 26 Nagios Sample submit_check_result command for event handler from contrib directory 6 4 Performance management Nagios does not have performance data collection and reporting out of the box however it does provide configuration parameters such that any host check or service check may also return performance data provided the plugin supplies such data This data can then either be processed by a Nagios command or the data can be written to a file to be processed asynchronously either by a Nagios command or by some other mechanism mrtg RRDTool and Cacti may all be
12. 790 0 1 65 Figure 27 Nagios Performance parameters in nagios cfg The default is that process_performance_data 0 ie off and all the other parameters are commented out In addition to the global parameters each host and service needs to either explicitly configure or inherit a definition for 43 e process_perf_data 1 1 data collection on 0 data collection off By default the generic_host and generic_service template definitions set these parameters to 1 on If a Nagios plugin is able to provide performance data it is returned after the usual status information separated by a pipe symbol It can be retrieved as the HOSTPERFDATA or SERVICEPERFDATA macro It is then upto your Nagios commands to interpret and manipulate that data The next figure shows performance data that has been gathered into tmp service perfdata using the default service_perfdata_file_template where the last field is the SERVICEPERFDATAS value if the plugin delivers performance data E jane bino Shell Konsole lt 2 gt sil a Session Edit View Bookmarks Settings Help SERV ICEPERFDATAI 1217865172 bino DNS Check 0 033 0 025 DNS OK 0 017 seconds response time uuu skills 4 ist co uk returns 212 74 28 155 time 0 017324s 0 000000 SERV ICEPERFDATAI 1217865192 bino SNMP Check 0 093 0 040 SNMP OK Timeticks 30534541 3 days 12 49 05 41 DISMAN EUENT MIB sysUpTimeInstance Timeticks 30534541
13. gt lt parameter key 0id value 1 3 6 1 2 1 1 2 0 gt lt service gt Note that the default poller configuration xml has the SNMP monitor service turned off Services may be defined several times with different parameters each service will obviously require a unique name This is so that different devices can receive availability monitoring with different characteristics For availability polling devices are grouped together in packages where a package defines e target interfaces e services including the polling frequency 54 e adowntime model which controls how the poller will dynamically adjust its polling on services that are down e an outage calendar that schedules times when the poller is not to poll i e scheduled downtime There are two packages defined in the default poller configuration xml file example1 and a separate package strafer to monitor StrafePing A package definition must include a single filter stanza it may also have specific include range and exclude range stanzas Here is the start of the default as shipped package name examplel gt lt filter gt IPADDR 0 0 0 0 lt filter gt lt include range begin 1 1 1 1 end 254 254 254 254 gt It is then followed by the list of services pertinent to that package examplel includes many of the services with each service set to status on except SNMP The opening stanza in poller conf
14. use host 172 31 100 32 alias group 100 s1 class example org address group 100 s1 class example org t Figure 10 Nagios hosts cfg file showing real host definitions Hosts can be defined to be a member of one or more host groups This then makes subsequent configuration more scalable for example a service can be applied to a host group rather than to individual hosts Host groups are typically defined in hosts cfg PES TE SE EEE AE SE ESE ESE AHAH HE AEA AE HE AA AE A HE DA AE A BE AA AE E PA AA AE E DA AA DE HE DE A AA HE DA AA AE HA AA AA AA A DA DA AA HE TEEPE PESTER EEE SEES AAE AE HE AA AE AE HE HA A AA HE AA AA A HE DA AE E HA AA AE AE HA AA AE E DA AA DE E DA AA A HE DA A A HE AA AA AA HA BA AA AA HE AA AA G HE AA AE E BE AHAH HE HEIE E HEIERI E IE HOST GROUPS FEES EYEE ESE FE 38 STE SE IEEE TEEETETEIETETEIETEIETETEEEH IEEE ETE TET EEIETEHETETEHEE IEEE TEIETHETEEEIETEIEEEE HIE EE I I E I AL TEHRETEEEIEHEIEIETETEETEIETETEIETEIETEE EET TETTE IEEE IETEHIEIETEIETEEEHEH IEETEIETET IET IET TEIETETERET ETE TEIETETEETEHEE TEETH HIE HE FE I A create more than one hostgroup define hostgroupt hostgroup name routers alias routers members bino group 100 r1 group 100 r2 group 100 r3 H define hostgroupt hostgroup name nagios alias nagios members nagios nagios3 H define hostgroupt hostgroup name servers alias seruers nenbers bino tino seruer nagios nagios3 define hostgroupt hostgroup name clients a
15. 09 57 GMT 05 00 Alarms Ne E 1 s rveilla Map Admin Help The menu on the left lets you choose a specific resource that you want to use in a graph A resource can be any graphable resource such as SNMP data node level interface level or generic indexed data response time data or distributed response time data These resources are organized first by top level resources such as nodes or domains if enabled and then by child resources under the top level resources like SNMP node level data response time data etc The resource you are currently looking at if any is shown just below the menu bar on the left side of the page If the resource has any available prefabricated graphs they will be listed in the Choose the current resource box along with a Choose this resource button which will take you to the graph customization page If the current resource has child resources or if you are at the top level a list of available child resources will be shown in the View child resources box You can select a child resource and click the View child resource button to view the details of the selected child resource including any available graphs and any sub children If you know the resource you are selecting has graphs you can go straight to the graph customization page by clicking Choose child resource The View the parent resource box lets you see the parent resource of the current resource or see all top level resources For exampl
16. 3 65 k Outbound Current 39 69 k Average 774 07 Maximum 39 69 k Daily 5 Minute Average deodar skills 1st co uk Traffic etho bits per second 20 22 23 24 26 Bl Inbound Current 1 k Average 555 54 Maximum WE Outbound Current 15 42 k Average 778 49 Maximum Tm ne sae rr Weekly 30 Minute Average deodar skills 1st co uk Traffic etho a z B 5 50 9 40 u 400 e 300 v 9 200 u 4 100 a o Week 22 Week 23 Week 24 Week 25 Bl Inbound Current 934 16 Average 554 23 Maximum 934 16 BM Outbound Current 4 39 k Average 765 61 Maximum 4 39 k Monthly 2 Hour Average Done EP EP EP e Adblock Figure 2 Cacti graph of interface traffic 16 cacti Mozilla Firefox tB XJ File Edit View History Bookmarks Tools Help gt i A nttp icactiicactilgraph php local graph id 158rra id all W SFirefox Support ElPlug in FAQ RSS Feeds FI m console graphs Graphs gt List Mode gt bino skills 1st co uk Memory Usage Viewing Graph bino skills 1st co ul Memory Usage bino skills 1st co uk Memory Usage Q u 106 u t ES 0 0 14 00 16 00 18 00 20 00 22 00 00 00 02 00 04 00 06 00 08 00 10 00 12 00 B Memory Free Current 122 95 M Average 108 01 M Maximum 213 86 M W Memory Buffers Current 279 31 M Average 149 41 M Maximum 291 43 M E Cache Memory Current 1 01 G Average 1 19 G
17. Availability monitoring in Zenoss can use 3 different methods e ping tests o implemented via zenping o detects device availability e service tests o implemented via zenstatus o detects services as defined by TCP UDP ports e process tests and Windows Services tests o implemented via zenprocess o detects processes using the SNMP Host Resources MIB using the snmp IpServiceMap zCollectorPlugin driven by zenmodeler o detects Windows services using WMI using the WinServiceMap driven by zenwin 8 2 1 Basic reachability availability Basic availability monitoring is controlled by Collectors These are also known as Monitors and the documentation can be confusing The Collectors menu can be found on the left hand side 108 Zenoss localhost Mozilla Firefox File Edit View History Bookmarks Tools Help w G Q T httpuzenoss 8080 zport dmd Monitors Performance localhost kle IG amp f 6 Zenoss localhost Lj SpeedTouch 546 0514C Main Views Overview Edit Performance Modifications Dashboard Performance Collector Configuration Event Log Cycle Interval secs SNMP Performance Cycle Interval secs Network Map Process Cycle Interval secs Process Parallel Jobs Classes Status Cycle Interval secs Windows Service Cycle Interval secs Windows Modeler Cycle Interval secs Config Cycle Interval mins Ping Time Out secs Ping Tries Products Maximum Ping Pack
18. Event Manager 10f13 Heartbeat v show all v EventClass Mappings ic EventClassKey Evaluation Figure 109 Zenoss menu to create a new event class 8 3 4 email pager alerting Alerting Rules are Zenoss s way of sending email and or paging notifications These are configured on a per user basis starting from the Preferences menu towards the top right of the web console The Alerting Rule tab then shows existing rules and permits rule creation deletion 126 ZENOSS core Main Views Edit Administered Objects Event Views Alerting Rules v Alerting Rules Add Alerting Rule 1 Does not repeat Classes Delete Rules E Figure 110 Zenoss menu to create Alerting Rule Using the Edit tab permits changes of existing alerting rules Different rules can be applied based on a combination of severity event state production state and a more generic filter The Production State is assigned to a device or device class e Production e Pre Production e Test e Maintenance e Decommissioned The Production State can be set or changed using the Edit tab from a device main page The default is Production The Production State attribute can be used to control whether a device is monitored at all whether alerts are sent and whether a device is represented on the Zenoss main dashboard It is very simple to modify the Production State to put a device or class of devices into maintenance for example 127 a
19. Sev 172 31 100 3 09 07 08 17 02 38 lt gt SNMP data collection on interface 172 31 100 3 failed SNMP L3 1384 ille i i ino UEI blue atlas skills 1st co uk 09 07 08 16 02 39 lt gt Neue pide ster Salle Tee T MORES Sev 1 09 07 08 16 02 39 lt gt E 1383 blue atlas skills 1st co uk 16 UEI 10 0 0 2 na n2 n8 16 114 fe1f gt 1 Figure 48 OpenNMS Alarms display 10 07 08 07 25 01 F lt gt SNMP data collection on interface 10 0 0 2 failed Alarms are defined as part of an event definition in eventconf xml and its include files It uses the lt alarm data gt tag where e reduction key fields to compare to determine duplicate event e alarm type 1 problem 2 resolution alarm type 2 also takes a clear key parameter defining the problem event this resolves e auto clean true or false True ensures that all events other than the latest one that match the reduction key are removed very useful for clearing out duplicate events One of the key characteristics of an alarm that differentiates it from an event is the reduction key field which should ensure that duplicate events are treated as one event with multiple instances rather than as multiple events Most of the information provided with an event is also available in the Alarm display The new field is Count which shows the number of duplicate ev
20. a amp gt for assistance amp lt p amp gt Bt p amp gt Uhen all else fails RTFMIYCFI amp 1t p amp gt lt voper instruct gt lt mouseovertext gt When all else fails RTFM if you can find it lt mouseovertext gt lt autoact ion gt tmpfaction sh zueiz id zgenericz specific lt zautoact ion gt Figure 47 OpenNMS Configuration of specific TRAP with varbind matching a regular expression If you have SNMP TRAP definitions in a mib file the open source utility mib2opennms can be obtained to convert SNMP V1 TRAPs and SNMP V2 NOTIFICATIONS into an OpenNMS event configuration xml file For a source file vcs mib in home jane use mib2opennms f opt opennms etc events vcs events xml m home jane vcs mib 7 3 4 Alarms notifications and automations In OpenNMS you can add an alarm data tag to an event configuration to create an alarm Alarms are defined as Important Events and have a separate display It is similar to the Events display in that you can select All Alarms or you can specify a search to filter for particular alarms 69 Alarm List User admin Notices On Log out 10 Jul 2008 07 53 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications Assets Reports Charts Surveillance Map Admin Help Home Alarms List View all alarms Advanced Search Severity Legend Acknowledge entire search Alarm Text Time Any xj Search Results 1 10 of 19 Search constrain
21. gt trigger nane selecthResoluers operator amp gt row count 1 gt statement SELECT nou AS ts FROM alarms WHERE alarmType 2 lt statement gt lt tr igger gt lt tr iggers gt Figure 50 OpenNMS Definition of selectResolvers trigger in vacuumd xml and the clearProblems action 72 lt 1 lt action name clearProblems gt lt statement gt UPDATE alarms SET seuerity 2 firstautomationtime COALESCE firstautomationtime _ts lastautomationtime t ts WHERE alarmType 1 AND severity amp gt 2 AND lastEventTime amp lt flastEventTime AND euentUei clearUei AND COALESCE dpName COALESCE dpName AND COALESCE nodeID 0 COALESCE S nodeID 0 AND CORLESCECipaddr COALESCE ipaddr AND COALESCE serviceID 0 COALESCE serviceID 0 lt statement gt lt zaction gt PE t New and optimized version of clearing problems gt lt action name clearProblems gt lt statement gt UPDATE alarms SET seuerity 2 firstautomationtime COALESCE firstautomationtime _ts lastautomationtime _ts WHERE alarmType 1 AND severity amp gt 2 AND lastEventTime amp 1t lastEventTime AND reductionKey clearKey lt statement gt lt zaction gt Figure 51 OpenNMS Definition of clearProblems action in vacuumd xml The trigger is keyed on the field alarmType 2 Note that the first version of the action is commented out the clear uei element
22. gt lt service gt lt package gt collector service SNMP class name org opennns netmgt col lectd SnmpColl lector gt collectd configuration Figure 55 OpenNMS collectd configuration xml as shipped There is only one package specified in collectd configuration xml as shipped which applies to all interfaces other than 0 0 0 0 and in the range 1 1 1 1 through 254 254 254 254 As with poller configuration xml you must have one filter 77 statement per package and can then use multiple lt specific gt lt include range gt and lt exclude range gt statements to define which interfaces this package applies to You can also use the lt include url gt tag to specify a file with a list of interfaces There is only one data collection service defined for OpenNMS out of the box in collectd configuration xml the SNMP service It will run every 5 minutes 300 000 ms and will collect the MIB variables specified in the collection called default specified in datacollection config xml The lt service gt stanza can also specify values for SNMP timeouts retries and port number which would override the default values in snmp config xml The package definition can also use the lt outage calendar gt tag to specify scheduled downtime for devices during which data collection will be suspended This should be used to prevent lots of failed SNMP collection events Outage periods are defined in the poll outages x
23. jane opennms skills 1st co uk loptlopennmsletc Shell Konsole Session Edit View Bookmarks Settings Help lt events gt t Event conversion for Skills 1st TRAPs t Match any specific event from enterprise 1 3 6 1 4 1 123 where varbind 1 contains either Bad or bad gt lt event gt lt mask gt lt maske lement gt lt mename gt id lt mename gt lt meva lue gt 1 3 6 1 4 1 123 lt mevalue gt lt maske lement gt lt maske lement gt lt mename gt gener ic lt mename gt lt meva lue gt 6 lt meva lue lt maske lement gt lt varbind gt lt ubnumber gt 1 lt ubnumber gt lt ubva lue gt Bb Jad lt ubua lue gt lt vvarbind gt lt smask gt Xuei uei opennms org vendor skills traps trap123_bad lt uei gt lt event label gt Skills 1st defined trap event trap123_bad lt event label gt lt descr gt amp lt p amp gt Bad news from enterprise id generic generic specific specific with varbinds args Zparm tHt 172 zparmDalllz amp lt lt descr gt lt logmsg dest logndisplay gt amp lt p amp gt Bad neus from enterprise id generic zgenericz specific specific with varbinds args parm zparmEalllz amp lt y lt logmsg gt lt sever ity gt Ma jor lt sever ity gt lt alarm data reduction key zueiz zdpnamez znodeidz alarm type 1 auto clean false Xoperinstruct amp lt p amp gt check amp lt a href http uuu skills ist co uk amp gt skills ist amp lt
24. lt group gt lt thresholding conf ig Figure 73 OpenNMS Modified thresholds xml for CC snmp group and raddle snmp group The attributes of a threshold are type A high threshold triggers when the value of the data source exceeds the value and is re armed when it drops below the re arm value Conversely a low threshold triggers when the value of the data source drops below the value and is re armed when it exceeds the re arm value relativeChange is for thresholds that trigger when the change in data source value from one collection to the next is greater than value percent expression A mathematical expression involving datasource names which will be evaluated and compared to the threshold values This is used in expression thresholding supported from 1 3 3 ds name The name of the variable to be monitored This matches the name in the alias parameter of the MIB statement in datacollection config xml ds type Data source type node for node level data items and if for interface level items ds label Data source label The name of the collected string type data item to use as a label when reporting this threshold Note this is a data item whose value is used as the label not the label itself value The value that must be exceeded either above or below depending on whether this is a high or low threshold in order to trigger In the case of relativeChange thresholds this is the p
25. 05 Od 3h 15m 5s 02 07 2008 12 02 55 Od 3h 9m 33s Host Status Totals EN 0 Host Status Details For All Host Groups Status Last Check Duration Status Information 02 07 2008 12 02 08 1d 23h 34m 42s PING OK Packet unused 0 13 Matching Host Entries Displayed e Up Down Unreachable Pending X oss 0 RTA 0 30 ms OK host group 100 a1 class example org interfaces up 2 down 0 dormant 0 excluded 0 gt Jmj x IG 4 Service Status Totals Narning Unknown Critical Pending o o MEM 0 All Problems All Types ENENN PING OK Packet loss 096 RTA 142 80 ms 02 07 2008 12 04 25 0d 3h9m33s PING OK Packet loss 0 RTA 216 36 ms PING OK Packet loss 096 RTA 113 08 ms 02 07 2008 12 01 25 Od 4h 37m 15s PING OK Packet loss 0 RTA 24 50 ms PING OK Packet loss 0 RTA 8 56 ms 02 07 2008 12 03 25 0d 3h9m 43s PING OK Packet loss 0 RTA 134 61 ms PING OK Packet loss 096 RTA 121 06 ms BERI 20 06 2008 20 46 49 69d 18h 56m 41s CRITICAL Host Unreachable nagios skills 1st co uk 02 07 2008 12 04 35 69d 23h 13m 7s PING OK Packet loss 0 RTA 0 05 ms 02 07 2008 12 00 05 69d 23h 12m 58s PING OK Packet loss 0 RTA 0 56 ms IGA 02 07 2008 12 02 28 69d 23h 9m 4s CRITICAL Host Unreachable tino skills 1 st co uk Availability monitoring especially for computers rather than network devices can mean many things Nagio
26. 1 3 6 1 2 1 2 2 1 14 instance if Index alias ifInErrors type counter 7 5 lt mibOb j 1 3 6 1 2 1 2 2 1 16 instance if Index alias if ut ctets type counter 7 gt lt mibOb j 1 3 6 1 2 1 2 2 1 17 instance if Index alias ifOutUcastPkts type counter 7 gt lt mibOb j 1 3 6 1 2 1 2 2 1 18 instance if Index alias ifOutNUcastPkts tupe counter gt lt mibOb j 1 3 6 1 2 1 2 2 1 19 instance if Index alias if utDiscards type counter lt mibOb j 1 3 6 1 2 1 2 2 1 20 instance if Index alias if utErrors type counter gt lt mibOb j 1 3 6 1 2 1 31 1 1 1 6 instance if Index alias ifHCInOctets tuype counter nib bj oid 1 3 6 1 2 1 31 1 1 1 10 instance if Index alias ifHCOutOctets type counter 7 gt lt group gt lt group name mibZ icmp ifType ignore nib bj oi 1 3 6 1 2 1 5 2 instance 0 alias icmpInErrors type counter lt mibOb j 3 6 1 2 1 5 3 instance 0 alias icmpInDestUnreachs type counter lt mibOb j 3 6 1 2 1 5 4 instance 0 alias icmpInTimeExcds type counter 7 gt lt mibOb j 3 6 1 2 1 5 6 instance 0 alias icmpInSrcQuenchs type counter 7 gt lt mibOb j 3 6 1 2 1 5 7 instance 0 alias icmpInRedirects type counter 7 5 lt mibOb j 3 6 1 2 1 5 8 instance 0 alias icmpInEchos type counter 7 gt lt mibOb j 3 6 1 2 1 5 15 instance 0 alias icmp utErrors type counter 7 gt lt mib
27. 10 check host aliue admins 0 DONT REGISTER THIS DEFINITION ITS NOT A REAL HOST JUST host 172 31 100 32 hosts on the 172 31 100 32 netuork generic host inherits from generic host group 100 r3 gt group 100 r3 is the router from 172 31 100 32 check host aliue admins 0 gt DONT REGISTER THIS DEFINITION ITS NOT A REAL HOST JUST host 172 30 100 hosts on the 172 30 100 network generic host gt inherits from generic host group 100 r1 group 100 r1 is the router from 172 31 100 32 check host aliue admins 0 gt DONT REGISTER THIS DEFINITION ITS NOT A REAL HOST JUST it Figure 9 Nagios hosts cfg showing host template definitions Subsequent definitions of sub groups and real hosts will follow Note the use of the parents stanza to denote the network node that provides access to the device This means that Nagios can tell the difference between a node that is down and a node that is unreachable because its access router is down 24 Now start defining real hosts Hosts on the 10 191 network define hostt host_name group 100 r1 use host 10 191 Name of host template to use alias group 100 r1 class example org address group 100 r1 class example org t Hosts on the 172 16 100 32 network define hostt host name group 100 r3 H use host_172 31 100 32 parents group 100 r2 alias group 100 r3 class example org address group 100 r3 class example org define host host_name group 100 s1
28. 3 days 12 49 05 41 SERUICEPERFDATA 1 1217865252 group 100 r3 PING 4 188 0 132 PING OK Packet loss 0 RTA 120 84 ms SERUICEPERFDATA 1 1217865272 nagios PING 3 030 0 086 CRITICAL Host Unreachable nagios skills ist co uk SERUICEPERFDATA 1 1217865272 group 100 c2 PING 4 297 0 139 PING CRITICAL Packet loss 0 RTA 937 61 m Is SERV ICEPERFDATAI 1217865282 nagios3 Current Users 06 016 0 046 USERS OK 5 users currently logged in users 5 20 5030 SERV ICEPERFDATAI 1217865292 group 100 s1 PING 4 122 0 033 PING OK Packet loss 0 RTA 93 32 ms SERUICEPERFDATA 1 1217865292 group 100 r1 PING 4 035 0 100 PING OK Packet loss 0 RTA 8 36 ms SERV ICEPERFDATAI 1217865302 nagios3 Root Partition 0 010 0 245 DISK OK free space 784 MB 16 inode 697 7 400 1MB 4536 4788 0 5041 SERV ICEPERFDATA 1217865332 server PING 4 585 0 214 PING OK Packet loss 0 RTA 0 70 ms SERV ICEPERFDATAI 1217865332 group 100 c2 PING 4 092 0 041 PING OK Packet loss 0 RTA 134 27 ms SERUICEPERFDATA 1 1217865342 bino PING 4 019 0 156 PING OK Packet loss 0 RTA 0 35 ms SERV ICEPERFDATAI 1217865362 group 100 c1 PING 4 130 0 087 PING OK Packet loss 0 RTA 98 93 ms SERV ICEPERFDATA 1217865382 group 100 c3 PING 4 114 0 008 PING OK Packet loss 0 RTA 81 77 ms SERY ICEPERFDATA 1217865402 group 100 r2 PING 4 172 0 225 PING OK Packet loss 0 RTA 150 30 ms SERV ICEPERFDATAI 1217865432 nagios3 Current Load 0 011 0 056 O
29. All other instances have to be explicitly configured The ifType parameter can be used to specify the sort of interfaces to collect from Legal values are e all collect from all interface types 83 e ignore used when the value would be the same for all interfaces eg CPU utilisation for a Cisco router e lt i f type number used to denote one or more specific interface types For example ifType 6 for ethernetCsmacd See http www iana org assignments ianaiftype mib for a comprehensive list OpenNMS understands four types of variables to collect on gauge timeticks integer octetstring Note that RRD only understands numeric data systems lt systemDef name Enterprise lt sysoidMask gt 1 3 6 1 4 1 lt sysoidMask gt lt collect gt lt inc ludeGroup gt mib2 interfaces lt inc ludeGroup gt lt inc ludeGroup gt mib2 tcp lt inc ludeGroup gt lt inc ludeGroup gt mib2 icmp lt inc ludeGroup gt lt collect gt lt systemDef gt lt systemDef name Aluarion Breezeficcess base lt sysoidMask gt 1 3 6 1 4 1 12394 4 1 lt sysoidMask gt collect lt inc ludeGroup gt a luvar ion bad al 1 frames lt inc ludeGroup gt lt inc ludeGroup gt a luar ion interfacesRB lt inc ludeGroup gt lt zcollect gt lt systemDef gt lt systemDef name Aluvarion Breezeficcess SU gt lt sysoidMask gt 1 3 6 1 4 1 12394 4 1 2 lt sysoidMask gt lt collect gt lt inc ludeGroup gt a luar ion snr lqi lt inc lud
30. CRITICAL Host Unreachable group 100 r2 class example org group 100 r3 DOWN SOFT 1 CRITICAL Host Unreachable group 100 r3 class example org group 100 c2 DOWN SOFT 1 CRITICAL Host Unreachable group 100 c2 class example org Figure 23 Nagios Event Log showing hard and soft events Note from the earlier figure showing the topology layout that group 100 r3 sits behind group 100 r1 Each of these host devices is being polled every 5 minutes when in an OK state or max_check_attempts has been exceeded and every 1 minute when a problem has arisen The actual problem that has caused the event log shown above is that group 100 r1 has failed however group 100 r3 is polled first and results in the first event for this device with a status of DOWN and a state type of SOFT Subsequently group 100 r1 is polled and found to be DOWN which results in the associated poll to group 100 r3 receiving a status of UNREACHABLE and a state type 38 of SOFT The third poll of group 100 r3 again has a status of UNREACHABLE anda state type of SOFT The next event for group 100 r3 is a service ping monitor which runs every 5 minutes for this device Note that this event has a state type of HARD this is because Nagios knows that the host status associated with this service monitor is already UNREACHABLE or DOWN The fourth event results in a state type of HARD and the status of UNREACHABLE The hard event also
31. EE 30 07 2008 12 06 20 97d 22h 54m 51s 1 2 PROCS OK 46 processes with STATE RSZDT Alert History Hrocesses sir al han server PING GRIN 30 07 2008 12 07 29 0d th20mBs 1 4 PING OK Packet loss 0 RTA 0 59 ms Event Log Daes ef GRAB 30 07 2008 10 10 01 224 1h 54m 18s 1 4 CRITICAL Host Unreachable tino skills 1 st co uk Configuration View Config 18 Matching Service Entries Displayed Figure 17 Nagios Service detail Service dependencies are an advanced feature of Nagios that allow you to suppress notifications and active checks of services based on the status of one or more other services that may be on other hosts Both host and service monitoring can be configured to generate events on failure and this is the default 6 3 Problem management Nagios s event system displays events generated by Nagios s own host and service monitors There is no built in capability to collate events received as SNMP TRAPs or syslog messages When an event is generated it can be configured so that 32 notification s are generated to one or more users or groups of users It is also possible to create automated responses to events typically scripts Note that Nagios tends to use the terms event and alert interchangeably 6 3 1 Event console The Nagios Event Log is displayed from the left hand menu Nagios Mozilla Firefox salia nf File Edit View History Bookmarks Tools Help
32. Home Search Node Node bino skills 1st View Events View Alarms Resource Graphs Rescan Admin General Status Active Notification Availability Recent Events Availability last 24 hours 81 250 Overall 80 000 194387 DNS 100 000 187237 FTP 0 000 180154 ICMP 100 000 180152 10 0 0 121 Router Not Monitored I 175101 04 08 08 01 03 45 Normal 30 07 08 23 10 13 30 07 08 09 03 46 30 07 08 09 02 50 Warning 29 07 08 23 07 42 L E SourceForge net User admi 05 Aug 20 rms Notifications ts Reports Charts Surveillance View Node Link Detailed Info You Outstanding Check You Acknowledged Check A services scan has been forced i snmpstorageflag s Node Notices On Log out 08 04 GMT 05 00 Map Admin A services scan has been completed on this node A services scan has been completed on this node A services scan has been completed on this node on this node A services scan has been completed on this node Help SNDE eT Acknowledge Reset Mores Service Lost Regained FTP 08 07 08 08 37 28 Outage ID 315 314 SSH 100 000 StrafePing Not Monitored Recent Outages Overall 100 000 Interface DNS Not Monitored 10 0 0 121 FTP Not Monitored 172 16 222 1 ICMP 100 000 172 16 223 1 10 191 0 1 T Router J Not Monitored SNMP
33. Management Groups None Serial Systems None HW Make Microsoft Mibs Collector localhost HW Model 1 3 6 1 4 1 311 1 13 12 OS Make Microsoft t OS Version Windows 2000 Version 5 0 Event Manager Rack Slot O sysName WSVR2K1 Contact Location admin Preferen Zenoss server time 1910 Main Views Dashboard Event Co eur Manage Network Map Change Model Device Run Commands gt ResetIP Classes Events Rename Device D zenoss snmp InterfaceM zenoss snmp RouteMap Sus Services zenoss snmp CpuMap R zenoss snmp HRFileSystq zenoss snmp pServiceM Co Processes t Community Products Push Changes Browse By zenoss snmp HRSWRunMife EETISEETIREETS Delete Device Locations Networks Reports Management Add Mibs Collectors ngs ent Manager Plugins drag to change order Save Delete Figure 101 Zenoss Device Manage submenu 119 8 2 4 Running commands on devices A few Commands are defined out of the box and can be seen using the left hand Settings menu and then selecting the Commands tab New commands can be added using the Add User Command drop down menu me e SS CORE Settings Commands Users Main Views Define Commands ne ag es I DNS forward DNS reverse pina amp snmpwalk traceroute admin Prefer ZenPacks Menus Portlets Daemons Versions Backups Command host device managelp host device id ping c2 device managelp snmpwalk v1
34. N A 30 07 2008 14 00 06 Service Alert group 100 r1 PING group 100 r1 N A 30 07 2008 13 59 16 Service Alert aroup 100 r1 PING group 100 r1 N A group 100 r1 N A 30 07 2008 10 04 11 Service Alert aroup 100 r1 PING group 100 r1 N A group 100 r1 N A group 100 r1 N A 30 07 2008 09 59 11 Service Alert aroup 100 r1 PING group 100 r1 N A group 100 r1 N A group 1 00 r1 Alert Types Host amp Service Alerts State Types Soft amp Hard States 24 07 2008 13 08 02 to 31 07 2008 HostStates Up Down Unreachable 13 08 02 Duration 7d Oh Om Os Service States Ok Warning Unknown Critical Generate New Report Displaying all 23 matching alerts te Type Information PING OK Packet loss 0 RTA 12 17 ms ING OK Packet loss 0 RTA 17 06 ms RITICAL Host Unreachable group 100 r1 class example org RITICAL Host Unreachable group 100 r1 class example org RITICAL Host Unreachable group 100 r1 class example org RITICAL Host Unreachable group 100 r1 class example org RITICAL Host Unreachable group 100 r1 class example org ING OK Packet loss 0 RTA 7 45 ms PING CRITICAL Packet loss 44 RTA 6298 01 ms PING CRITICAL Packet loss 0 RTA 649 55 ms PING OK Packet loss 0 RTA 9 96 ms PING OK Packet loss 0 RTA 7 49 ms CRITICAL Host Unreachable group 100 r1 class example org PING CRITICAL Packet loss 100 PING OK Packet loss 0 RTA 4 31 ms PING CRITICAL Packet los
35. RRD data is written to var lib cacti is writeable by this user e cacti log is in var log cacti e I found through var log messages that poller php was being run twice once in etc crontab as cactiuser and once in etc cron d cacti as user wwwrun comment out the line in etc cron d cacti and check again that cactiuser can write to the data files in var lib cacti 150 e The initial console page is a good starting point to add devices to monitor and associated graphs About the author Jane Curry has been a network and systems management technical consultant and trainer for 20 years During her 11 years working for IBM she fulfilled both pre sales and consultancy roles spanning the full range of IBM s SystemView products prior to 1996 and then when IBM bought Tivoli she specialised in the systems management products of Distributed Monitoring amp IBM Tivoli Monitoring ITM the network management product Tivoli NetView and the problem management product Tivoli Enterprise Console TEC All these products are based around the Tivoli Framework product and architecture Since 1997 Jane has been an independent businesswoman working with many companies both large and small commercial and public sector delivering Tivoli consultancy and training Over the last 5 years her work has been more involved with Open Source offerings 151
36. SNMP status check not ping A summary of host status is given on the Tactical Overview display The Host Detail display then gives further information for each device The hosts monitored using check ping show the Round Trip Average RTA Note that group 100 a1 is monitored using the check_ifstatus plugin so shows different Status Information 28 Nagios Mozilla Firefox File Edit View History Bookmarks Tools Help gt Home Documentation Tactical Overview Service Detail Host Detail Hostgroup Overviev Hostgroup Summary Hostgroup Grid icegroup Overvll cegroup Summ cegroup Grid Is Map Status Map Service Problems Unhandled Host Problems Unhandled Network Outages Comments Downtime Process Info Performance Info Scheduling Queue Reporting Trends Availability Alert Histogram Alert Histo Li group 100 a1 class exa fit N nttp nagios3 nagios N Nagios Current Network Status Last Updated Wed Jul 2 12 05 11 BST 2008 Updated every 90 seconds Nagios 3 0 1 www nagios org Logged in as nagiosadmin View Service Status Detail For All Host Groups View Status Overview For All Host Groups View Status Summary For All Host Groups View Status Grid For All Host Groups Figure 14 Nagios Host Detail display 02 07 2008 12 04 08 Od Oh 28m 6s 02 07 2008 12 00 15 Od 3h 9m 53s 02 07 2008 12 02 55 Od 3h 3m 33s 02 07 2008 12 01
37. ability to create object templates and thus an object hierarchy makes definitions flexible and easy once you have defined your hierarchies 26 A great benefit of this configuration file is the ability to denote the network devices that provide access to specific nodes parent child relationship This means that a map hierarchy can be displayed and also means that node reachability is encoded If for example all nodes on the 172 31 100 32 network inherit from a template that group 100 r3 stanza when group 100 r3 goes down then Nagios knows that all nodes in that network are unreachable rather than down Defining multiple parents for a meshed network seemed problematical though includes a parents Nagios automatically generates a topology map based on the the parents stanzas in the configuration files Colour coding provides status for nodes Nagios Mozilla Firefox File Edit View History Bookmarks Tools Help Z 7 ih N hitp nagios3 nagios Li group 100 r1 class exam N Nagios Last Upudieu rue Jui I 1J 17 12 UJI ZUUO Nagios Updated every 90 seconds Nagios 3 0 1 www nagios org Logged in as nagiosadmin View Status Detail For All Hosts Home View Status Overview For All Hosts Documentation Tactical Overview Service Detail Host Detail Hostgroup Overview Hostgroup Summary Hostgroup Grid Servicegroup Overvie Servicegroup Summa Serviceqroup Grid Sta
38. c device zSnmp Community here managelp system traceroute q 1 w 2 device managelp Figur 102 Zenoss Commands provided out of the box From a device s main page there is a submenu to Run Commands Ze n SS CORE Main Views Manage Network Map os Gasca Run Commands ping st snmpwalk I traceroute Software Events IP 172 31 100 21 Status Up Component Type IpRouteEntry Ipinterface Device Information Organizers Location None Groups None Systems None Collector localhost MET ET Ei Figure 103 Zenoss Run 120 os Tag Serial HW Make HW Model OS Make Cisco 2924XLv Unknown IOS 120 5 1 XP Rack Slot 0 OS Version Commands for a particular device Although much of the availability monitoring that has been demonstrated so far relies on SNMP it is also possible to use ssh or telnet to contact remote devices and run monitoring scripts on them 8 3 Problem management The Zenoss event management system can collect events from syslogs windows event logs SNMP TRAPs and XML RPC in addition to managing events generated by Zenoss itself such as availability and performance threshold events When an event arrives in the Status table of the events database the default state of the event is set to New The event can then be Acknowledged Suppressed or Dropped From there an event wi
39. each of the services This managed parameter can be overridden at the end of capsd configuration xml by unmanaged range stanzas lt ip management policy unmanaged gt lt specific gt 0 0 0 0 lt specific gt range begin 127 0 0 0 end 127 255 255 255 lt ip management gt When a new suspect event is generated provided the IP address is in a managed management policy range the IP address is checked for each of the services in capsd configuration xml starting from the top If the device does not respond to any configured service then even if triggered with send_event pl it will not be added to the OpenNMS database Look in opt opennms logs daemon discovery log for debugging information 7 1 3 Topology mapping and displays OpenNMS does not use a topology mapping function in the core code indeed some of its proponents are vociferous that you do not need a mapping ability There is a mapping capability if you use an Internet Explorer web browser with a specific Adobe Scalable Vector Graphics SVG plugin this is only supported in IE and did not work for me There is also a maps on firefox code branch but performance is said to be poor and the maillists suggest that neither mapping capability is heavily used A Node List is available from the main menu where each node name is a link to a detailed node page 51 Node List OpenNMS Web Console Mozilla Firefox Lei xj File Edit View History Bookmark
40. lt specific gt lt definition gt lt definition retry 2 timeout 1000 gt lt range begin 172 31 100 1 end 172 31 100 254 gt lt definition gt definition read community fraclmye write community rrwatr gt range begin 10 0 0 1 end 10 0 0 254 definition lt snmp config gt The first stanza in snmp config xml provides global default parameters for SNMP access Variations in any of these global parameters can be made using a definition stanza and either a range or a specific statement This file is used both for discovery and for collecting performance data 49 When testing SNMP capsd makes an attempt to receive the sysObjectID MIB 2 variable 1 3 6 1 2 1 1 2 0 If successful then extra discovery processing takes place First three threads are generated to collect the data from the SNMP MIB 2 system tree and the ipAddrTable and ifTable tables If for some reason the ipAddrTable or ifTable are unavailable the process stops but the SNMP system data may show up on the node page Second all of the IP addresses in the ipAddrTable are run through the capsd capabilities scan Note that this is regardless of how management is configured in the configuration file This only happens on the initial scan and on forced rescans On normal rescans by default every 24 hours IP addresses that are unmanaged in capsd are not polled Third every IP address in the ipAddrTable that supports SNMP is t
41. management 44i eno died evite ve o a E Drev ree d v aea ted 11 4 2 2 Systems THanag emen cu coti ea Vei Weave Eee Ug n Rupe aur sedo i eu ie 12 4 3 What is o ut oP Scope ier eriatvrteronai iae bh Gic dbi ev d Fd va ER E a teo 13 A quick look at Cacti The Dude and netdisco sees 14 Sy OS GUT os aS aaa asa NO MT RETRO UN MER NM UT 14 D12 S BEOISCO Ure ratus obi BUE i eratis voyeur beg deeusas dened tae ures ageue M mel ecient 17 5 9 Ehe Dea eared Oe see ca Ca A dba atout NOLES ICES NRE a EAE NT te 20 lta vU 21 6 1 Configuration Discovery and topology eeeeeeeseeeseeeeeeeneenenennnn 22 6 2 Availability monIt rin gs uds e ee s sa aed saad a ci air d ales REN 27 6 3 Problem manavemen sists nuie enit e dia oi s E o NN Wa a es 32 6 3 1 Event Console aede ne ea o eade i a Ra tria bea dole bett es eoa 33 6 3 2 Internally generated CVENtS cccccccceccceccceecceecceecceecceecceecceecceeceeeeeeesuaanasees 37 6 3 38 SNMP TRAP reception and configuration eeseeeeeeeeeeeeeenen 39 6 3 4 Nagios notifications oie E D D Ed e p ps 39 6 3 5 Automatic responses to events event handlers sssssssss 41 6 4 Performance rianagemendt 5o nic Poe udi bm dae ena as 42 5 9 IN ASICS SUTRA Ty e cede b etd RO OBRA DM EROR ARRA I RATHER S EXER RU TE NY OMM 45 capi Mer I e ERN 46 7 1 Configuration Discovery and topology ccccccccccccssec
42. mapping The Edit tab allows editing of any of these fields 8 3 3 SNMP TRAP reception and configuration Zenoss automatically listens for SNMP TRAPs on UDP 162 the well known trap port using the zentrap process Some generic TRAPs 2 3 and 4 for Link Down Link Up and Authentication Failure are automatically mapped to defined classes Other generic TRAPs such as 0 1 for Cold Start and Warm Start appear as the Unknown event class as will any specific TRAPs It is simple to map such events to an already 125 configured event class by selecting the occurrence of the event and using the pull down menu to select Map Events to Class pick the correct class from the scrollable list It is also possible to create new event classes Starting from Events on the left menu navigate to the place in the event class hierarchy under which you want to create a new class and use the drop down menu to Add New Organizer and give the class a unique name ce IP Search admin Pref Main Views Classes Mappings Events zProperties Dashboard Status SubClass Count 15 Instance Count Network Map Classes SubClasses Events Add New Organizer Processes Move Organizers Products Delete Organizers Browse By Systems pert Groups Locations Networks Reports T Update Web winservice Add Device F wmi xmiRpe Ping E Snmp Management og ISi oim ISi o fe omo o iSi o IW oov oooo oo zenwinmodeler
43. modelling plugins as distinct from performance plugins zCollector plugins are applied to device classes or devices through the zProperties tab use the Edit link alongside zCollectorPlugins to show or modify the plugins applied and available Main Views Software Events Perf Edit Dashboard Sortable Selection Event Console Name zCollectorPlugins Path Device List Network Map R a Classes zenoss snmp NewDeviceMap x zenoss cmd darwin cpu rA EVERE zenoss snmp DeviceMap x zenoss cmd darwin ifconfig vents i zenoss snmp InterfaceMap zenoss cmd darwin memory zenoss snmp RouteMap zenoss cmd darwin netstat_an 3 zenoss cmd darwin process Processes s zenoss cmd darwin swap Products zenoss cmd df Browse By zenoss cmd linux ifconfig Systems Groups Locations Networks Reports Available fields drag to other list to add Figure 97 Zenoss zCollectorPlugins Note that the Add Fields Hide Fields appears greyed out but does actually work The plugins shown on the left in the screenshot above are the default for the Devices class The Devices Server class has several more SNMP based plugins by default and the Devices Server Windows class has an extra wmi WinServiceMap plugin Documentation on these plugins seems a little sparse but here are a few clues 116 Rena Vicon Classes Events zProperties Templates Dashboard Sortable Selection Event Console Device List Network Map
44. netdisco device_inv htm x be IG i Firefox Support Plug in FAQ RSS Feeds Netdisco Device Inventory By Age By Model By OS By Location Wireless SSID By Age Network Map Find Devices Last Updated gt not in gt 2 gt months x Search Device Search Find Devices That have been up for at least 2 x months z Search Device Inventory By Model Node Search Vendor Model Count netSnmpAgentOIDs 10 3 cisco 2924XLv 2 Port Report cisco 3640 4 Duplex Mismatch cisco 7206 2 Finder cisco wsc 1900 1 Total 9 Node Inventory Backend Log By OS Documentation os Version Count About Unknown Unknown 3 catalyst 8 01 02 il User jane Logout ios 12 0 12 2 Change Password ios 12 0 5 1 XP 2 ios 12 0 7 XK1 1 Total 9 By Location Inventory by Location Wireless SSID Inventory Wireless SSID Inventory Netdisco 0 95 Done e Adblock Figure 4 Netdisco main device inventory display 18 Network Map Device Search Device Inventory Node Search Port Report Duplex Mismatch Finder Node Inventory Backend Log Documentation About User jane Logout Change Password Done netdisco Device View Mozilla Firefox slm xj File Edit View History Bookmarks Tools Help E Tu e ft O http netdisco netdisco device html ip 10 191 100 4 amp submit Show All Po
45. options w u c r specifies notifications on service warning unreachable critical recovery events e Host service notification period notifications only sent during this period eg 24x7 workdays e Host service notification interval if notification already sent problem still extant and notification period exceeded then send another notification 39 Once each of these filters for notification has been tested and passed contact filters are then applied for each contact in the group s indicated in the host or service contact_groups stanza Here is the default definition IIHEIEIIEI III EI AE AE HE AE E AAE AE AE E AE AE DA AE HA AE AE SS SE I E I T EAEE MEAE HAE E AE AE HE AE HE AE AE HE AE HE AA AE BE AE HE A AE AA AE HE AA AE BA AE HE AA AE PA AE HE AA AA DAAE HE AE A AA A BA A A AA AA PA AA E AA AA DA A E AE A BA AE HE A AE PA DE PEIE HEME EMEEN t CONTACT TEMPLATES t ARSE ESE SESE PEST HEHE SE E HE AE HE BEE DEHE HE AE HE DEAE DEAE HE HE E EE SE ESE EIE NE IIHEIHEIIEREI IET EIE IER ESE EE HE AE HE AE AE PE AE HE AE AE PE DE HE AE AE DEDE HE AE HE DEAE E AE A AE AE HE AE E BA AE HH AE E AAE DA AE E AE AE DA AE E AE AE HADE HHAH EHEH EI T Generic contact definition template This is NOT a real contact just a template lefine contact name generic contact The name of this contact template seruice notification period 24x seruice notifications can be sent anytime host notification period 24x host notifications can
46. over time and not necessarily just a single action although it can be The destination path continues to be walked until all notifications and escalations have been sent or the notification is acknowledged automatically or by manual intervention Out of the box the only destinationPath that is configured is for javaEmail to the Admin group of users The notifications xml file species what events trigger notifications and to whom Here is an example from the default file 74 E jane opennms skills 1st co uk loptlopennmsletc Shell Konsole 2 Session Edit View Bookmarks Settings Help K xml version 1 0 encod ing UTF 8 gt notifications xmlns http xmlns opennns org xsd notifications gt lt nsi header xmlns nsi http xmlns opennns org xsd tupes reu xmlns gt 1 2 lt reu gt created xmlns Uednesday July 9 2008 1 33 51 PM GMT lt created gt lt mstation xmlns gt localhost lt mstat ion gt lt ns1 header gt notification name interfaceDoun status on uriteable yes gt uei xmlns gt uei opennms org nodes interfaceDoun lt uei gt lt rule xmlns gt IPADDR t 0 0 0 0 lt rule gt lt destinationPath xmlns gt Email Admin lt dest inat ionPath gt lt text message xmlns gt All services are down on interface interfaceresoluex zinterfacez on node nodelabel New Outage records have been created land service level availability calculations will be impacted until
47. programs 2 1 Choosing systems management tools Every organisation has different priorities for the criteria that drive tool selection For the moment let s leave aside the technical metrics and look at some of the other decision factors e Ease of use not just what demos well but what implements well in your environment e Skills necessary to implement the requirements versus skills available e Requirements for and availability of user training e Cost all of it not just licences and tin evaluation time maintenance training e Support from supplier and or communities e Scalability e Deployability management server s ease of installation and agent deployment e Reliability e Accountability the ability to sue charge the vendor if things go wrong If accountability is high in your priorities and the software cost is a relatively low priority then you are likely to choose one of the commercial offerings however if you have a well skilled workforce or one prepared and able to learn quickly and overall cost is a limiting factor then Open Source offerings are well worth considering Interestingly you can find offerings that suit all the other bullets above from both the commercial and the Open Source stables 2 2 The advantages of Open Source One attraction of Open Source to me is that you don t actually have to fund salesfolk Some costs do need to be invested in your own people to investig
48. respond to ping is to use a provided Perl script opt opennms bin send event pl interface ip addr gt uei opennms org internal discovery newsuspect 7 1 2 Service discovery When a new suspect event has been generated by the discovery process it is the capabilities daemon capsd that takes over and discovers services on a system capsd is configured using opt opennms etc capsd configuration xml Thus discovery in OpenNMS consists of two parts discovering an IP address to monitor the discover process and then discovering the services supported by that IP address the capsd process The basic monitored element is called an interface and an interface is uniquely identified by an IP address Services are mapped to interfaces and if a number of interfaces are discovered to be on the same device either via SNMP or SMB then they may be grouped together as a node capsd uses a number of plugins supplied with OpenNMS to discover services Each service has a protocol plugin stanza in capsd configuration xml For example lt protocol plugin protocol SSH class name org opennms netmgt capsd TcpPlugin scan on user defined false gt lt property key banner value SSH gt lt property key port value 22 gt lt property key timeout value 3000 gt 48 lt property key retry value 1 gt protocol plugin This defines a service protocol called SSH that tests TCP port 22 us
49. return the total number of parameters parm lt num gt Will return the value of parameter number lt num gt Any of this data can be used in the message or description fields In addition the varbind data can also be used to filter the event within the lt mask gt tags following the lt maskelement gt tags It is possible to match more than one varbind and more than one value per varbind For example lt varbind gt lt vbnumber gt 3 lt vbnumber gt lt vbvalue gt 2 lt vbvalue gt lt vbvalue gt 3 lt vbvalue gt lt varbind gt lt varbind gt lt vbnumber gt 4 lt vbnumber gt lt vbvalue gt 2 lt vbvalue gt lt vbvalue gt 3 lt vbvalue gt lt varbind gt The above code snippet will match if the third parameter has a value of 2 or 3 and the fourth parameter has a value of 2 or 3 It is also possible to use regular expressions when matching varbind values Again note that the order in which events are listed is very important Put the most specific events first Here is an example definition that includes matching a varbind with a regular expression Note the vbvalue matches any string that contains either Bad or bad Extra stanzas have also been added for lt operinstruct gt help which provides a web link on one line and plain text on the second a lt mouseovertext gt tag which doesn t appear to work and a tag to run an automatic action a shellscript whenever this event occurs 68
50. service monitoring for a specific device access the main page for a device and open the OS tab Under the IP Services section click on the Name column header to see services detected Click on the service name which brings up the service status window for the device where the Monitor field can be changed don t forget to click the Save button Note that the Monitored box in the IP Services heading bar can be used to toggle the display between detected services and monitored services Note that the drop down menu to Add IpService is driven by typing in a partial match of the service name you want the subsequent dropdown then shows configured services that match your selection 8 2 3 Process availability monitoring Unix Linux process monitoring relies on the SNMP Host Resources MIB on the target device Processes to be monitored can be flexibly defined using regular expressions Start from the Processes menu to see processes defined there are none out of the box Use the drop down menu to Add process 113 admin Sequence Administration zProperties Modifications Se D EDAM I Delete Pr 3 mahjongg Move Pro Figure 93 Zenoss Processes with drop down menu Supply a process name and it will be added to the list To modify the definition of the process click on the process name and select the Edit tab admin Preferences Logout Help irefox firefox False
51. specifies packages for collection A package combines filters and ranges to determine which interfaces collections should be applied to with services which reference collections in datacollection config xml collectd configuration xml can also specify data collection intervals and whether the collection is active Note that if a device has several interfaces that e Support SNMP e Have a valid ifIndex e Isincluded in a collection package in collectd configuration xml then the lowest IP address is marked as primary and will be used by default for all performance data collection collectd is triggered when capsd generates a NodeGainedService event The discovered protocol name eg SNMP SSH is passed from capsd to collectd along with the primary interface from the event These are checked against the configuration in collectd configuration xml to see whether any collection packages are valid there should be at least one by definition and data collection is started Session Edit View Bookmarks Settings Help K xml version 1 0 gt lt castor class name org opennns netngt col lectd Col lectdConf igurat ion gt collectd configuration threads 50 gt lt package name examplei gt lt filter gt IPADDR t 0 0 0 0 lt filter gt lt include range begin 1 1 1 1 end 254 254 254 254 gt lt seruvice name SNMP interval 300000 user defined false status on gt lt parameter key collection value default
52. status information retain nonstatus information generic seruice 1 1 1 N e WN Oe e l RR EROR admins U U C r 60 24x local service gener ic service 4 5 1 0 The name of this service template Active service checks are enabled Passive service checks are enabled accepted Active service checks should be parallelized disabling this Se ve se se We should obsess over this service if necessary Default is to NOT check service freshness Service notifications are enabled Service event handler is enabled Flap detection is enabled Failure prediction is enabled Process performance data Retain status information across program restarts Retain non status information across program restarts The service is not volatile The service can be checked at any time of the day Re check the service up to 3 times in order to determine its Check the service every 10 minutes under normal conditions Re check the service every two minutes until a hard state ca Notifications get sent out to everyone in the admins grouy Send notifications about warning unknown critical and rec Re notify about service problems every hour Notifications can be sent out at any time DONT REGISTER THIS DEFINITION ITS NOT A REAL SERVICE JUS Se se ve ve ve se Se se ve ve ve ve ve ve ve to se we just a templatet The name of this service template Inherit default values from the generic seruice definition Re check the seruic
53. steps ie this data is consolidated over 288 5 min steps 1 day The RRA will have 366 rows representing 1 year of data 1 day consolidations 366 days 366 Consolidate the samples provided 0 5 half of them are not UNKNOWN otherwise the consolidated value will be UNKNOWN RRA MAX 0 5 288 366 O create an RRA with MAX values averaged daily and keep 1 year of data RRA MIN 0 5 288 366 O create an RRA with MIN values averaged daily and keep 1 year of data The top of datacollection config xml defines where the RRD repositories are kept and how many variables can be retrieved by an SNMP V2 GET BULK command 10 is the default Within the repository directory for each node there will exist a directory that consists of the node number Thus if the system was collecting data on node 21 there would be a directory called opt opennms share rrd snmp 21 containing a datafile for each MIB OID being collected File names will match the alias parameter for a MIB OID in datacollection config xml The node number can be found by going to the detailed node information for a device and choosing the Asset Info link 79 r bino skills 1st co uk Node OpenNMS Web Console Mozilla Firefox File Edit View History Bookmarks Tools Help e Li http opennms 8980 opennms element node jsp node 4 BE IGi N Nagios E bino skills 1st co G3 O Nagios Addons Node List Searc Outages Path Outa shboard Events Ala
54. the scale of features and complexity some offerings are slanted more towards network management netdisco The Dude others towards systems management Nagios Some aim to encompass a number of systems management disciplines with an architecture based around a central database Nagios Zenoss OpenNMS Some are extremely active projects with hundreds of appends to maillists per month Nagios Zenoss OpenNMS cacti others have a regular but smaller community with hundreds of maillist appends per year netdisco Some are purely Open Source projects typically licensed under the Gnu GPL MRTG RRDTool cacti or BSD license netdisco some have free versions again typically under GPL with extensions that have commercial licences Zenoss In addition to free licences several products offer support contracts Zenoss Nagios OpenNMS Most are available on several versions of Linux MRTG RRDTool and cacti are also available for Windows The Dude is basically a Windows application but can run under WINE on Linux Most have a web based GUI supported on Open Source browsers OpenNMS can only display maps by using Internet Explorer 4 Criteria for Open Source management tool selection It is essential to define what is in scope and what is out of scope for a systems management project A prioritised list of mandatory and desirable requirements is helpful 4 1 General requirements For the purposes of this paper here are my selectio
55. this outage is resolved lt text message gt subject xmlns Notice t znoticeid Zzinterfaceresoluez interface on node znodelabelz down lt sub ject gt lt numeric message xmlns gt 111 not ice id lt numer ic message gt lt snotificat ion gt notification name nodeDoun status on uriteable yes gt lt uei xmlns gt uei opennns org nodes nodeDown lt ue i gt lt rule xmlns gt IPADDR t 0 0 0 0 lt rule gt lt destinationPath xmlns gt Email Admin lt dest inat ionPath gt lt text message xmlns gt All services are down on node znodelabelc New Outage records have been created and service level availability calculations will be impacted until this outage is resolved lt text message gt lt sub ject xmlns Notice ttznoticeidz node nodelabel down lt sub ject gt lt numeric message xmlns gt 111 not ice id lt numer ic message gt amp notification Figure 53 OpenNMS Extract of notifications from notifications xml The notification called interfaceDown is turned on it applies to all interfaces other than 0 0 0 0 the notification is sent to the destination Email Admin defined in destinationPaths xml and the text message of the email includes 3 parameters from the event 4 parameters are included on the email subject The default notifications xml generates email to the Admin group for the following events e interface Down e nodeDown e nodeLostService e nod
56. to include any graphable resource Figure 66 OpenNMS KSC Reports menu Selecting a node and clicking View child resources results in a menu of report categories 89 N Nagios Choose Resource KSC Reports OpenNMS Web Console Mozilla Firefox File Edit View History Bookmarks Tools Help gt EC ft E ttp opennms 8980 opennms KSC customGraphChooseResource htm st l gt Home Reports KSC Reports Custom Graph Node bino skills 1st co uk Choose the current resource This resource has no available prefabricated graphs Select a child resource or the parent resource if any View child resources SNMP Node ta No evel Performance SNMP Interface Data ethi 10 0 0 121 100 Mbps SNMP Interface Data vmneti 172 16 222 1 10 Mbps SNMP Interface Data vmnet2 10 191 0 1 10 Mbps SNMP Interface Data vmnet3 172 16 223 1 10 Mbps SNMP Interface Data vmnet4 192 168 1 200 10 Mbps SNMP Interface Data vmnet8 192 168 10 1 10 Mbps Response Time 192 168 1 200 Response Time 172 16 222 1 Response Time 10 191 0 1 View child resource Choose child resource O Choose Resource G Nagios Addons View the parent resource This resource has no parent You can use the View top level resources button to see all top level resources View top level resources SourceForge net User admin Notices On Log out 05 Au j
57. vj Browse By G Figure 94 Zenoss dialogue for modifying process definition To modify the zProperties of a process use the zProperties tab 114 ZENOSS core Main Views Status Edit Administration zProperties Modifications zProperties Configuration Type Network Map boolean boolean Classes int boolean Delete Local Property Figure 95 Zenoss zProperties for the firefox process To apply process monitoring to a device from the OS tab of the device page select the drop down menu and use the Add OSProcess menu Defined processes are selectable from the drop down window n Add OSProcess Process Class firefox iv firefox ftp OK mahjongg I E Figure 96 Zenoss Add OSProcess monitoring to a specific device 115 Note that there are currently July 4th 2008 a couple of bugs to do with process monitoring whereby processes disappear from the OS tab of a device and or show the wrong status tickets 3408 3399 3270 To mitigate against these the zenprocess daemon should be stopped and restarted whenever modifications have been made to do with processes You can use the GUI by choosing Settings and selecting the Daemons tab Temporarily it would also be wise to use the menu for the process and select to Lock the process from Deletion More sophisticated availability monitoring can be implemented using standard zCollectorPlugins note that these are
58. with the trap 1 They were 1 3 6 1 4 1 123 1234 bad news 1 Operator Instructions No instructions available Acknowledge Figure 45 OpenNMS Event detail for an unformatted TRAP TRAPs are configured in eventconf xml or an include file using the mask tag This tag specifies mask elements with name value pairs that must match data delivered by the TRAP in order for this particular event configuration to match 66 jane opennms skills 1st co uk loptlopennmsletc events Shell Konsole N N Session Edit View Bookmarks Settings Help lt event gt lt mask gt lt maske lement gt lt mename gt gener ic lt mename gt lt meva lue gt 6 lt meva lue lt maske lement gt lt smask gt Xuei uei opennns org gener ic traps Enterpr iseDefault lt uei gt Xeuent label penNMS defined trap event EnterpriseDefault lt event label gt lt descr gt amp lt p amp gt This is the default event format used when an enterprise specific event trap is received for which no format has been configured i e no event definition exists amp lt p amp igt amp 1lt p amp gt The total number of arguments received with the trap zparm amp 1t 7p amp gt amp lt p amp gt They uere amp lt p amp gt amp lt p amp gt 7zparm alllz amp lt p amp gt lt descr gt lt logmsg dest logndisplay gt Received unformatted enterprise event enterprise id generic zgenericz specific zspecificz zparm a
59. you can also write your own One of the standard plugins is check_snmp which can be used to query any host for any SNMP MIB variable this obviously requires the target to support SNMP and the MIB in question 45 It is also possible to run checks on remote hosts by installing the NRPE agent available for both Unix Linux and Windows hosts and the required Nagios plugins on the remote system The check_nrpe plugin must also be installed on the Nagios system This allows plugins designed to be run local to the Nagios system to be run on remote hosts With NRPE agents checks are run on a scheduled basis initiated from the Nagios system Another alternative is to install the NSCA addon to remote systems This permits remote machines to run their own periodic checks and report the results back to Nagios which can be defined as passive service checks The event subsystem of Nagios is less powerful and configurable than some of the other offerings it has less focus on an event console but includes more information about host and service events from other menus Nagios has no easy built in way to collect and process SNMP TRAPs If you want lots of performance graphs then Nagios alone is not going to deliver easily In summary Nagios seems good for monitoring a relatively small number of systems provided you don t need historical performance reporting 7 OpenNMS OpenNMS presents itself as the first Enterprise grade netwo
60. 008 Ai notifications z Nagios 3 0 1 www nagios org Log File ies Fi Logged in as nagiosadmin Latest Navigation Older Entries First DE un View Status Detail For This Host Archive Thu Jul 31 00 00 00 Li Update VUES View History For This Host BST 2008 z F View Trends For This Host to Present Tactical Overview Service Detail File usr local nagios var nagios log Host Deta MEAN On E dues a ain Grid group 100 r1 WA OSTIUPIN 31 07 2008 10 36 06 nagiosadmin notify host by email PING OK Packet loss 0 RTA 17 06 ms Servicegroup Overv group 100 r1 N A HGSHOGWA 31 07 2008 10 17 26 nagiosadmin notify host by email CRITICAL Host Unreachable group 100 r1 class example org Servicegroup Summ Servicegroup Grid Status Map 3 D Status Map Service Problems Unhandled Host Problems Figure 25 Nagios Host Notifications 40 6 3 5 Automatic responses to events event handlers Nagios can run automatic actions event handlers when a service or host Is in a SOFT problem state Initially goes into a HARD problem state Initially recovers from a SOFT or HARD problem state There is a global parameter enable_event_handlers which must take the value 1 true before any automation can take place There are two global parameters global_host_event_handler and global_service_event_handler which can be used to run commands on all host service events These might be used say to log all events to an
61. 02 00 04 00 06 00 08 00 Blused Other Avg 667 61 M Min 596 89 M Max 856 67 M E I0 Buff Ram Avg 38 16 M Min 1 94 M Max 94 99 M Wi shared Mem Avg 0 00 Min 0 00 Max 0 00 W Filesystem Cache Avg 1 23 G Min 1017 34 M Max 1 33 G E Avail Real Mem Avg 64 55 M Min 25 54M Max 185 20 M Bl Total Swap Avg 2 68 G Min 2 64 G Max 2 72G W Total Real Mem Avg 1 98 G Min 1 98 G Max 1 98 G CPU Usage 100 80 60 40 i 20 14 00 16 00 18 00 20 00 22 00 00 00 02 00 04 00 06 00 08 00 H user Avg 32 85 Min 20 45 Max 98 30 E nice Avg 22 41 Min 13 79 Max 77 02 Owait Avg 22 25 Min 13 75 Max 73 56 El System Avg 18 92 Min 11 71 Max 40 87 Wi Interrupts Avg 293 14 m Min 104 75 m Max 689 95 m Figure 65 OpenNMS partial display of the node level performance data graphs If you wish to create more selective sets of graphs for other people to use the Key SNMP Customized KSC Reports menu to create your own reports which can include graphs of selected MIB variables from one device or can select MIB variables from different devices Using the Create New button will prompt for nodes that have data collections configured as Child Resources 88 File Edit View History Bookmarks Tools Help Performance Reports KSC OpenNMS Web Console Mozilla Firefox 7 e gt E t Li http opennms 8980 opennms KSC index htm gt N Nagios o
62. 02 30 000 Message ip 10 0 0 97 is down Event Detail Acknowledge Delete Device Events Figure 114 Zenoss email generated by event notification including links 8 3 5 Event automations Any event can be configured to run an automatic script This can be in addition to the email pager alerting rules described above Such automation scripts are known as Zenoss Commands and are run by the zenactions daemon They are configured from the Event Manager left hand menu using the Commands tab ZeNOSS 1c germ Main Views Dashboard Console P List Network Map Default Command Timeout secs Delay secs Repeat Time secs Classes Groups Locations Networks Reports Event Class v Status Snmp rap 123 xj p Management Add fiter sy Mibs _Save Figure 115 Zenoss Event Command definition 131 8 4 Performance management Zenoss can collect performance data and threshold it using either SNMP through the zenperfsnmp daemon or by commands typically ssh using the zencommand daemon The data is stored and displayed using RRD Tool 8 4 1 Defining data collection thresholding and graphs Configuration of performance data collection thresholding and display is done through templates As with other Zenoss objects templates can be applied to a specific device or to a higher level in the device class object hierarchy To see all the defined templates navigate to the Devices page and use the left hand d
63. 06 07 2008 10 13 56 07 2008 10 13 56 07 2008 10 13 56 31 07 2008 10 13 46 31 07 2008 10 13 46 3 3 3 3 gt gt G A Object Definitions HUST ALERT group TUU s1 UNREACHABLE HARU 4 CRITICAL Host Unreachable group TUU s1 class example org HOST NOTIFICATION nagiosadmin group 100 c3 UNREACHABLE notify host by email CRITICAL Host Unreachable xample org HOST ALERT group 100 c3 UNREACHABLE HARD 4 CRITICAL Host Unreachable group 100 c3 class example org HOST NOTIFICATION nagiosadmin group 100 r1 DOWN notify host by email CRITICAL Host Unreachable group 100 r1 class example org HOST ALERT group 100 r1 DOWN HARD 4 CRITICAL Host Unreachable group 100 r1 class example org SERVICE ALERT group 100 s1 PING CRITICAL HARD 1 CRITICAL Host Unreachable group 100 s1 class example org SERVICE ALERT group 100 r1 PING CRITICAL HARD 1 CRITICAL Host Unreachable group 100 r1 class example org SERVICE ALERT group 100 c3 PING CRITICAL HARD 1 CRITICAL Host Unreachable group 100 c3 class example org HOST ALERT group 100 c1 UNREACHABLE SOFT 3 CRITICAL Host Unreachable group 100 c1 class example org HOST ALERT group 100 r1 DOWN SOFT 3 CRITICAL Host Unreachable group 100 r1 class example org HOST ALERT group 100 c3 UNREACHABLE SOFT 3 CRITICAL Host Unreachable group 100 c3 class example org HOST ALERT group 100 s1 UNREACHABLE SOFT 3 CRITICAL Host Unr
64. 1 2 23 47 e net snmp 5 4 1 19 149 e MySQL 5 0 45 22 Cacti as well as all of the prerequisites were available on the Open SuSE 10 3 standard distribution DVD Use the Installation under Unix instructions available from http www cacti net downloads docs html install_unix html A few modifications were required such as e No PHP5 configuration was done as the files documented in the installation guide did not exist e Configuration of Apache2 required no modifications in etc apache2 conf d php5 conf e Cacti was installed using the standard SuSE Yast mechanism e Create the MySQL database by cd usr share cacti mysql user root p and supply the root password when prompted create database cacti source cacti sql GRANT ALL ON cacti TO cactiuser localhost IDENTIFIED BY cacti Note that cacti in the above command is the password for the user cactiuser e You need to manually create the Operating System user cactiuser with password cacti e When pointing your web browser at http lt your server gt cacti ensure that you include the trailing slash Use a web logon of admin password admin e Ensure that apache2 and mysql are either manually started etc init d lt name gt start or start them automatically at system start using chkconfig e Ensure that the cactiuser user id can execute the usr share cacti poller php script that is run by etc crontab e Also ensure that the directory that the
65. 1 UP HARD 1 PING OK Packet loss 0 RTA 17 06 ms 07 2008 10 35 56 SERVICE ALERT group 100 c1 PING OK HARD 1 PING OK Packet loss 0 RTA 76 80 ms 07 2008 10 35 56 HOST NOTIFICATION nagiosadmin group 100 r2 UP notify host by email PING OK Packet loss 0 RTA 69 72 ms 07 2008 10 35 56 HOST ALERT group 100 r2 UP HARD 1 PING OK Packet loss 096 RTA 69 72 ms 07 2008 10 35 56 HOST NOTIFICATION nagiosadmin group 100 s1 UP notify host by email PING OK Packet loss 0 RTA 70 50 ms 07 2008 10 35 56 HOST ALERT group 100 s1 UP HARD 1 PING OK Packet loss 0 RTA 70 50 ms 07 2008 10 35 56 HOST NOTIFICATION nagiosadmin group 100 c3 UP notify host by email PING OK Packet loss 0 RTA 65 99 ms 07 2008 10 35 56 HOST ALERT group 100 c3 UP HARD 1 PING OK Packet loss 0 RTA 65 99 ms 07 2008 10 35 56 HOST NOTIFICATION nagiosadmin group 100 c2 UP notify host by email PING OK Packet loss 096 RTA 67 17 ms 07 2008 10 35 56 HOST ALERT group 100 c2 UP HARD 1 PING OK Packet loss 096 RTA 67 17 ms 07 2008 10 35 56 HOST NOTIFICATION nagiosadmin group 100 c1 UP notify host by email PING OK Packet loss 0 RTA 66 76 ms 07 2008 10 35 56 HOST ALERT group 100 c1 UP HARD 1 PING OK Packet loss 0 RTA 66 76 ms 07 2008 10 35 56 SERVICE ALERT group 100 r2 PING OK HARD 1 PING OK Packet loss 0 RTA 78 73 ms 07 2008 10 35 46 SERVICE ALERT g
66. 1 Week 32 Q 2008 06 27 11 48 30 to 2008 08 08 11 48 30 Bl laLoadInt5 cur 1 64 avg 2 06 max 5 28 load Products L Load Average 5 min Browse By syste 6 0 5 0 4 0 3 0 processes R 2 0 1 0 Week 27 Week 28 Week 29 Week 30 Week 31 Week 32 Q 2008 06 27 11 48 30 to 2008 08 08 11 48 30 Device Bl laLoadInt5 cur 1 64 avg 2 06 max 5 28 L i CPU Utilization etting 100 80 60 40 20 percentage 04 Week 27 Week 28 Week 29 Week 30 Week 31 Week 32 2008 06 27 11 48 30 to 2008 08 08 11 48 30 a B ssCpuRawWait gt 5 Bl ssCpuRawSys tem cur 27 2 avg 21 4 max 41 1 W ssCpuRawUser cur 12 5 avg 8 3 max 24 1 E El ssCpuRawWait cur 5 8 amp avg 4 3 max 14 6 CPU Idle ga Figure 125 Zenoss Performance graphs available under the Perf tab for bino Note that the Reports left hand menu also provides access to various reports including performance reports 140 SubFolders k Map Custom Device Reports Device Reports Event Reports Classes Graph Reports Multi Graph Reports Performance Reports User Reports 10f7 custom Device Reports v show all Page Size lao ok Management Figure 126 Zenoss Reports menu Following the Performance Reports link provides access to all performance reports for all devices Zenoss ss admin Preferences Logout Help ex Reports Performance Reports Zenoss server time 4 55 54 Netw
67. 1 a we s E v Figure 90 Zenoss Windows services Even more IP services come configured out of the box There are two subclasses of IP services Privileged and Registered either can monitor either TCP or UDP ports 111 admin Prefere Main Views Classes Administration zProperties Modifications Dashboard Sub Folders De List ub Folders Network Map Services Classes Port Description L dis mon 198 Directory Location Service Monitor False dn amp nIm aud 195 DNSIX Network Level Module Audit False dn amp smm red 196 DNSIX Session Mgt Module Audit Redir False dna cmi 436 DNA CML False L dnsix 90 DNSIX Securit Attribute Token Map False domain 53 Domain Name Server False 5 doom 666 doom lid Software False 1 dpsi 315 DPS False dsETos 378 NEC Corporation False T dst 555 False E dsfqw 438 dsfgw False dsp 33 Display Support Protocol False dsp3270 246 Display Systems Protocol False dtag ste sb 352 DTAG assigned long ago False Event Manager c dtk 365 DTK False E dwr 644 dwr False echo 7 Echo False ets 520 extended file name server False 5 elcsd 704 errlog copy server daemon False embl ndt 394 EMBL Nucleic Data Transfer False ogeoi5wu ugoNoaomomoiiumoimorm3no emfis cntt 141 EMFIS Control Service False Figure 91 Zenoss Privileged IP services Again note the Count column Clicking on the service name shows where the service has been detected admin Prefe
68. 106 Zenoss Event classes and subclasses To modify the context of any event select the event and use the zProperties tab Main Views Classes Mappings Events zProperties Dashboard zProperties Configuration Value Network Map i status vf Classes zEventClear Classes Po zEvent Severity Original x Save X Figure 107 Zenoss zProperties for the event class Event Status OSProcess 124 Events are mapped to Event Classes by Event Class instances Event Class instances are looked up by a non unique key called EventClassKey When an event arrives it is e Parsed e Assigned to the appropriate class and class key e Context is then applied o Event context is defined in the zProperties of an event class o After the event context has been applied then the device context is applied whereby the ProductionState Location DeviceClass DeviceGroups and Systems are all attached to the event in the event database e Once these properties have been associated with the event Zenoss attempts to update the zEventProperties This allows a particular device or class of devices to override the default values for any given event To change the event mapping select the event class and use the Mappings tab Zenoss Main Views L i Events Modifications Dashboard Event Console Device List Network Map Total Event Count Classes Events Services Processes Products Browse By Systems Figure 108 Zenoss Event
69. 2 class example org group 100 c1 UNREACHABLE SOFT 2 CRITICAL Host Unreachable group 100 c1 class example org group 100 r1 DOWN SOFT 2 CRITICAL Host Unreachable group 100 r1 class example org group 100 s1 UNREACHABLE SOFT 2 CRITICAL Host Unreachable group 100 s1 class example org group 100 c3 UNREACHABLE SOFT 2 CRITICAL Host Unreachable group 100 c3 class example org group 100 c2 UNREACHABLE SOFT 3 CRITICAL Host Unreachable group 100 c2 class example org group 100 r3 UNREACHABLE SOFT 3 CRITICAL Host Unreachable group 100 13 class example org group 100 r2 UNREACHABLE SOFT 3 CRITICAL Host Unreachable group 100 r2 class example org group 100 a1 UNREACHABLE HARD 1 CRITICAL No response from remote host group 100 a1 class example org for group 100 r2 UNREACHABLE SOFT 2 CRITICAL Host Unreachable group 100 12 class example org group 100 r1 DOWN SOFT 1 CRITICAL Host Unreachable group 100 r1 class example org group 100 r3 UNREACHABLE SOFT 2 CRITICAL Host Unreachable group 100 r3 class example org group 100 c1 UNREACHABLE SOFT 1 CRITICAL Host Unreachable group 100 c1 class examp group 100 c2 UNREACHABLE SOFT 2 CRITICAL Host Unreachable group 100 c2 class example or group 100 c3 UNREACHABLE SOFT 1 CRITICAL Host Unreachable group 100 c3 class example group 100 s1 UNREACHABLE SOFT 1 CRITICAL Host Unreachable group 100 s1 class examp group 100 r2 DOWN SOFT 1
70. 2002 significantly improved performance and security issues Much more complex Of the Open Source management solutions available some are excellent point solutions for specific niche requirements MRTG Multi Router Traffic Grapher written by Tobi Oetiker is an excellent example of a compact application that uses SNMP to collect and log performance information and display it graphically If that satisfies your requirement don t look any further but it will not help you with defining and collecting problems from different devices and then managing those problems through to resolution An enhancement of MRTG is RRDTool Round Robin Database Tool again from Tobi Oetiker It is still fundamentally a performance tool gathering periodic numeric data and displaying it but RRDTool has a database at its heart The size of the database is predetermined on creation and newer data overwrites old data after a predetermined interval RRD can be found embedded in a number of other Open Source management offerings Cacti Zenoss OpenNMS A further enhancement from RRDTool is Cacti which provides a complete frontend to RRDTool A backend MySQL relational database can be used behind the Round Robin databases data sources can be pretty well any script in addition to SNMP and there is user management included This is still a performance data collection and display package not a multi discipline framework systems management solution Moving up
71. 35 58 SNMP Attributes Name group 100 ri class example org Object ID 1 3 6 1 4 1 9 1 108 Location Virtual comms rack 100 Contact Andrew Findlay skills 1st co uk Cisco Internetwork Operating System Software IOS tm 7200 Software C7200 DS M Version 12 0 12 RELEASE SOFTWARE fc1 Copyright c 1986 2000 by cisco Systems Inc Compiled Tue 11 Jul 00 02 09 by htseng Description Interface Index Description IfAlias 10 191 100 4 2 FastEtherneto o Main site aroup 100 r1 class example org network 172 30 100 1 z p neg 3 Serial1 0 remote group 100 r1 class example org Vs Figure 33 OpenNMS node detail for group 100 r1 Note the services that have been discovered for the node The list of services per interface are those that have been actually detected whether they are Monitored or not will be discussed in the next section 7 2 Availability monitoring OpenNMS performs availability monitoring by polling devices with processes known as monitors which connect to a device and perform a simple test Polling only happens to an interface that has already been discovered by capsd The configuration file for polling is opt opennms etc poller configuration xml There are many similarities between this and capsd configuration xml however the monitors are defined with monitor service stanzas rather than protocol stanzas which define the Java class to use for monitoring 53 moni
72. 9 2 2 OpenNMS goodies and baddies Good points Bad points Good OOTB functionality Written in Java log files hopeless Difficult to get individual daemon status Code feels solid No map that works reasonably Clean standard configuration through well organised xml files GUI is wordy difficult for the eye to focus on the important things 146 Good points Bad points Single database PostgreSQL Need to bounce entire OpenNMS when almost any config file is changed LOTS of trap customisation OOTB Event alarm notification architecture is currently a mess under review Ability to do some configuration through web Admin menu No way to change colours of events Easy import of TRAP MIBs mib2opennms No MIB compiler or browser Chargeable support available from The OpenNMS Group Supports Nagios plugins No pdf documentation Wiki hard to find detailed information Some good Howto documents for basic configuration on the wiki Lots of things undocumented when you get down to details 9 2 3 Zenoss goodies and baddies Good points Bad points Good OOTB functionality No correlation between service events and host events Architecture good based around object oriented CMDB database Implementation feels buggy Topology map upto 4 hops Lots of plugins amp zenPacks available No MI
73. Alert History For This Service View Trends For This Service Member of Tactical Overview View Alert Histogram For This Service No servicegroups Service Detail View Availability Report For This Service Host Detail View Notifications For This Service Y Hostgroup Overview bino skills 1st co uk Hostgroup Summary Hostgroup Grid Servicegroup Overview Service State Information Service Commands ihe Current Status FOKI for 27d 6h 55m 13s X Disable active checks of this service Servicegroup Grid x Status Map Status Information DNS OK 0 013 seconds response time www skills 1st co uk returns 212 74 28 155 EX Re schedule the next check of this service 3 D Status Map Performance Data imez0 012726s 0 000000 TT Submit passive check result for this service Service Problems T n s state X Stop accepting passive checks for this service Contents Last Check Time 04 08 2008 17 07 49 M Stop obsessing over this service Unhandled Check Type ACTIVE X Disable notifications for this service Network Outages Check Latency Duration 0 019 0 092 seconds ifi Send custom service notification is Next Scheduled Check 04 08 2008 17 12 49 7 Schedule downtime for this service NENNEN Last State Change 08 07 2008 10 14 36 X Disable event handler for this service Last Notification N A notification 0 3 Disable flap detection for this service comments Is This Service Flapping NGI 0 00 state change rosse Info In Sch
74. All Hosts Enabled Configuration View Config Figure 7 Nagios Tactical Overview screen 21 6 1 Configuration Discovery and topology Nagios uses a number of files to configure discovery out of the box it will find nothing Samples are available by default in usr local nagios etc The main configuration file is nagios cfg which defines a large number of parameters most of which you can leave alone at the outset Typically the main things to discover are hosts and services These are defined in an object oriented way such that you can define host and service top level classes with particular characteristics and then define sub classes and hosts that inherit from their parent classes Rather than having a single huge nagios cfg it can reference other files typically in the objects subdirectory where definitions for hosts services and other object types can be kept So for example usr local nagios etc nagios cfg may contain lines such as cfg file usr local nagios etc objects hosts cfg cfg file usr local nagios etc objects services cfg cfg file usr local nagios etc objects commands cfg Definitions of hosts are built up in a hierarchical manner so the top level definitions may look like the following screenshot Note the use stanza to denote inheritance of characteristics from a previous definition 22 m jane bino Shell Konsole Session Edit View Bookmarks Settings
75. B browser email notifications include URL links back to Zenoss No way to change colours of events Commercial version available Commercial version available Good Quick Start manual Administrators manual and book Lots of things undocumented when you get down to details Supports Nagios amp Cacti plugins 147 9 3 Conclusions What to choose Back to your requirements For smallish systems management environments Nagios is well tested and reliable with a huge community behind it For anything more than simple ping checks plus SNMP checks bear in mind that you may need a way to install remote plugins on target hosts Notifications are fairly easy to setup but if you need to produce analysis on your event log then Nagios may not be the best choice OpenNMS and Zenoss are both extremely competent products covering automatic discovery availability monitoring problem management and performance management and reporting Zenoss has some topology mapping and has better documentation but the code feels less reliable OpenNMS currently has a rather messy architecture around events alarms and notifications though this is said to be under review I also struggle to believe that you have to recycle the whole of OpenNMS if you have changed a configuration file The code feels very stable though My choice hoping fervently that code reliability and documentation improves is Zenoss 148 10 Referen
76. Daily Availability Month To Date Daily Availability Percentage Availability Percentage Availablity m n ES m m E g wt i dec by the iota me mirus fasi mortu Duy Awerage of sec monitored and avalabiity of wes div by total wt misutes of rromt em Lut til date 51 M 2008 t 8 Am 2008 Ol jet 2088 te OF jol 200 Last Months Top Offenders Percentage Availability pitch skilis 1st 60 9 95 70 sco skils 181 60 ok 05 27 hp7410 skils Lst co uk 25 30 group 190 51 class example og f 99 59 group 190 52 class example ceg f 39 59 Qroup 190 c2 class examgle org 99 66 group 190 2 Cass examgle org f 99 66 group 100 s1 dass exampleorg F 99 66 Group 100 91 Cass example nrg 99 66 qroup 190 c3 dlass example org f 95 61 group 190 c 1 class example ong 99 61 seres class example crg 99 6 qroup 190 linux class exaceote onfr 95 61 group 100 3 cass exaegle org 59 61 group 100 s2 class example org 961 Zenass skilis ist co uk F 99 80 hbino skills Ist co uk 99 84 tite skilis 1st co vi 1950900 Figure 36 OpenNMS Overall service availability report Note that there is an opt opennms etc examples directory with extra samples of all the OpenNMS configuration files Also note that OpenNMS needs recycling if any configuration files have been modified Use etc init d opennms stop etc init d opennms start 58 7 3 Problem management For problem management OpenNMS has the concepts of e Events all sor
77. Forge ne Q Forums View t Events Template Modifications MAC Address Administrative Status Speed SNMP Index Monitor Link graphs LA uU 2008 08 06 16 29 54 to WlifinOctets ifOutOctets gt 75 00M Ill Outbound cur 11 54k E Inbound cur 30 96k Pkts sec 2008 08 06 16 29 54 to Bl ifInUcastPackets Bl ifOutUcastPackets Thu 00 00 Thu 12 00 Fri 00 00 2008 08 08 4 29 54 avg 16 53k max 303 17k avg 157 40k max 22 70M Thu 00 00 Thu 12 00 Fri 00 00 2008 08 08 4 29 54 avg 13 25 max 502 75 avg 26 93 max 2 16k Thresholding O Zenoss eth1 a 00 11 25 80 1C 4F Up 100 000Mbps 3 True v Figure 124 Zenoss Performance graphs for eth1 interface on bino You can change the range of data with the Hourly dropdown to daily weekly monthly or yearly Data can be scrolled using the lt gt bars at either side and the and magnifiers can be used to zoom in out By default all graphs on the page are linked so that if you change the range on one it changes for all They can be de coupled with the Link Graphs check box Here is a partial screenshot of the graphs for bino under the Perf tab 139 Status os Hardware Software Events Perf Edit asa Cie Ciel weekly Link graphs m Event C Load Average Main Views Ne 6 0 5 0 4 0 3 0 Ev S 2 0 a i 1 0 Servic Week 27 Week 28 Week 29 Week 30 Week 3
78. Help HOSTS d EE EE SESE SESE SEE EEE HEHEHEHE HE AE AE HEHE E HE HE AE HEHE EHE DE AE DE HEHHE AE HE A HEA AE HE DE PE PE EAE HE DE DE HA EE HE HE H DA a E DE HE DE HEEE HEME EI I IE IE EE EEE SESE SE SESE EE SSE SE SE SEE PSS SS IE IE it Define host templates these are not real hoststftftt it JC template generic host defined in templates cfg define hostt d PLATE name generic host notifications_enabled euent handler enabled flap detection enabled failure prediction enabled process perf data retain status information retain nonstatus information notification period max check attempts register A N ISI SII RP ee it Linux host definition template This is NOT a real define hostt dt dt ate dt dt dt dt d dt from dt d d PLATE t name linux server use generic host check_period 24x check_interval 5 retry_interval 1 max_check_attempts 10 check_command check host aliue notification period workhours notification_interval 120 notification_options d u r contact groups admins register 0 F Figure 8 Nagios hosts cfg top level definitions Se ve ve ve ve ve vo The name of this host template Host notifications are enabled Host euent handler is enabled Flap detection is enabled Failure prediction is enabled Process performance data Retain status information across program restart Retain non status information across
79. IDevices Server Cmd Filesystem template that uses ZenPlugins HardDisk IDevices Server Windows HardDisk 1 0 template that requries Informat MB Management 7 IpService Devices Place holder for future use Add Device OSProcess Devices Monitors for OSProcess object Mibs Unused Device HRMB Devices Server Windows template that requires Host Resources MB GEER WinService Devices Place holder for future use ernetCsmac evices andard ethernet interface template wi ilization threshol ethernet C d Devi Standard ethernet interface template with 75 utilization threshold ethernetCsmacd IDevices Server Cmd Ethernet interface template for ZenPlugins with 75 utilization threshold ernetCsmac evices emplate for interface counters use v2c for itto work ethernet C d 64 Devi Template for 64 bit interf init Must SNMP v2c for itti k 1 of 20 Device v show all Page Size Ho ok Figure 116 Zenoss All Templates showing all defined performance templates Settings Event Manager With the exception of the templates with HRMIP in the name the above figure shows the default templates as shipped Note that these are defined templates there is no indication here as to which are active on what objects Note in the screenshot above that there are several templates called Device Templates can be bound to a device or device class to make it active When 132 determining what data to collect the zenperfsnmp or zencommand daemon first determine
80. J Not Monitored SSH lE Not Manitared Figure 57 OpenNMS Asset Info link for a device The resulting page includes the Node ID at the top 80 File Edit View History Bookmarks Tools Help E gt O Rttp opennms 8980 opennms asset modify jsp node 4 e G q N Nagios 1 Modify Asset O 3 Nagios Addons Ei SourceForge net O snmpstorageflag s Modify Asset 0 en User adm On Log out 05 Aug GMT 05 00 Node List earc o Jashboard Events Alarms No a s Re ts Charts Surveillance Map Admin Help Home Assets Ma bino skills 1st co ul E Node ID 4 General Information System Id 1 3 6 1 4 1 8072 3 2 10 System Name bino System Location Cedar Chase System Contact Jane Curry System Description Linux bino 2 6 22 17 0 1 default 1 SMP 2008 02 10 20 01 04 UTC i686 Configuration Categories Display C Category Poller Category Notification Category Threshold Category Identification Description Category Unspecified gt Manufacturer Model Number lf Serial D Number Asset C Date Installed a Operating 4 Number System Location Region ooo Division m Department Address 1 MEER idaresi SSS OE 7 City 5 1 State o 7j ZIP LC Building o Floor _ 1 Room Rack j Slot lf port Circuit ID a Vendor
81. K load average 0 03 0 04 0 01 load1 0 030 5 000 10 000 0 10ad5 0 040 4 000 6 000 0 10ad15 0 010 3 000 4 000 0 SERV ICEPERFDATAI 1217865462 nagios3 PING 4 014 0 233 PING OK Packet loss 0 RTA 0 05 ms SERV ICEPERFDATAI 1217865472 nagios3 Total Processes 0 027 06 188 PROCS OK 45 processes with STATE RSZDT SERV ICEPERFDATA 1217865472 bino DNS Check 0 035 0 203 DNS OK 0 013 seconds response time wuw skills ist co uk returns 212 74 28 155 time 0 013111s 0 000000 SERV ICEPERFDATAI 1217865552 group 100 r3 PING 4 513 0 173 ING OK Packet loss 0 RTA 185 76 ns tmp seruice perfdata readonly 570 lines 1007 570 60 81 Bot eR Shell Figure 28 Nagios Performance data collected into tmp service perfdata The most recent performance data gathered for hosts and services can also be seen from the Host Detail or Service Detail menu options 44 Nagios Mozilla Firefox o X File Edit View History Bookmarks Tools Help 5 N http nagios3 nagios gt G N Nagios G Lj OpenNMS Web Console 4 Performance Data 8 nagios performance gra H Homepage of ZABBIX N a g i os Service Information Service Last Updated Mon Aug 4 17 09 49 BST 2008 DNS Check Updated every 90 seconds Nagios 3 0 1 www nadios org On Host Home Logged in as nagiosadmin bino skills 1st co uk Documentation View Information For This Host 1 bino View Status Detail For This Host Monitorin View
82. Maximum 1 46 G Daily 5 Minute Average bino skills 1st co uk Memory Usage a n 106 v E ES 0 0 20 21 22 23 24 25 26 Bi Memory Free Current 130 18 M Average 107 88 M Maximum 200 94 M B Memory Buffers Current 277 23 M Average 147 93 M Maximum 288 69 M El Cache Memory Current 1 01 G Average 1 19 G Maximum 1 46 G x Weekly 30 Minute Average bino skills 1st co uk Memory Usage amp s u 106 v E E 9 0 Week 22 Week 23 Week 24 Week 25 B Memory Free Current 97 11 M Average 107 54 M Maximum 171 42 M W Memory Buffers Current 269 68 M Average 151 10 M Maximum 277 27 M I M Cache Memorv Current 999 69 M Averaae 1 18 G Maximum 1 46 G j Done e Adblock Figure 3 Cacti graph of memory for device bino 5 2 netdisco netdisco was created at the University of California Santa Cruz UCSC Networking and Technology Services NTS department It is interesting as a network management configuration offering It uses SNMP and Cisco Discovery Protocol CDP to try and automatically discover devices Unlike most other management offerings netdisco is Layer 2 switch aware and can both display switch ports and optionally provide access to control switch ports It provides an inventory of devices that you can sort either by OS or by device model displaying all ports for a device It also has the ability to provide a network map User management is included so you can restrict who is allowed to actively man
83. Monitorlgnore boolean zXmlRpc Monitorlgnore boolean Save Delete Local Property Figure 84 Zenoss zProperties for the Device class part 3 The left hand menus of the web console provide an Add Device option nothing is discovered automatically out of the box Zenoss dmd Mozilla Firefox File Edit View History Bookmarks Tools Help lt Search admin Prefei Main Views Add Device Dashboard Device Name Device Class Path Event Console Discovery Protocol D E Attributes Network Map Snmp Port 161 Classes Serial Number vent i Priority Normal x ses Products Browse By Relations HW Manufacturer Locations HW Product Networks OS Manufacturer Reports Management Add Device Mibs Collectors Add Event Manager New Device Group Add Collector localhost gt New Collector Add Add Device Figure 85 Zenoss Add Devices dialogue Once a device has been discovered which by default uses ping if the discovery protocol is set to SNMP then the device will be queried for its SNMP routing table Any networks that the device has routes to will then be added to the object class of networks 105 Zenoss Networks Mozilla Firefox File Edit view History Bookmarks Tools Help Zen Q SS Core Main Views Dashboard Event Console De t Network Map Classes Products Browse By Management Add Device Mibs Collectors Settings Event Manager Do
84. NMS Zenoss Supports Yes Yes Possible NRPE NSClient SNMP support V1 2 amp 3 V1 2 amp 3 V1 2 amp 3 L3 topology Yes No Yes upto 4 hops map L2 topology No No No but may be in map plan 9 1 2 Availability monitoring Nagios OpenNMS Zenoss Ping status Yes Yes Yes monitoring Alternatives to ping status Yes any plugin eg check_ifstatus Nagios plugins Yes ssh telnet ZenPacks Nagios plugins Port sniffing Yes Yes Yes Process monitoring Yes with plugins Nagios plugins Yes Host Resources MIB Agent technology Generally relies SNMP out of the box SNMP ssh client on Nagios plugins customised plugins WMI for Windows deployed possible ZenPacks to be deployed Availability reports Yes Yes Yes 9 1 3 Problem management Nagios OpenNMS Zenoss Configurable No Yes Yes event console Severity Yes Yes Yes customisation 144 Nagios OpenNMS Zenoss Event No Flexible Lots OOTB Flexible Lots OOTB configuration SNMP TRAP No Flexible Lots OOTB Flexible Lots OOTB handling email pager Yes Yes with Yes notifications configurable escalation Automation auto actions on auto actions on events good news bad news correlation on alarms and notifications events good news bad news correlation on events and notifications De duplication No automatic repeat count
85. Ob j 3 6 1 2 1 5 16 instance 0 alias icmp utDestUnreachs type counter 7 gt lt mibOb j 3 6 1 2 1 5 17 instance 0 alias icmp utTimeExcds type counter 7 gt lt mibOb j 3 6 1 2 1 5 19 instance 0 alias icmpOutSrcQuenchs type counter 7 gt lt mibOb j 3 6 1 2 1 5 20 instance 0 alias icmp utRedirects type counter 7 gt Bo lt mibobj 3 6 1 2 1 5 21 instance 0 alias icmp utEchos type counter 7 gt lt mibOb j 3 6 1 2 1 5 22 instance 0 alias icmp utEchoReps tupe counter gt lt mibOb j 3 6 1 2 1 5 1 instance 0 alias icmpInMsgs type counter 7 gt lt mibOb j 3 6 1 2 1 5 5 instance 0 alias icmpInParmProbs type counter gt lt mibOb j 3 6 1 2 1 5 9 instance 0 alias icmpInEchoReps tupe counter gt lt mibOb j 3 6 1 2 1 5 10 instance 0 alias icmpInTimestamps type counter 7 gt lt mibOb j 3 6 1 2 1 5 11 instance 0 alias icmpInTimestampReps type counter 7 gt lt mibOb j 3 6 1 2 1 5 12 instance 0 alias icmpInfiddrMasks tupe counter gt lt mibOb j 3 6 1 2 1 5 13 instance 0 alias icmpInfiddrMaskReps type counter lt mibOb j 3 6 1 2 1 5 14 instance 0 alias icmp utMsgs type counter gt lt mibOb j 3 6 1 2 1 5 18 instance 0 alias icmp utParmProbs type counter 7 gt lt mibOb j 3 6 1 2 1 5 23 instance 0 alias icmp utTimestamps type counter 7 gt lt mibOb j 3 6 1 2 1 5 24 instance 0 alias icmpOutTimestmpRe
86. Open Source Management Options September 30th 2008 Jane Curry Skills 1st Ltd www skills 1st co uk Jane Curry Skills 1st Ltd 2 Cedar Chase Taplow Maidenhead SL6 OEU 01628 782565 jane curry skills 1st co uk Synopsis Nuts and bolts network and systems management is currently unfashionable The emphasis is far more on processes that implement service management driven by methodologies and best practices such as the Information Technology Infrastructure Library ITIL Nonetheless all service management disciplines ultimately rely on a way to determine some of the following characteristics of systems and networks e Configuration management e Availability management e Problem management e Performance management e Change management e Security management The commercial marketplace for systems and network management offerings tend to be dominated by the big four IBM HP CA and BMC Each have large modular offerings which tend to be very expensive Each has grown their portfolio by buying up other companies and then performing some level of integration between their respective branded products One can argue that the resulting offerings tend to be marketechtures rather than architectures This paper looks at Open Source software that addresses the same requirements Offerings from Netdisco Cacti and The Dude are examined briefly followed by an in depth analysis of Nagios OpenNMS and Zenoss This paper is aimed
87. TTP as well as having support for Nagios plugins All the products have some user management to define users passwords and roles with customisation of what a user sees OpenNMS and Zenoss use RRD Tool to hold and display performance data Nagios doesnt really have a performance data capability Cacti might be a good companion product Most surprisingly given that they all rely on SNMP none of the products has an SNMP MIB Browser built in to assist with selecting MIBs for both status monitoring and performance data collection 142 There are advocates for and against agentless monitoring Personally I don t believe in agentless Once you have got past ping then you have to have some form of agent to do monitoring The question is should a management paradigm use an agent that is typically part of a box build like ssh SNMP or WMI for Windows or should the management solution provide its own agent like Nagios provides NRPE and most of the commercial management products come with their own agents If your management system wants its own agents you then have the huge problem of how you deploy them check they are running upgrade them etc etc OpenNMS and Zenoss have a strong dependency on SNMP although Zenoss also supports ssh and telnet monitoring out of the box if your environment permits these SNMP may be old and Simple but all three products support SNMP V3 for those who are worried about the se
88. Windows services is configured through the Services menu admin Preferences Administration zProperties Modifications v Sub Folders Select All None Network Map L IpService Classes winService Figure 89 Zenoss Services menu A very large number of Windows services are preconfigured out of the box These services are actually monitored by the zenwin daemon which uses and requires WMI on the Windows target machine Note the Count column showing on how many devices these services have been detected 110 admin Administration zProperties Modifications Sub Folders Device Li Network Map v Services Classes Select All None Name Destripti R ALG Application Layer Gateway Service False 0 AeLookupSvc Application Experience Lookup Service False 0 Alerter Alerter False 1 L AppMgmt Application Management False 1 AudioSrv Windows Audio False 0 BITS Background Intelligent Transfer Service False 1 Browser Computer Browser False 1 E coMsysApp COM System Application False 0 RE CiSvc Indexing Service False 0 J cipsrv ClipBook Fase 1 CryptSvc Cryptographic Services False 0 DHcPServer DHCP Server False 1 E DNS DNS Server False 1 L DcomLaunch DCOM Server Process Launcher False 0 Lj Dfs Distributed File System False 1 J Dhep DHCP Client Fase 1 Dnscache DNS Client Fase 1 ERSvc Error Reporting Service False 0 EventSystem CON Event System False
89. Zenoss topology maps aeo rent d i a Poder a a medidas 107 823 Availability montborifig ua aii HN UO aba e dr e anat 108 8 2 1 Basic reachability availability eeeeeeesseeeeeeeeeee eene eene 108 8 2 2 Availability monitoring of services TCP UDP ports and windows services muc T r a eI Sk 110 8 2 3 Process availability monitoring eseeeeeseeeeeeeeeereeee nennen 113 8 2 4 Running commands on devices eese eene 120 8 9 Problenimmanagermetil uot toti nonm a o dati ni d c ame 121 8 3 BventconsoleLo date toux ce dade vd s RR E EEE ase 122 8 3 2 Internally generated events esses eene eene eene enne 123 8 9 38 SNMP TRAP reception and configuration eeeeeeeeeeeeeeeee 125 8 3 4 email pager alerting 35 eise eio ko Oo abend li e Bere it Vue 126 8 3 5 Event autotnablonssi i teo ie i Maa S ea e Lana eee 131 8 4 Performance management eeeeseeeeeeeeeeeeee een rrr nnn 132 8 4 1 Defining data collection thresholding and graphs 132 8 4 2 Displaying performance data graphs 2 eere 138 8 5 Zenoss SUMMAPY d d ncc eO p exa Oa RERO RE Fio RU Oe INO VER EE X E AERE 141 9 Comparison of Nagios OpenNMS and Zenoss eeeeeseeeeeeeeeeeee nennen 142 9 1 Feature comparisons o poa ei E nl e Lc IER iat te t bid d vb sd 143 9b DISCOV EVI uctor esee a bea
90. a Each notification path can be triggered by any number of OpenNMS events and can further be associated with specific interfaces or services When OpenNMS was first started the nodes interfaces and services in the network were discovered As your network grows and changes the TCP IP ranges you want to manage as well as the interfaces and services within those ranges may change Manage and Unmanage Interfaces and Services allows you to change your OpenNMS configuration along with your network Manage SNMP Data Collection per Interface This interface will allow you to configure which non IP interfaces are used in SNMP Data Collection Configure SNMP Community Names by IP This interface will allow you to configure the Community String used in SNMP Data Collection Add Interface is an interface to add an interface to the database If the IP address of the interface is contained in the ipAddrTable of an existing node the interface will be added into the node Otherwise a new node will be created Delete Nodes is an interface to permanently delete nodes from the database Import and Export Asset Information provides an easy to use interface for adding data to OpenNMS s asset inventory from your database or spreadsheet application as well as extracting data from the asset inventory for use in your favorite spreadsheet or database Our comma delimited file format is supported by most spreadsheet and database applications and details for usi
91. acility priority ntevid ownerid clearid DevicePriority ttings eventClassMapping monitor Event Manager Figure 112 Zenoss Alerting rule message format Global parameters for email and paging along with other useful parameters can be defined from the Settings left hand menu 129 Main Views setin Event Manager Settings Commands Users State at time 2008 07 08 13 03 14 SMTP Host SMTP Port usually 25 SMTP Username blank for none SMTP Password blank for none From Address for Emails Use TLS Page Command Dashboard Production State Threshold Dashboard Priority Threshold State Conversions Priority Conversions Administrative Roles Google Maps API Key Help Figure 113 Zenoss Settings parameters ZenPacks Menus Portlets Daemons ismtp ourshack com Production 1000 Pre Production S00 est400 laintenance 300 Decommissioned 1 ABQIARAAypyOg1 nKibn1ufoQNdNGyhRzYYB7sE Save Versions admin Pref Backups The out of the box email notifications provide handy links back to Zenoss to manipulate the event that is being reported on 130 zenoss hp7410 skills 1st co uk ip 10 0 0 97 is down zenossuser admi jane curry 20 02 g Subject zenoss hp7410 skills 1st co uk ip 10 0 0 97 is down From zenossuser admin zenoss skills 1st co uk Date 20 02 To jane cur skills 1st co uk Device hp7410 skills 1st co uk Component Severity Critical Time 2008 07 09 20
92. age devices There is good provision of both command line interface and web based GUI netdisco is supported on various platforms it was originally developed on FreeBSD I built it on a Centos 4 platform 17 If your requirement is strictly for network configuration management and your devices respond suitably to netdisco then this might be worth a try I found it very quirky as to what it would discover It appears very dependent on the SNMP system sysServices variable to decide whether a device supports network layer 2 and 3 protocols if a device did not provide sysServices or didn t indicate layer 2 3 then netdisco would not discover it I also had very few devices supporting Cisco CDP so the automatic discovery didn t work well for me Although there is a file where you can manually describe the topology this would be a huge job in a sizeable network if you had to hand craft a significant amount of the network topology This project is not nearly so active as some of the other offerings discussed here around 500 appends to the users maillist in 2007 but there seems to be a steady flow Building the system was a fair marathon but the documentation is reasonably good Here are some screenshots of the main device inventory panel plus the details of a router and the details of a switch netdis co Device Inventory Mozilla Firefox ISI x File Edit View History Bookmarks Tools Help lt O http metdisco
93. age Size ao ok Name Type Data Points CPU Utilization MinMaxThreshold ssCpuRawldle_ssCpuRawldle Warning True Y Graph Definitions Select All None Sed Name Graph Points p D Load Average laLoadints load i00 500 b l Load Average 5 min laLoadInt5 processes 100 500 n i cpu utilization ssCpuRawSystem ssCpuRawUser ssCpuRawWait percentage 100 500 P I cpu Idle CPU Utilization ssCpuRawIdle percentage 100 500 Figure 119 Zenoss Device template for Devices Server Zenoss provides two built in types of Data Sources SNMP and COMMAND Other types can be provided through ZenPacks Clicking on the Data Source displays details which can then be modified Typically an SNMP Data Source will provide a single Data Point a MIB OID value Typically the name of the data point will be the same as the name of the data source This means that when you come to select threshold values or values to graph you will be selecting names like ssCpuRawWait_ssCpuRaw_wait Main Views Dais Sauma State at time 2008 08 08 03 41 04 memAvailReal SNMP Tue vf 1 36 14 120214 6 0 Browse By memAvailReal Figure 120 Zenoss Data Source memAvailReal 135 Note that there is a useful Test button to check your OID against a node that Zenoss knows about However beware that this Test button appears to use snmpwalk under the covers so if a MIB OID has multiple instances then the snmpwalk will return values succe
94. age gt collector seruice SNMP class name org opennns netngt collectd SnmpCol lector gt lt collectd conf igurat ion gt Figure 71 OpenNMS Modified collectd configuration xml to enable thresholds threshd configuration xml can be modified with different packages of thresholding to apply to different ranges of nodes Session Edit View Bookmarks Settings Help K xml version 1 0 2 gt lt castor class name org opennns netmgt threshd ThreshdConf igurat ion lt threshd conf igurat ion threads 5 gt lt package name CC gt lt filter gt IPADDR t 0 0 0 0 lt filter gt lt include range begin 10 0 0 0 end 10 0 0 254 7 lt include range begin 172 16 0 0 end 172 16 254 254 7 gt lt service name SNMP interval 300000 user defined false status on gt lt parameter key thresholding group ualue CC snmp lt service gt lt package gt lt package nane raddle lt filter gt IPADDR t 0 0 0 0 lt filter gt lt include range begin 10 191 0 0 end 10 191 101 254 7 gt lt include range begin 172 30 0 0 end 172 31 254 254 7 lt exclude range begin 172 31 100 3 end 172 31 100 3 7 gt lt service name SNMP interval 600000 user defined false status on gt lt parameter key thresholding group value raddle snmp gt lt service gt lt package gt lt thresholder seruice SNMP class nanme org opennns netngt threshd SnmpThresholder
95. and then by time most recent first Events are assigned different severities e Critical Red e Error Orange e Warning Yellow e Info Blue e Debug Grey e Clear Green The events system has the concept of active status events and historical events two different database tables in the MySQL events database Events in the console can be filtered by Severity Info and above by default and by State New Acknowledged and Suppressed where New and Acknowledged are shown by default Any event which has been Acknowledged changes to a wishy washy version of the appropriate colour There is also a Search box at the top right for filtering events 122 Zenoss Events Mozilla Firefox File Edit View History Bookmarks Tools Help o gt Oe ZECNOSS core as http zenoss 8080 zport dmd Events viewEvents notabs 1 Main Views R Sev Eel Acknowledged Mo kA Dashboard D Li device component eventClass summary firstTime lastTime Network Map 8 IStatus Ping ip 10002 s down 2008 07 04 2008 07 04 369 5 E 030201000 163717000 Los Classes ocalhost Pet Samp threshoid of zenperfsnmp cycle time 2008 07 04 2008 07 04 34 exceeded current value 535 96 031141000 163005000 ON wsvr2k Class exampl IStatus Ping ip 172 16 223 11 is down 2008 06 30 2008 07 04 1482 ol L 095407000 161028000 im IStatus lpServic IP Service http is dow
96. arm data reduction key zueiz zdpnamez znodeidz Zzinterfacez zseruicez alarm type 1 auto clean false 7 gt lt event gt Figure 42 OpenNMS event definition for nodeLostService The different severities available can be seen by selecting the Severity Legend option from the top of an events list Home Critical This event means numerous devices on the network are affected by the event Everyone who can should stop what they are doing and focus on fixing the problem Major A device is completely down or in danger of going down Attention needs to be paid to this problem immediately A part of a device a service and interface a power supply etc has stopped functioning The device needs attention Warning An event has occurred that may require action This severity can also be used to indicate a condition that should be noted logged but does not require direct action Indeterminate No Severity could be associated with this event Normal Informational message No action required Cleared This event indicates that a prior error condition has been corrected and service is restored Figure 43 OpenNMS event severity legend Note that there is no separate file to configure alarms it is simply done with the alarm type tag in eventconf xml OpenNMS comes with a huge number of events pre defined To make eventconf xml much more manageable inclusion files can be specified at the end such as event fi
97. at two audiences For a discussion on systems management selection processes and an overview of three main open source contenders read the first few chapters The last few chapters then provide a product comparison For those who want lots more detail on Nagios OpenNMS and Zenoss the middle sections provide in depth discussions with plenty of screenshots Table of Contents 1 c2 Defining Systems Management e ed re e tr i ved ld ig lds 5 Ll darson and DFOGEBSES uo ee i et bic dad phai et uisu ul Rp eN C CUN p eg dl ai E 5 1 2 Systems Management for this paper eseesseeseeeseeerenenennn enne nennen 6 Systems management tOols ccccssscssscssecssecseccseceseccseceseecseecseceeeececeeeaueceeccessuaaneeeeees 6 2 1 Choosing systems management tOols ccccccccccccccecccecccecccecccecceeecessasseeceeeeeaneseees 7 2 2 The advantages of Open Source esses eene eene eee eene 8 Open Source management offerings esssseseeeeeeeeeeeeenee nn enne rne nennen 8 Criteria for Open Source management tool selection seeeeneeeee 10 4 l General PequirernieHbsu teet to ae ERA eot a p v Fen Nbre Db va uade 10 4 1 1 Mandatory Requirements eeeeeesseseseeeeeeee nnne eene nnne nnn nnns 10 4 1 2 Desirable Requirements ssseee eem nennen eene ener es 10 4 2 Defining network and systems management sse 11 4 2 1 Network
98. ate the offerings available research their features and requirements and participate in the online fora that share experience around the globe These costs may not be small but at least the investment stays within the company and hopefully those people who have done the research will then be a key part of the team implementing the solution This is often not the case if you purchase from a commercial supplier Open Source does not necessarily mean you re on your own pal Most of the Linux distributions have a free version and a supported version where a support contract is available to suit your organisation and budget Several of the Open Source management offerings have a similar model but do ensure that the free version has sufficient features for your requirements and is not just a well featured demo All software has bugs in it Ultimately if you go Open Source you have the source code so you have some chance of fixing problems with local staff or buying in global expertise and that doesn t necessarily mean transporting a guru from Australia to Paris Open Source code is available to everyone so remote support and consultancy is a distinct possibility With the best will in the world commercial organisations will prioritise problem reports according to their criteria not yours There are some excellent fora and discussion lists for commercial products I have participated in several of them for many years some even
99. atus Map SNMP Check 30 07 2008 12 00 27 20d 17h 45m 52s 1 3 SNMP OK Timeticks 14490143 1 day 16 15 01 43 Servire Problems group 100 c1 PING GR 30 07 2008 12 05 50 Od Oh 41m 47s 1 4 PING OK Packet loss 0 RTA 109 91 ms Unhandle Host Problems group 100 c2 PING OR 30 07 2008 12 03 36 0d 1h 24m 1s 1 4 PING OK Packet loss 0 RTA 72 81 ms Unhandled Network Outages group 100 c3 PING GN 30 07 2008 12 03 13 0d 1h 14m 24s 1 4 PING OK Packet loss 0 RTA 139 93 ms group 100 r1 PING GRIN 30 07 2008 12 03 59 0d2h 3m 38s 1 4 PING OK Packet loss 0 RTA 7 48 ms Loud group 10O iI2 PING GRIN 30 07 2008 12 05 45 0d2h1m52s 1 4 PING OK Packet loss 0 RTA 140 70 ms D eee E group 100 r3 PING GRIN 30 07 2008 12 04 22 0d Oh 58m 15s 1 4 PING OK Packet loss 0 RTA 72 29 ms Downtime group 100 s1 PING M 12 06 08 Od 1h 41m29s 1 4 PING OK Packet loss 0 RTA 70 92 ms P Inf Performance Info X06 AA 30 07 2008 12 04 16 2242h 50M 3s 1 4 CRITICAL Host Unreachable nagios skills 1st co uk schedulingiqueus nagios3 Current Load 30 07 2008 12 04 25 97d 23h 14m 46s 1 2 OK load average 0 01 0 02 0 00 Reporting Users 30 07 2008 12 06 11 97d 23h 13m 0s 1 2 USERS OK 6 users currently logged in ad PIN 30 07 2008 12 02 48 97d 23h 16m 23s 1 4 PING OK Packet loss 0 RTA 0 06 ms eus ability Root cC 30 07 2008 12 07 24 97d 22h 56m 37s 1 2 DISK OK free space 788 MB 16 inode 69 Alert Histogram
100. be sent anytime seruice notification options u u c r f s send notifications for all service states flapping events and schedule 2 events host notification options d wr f s send notifications for all host states flapping events and scheduled ae Jents seruice notification commands notify seruice by email send service notifications via email i host_notification_commands notify host by email send host notifications via email register 0 DONT REGISTER THIS DEFINITION ITS NOT A REAL CONTACT JUST A TEMPLATE fli Figure 24 Nagios Default contact definition Notifications for hosts and services can be sent 24x7 They are sent for all types of events and use a Nagios command that drives the email system As with all other Nagios configurations more specific users and groups of users can be defined which change any of these parameters An event has to satisfy the global criteria the specific host service criteria and the contact criteria before a notification is actually sent Remember from the Alerts Histogram report it is possible to see notifications for a particular host Nagios Mozilla Firefox anid 4 E Y e Git N hittp nagios3 nagios gt gt G Jj OpenNMS Web Console A Notifications z Eile Edit View History Bookmarks Tools Help N Nagios N a g i os Host Notifications Host group 100 r1 Notification detail level for this host Last Updated Thu Jul 31 15 15 45 BST 2
101. ccececeecceeecececececeeeeeeeseeeeeees 47 T LT Interface dist y ry edan i vias dn xav a E E EE ot oii rni 47 k12 Deryice CISCOVERY cette ae E CER Ua A rec bte edad ftt 48 7 1 3 Topology mapping and displays eeeeeeeseseseeeeeeerenerenennnnnnnn n 51 7 2 Availability mobiTUoEE 555i nt dnas EE ve pe aho bo dd du n guai tu aab TER NA 53 To Probl m TAMIA SONIC TU a aces eoru eder ede ea rs is ts Car Ps brescia vid cot A uh 59 1 9 1 Bixent consolennuini us dva dedi IET peque sie su A Au d ouf I CIE 59 7 3 2 Internally generated events eese eene 62 1 39 8 SNMP TRAP reception and configuration ccccccccccceccceccceececeeeeeeeeeeeeeeeees 65 7 3 4 Alarms notifications and automations esesseeeee eene 69 7 4 Performance ManageMent ccccccsecssecssccsseccsecssescsescsesssscseeeeesasseeseeaeesessseeeeeees 76 1 451 Defining data collecting sys adve videt ie ed ddr iei e ed e Cd ene eU nes 76 7 4 2 Displaying performance data ccccccccscccescceeeceecceecceecceecceecceecceeeeeeceeeeeeeeeaaas 85 Lod TEMPOS MOIG UID cto posito ii E De ti o Mi eden d aad toam odio and o ads 91 7 5 Managing OpernicMP eene t eH rte io er tedterm edid uve d pia 97 TO Open NIMS Sunn ary 553 os iden is dies e i NH e d ts Re 98 ESA RUIT RD ND R 98 8 1 Configuration Discovery and topology eese 100 8 1 1 Zenoss di560Vety i ee etre De E Le b EIE DT EN 100 8 1 2
102. ce group 100 b1 class example org Mibs Collectors Settings Event Manager Figure 78 Zenoss default dashboard 8 1 Configuration Discovery and topology There is a good Zenoss Quickstart document available from http www zenoss com community docs Similar to OpenNMS the architecture is based on object oriented techniques 8 1 1 Zenoss discovery zProperties can be defined for devices services processes products and events Objects can be grouped and sub grouped with zProperties being refined and changed throughout the hierarchy So for example the Device object class has default subclasses for different device types as shown below 100 Zenoss Devices Mozilla Firefox File Edit View History Bookmarks Tools Help lt Q T http zenoss 8080 zport dmd Devices openSUSE Getting Started amp Latest Headlines ZENOSS core E Main Views Classes Events zProperties Templates v Sub Devices Select All None Discovered kvm Network Bing Power J printer Server Management no ping Add Device J Discovered v show all Done e Figure 79 Zenoss device classes The class of Devices has a zProperties page as do the classes Network Server Printer etc Devices will initially be added to the Discovered class and can then be moved to a more appropriate class 101 Zenoss Server Mozilla Firefox Zenoss Linux Mozilla Firefox File Edit View Hi
103. ce data 4 2 2 Systems management Many of the criteria for systems management are similar to the network management bullets above but they are repeated here for convenience e Configuration O O O O O O Automatic controllable discovery of Windows and Unix devices Topology display of discovered devices Support for SNMP V1 V2 and preferably V3 Ability to discover devices that do not support ping Ability to discover devices that do not support SNMP Central open database to store information for these devices Ability to add to this information e Availability monitoring O O Customisable ping test for all discovered devices Availability test for devices that do not respond to ping eg comparison of SNMP Interface administrative status with Interface operational status support for ssh tests Ability to monitor customisable ports on a device eg tcp 80 for http servers Ideally the ability to monitor applications eg ssh snmp access to monitor for processes wget to retrieve web pages Simple display of availability status of devices preferably both tabular and graphical Events raised when a device fails any availability test Ability to monitor basic system metrics CPU memory disk space processes services eg the SNMP Host Resources MIB e Problem O 12 Events to be configurable for any discovered device Central events console for network and systems managem
104. ces 1 itSMF Pocket Guide IT Service Management a Companion to ITIL IT Service Management Forum 2 Multi Router Traffic Grapher MRTG by Tobi Oetiker http oss oetiker ch mrtg 3 RRDtool high performance data logging and graphing system for time series data http oss oetiker ch rrdtool netdisco network management application http www netdisco org The Dude network monitor by MicroTik http www mikrotik com thedude php nagios host service and network monitoring program http www nagios org Zenoss network systems and application monitoring http www zenoss com pu COS gt SOUS OpenNMS distributed network and systems management platform http www opennms org 9 cacti network graphing solution http www cacti net 10 SNMP Requests For Comment RFCs http www ietf org rfc html 11 V1 RFCs 1155 1157 1212 1213 1215 12 V2 RFCs 2578 2579 2580 3416 3417 3418 13 V3 RFCs 2578 2580 3416 18 3411 3412 3413 3414 3415 14 SNMP Host Resources MIB RFC s 1514 and 2790 http www ietf org rfc html 15 PHP scripting language http www php net 16 Zenoss Core Network and System Monitoring by Michael Badger published by PACKT Publishing June 2008 ISBN 978 1 847194 28 2 11 Appendix A Cacti installation details Cacti 0 8 6j 64 4 was installed on an Open SuSE 10 3 Linux system Prerequisites are e A web server Apache 2 2 4 70 e PHP 5 2 5 8 1 e RRDTool
105. cked to Acknowledge one or more events they will then disappear from this display which only shows Outstanding events Click on the symbol beside Event s outstanding to see Event s Acknowledged including the name of the user that acknowledged the event The various and links can be used to filter in out on the parameter such as node interface or service The lt and gt beside the Time can be used to filter for events before or after this time To see the event detail click on the ID link 61 Event Detail User admin Notices On Log out 09 Jul 2008 23 15 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications Assets Reports Charts Surveillance Map Admin Help Home Events Detail Event 139192 Severity Normal Node group 100 r2 class example org Acknowledged By Time 7 8 08 8 41 09 AM Interface Bete aim 7 8 08 8 41 33 AM Service UEI uei opennms org internal capsd rescanCompleted Log Message A services scan has been completed on this node A services scan has been completed The list of services on this node has been updated Operator Instructions No instructions available Unacknowledge Figure 40 OpenNMS Event detail for event 139192 7 3 2 Internally generated events Events and indeed alarms are configured in opt opennms etc eventconf xml where the first match for an event
106. command also defined below Read the HTML docs for more information on performance data Values 1 process performance data 9 do not process performance data process performance data 1 HOST AND SERVICE PERFORMANCE DATA PROCESSING COMMANDS These commands are run after every host and service check is performed These commands are executed only tif the enable performance data option above is set to 1 The command argument is the short name of a itcommand definition that you define in your host configuration file Read the HTML docs for t more information on performance data jt Don t use these use data files option below JC ithost perfdata command process host perfdata itseruice perfdata command process seruice perfdata HOST AND SERVICE PERFORMANCE DATA FILES Iit These files are used to store host and service performance data Performance data is only written to itthese files if the enable performance data option above is set to 1 lhost_perfdata_file tmp host perfdata service_perfdata_file tmp service perfdata HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES it These options determine what data is written and how to the performance data files The templates may contain macros special characters t for tab r for carriage return n for newline and plain text A newline is automatically added after each write to the performance data file Some examples of what Htyou can do are shown below host pe
107. contenders for the post processing There are a number of global parameters that control the collection of performance data typically in usr local nagios etc nagios cfg e process_performance_data global on off switch e host_perfdata_command Nagios command to be executed on data e service perfdata command Nagios command to be executed on data e host perfdata file datafile for asynchronous processing e service perfdata file datafile for asynchronous processing o Note either use the command parameter for data processing when the data is retrieved or use the data file for later processing 42 e host_perfdata_file_processing_interval process data file every lt n gt seconds e service perfdata file processing interval process data file every n seconds e host perfdata file processing command Nagios command to process data e service perfdata file processing command Nagios command to process data e host perfdata file template format of data file e service perfdata file template format of data file T jane bino Shell Konsole 2 Jm x Session Edit View Bookmarks Settings Help PROCESS PERFORMANCE DATA OPTION This determines whether or not Nagios will process performance data returned from service and host checks If this option is enabled host performance data will be processed using the host perfdata command it defined below and service performance data will be processed using the seruice perfdata
108. counting Performance and Security through to the Information Technology Infrastructure Library ITIL which divides the ITIL V2 framework into two categories e Service Support which includes the o Service Desk function o Incident management process o Problem management process o Configuration management process o Change management process o Release management process e Service Delivery which includes the o Service Level management process o Capacity management process o IT Service Continuity management process o Availability management process o Financial management for IT services Key to the core of configuration management and the entire ITIL framework is the concept of the Configuration Management Database CMDB which stores and maintains Configuration Items CIs and their inter relationships The art of systems management is defining what is important what is in scope and perhaps more importantly what is currently out of scope The science of systems management is then to effectively accurately and reliably provide data to deliver your systems management requirements The devil really is in the detail here A comprehensive systems management tool that delivers a thousand metrics out of the box but which is unreliable and or not easily configurable is simply a recipe for a project that is delivered late and over budget For smaller projects or Small Medium Business SMB organisations a pragmatic approach is often h
109. curity of SNMP and virtually everything has an SNMP agent available The other form of agentless monitoring basically comes down to port sniffing for services Whilst this can work fine for smaller installations the n squared nature of lots of devices and lots of services doesn t scale too well All three products do port sniffing so it comes down to how easy it is to configure economic monitoring 9 1 Feature comparisons The following tables start with my requirements definition and compare the three products on a feature by feature basis OOTB Out Of The Box 9 1 1 Discovery Nagios OpenNMS Zenoss Node discovery Config file for each Config file with GUI CLI and batch node include exclude import from text or ranges XML file Automatic No Yes nodes within Yes networks amp nodes discovery configured n w ranges Interface Possible through Yes including switch Yes including switch discovery config file ports ports Discover nodes Yes use Yes send_event pl Yes use SNMP ssh or that don t check_ifstatus telnet support ping plugin SQL Database No PostgreSQL mySQL amp Zope ZEO Service port Yes use plugin Yes various out of Yes TCP and UDP discovery TCP UDP the box Application Yes define service Not without extra Yes with ssh discovery agent eg NRPE zenPacks or plugins 143 Nagios Open
110. defines its characteristics For this reason the ordering of stanzas in eventconf xml is very important Any individual event is identified by a Universal Event Identifier uei Events are bracketed by lt event gt lt event gt tags Within the event definition the following tags can also be used e uei a label to uniquely identify the event e event label a text label for the event used in the Web GUI e descr description of the event e logmsg summary of the event where the dest parameter is one of o logndisplay log to events database and display in web GUI o logonly log to database but don t display in web GUI o suppress don t log to database or web GUI o donotpersist don t log or display but do pass to other daemons eg for notification o discardtraps trapd to discard TRAPs no processing whatsoever e severity e alarm data create an alarm for this event with o reduction key fields to compare to determine duplicate event 62 o alarm type o auto clean operinstruct mouseovertext autoaction 1 problem 2 resolution alarm type 2 also takes a clear key parameter defining the problem event this resolves true or false optional instructions for operators using the web GUI text to display when mouse positioned over this event absolute pathname to executable program executed every event instance Many of the tags can use data substituted from the event These are documented on the OpenNMS wiki 63
111. detected through SNMP queries but there is no monitoring of any services on these networks There are no current issues with deodar and availability has been 100 over the last 24 hours 55 le deodar skills 1st co uk Node OpenNMS Web Console Mozilla Firefox exilii xj File Edit View History Bookmarks Tools Help EI x z e tt O http opennms 8980 opennms element node jsp node 20 ly gt Gl a N Nagios O deodar skills 1st co G3 L Help OpenNMS Web t Category FAQs Ope Li SourceForge net Par On Log out BY IE Path Out jash Events Alarms Notificat repo Cha e Map Admin Help Home Search Node Node deodar skills 1st co uk View Events View Alarms Asset Info Resource Graphs Rescan Admin Update SNMP General Statu ctive View Node Link Detailed Info You Outstanding Check You Acknowledged Check Recent Events Availability last 24 hours 100 000 V 66350 A services scan has been completed on this node 02 07 08 07 09 26 26252 29 06 08 04 25 50 Acknowledge Reset More Overall 100 000 01 07 08 07 02 17 A services scan has been completed on this node 100 000 52625 100 000 30 06 08 06 56 36 A services scan has been completed on this node 10 0 0 95 Not Monitored 27442 29 06 08 06 50 07 100 000 A servic
112. dex jsp N Nagios O Admin Ope Home Admin Configure Discovery Configure Users Groups and Roles Configure Notifications Manage and Unmanage Interfaces and Services Configure SNMP Data Collection per Interface Configure SNMP Community Names by IP Add Interface Delete Nodes Import and Export Asset Information Scheduled Outages Manage Surveillance Categories Manage Applications Manage Provisioning Groups Manage Location Monitors Notification Status On C off Update Figure 75 OpenNMS Admin menu G O Nagios Addons SourceForge t FAQ Configur Admin admin Notices On Log out 00 21 GMT 0 User 06 A Alarms Notifi Charts Surveilla Map Option Descriptions Configure Discovery allows you the Administrator to add or delete ip address specific and range to discover Help Configure Users and Groups allows you the Administrator to add modify or delete existing users If adding or modifying users be prepared with user IDs passwords notification contact information pager numbers and or email addresses and duty schedule information You can then Add users to Groups Configure Notifications allows you to create new notification escalation plans called notification paths and then associate a notification path with an OpenNMS event Each path can have any arbitrary number of escalations or targets users or groups and can send notices through email pagers et ceter
113. dmg ali cs incar Save v Data Sources Select All None Name Source taLoadints 136141202110152 memAvailReal 1361412021460 memAvailSwap 1361412021440 memButfer 13614120214140 memcached 13614120214150 sscpuRawldle 1361441202111530 ssCpuRawSystem 1 3 6 1 41 2021 1152 0 ssCpuRawUser 13 61 41 202111 50 0 ssCpuRawWait 1 3 6 1 4 1 2021 11 54 0 sysupTime 13812125110 1of10 laLoadints show all Thresholds Type Data Points Severity Add Threshold Min Max Threshold ss CpuRawldle_ssCpuRawidle Warning Delete Threshold Add to Graphs Units Height Width laLoadintS load 100 500 laLoadint5 processes 100 500 ssCpuRaw System ssCpuRaw User ssCpuRaw Wait percentage 100 500 CPU Utilization ssCpuRawldle percentage 100 500 memAvailSwap KBytes 100 500 memAvailReal bytes 100 500 Figure 122 Zenoss Dropdown menu for data thresholds Note that this dropdown menu as is also true of the Data Sources dropdown has an option to Add to Graphs Graphs can be defined for a wide combination of the collected data points and thresholds The menu panels are basically a frontend to the RRD graphing tool and with lots of samples provided you don t need to get into the details of RRD Tool however if you wish to there is plenty of scope to do so Graphs can be added deleted or re sequenced using the dropdown Existing graphs are modified by clicking on the graph name 137 admin Prete Grap
114. dmin Pref ces Logout Help Message Schedule State at time 2008 07 08 12 12 09 pa Enabled rase gt emai v Address optional he cuny skills 1st co uk Fase gt Repeat Time secs bum Classes Eve Production State sf Production xj E P Severity p Error i El Pro Event State xi New z Ex Browse By Device Groups Device Priority Event Class Event Class Key Production State Figure 111 Zenoss Editing alerting rule The email or pager message of the Alerting Rule is configured by the Message tab and the Schedule tab can be used to create different alerting rules at different times 128 Device IP Search admin Preferences Logout Hel Message or Subject penca Dis zenoss device s summary s Network Map Classes Device device s Component component s Events Severity i ing s message s a hret eventUrl s gt Event Detail lt a gt Clear Message or Subject zenoss CLEAR device s clearOrEventSummary s Event Y summary s Networks Reports Management Message Format is a python format string Fields are specified as fieldname s The list of fields available in the event database is dedupid evid device component eventClass eventKey summary message severity eventState eventClassKey eventGroup stateChange firstTime lastTime count prodState suppid manager agent DeviceClass Location Systems DeviceGroups ipAddress f
115. dons N Nagios Choose Resource Admin Help Home Reports Resource Graphs Choose Node bino skills 1st co uk Choose resources to query Please choose one or more resources that you wish to query SNMP Node Data Node level Performance Data B SNMP Interface Data eth1 10 0 0 121 100 Mbps vmneti1 172 16 222 1 10 Mbps vmnet2 10 191 0 1 10 Mbps vmnet3 172 16 223 1 10 Mbps vmnet4 192 168 1 200 10 Mbps vmnet8 192 168 10 1 10 Mbps 10 191 0 1 172 16 223 1 192 168 10 1 192 168 1 200 10 0 0 121 172 16 222 1 Disk Table Index UCD SNMP MIB index 1 Submit Select All Unselect All Figure 64 OpenNMS Standard Resource graphs available for a selected node Here is part of the node level performance data set of graphs 87 Results Resource Graphs Reports OpenNMS Web Console Mozilla Firefox JE X File Edit View History Bookmarks Tools Help es x e A Lj http opennms 8980 opennms graph results htm reports all amp resourceld gt G7 Loading O Results Resourc 43 Nagios Addons E SourceForge net t FAQ Configuration T Swap 400 m 200 m 0 12 00 18 00 00 00 06 00 Min Avg 2 88 m Min 0 00 Max 206 10 m Bout Avg 3 44 m Min 0 00 Max 447 10 m System Memory Stats 5 0G 4 0G 3 0G y s gt a 2 06 1 0 G 0 0 12 00 14 00 16 00 18 00 20 00 22 00
116. e if you are looking at an SNMP interface resource its parent resource would be the node which owns that SNMP interface If you are looking at a node you would have the option to see all top level resources Figure 67 OpenNMS Report categories available for customised reports If you select the Node level Performance Data option and the Choose child resource button then each of the MIB variables collected can be displayed and selected 90 File Edit View History Bookmarks Tools Help Qs A Li http opennms 8980 opennms KSC customGraphEditDetails htm resourc i IG 4 N Nagios B Performance Re G G Nagios Addons E SourceForge net t FAG Configuration Key SNMP Customized Performance Reports User admin Notices On Log out 05 Aug 2008 10 03 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications Assets Reports Charts Surveillance Map Admin Help Figure 68 OpenNMS Selecting prefabricated reports to include in a customised report The dropdown alongside the Prefabricated Report field allows you to select any of Home Reports KSC Reports Custom Graph Customized Report Graph Definition Sample graph E TCP Open Connections s L0 u v wu n Node bino skills 1st co uk a 0 0 SNMP Node Data Node level Performance Data I From Tue Jul 29 10 03 10 GMT 05 00 2008 amp To Tue Aug 05 10 03 10 GMT 05 00 2008 a gt z Wed Thu F
117. e Events Perf Edit v Performance Templates for bino skills 1st co uk th Description Add Template r Net SNMP template for late vintage unix device Has CPU threshold Create Local Copy r Windows template that requires Host Resources MB Create Local Copy d Templates E Bind Template Events Reset Bindings Pro Figure 118 Zenoss Bind Templates menu Be aware that when selecting templates to bind you need to select all the templates you want bound use the Ctrl key to select multiples So what do these templates actually provide Templates contain three types of sub objects e Data sources what data to collect and method to use eg MIB OID e Thresholds expected bounds for data and events to raise if breached e Graph definitions how to graph the data points 134 Performance Template Main Views State at time 2008 08 08 03 29 38 Map Classes vents Select All None Source Enabled laLoadints 1 3 6 1 4 1 2021 10 1 5 2 SNMP True memAvailReal 1 3 6 1 4 1 2021 4 6 0 SNMP True memAvailSwap 1 3 6 1 4 1 2021 4 4 0 SNMP True memBuffer 1 3 6 1 4 1 2021 4 14 0 SNMP True memcached 1 3 6 1 4 1 2021 4 15 0 SNMP True ssCpuRawldle 1 3 6 1 4 1 2021 11 53 0 SNMP True ssCpuRawSystem 1 3 6 1 4 1 2021 11 52 0 SNMP True ssCpuRawUser 1 3 6 1 4 1 2021 11 50 0 SNMP True IT sscpuRawWait 1 3 6 1 4 1 2021 11 54 0 SNMP True sysUpTime 1 3 6 1 2 1 25 1 1 0 SNMP True 1 of 10 latoadints Y show all P
118. e Map Admin Help Home Reports bed Resource Graphs Resource Graphs provide an easy way to visualize the critical SNMP response time and other KSC Performance Nodes Domain data collected from managed nodes throughout your network Availability Key SNMP Customized KSC Performance Reports and Node Reports KSC reports allow Statistics Reports the user to create and view SNMP performance data using prefabricated graph types The reports provide a great deal of flexibility in timespans and graphtypes KSC report configurations may be saved allowing the user to define key reports that may be referred to at future dates Node reports show SNMP data for all SNMP interfaces on a node Node reports may be loaded into the customizer and saved as a KSC report Availability Reports provide graphical or numeric view of your service level metrics for the current month to date previous month and last twelve months by categories Statistics Reports provide regularly scheduled statistical reports on collected numerical data response time SNMP performance data etc AnenNMS Canwrinht A 2002 2008 The AnenNMS Groin Ine AnenNMG ic a renictered trademark nf The QnenNMS Grain Tne Figure 62 OpenNMS Report categories available out of the box e Resource Graphs provide lots of standard reports e KSC Performance Nodes Domains allows users to customise own reports e Availability availability reports for interfaces amp services e Sta
119. e Node Interface Service Ackd IT 217583 Normal 05 08 08 23 59 20 lt gt BIE p EE Fr Edit notifications for event im mm OpenNMS user admin has logged in from 10 0 0 121 NE T 217582 Bl norma 05 08 08 23 58 30 lt gt HIE e SERES E Edit notifications for event OpenNMS user rtc has logged in from 127 0 0 1 ris i 7 V 217566 Warning 05 08 08 23 54 54 lt gt server class example org 10 191 101 1 SNMP 4 GI uei opennms org threshold relative ChangeExceeded Edit notifications for event Relative change exceeded for SNMP datasource ifInOctets on interface 10 191 101 1 parms ds ifInOctets value 82948 0 previousValuez 38540 0 multiplier 1 05 label Unknown ifLabel ethO 000c29aea14f ifIndex 2 V 217565 Warning 05 08 08 23 54 54 lt gt server class example org 10 191 101 1 SNMP HIE uei opennms org threshold relativeChangeExceeded Edit notifications for event Relative change exceeded for SNMP datasource ifOutOctets on interface 10 191 101 1 parms ds ifOutOctets value 80593 0 previousValuez 37973 0 multiplierz 1 05 label Unknown ifLabel2 eth0 000c29aea14f ifIndex 2 V 217564 Warning 05 08 08 23 54 51 lt gt group 100 linux class example o 10 191 100 3 SNMP uei opennms org threshold relative ChangeExceeded Edit notifications for event Relative change exceeded fo
120. e interface s you wish to change and then select Update Collection Note Interfaces marked as Primary or Secondary will always be selected for data collection To remove them edit the IP address range in the collectd configuration file Node ID 22 Node Label group 100 linux class example org ifIndex IP Address IP Hostname ifType ifDescription ifName ifAlias SNMP Status Collect 10 191 100 3 group 100 linux class example org 0 null null null Primary fy 3 0 0 0 0 null 131 sito null Not Collected RE Update Collection Cancel Select All Unselect All Reset Figure 59 OpenNMS GUI Admin page for specifying interfaces to collect data from Most of the contents of datacollection config xml is defining groups and systems e groups define groups of SNMP MIB OIDs to collect e systems use a device s System OID as a mask to determine which groups of OIDs should be collected 82 lt groups gt lt t data from standard mib 2 sources gt lt group name mib2 interfaces ifType all lt mibOb j 1 3 6 1 2 1 2 2 1 10 instance if Index alias iflIn ctets type counter 7 gt lt mibOb j 1 3 6 1 2 1 2 2 1 11 instance if Index alias ifInUcastpkts tupe counter lt mibOb j 1 3 6 1 2 1 2 2 1 12 instance if Index alias if InNUcastpkts type counter 7 gt lt mibOb j 1 3 6 1 2 1 2 2 1 13 instance if Index alias ifInDiscards type counter lt mibOb j
121. e up to 4 times in order to determine its Check the seruice euery 5 minutes under normal conditions Re check the service every minute until a hard state can be DONT REGISTER THIS DEFINITION ITS NOT A REAL SERVICE JUS we se ve ve se service definition template for ping check This is NOT a real service just a template define seruicet name use max_check_attempts normal_check_interval retry_check_interval register 0 p g 4 5 1 9 ing service ener ic service Figure 15 Nagios service cfg top level objects The name of this service template gt Inherit default values from the generic seruice definition Re check the seruice up to 4 times in order to determine its Check the service euery 5 minutes under normal conditions Re check the seruice euery minute until a hard state can be d DONT REGISTER THIS DEFINITION ITS NOT A REAL SERVICE JUST Again note the check period max check attempts normal check interval and retry check interval stanzas More specific service definitions can be then be defined inheriting characteristics of parents through the use stanza 30 jane bino Shell Konsole 3 Session Edit View Bookmarks Settings Help it Define a service to ping non raddle machines define seruicet use ping service Name of service template to use hostgroup name seruers seruice description PING check command check ping 200 0 2071500 0 607 it Define a service to
122. eAdded e interfaceDeleted e High Threshold e Low Threshold e High Threshold Rearmed e Low Threshold Rearmed Nothing so far has handled acknowledging notifications This can either be done manually by a user or can be performed automatically Either way when a notification is acknowledged it stops the destination path being walked for the original notification It will also create a new notification to tell users that the original issue is resolved Automatic acknowledgements are configured 75 in opt opennms etc notifd configuration xml where lt auto acknowledge gt tags specify the uei resolution problem events along with the parameters on the event which must also match for the notification to be automatically acknowledged jane opennms skills 1st co uk loptlopennmsletc Shell Konsole Session Edit View Bookmarks Settings Help lt xml uersion 1 0 encod ing UTF 8 7 gt lt notifd configuration xmlns http xmlns opennns org xsd conf ig notifd status on pages sent SELECT FROM notifications next notif id SELECT nextual notifynxtid next user notif id SELECT nextual userNotifNxt Id next group id SELECT nextval notifygrpid outstanding notices sql SELECT notifyid FROM notifications where notifyld AND respondTime is not null acknouledge id sql SELECT notifyid FROM notifications WHERE euentuei AND nodeid AND interfaceid AND seruiceid acknouledge update sql UPDATE notif
123. eGroup gt lt 7collect gt lt systemDef gt Figure 61 OpenNMS systems definitions in datacollection config xml In the figure above any device which has satisfied the filtering in collectd configuration xml and has a system OID starting with 1 3 6 1 4 1 the start of the Enterprise MIB tree will collect performance data for MIB 2 interfaces tcp and icmp as specified in the earlier group stanzas Note that the defaults in collectd configuration xml and datacollection config xml mean that a large number of SNMP data collections will be activated out of the box This is good in providing lots of samples in small environments but it could be a serious performance and disk usage factor if these defaults are left unchanged where a large number of interfaces are monitored by OpenNMS 84 7 4 2 Displaying performance data OpenNMS provides a large number of reports out of the box based on the default data collection parameters Use the Reports main menu to see the options Reports OpenNMS Web Console Mozilla Firefox sj ixj File Edit View History Bookmarks Tools Help E O _http opennms 8980 0pennms report index jsp l gt IGI t FAQ Configuration N Nagios a G Reports OpenN alo Nagios Addons E SourceForge net Reports open User admin Notices On Log out 05 Aug 2008 07 36 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications A Surveillanc
124. eachable group 100 s1 class example org HOST NOTIFICATION nagiosadmin group 100 r3 UNREACHABLE notify host by email CRITICAL Host Unreachable group 100 r3 class example org HOST ALERT group 100 r3 UNREACHABLE HARD 4 CRITICAL Host Unreachable group 100 13 class example org HOST NOTIFICATION nagiosadmin group 100 c2 UNREACHABLE notify host by email CRITICAL Host Unreachable xample org HOST ALERT group 100 c2 UNREACHABLE HARD 4 CRITICAL Host Unreachable group 100 c2 class example org SERVICE ALERT group 100 c1 PING CRITICAL HARD 1 CRITICAL Host Unreachable group 100 c1 class example org SERVICE ALERT group 100 r2 PING CRITICAL HARD 1 CRITICAL Host Unreachable group 100 i2 class example org SERVICE ALERT group 100 c2 PING CRITICAL HARD 1 CRITICAL Host Unreachable group 100 c2 class example org SERVICE ALERT group 100 r3 PING CRITICAL HARD 1 CRITICAL Host Unreachable group 100 r3 class example org HOST NOTIFICATION nagiosadmin group 100 r2 UNREACHABLE notify host by email CRITICAL Host Unreachable group 100 r2 class example org HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT h snmp version HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT HOST ALERT group 100 r2 UNREACHABLE HARD 4 CRITICAL Host Unreachable group 100 r
125. echanism If there is a response within the timeout a new suspect event is generated discovery configuration threads 1 packets per second 1 initial sleep time 300000 restart sleep time 86400000 retries 3 timeout 800 gt lt include range retries 2 timeout 3000 gt lt begin gt 10 0 0 1 lt begin gt 47 lt end gt 10 0 0 254 lt end gt lt include range gt lt include range gt lt begin gt 172 30 100 1 lt begin gt lt end gt 172 30 100 10 lt end gt lt include range gt lt specific 10 191 101 1 specific gt discovery configuration In the above example ping discovery will start 300 000 ms 5 minutes after OpenNMS has started up the discovery process will be restarted every 86 400 000 ms 24 hours 1 ping will be sent per second the timeout for a ping will be 800 ms and there will be 3 ping retries before the discovery process gives up on an address All devices on the Class C 10 0 0 0 network will be polled with only 2 retries but a 3 second timeout The 10 devices 172 30 100 1 through 10 will be polled for with the default characteristics The specific node 10 191 101 1 will be polled All that the discover process does is to generate new suspect events that are then used by other OpenNMS processes If the device does not respond to this ping polling then it will not be added to the OpenNMS database Another way to generate such events say for a box that does not
126. eduled Downtime ENOI Performance Info Last Update 04 08 2008 17 09 42 0d Oh Om 7s ago Scheduling Queue Active Checks ENABLED Reporting Passive Checks ENABLED rents Obsessing ENABLED Availability Spe Naan Alert Histogram Notifications ENABLED Alert History Event Handler ENABLED Alert Summary T UR Notifications Flap Detection ENABLED Event Log Configuration View Config Service Comments 2 Add a new comment uj Delete all comments uthor Comment Comment ID Persistent TypelExpires Actions This service has no comments associated with it Figure 29 Nagios Performance data highlighted DNS Check service 6 5 Nagios summary Nagios is a mature systems management tool whose documentation is much better than the other open source offerings It s strength is in checking availability of hosts and services that run on those hosts Support for network management is less strong as there is no automatic discovery however it is possible to configure simple network topologies and it includes the concept of a set of devices being UNREACHABLE rather than DOWN if there is a network single point of failure Handling meshed networks with multiple routing paths to a network is problematical Since all monitoring is performed by plugins some of which come with the product and some of which are available as community contributions the tool is as flexible as anyone requires There are a large number of plugins available and
127. egroup Summi Servicegroup Grid Aft Event History For Host group 100 r1 Status Map RAA Thu Jul 24 12 43 57 2008 to Thu Jul 31 12 43 57 2008 3 D Status Map 10 no rj EVENT TYPE MIN MAX SUM AVG Recovery Up 9 3 4 0 13 Down 9 6 10 0 3 Unreachable o 9 o 0 00 Service Problems Unhandled Host Problems Unhandled Network Outages Comments Downtime HNO THOR OHA gy Number of Events o P mw o ROO cot 1i 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Process Info Performance Info Scheduling Queue Trends Availability Alert Histogram Alert History Alert Summary Notifications Event Lo Figure 20 Nagios Alert Histogram for host group 100 r1 Day of the Month The Alert Summary menu option can provide various reports specific to hosts or services 35 Nagios Mozilla Firefox File Edit View History Bookmarks Tools Help e gt COA N Nagios hy EET HE ES Home Documentation Tactical Overview Service Detail Host Detail Hostgroup Overview Hostgroup Summary Hostgroup Grid Servicegroup Overv Servicegroup Summ Servicegroup Grid Status Map 3 D Status Map N hitp nagios3 nagios GJ Lj OpenNMS Web Console Alert Summary Report Last Updated Thu Jul 31 13 02 07 BST 2008 Nagios 3 0 1 www nagios org Logged in as nagiosadmin Service Problems Unhandled Host Problems Unhandled Network Outages Comments Down
128. elpful Many people will want a say in the definition of management Others whose requirements may be equally valuable may not know the art of the possible Hence combining top down requirements definition workshops with a bottom up approach of demonstrating top 10 metrics that can easily be delivered by a tool can result in an iterative process that fairly quickly delivers at least a prototype solution 1 2 Systems Management for this paper For the purposes of this paper I shall define systems management as spanning e Configuration management e Availability management e Problem management e Performance management I shall further define systems to include local and wide area networks as well as PCs and Unix like systems In my environment I do not have mainframe or proprietary midrange systems PC s run a variety of versions of Windows Unix like tends to mean a flavour of Linux rather than a vendor specific Unix though there is some legacy IBM AIX and Sun Solaris 2 Systems management tools There are no systems management solutions for sale The successful implementation of systems management requirements is a combination of e Appropriate requirements definition e Appropriate tools e Skills to translate the requirements into customisation of tools e Project management e User training e Documentation In theory the choice of tool should be driven by the requirements In p
129. ent events with ability to prioritise events Ability to categorise events for display to specific users Ability to receive and format SNMP traps for SNMP V1 V2 and preferably V3 Ability to monitor Unix syslogs and Windows Event Logs and generate customisable events Ideally the ability to monitor any test logfile and generate customisable events Customisation of actions in response to events both manual actions and automatic responses Ability to correlate events to find root cause problems eg single point of failure router is root cause of availability failure for all devices in a network Performance O Regular customisable monitoring of SNMP MIB variables both standard and enterprise specific with data storage and ability to threshold values to generate events Ability to import any MIB Ability to browse any MIB on any device Ability to gather performance data by methods other than SNMP eg ssh Customisable graphing of performance data 4 3 What is out of scope In my environment some things are specifically out of scope e Software distribution Remote configuration Remote control of devices High availability of management servers Application response time In the next few sections of this document I will explore some of the niche products briefly and then take a slightly more in depth look at OpenNMS Nagios and Zenoss These sections are not intended to be a full analysis of the products more an
130. ents that have been integrated into this alarm To see the individual events click on the number in the Count column 70 At present July 10th 2008 acknowledging events has no effect on related alarms and vice versa Note that the concepts of Acknowledging and Clearing are completely different An operator can acknowledge an event or an alarm and then owns it This does not clear the event ie remove it entirely from the events database Automatic actions can be configured for an event using the lt autoaction gt tag but this can only run an executable and it runs on every occurrence of the event which may not be what you want OpenNMS s concept of automation however is triggered from alarms rather than events Automation is the concept of actions being performed on a scheduled basis provided the correct triggers exist An automation tag includes e name the name of the automation e interval the frequency in milliseconds at which the automation runs e trigger name a string that references a trigger definition e action name a string that references an action definition The triggers and actions are SQL statements that operate on the events database Automation is defined in opt opennms etc vacuumd xml where there are a number of useful rules by default 71 jane opennms skills 1st co uk loptlopennmsletc Shell Konsole 2 Session Edit View Bookmarks Settings Help Btatenent l
131. eral templates called Device The Device template for the class Devices simply collects sysUpTime The template called Device for Devices Server collects a number of parameters supported by the net snmp MIB The template called Device for Devices Server Windows collects various MIB values from the Informant MIB For each template name Zenoss searches first the device itself and then up the Device Class hierarchy looking for a template with that name Zenoss uses the first template that it finds with the correct name ignoring others with the same name that might exist further up the hierarchy 133 So the zenperfsnmp daemon will collect net SNMP MIB information for Unix Linux servers and will collect Informant MIB information for Windows servers as Devices Server Windows is more specific than Devices Server Any actual device can have a local copy of a template and change parameters to suit that specific device Template bindings can either be modified by changing the zProperties zDeviceTemplates field or there is a Bind Templates menu dropdown from the templates display of any device Do remember that for a device both the Templates menu and the zProperties menu are off the More dropdown submenu Zenoss bino skills 1st co uk Mozilla Firefox File Edit View History Bookmarks Tools Help i http zenoss 8080 zport dmd Devices Server L inuxdevicesibino skills 1s gt C admin Preferences Softwar
132. ercent that things need to change in order to trigger e g valuez 1 5 means a 50 increase rearm The value at which the threshold will reset itself Not used for relativeChange thresholds 94 trigger The number of times the threshold must be exceeded in a row before the threshold will be triggered Not used for relativeChange thresholds triggeredUEI A custom UEI to send into the events system when this threshold is triggered If left blank it defaults to the standard thresholds UEIs rearmedUEI A custom UEI to send into the events system when this threshold is re armed If left blank it defaults to the standard thresholds UEIs By default standard threshold and rearm events will be generated but it is also possible to create customised events with the threshold attributes This would then make it easier to generate notifications for specific thresholding rearm events Here is a screenshot with standard events generated by thresholds on the raddle network N Nagios O List Events G3 L Nagios Addons E SourceForge t FAQ Configur t Thresholding s Home 7 Events List View all events Advanced Search Severity Legend Acknowledge entire search Event Textil Time Any Search Results 1 10 of 2980 Search constraints Event s outstanding 1 2 3 4 5 Next Last Legend e jo i Ack pir Severity Tim
133. es scan has been completed on this node 100 000 StrafePing Not Monitored Not Monitored Recent Outages Overall 172 16 224 1 172 16 225 1 Overall Not Monitored There have been no outages on this node in the last 24 hours SNMP Attributes Name deodar Object ID 1 3 6 1 4 1 8072 3 2 10 Location Cedar Chase Contact Jane Curry A services scan has been completed on this node DEEISIDISBDE Linux deodar 2 6 18 8 0 5 default 1 SMP Fri Jun 22 12 17 53 UTC 2007 x86 64 z 1st co uk eth0 172 16 225 1 5 vmnets 172 16 224 1 4 vmneti Figure 34 OpenNMS node detail with monitored services OpenNMS includes a standard set of Availability reports They can be selected from the Reports menu 56 Availability OpenNMS Web Console Mozilla Firefox eli xj File Edit View History Bookmarks Tools Help 7 e ft Li http opennms 8980 opennms availability index jsp I lIGI Aj N Category FAQs Ope Lj SourceForge net Par Nagios Li Availability OpenN L Help OpenNMS Web Availability User admin Notices On Log out 03 Jul 2008 04 25 GMT 05 00 Path Ou Dashb ts Alarms Noti reports e Map Admin Help Home Reports Availability kA ability Repo Generating the availability reports may take a few minutes especially for large networks so please do not C
134. ested to see if it maps to a valid ifIndex in the ifTable If this is true the IP address is marked as a secondary SNMP interface and is a contender for becoming the primary SNMP interface switch skills 1st co uk Node OpenNMS Web Console Mozilla Firefox 25 xJ File Edit View History Bookmarks Tools Help E tt Li http opennms 8980 opennms element node jsp node 8 I3 eg amp N Nagios O switch skills 1st co G3 Help OpenNMS Web t Category FAQs Ope Li SourceForge net Par 63 Node switch skills 1st co uk a View Events View Alarms Asset Info Telnet HTTP Resource Graphs Rescan Admin Update SNMP View Node Link Detailed Info You Outstanding Check You Acknowledged Check Recent Events Availability last 24 hours 66140 02 07 08 06 47 31 02 07 08 06 47 31 A services scan has been completed on this node Overall 100 000 SNMP information on 10 0 0 253 is being refreshed for data collection purposes 100 000 I 66139 100 000 SNMP data collection on interface 10 0 0 253 previously failed and has been restored 10 0 0 253 100 000 StrafePing Not Monitored Node switch skills 1st co uk is up Telnet Not Monitored Normal 52412 01 07 08 06 46 44 Acknowledge Reset More SNMP Attributes Acknowledge Reset Name switch skills 1st co uk Recent Outages Ob
135. ets in Flight Ping Cycle Time secs Browse By Maximum Ping Failures 1440 Modeler Cycle Interval mins 720 A Default Discovery Networks None Groups Render URL Izport Render Server F Locations Render User Event Console Systems Devices Select All None fn Everts adsi2 skills 1st co uk bino skils 1st co uk Event Manager blue atlas skils 1st co uk E deodar mat skills 1st co uk L deodar skills 1st co uk E group 100 a1 class example orq aroun 100 b1 class examole ora Figure 88 Zenoss Collectors Monitors overview The devices being monitored are shown at the bottom of the screen To change any of these parameters use the Edit tab The defaults for availability monitoring are e Pingcycle time polling 60 sec e Pingtimeout 1 5 sec e Ping retries 2 e Status TCP UDP service polling interval 60 sec e Process SNMP Host Resources polling interval 180 sec e SNMP performance cycle interval 300 sec What availability checks are carried out on a device is controlled by the zProperties of that device remembering that zProperties can be set at any level of the object hierarchy By default the Devices class has zPingMonitorIgnore False and zSnmpMonitorlgnore False so every device will get ping polling at 1 minute intervals and SNMP polling at 5 minute intervals 109 8 2 2 Availability monitoring of services TCP UDP ports and windows services Service monitoring for TCP UDP ports and
136. external file In addition individual host and services or groups of either can have their own event_handler directive and their own event_handler_enabled directive Note that if the global enable_event_handlers is off then no individual host service will run event handlers Individual event handlers will run immediately after and global event handler Typically an event handler will be a script or program defined in the Nagios commands cfg file to run any external program The following parameters will be passed to the event handler For Services SERVICESTATE SERVICESTATETYPE SERVICEATTEMP For Hosts HOSTSTATES HOSTSTATETYPES S SHOSTATTEMPT Event handler scripts will run with the same user privilege as that which runs the nagios program Sample event handler scripts can be found in the contrib eventhandlers subdirectory of the Nagios distribution Here is the sample submit check results command 41 l jane bino Shell Konsole 2 sli of Session Edit View Bookmarks Settings Help it bin sh SUBMIT CHECK RESULT Written by Ethan Galstad nagios nagios org Last Modified 02 18 2002 This script will write a command to the Nagios command file to cause Nagios to process a passive service check result Note This script is intended to be run on the same host that is running Nagios If you want to submit passive check results from a remote machine look at using the nsca addon Arguments
137. fer to customise through a GUI the Admin menu provides access to configure some of these files without needing to know an editor or XML It feels like a solid reliable product and is designed say the developers to scale to truly large enterprises There are lots of good samples provided and the default configurations provide rich functionality Areas where it is weak are around formal documentation and the lack of a usable topology map That said the help that is provided with OpenNMS panels is very good Data collection and thresholding is strong The addition of a MIB compiler and browser would improve matters enormously It is also short of a way to discover applications that do not support port sniffing or SNMP There are two large problems with OpenNMS that give me great concern You have to bounce the whole OpenNMS system if you change any configuration files The second big issue known to be under review is the association between events alarms and notifications Currently notifications are driven from events whereas driving them from alarms would seem preferable There is also no link between acknowledging events alarms and notifications I have two personal negative feelings with OpenNMS The first is that it is written in Java Sorry but I hate Java applications To be fair OpenNMS does not suffer from performance issues that affect so many other Java applications but its logfiles are Java logfiles and life is just too
138. generates a notification 6 3 3 SNMP TRAP reception and configuration Nagios s own documentation says that it is not a replacement for a full blown SNMP management application It has no simple way to receive SNMP TRAPs or to parse them It is possible to integrate SNMP TRAPs by sending them to Nagios as passive checks but this will require significant effort The documentation suggests using a combination of net snmp and the SNMP TRAP Translator SNMPTT packages 6 3 4 Nagios notifications In Nagios the terms event and alert are used interchangeably There is a comprehensive mechanism for notifications which is driven by parameters on the host and service checks There is also configuration for notifications on a per contact basis each check can have a contact groups stanza specifying who to contact Contacts can appear in several different contact groups although only a single notification will be sent to any individual Notifications are only generated for HARD status type events not SOFT ones Whether notifications are sent depends on the following parameters characteristics in this order e notifications enabled global on off parameter e Each host service can have scheduled downtime no notifications in downtime e Each host service can be flapping no notifications if flapping e Host notification options d u r specifies notifications on down unreachable recovery events e Service notification
139. h Custom Definition Graph Commands Type Description E cpuRawWait Threshold cpuRawWait ssCpuRaw System DataPoint ss CpuRaw System_ssCpuRaw System D ss CpuRaw User DataPoint ssCpuRawUser ssCpuRawUser p ss CpuPRaw Wait DataPoint ssCpuRawWait ss CpuRaw Wait State at time 2008 08 08 04 03 42 CPUUtiization 100 00 perenags Management a Add Device Tue Save Figure 123 Zenoss Performance template graph definition Note that graphs can display both data points and thresholds All graphs are stored by default under usr local zenoss zenoss perf Devices There is a subdirectory for each device Component data rrd files are under the os subdirectory with further subdirectories for filesystems interfaces and processes 8 4 2 Displaying performance data graphs To view performance graphs the Operating System component graphs can be seen from the OS page of a device by clicking on the relevant interface filesystem or process The rest of the performance graphs can be found under the Perf tab 138 le Zenoss eth1 Mozilla Firefox File Edit View History Bookmarks Tools Help lt Qh O nttp zenoss 8080 zport dmd Devices Server Linux devices bino skills 1st co uk os int i Cl amp 7 a x IpInterface ole ethi st 10 0 0 121 24 rk Map Sp ethernetCsmacd Classes demo ge Cie CMe Hourly Throughput bits sec N Nagios Lj Event Notificatio 3 Source
140. have input from the support and development teams however the source code is not open for discussion or community development With a very active Open Source offering there tends to be a much larger pool of developers and testers ie us and the chance of getting problems fixed may be higher even if you cannot fix it yourself I would emphasise very active Open Source offerings unless you really do have some very highly skilled local staff that you are sure you are going to keep it may be a risky choice to participate in a small Open Source project 3 Open Source management offerings There are lots of different Open Source management offerings available Many of them rely on the Simple Network Management Protocol SNMP which defines both a protocol for an SNMP manager to access a remote SNMP agent and also defines the data that can be transferred SNMP data values that an SNMP manager can request are defined in Management Information Bases MIBs which can either be standard MIB 2 or can be enterprise specific in other words each different manufacture can provide different data about different types of device Information events emanating from an agent typically problems are SNMP traps There are three versions of the SNMP standard e V1 1988 still most prevalent Significant potential security and performance issues e V2 1993 solved some performance issues Never reached full standard status e V3
141. he Dude I put some research into The Dude as it apparently provides auto discovery of a network with graphical map layout something that is hard to find done well From the Open Source perspective though it really doesn t qualify It is basically a Windows application though it can apparently run under WINE on Linux It comes from a company called MikroTik and their website says it is free but it is unclear what the licensing arrangement is for The Dude It has a very active forum It offers more than simply discovery and configuration as it can apparently monitor links and devices for availability and graph link performance It can also generate notifications 20 6 Nagios Nagios evolved in 2002 out of an earlier systems management project called NetSaint which had been around since the late 1990s It is far more a systems management product rather than a network management product It is available to build on most flavours of Linux Unix and the installation has become much easier over the years The Nagios Quickstart document is reasonably comprehensive although it misses a few prerequisites that I found necessary like gd png jpeg zlib net snmp and their related development packages I downloaded and built Nagios 3 0 1 on a SuSE 10 3 platform hostname nagios3 and had it working inside half a day To start the Web Interface point your browser at http nagios3 nagios The Quickstart document has you create so
142. his information Ideally ability to discover and display network Layer 2 switch topology e Availability monitoring O O O O O Customisable ping test for all discovered devices and interfaces SNMP availability test for devices that do not respond to ping eg comparison of SNMP Interface administrative status with Interface operational status Simple display of availability status of devices preferably both tabular and graphical Events raised when a device fails its availability test Ability to monitor infrastructure of network devices eg CPU memory fan Differentiation between device interface down and network unreachable e Problem Events to be configurable for any discovered device Central events console with ability to prioritise events Ability to categorise events for display to specific users Ability to receive and format SNMP traps for SNMP V1 V2 and preferably V3 Customisation of actions in response to events both manual actions and automatic responses Ability to correlate events to find root cause problems eg failure of a router device is root cause of all interface failure events for that device e Performance 11 Regular customisable monitoring of SNMP MIB variables both standard and enterprise specific with data storage and ability to threshold values to generate events Ability to import any MIB Ability to browse any MIB on any device Customisable graphing of performan
143. hoose the format of report press the stop or reload buttons until it has finished Thank you for your patience e Graphical Reports in PDF Format C Numeric Reports in PDF Format C Numeric Reports in HTML Format The SVG and PDF report formats can be viewed using Adobe Acrobat Reader If you do not have Adobe Acrobat Reader and wish to download it please click on the following link Choose the format of the monthly report sections A Get ADOBE READER Classic Format es C Calendar Format Acrobat is a registered trademark of Adobe Systems Incorporated Choose the category Overall Service Availability Network Interfaces Email Servers Web Servers JMX Servers DNS and DHCP Servers Database Servers Other Servers Internet Connectivity SPAS CU fq rate Choose the date to use for this report uy E zoos D OpenNMS Copyright 2002 2008 The OpenNMS Group Inc OpenNMS is a registered trademark of The OpenNMS Group Inc Figure 35 OpenNMS Availability reports menu Here is a sample 57 Availability Report July 3 2008 Overall Service Availability This category reflects availability of all services currently being monitored by OpenNMS Nodes having outages 20 Interfaces 28 Services 50 The last 12 Months Availability Percentage Availability dini E 1 s l i m i s se ke ua Ow m e We wen The last 17 Monts Awallabeliry pul 2097 to jun 2008 The last Months
144. ications SET ansueredby respondtime WHERE notifyld natch all true email address command javaEmai1 gt lt auto acknouledge resolution prefix RESOLUED uei uei opennns org nodes seruiceResponsiue acknouledge uei opennns org nodes seruiceUnrespons iue lt match xmlns gt nodeid lt match gt lt match xmlns gt interfaceid lt match gt lt match xmlns gt service id lt match gt auto acknouledge lt auto acknouledge resolution prefix RESOLUED uei uei opennns org nodes nodeRegainedService acknowledge uei opennms org nodes nodeLostService gt lt match xmlns gt nodeid lt match gt lt match xmlns gt interfaceid lt match gt lt match xmlns gt service id lt match gt lt auto acknou ledge gt lt auto acknouledge resolution prefix RESOLUED uei uei opennns org nodes interfaceUp acknowledge uei opennns org nodes interfaceDoun gt lt match xmlns gt nodeid lt match gt lt match xmlns gt interfaceid lt match gt auto acknouledge lt auto acknouledge resolution prefix RESOLUED uei uei opennns org nodes nodeUp acknowledge uei opennns org nodes nodeDown gt lt match xmlns gt nodeid lt match gt lt zauto acknow ledge gt X auto acknouledge resolution prefix RESOLUED uei uei opennms org correlation remote uideSpread utageResolued acknouledge uei opennnms org corre lat ion remote uideSpreadOutage gt lt match xmIns nodeid4 match l
145. iguration xml The need for the range parameter disappeared However to define different filters for thresholding different packages had to be defined in collectd configuration xml 91 From OpenNMS 1 5 91 this paper is based on version 1 5 93 filters can be defined in threshd configuration xml so that packages in collectd configuration xml can be kept simple The parameter in threshd configuration xml changes the thresholding group key disappears and is replaced by e lt parameter key thresholding enabled value true gt Here is the default collectd configuration xml Session Edit View Bookmarks Settings Help K xml version 1 0 gt lt castor class name org opennns netngt col lectd Col lectdConf igurat ion gt collectd configuration threads 50 gt lt package nane examplei1 lt filter gt IPADDR t 0 0 0 0 lt filter gt lt include range begin 1 1 1 1 end 254 254 254 254 gt lt service name SNMP interval 300000 user defined false status on gt lt parameter key collection value default gt lt service gt lt package gt collector seruice SNMP class name org opennns netmgt col lectd SnmpColl Figure 69 OpenNMS Default collectd configuration xml The lack of any thresholding parameter implies that thresholding is disabled and the default threshd configuration xml Session Edit View Bookmarks Settings Help K xml version 1 0 gt
146. iguration xml controls the overall behaviour of polling poller configuration threads 30 serviceUnresponsiveEnabled false nextOutageld SELECT nextval outageNxtId xmlrpc false node outage status on pollAllIfNoCriticalServiceDefined true critical service name ICMP gt node outage 30 threads are available for polling The basic event that is generated when a poll fails is called NodeLostService If more than one service is lost multiple NodeLostService events will be generated If all the services on an interface are down instead of a NodeLostService event an InterfaceDown event will be generated If all the interfaces on a node are down the node itself can be considered down and this section of the configuration file controls the poller behaviour should that occur If a NodeDown event occurs and node outage status on then all of the InterfaceDown and NodeLostService events will be suppressed and only a NodeDown event will be generated Instead of attempting to poll all the services on the down node the poller will attempt to poll only the critical service Once the critical service returns the poller will then resume polling the other services Note in the following screenshot that six services have been discovered on the 10 0 0 95 interface of the node called deodar skills 1st co uk of which four are monitored The two interfaces on the 172 16 network have been
147. ing the TCP plugin It will look for the string SSH to be returned Timeout is 3 seconds with 1 retry The first protocol entry in capsd configuration xml is for ICMP lt protocol plugin protocol ICMP class name org opennms netmgt capsd IcmpPlugin scan on user defined false lt property key timeout value 2000 gt lt property key retry value 1 gt protocol plugin It is possible to apply protocols to specific address ranges or exclude protocols from address ranges the default is inclusion lt protocol plugin protocol ICMP class name org opennms netmgt capsd IcmpPlugin scan on user defined false lt protocol configuration scan off user defined false lt range begin 172 31 100 1 end 172 31 100 15 gt lt property key timeout value 4000 gt lt property key retry value 3 gt lt protocol configuration gt lt protocol plugin gt Note the scan off for IP addresses 172 31 100 1 15 The SNMP protocol is special in that if supported it provides a way to collect performance data as well as poll for availability management information SNMP parameters for different devices and ranges of devices are specified in opt opennms etc snmp config xml Here is a sample lt snmp config retry 3 timeout 800 version v1 port 161 read community public write community private gt definition version v2c gt lt specific gt 10 0 0 121
148. is now deprecated in the lt alarm data gt tag and only the clear key element on the good news event is used to match against the reduction key element of the bad news event setting the severity to 2 ie Cleared Also note from the lt automation gt tag that cosmicClear will run every 30 seconds If users need to be notified of an event then OpenNMS provides email and pager notifications out of the box run by the notifd daemon It is also possible to create other notification methods such as SNMP TRAPs or an arbitrary external program There are several related configuration files in opt opennms etc e destinationPaths xml who when how to notify escalate e notifd configuration xml global parameters for notifd e notificationCommands xml notification methods email http page e notifications xml what events generate notifications where e javamail configuration properties configuration for java emailer default The main files that will need attention are destinationPaths xml notifd configuration xml and notifications xml Here is part of the examples file provided in etc opennms etc examples destinationPaths xml 73 jane opennms skills 1st co uk loptlopennmsletc examples Shell Konsole Session Edit View Bookmarks Settings Help lt xml version 1 0 7 5 lt dest inationPaths gt lt header gt lt reu gt 1 2 lt reu gt lt created gt Wednesday February 6 2002 10 10 00 AM EST lt created g
149. isplay of services Also note that some services run commands that are inherently local to the Nagios system eg check_local_disk The check_dns command runs nslookup on the Nagios system but the host_name parameter can be used to specify the DNS server to query from The commands are actually specified in the configuration file commands cfg which in turn calls executable plugins in usr local nagios libexec 31 fe Nagios Mozilla Firefox O x Eile Edit View History Bookmarks Tools Help E G N _nttp nagios3 nagios gt gt G N Nagios G Lj OpenNMS Web Console z 2 Current Network Status Host Status Totals Service Status Totals N a it i os Last Updated Wed Jul 30 12 07 37 BST 2008 IUD ilunr lel n Un Pei Updated every 90 seconds Nagios 3 0 1 www nagios org Home Logged in as nagiosadmin Documentation View History For all hosts View Notifications For All Hosts View Host Status Detail For All Hosts Service Detail Service Status Details For All Host Detail H t Hostgroup Overview osts Hostgroup Summary ee Host Service Status Last Check Duration Attempt Status Information ervicegroup Overvie z SOE bino DNS Check 30 07 2008 12 05 18 22d 1h53m1s 172 DNS OK 0 014 eronda response time www skills 1st co uk Servicegroup Grid returns 212 74 28 15 Status Map PING 30 07 2008 12 06 55 97d 23h 17m 16s 1 4 PING OK Packet loss 096 RTA 0 42 ms 3 D St
150. istory Alert Summary Notifications Event Log Figure 19 Nagios Configuration for Alert Histogram Note in the figure above that a host service selection has already been prompted for and having selected host the specific host has been supplied The following figure shows the resulting graph Note the blue links towards the top left of the display providing access to a filtered view of the events log View History for this Host and to notifications for this host 34 File Edit View History Bookmarks Tools Help Qe M e ft N hitp nagios3 nagios gt G7 N Nagios 3 O OpenNMS Web Console State Types 3 Host Alert Histogram Host group 100 r1 Report period Assume state N a g 10S Last Updated Thu Jul 31 12 43 57 BST 2008 Nagios 8 3 0 1 www nagios org a Last 7 Days yes v Logged in as nagiosadmin y z x Tee 4 24 07 2008 12 43 57 to 31 07 2008 Breakdown type Initial states ee c View Trends For This Host 12 43 57 logged Documentation View Availability Report For This Host ae View Status Detail For This Host Duration 20 0h Orns Day of the Month amp no i EJ View History For This Host Events to graph Ignore repeated Tactical Overview View Notifications For This Host states Service Detail All host events gt Host Detail Hostgroup Overview State types to graph Hostgroup Summarg Hard and soft states gt Update Hostgroup Grid Servicegroup Overv Servic
151. ject ID 1 3 6 1 4 1 9 1 217 z E There have been no outages on this node in the last 24 hours A services scan has been completed on this node Location Skills 1st Office Contact andrew findlay skills 1st co uk ORACLE Cisco Internetwork Operating System Software IOS tm C2900XL Software C2900XL C3H2S M Version 12 0 5 1 XP MAINTENANCE INTERIM SOFTWARE Copyright c 1986 1999 by cisco Systems Inc Compiled Fri 10 Dec 99 10 37 by cchang Interface Index Description IfAlias 10 0 0 253 c 1 VLAN1 switch skills 1st co uk Fa0 1 2 Fastetherneto Linksys wireless access point Fa0 2 3 FastEthernetO0 2 Blue Atlas Fa0 3 4 FastEthernet0 3 Brick Fa0 4 5 FastEthernet0 4 Blossom Fa0 5 6 FastEthernet0 5 7 FastEthernet0 6 Fa0 6 Figure 31 OpenNMS node detail for a switch showing switch ports 50 The first stanza in capsd configuration xml defines service polling parameters capsd configuration rescan frequency 86400000 initial sleep time 300000 management policy managed max suspect thread pool size 6 max rescan thread pool size 3 abort protocol scans if no route false gt This defines that capsd will wait 5 minutes after OpenNMS starts before starting the capsd discovery process It will rescan to discover services every 24 hours The default management policy for all IP addresses found in new suspect events will be to scan for
152. le events NetSNMP events xml event file The events subdirectory currently has around 100 files in it For performance reasons it makes sense to edit eventconf xml and remove any lt event file gt stanzas that are not relevant for your organisation Also note that the whole OpenNMS system must be recycled in order for changes to eventconf xml to take effect 7 3 3 SNMP TRAP reception and configuration OpenNMS will automatically monitor the SNMP TRAP part UDP 162 with the trapd process The opt opennms etc events directory contains around 100 files which specify SNMP TRAP translations into OpenNMS events If a TRAP is sent to OpenNMS that it has no configuration for then it will use a default mapping found in default events xml 65 List Events OpenNMS Web Console Mozilla Firefox kalkaj E File Edit View History Bookmarks Tools Help E z e ft Li http opennms 8980 opennms event list gt gt a S B List Events OpenN GN Nagios U SourceForge net Part t Eventconf xml OpenN Li color css in opennms Event List User admin Notices On Log out 09 Jul 2008 23 54 GMT 05 00 Path Outages Dashboard Events Alarms Nc s e C S Map Admin Help Home Events List View all events Advanced Search Severity Legend Acknowledge entire search Event Text Time Any x Search Results 1 10 of 1770 1 2 3 4 5 Next Last Legend SMS ME Search constraints Event s outstanding
153. ler TcpMonitor gt lt monitor service SQLServer class name org opennms netmgt poller TcpMonitor monitor service SSH class name org opennms netmgt poller TcpMonitor monitor service IMAP class name org opennms netmgt poller ImapMonitor monitor service POP3 class name org opennms netmgt poller Pop3Monitor monitor service NSClient class name org opennms netmgt poller NsclientMonitor gt lt monitor service NSClientpp class name org opennms netmgt poller NsclientMonitor gt monitor service Windows Task Scheduler class name org opennms netmgt poller Win32ServiceMonitor Preceding the monitor service stanzas in poller configuration xml are the definitions of services These look very similar to the entries in capsd configuration xml which makes sense as this is the regular polling definitions for the same services that capsd has already found however parameters in the poller file may well take different values for example the discovery service may be allowed longer timeouts and more retries than the polling service service name ICMP interval 300000 user defined false status on gt parameter key retry value 2 gt lt parameter key timeout value 3000 gt lt service gt user defined false status o0ff gt service name SNMP interval 300000 parameter key retry value 2 gt lt parameter key timeout value 3000 gt lt parameter key port value 161
154. lias clients members group 100 s1 group 100 c1 group 100 c2 group 100 c3 group 100 a1 H define hostgroupt hostgroup name raddle alias raddle nenbers server group 100 ri group 100 r2 group 100 r3 group 100 s1 group 100 c1 group 100 c2 group 100 c3 group 100 a1 H Figure 11 Nagios hosts cfg host group definitions 25 Host groups are also used in the GUI to display data based on host groups Nagios Mozilla Firefox File Edit View History Bookmarks Tools Help gt Home Documentation Tactical Overview Service Detail Host Detail Hostgroup Overview Hostgroup Summary Hostgroup Grid Servicegroup Overv Servicegroup Summ Servicegroup Grid Status Map 3 D Status Map Service Problems Unhandled Host Problems Unhandled Network Outages She Comments Downtime Process Info Performance Info Scheduling Queue Reporting Trends Availability Alert Histogram Alert History Alert Summary Notifications Event Log Configuration View Config Li group 100 a1 class exa EL A N http nagios3 nagios N Nagios G Current Network Status Last Updated Wed Jul 2 17 00 32 BST 2008 Updated every 90 seconds Nagios 3 0 1 www nagios org Logged in as nagiosadmin View Service Status Detail For All Host Groups View Host Status Detail For All Host Groups View Status Summary For All Host Groups View Status Grid For All Host Groups clien
155. ll be archived into the Event History database in one of four ways e Manually moved to the historical database historifying e Automatic correlation good event clears bad event e An event class rule e A timeout Events automatically have a duplication detection rule applied so that if an event of the same class from the same device with the same severity arrives then the repeat count of an existing event will simply be incremented Global configuration parameters for the event system can be configured from the Event Manager left hand menu By default status events of severity below Error are aged out to the Event History database after 4 hours Historical events are never deleted 121 admin Pref Main Views Fields History Fields Commands Modifications Dashboard mysql jzenoss Classes eve nts Events localhost s Network Map Prod Cache Timeout en Cache Clear Count Systems Eis History Cache Timeout History Cache Clear Count Maintenance Management Event Aging Threshold hours Dont Age This Severity and Above Delete Historical Events Older Than days Add Device Default Availabilty Report days Default Syslog Priority Save Changes Figure 104 Zenoss Event Manager configuration 8 3 1 Event console The main Event Console is reached from the Event Console menu on the left The default is to show all status events with a severity of Info or higher sorted first by severity
156. lla Firefox laka EA File Edit View History Bookmarks Tools Help lt Li http netdisco netdisco device html ip 10 0 0 253 amp submit Show AIl Ports amp portcol n gt IGI E i Firefox Support WPlug in FAQ RSS Feeds Netdisco Device View Switch skills 1st co uk 10 0 0 253 Device Control Name switch skills 1st co uk Network Map Tao Location Device Search Contact Model Serial cisco 2924XLv 0x0E Device Inventory OS Version ios 12 0 5 1 XP Skills 1st Office andrew findlay skills 1st co uk Node Search Cisco Internetwork Operating System Software IOS tm C2900XL Software C2900XL C3H2S M Version Description 12 0 5 1 XP MAINTENANCE INTERIM SOFTWARE Copyright c 1986 1999 by cisco Systems Inc Compiled Fri 10 Dec 99 10 37 by cchang Port Report Uptime 5 Last 53 min Thu Jun 26 18 11 16 2008 Duplex Mismatch j Discovered Finder m irsi AD Node Inventory IS COVETOU Thu Jun 26 17 42 02 2008 Backend Log Last MacSuck Thu Jun 26 20 00 27 2008 Documentation VIP Domain skills About Port Duplex Name Speed VLAN Link Admin Connected Port Devices Control e Du FastEthernet0 1 full auto Linksys wireless access point 100 Mbps 1 FastEthernet0 2 half auto Blue Atlas 10 Mbps 1 Done e erm Adblock Figure 6 Netdisco details of a switch device including ports 5 3 T
157. lt lt Previous Description Status Hostname adsl2 skills 1st co uk Up adsl2 skills 1st co uk 100 5 bino skills 1st co uk Up bino skills 1st co uk 5 14 16 87 99 48 deodar skills 1st co uk Up deodar skills 1st co uk 1 96 3 16 100 r group 100 r1 class example org Up group 100 r1 class example org 62 62 69 51 79 47 r group 100 r2 class example org Up group 100 r2 class example org 20 86 152 09 79 55 j group 100 r3 class example org Up group 100 r3 class example org 86 22 92 69 79 47 Localhost Up 127 0 0 1 100 r lt lt Previous Showing Rows 1 to 7 of 7 1 e Choose an action Delete ipo em D n s Q P UU gt ad Adblock Figure 1 Cacti main Devices panel e cacti Mozilla Firefox File Edit View History Bookmarks Tools Help E z i gt e KI ia A http cacti cacti graph php local graph id 23 amp rra id all M W SFirefox Support EljPlug in FAQ RSS Feeds console graphs Graphs gt List Mode gt deodar skills 1st co uk Traffic ethO settings Logged in as admin Logout Viewing Graph deodar skills 1st co uk Traffic etho deodar skills 1st co uk Traffic etho 30 k 25k 20 k 15k 10 k 5k bits per second 14 00 16 00 18 00 20 00 22 00 00 00 02 00 04 00 06 00 08 00 10 00 12 00 Bl Inbound Current 3 65 k Average 555 34 Maximum
158. me user ids and passwords the default logon for the Web console is nagiosadmin with the password you specified during installation Here is a screenshot of the Nagios Tactical Overview display l Nagios Mozilla Firefox O x File Edit View History Bookmarks Tools Help Li group 100 r1 class exam N Nagios N a is 10S Tactical Monitoring Overview Monitoring Performance Last Updated Tue Jul 1 12 05 49 BST 2008 3 Updated every 90 seconds Service Check Execution Time 0 01 4 46 1414 sec Nagios 3 0 1 www nagios org g EU g SETTE an Service Check Latency 0 01 0 25 0 125 sec Host Check Execution Time 0 02 3 04 0 589 sec Host Check Latency 0 00 0 23 0 088 sec Tactical Overview Active Host Service Checks 12 34 Service Detail i Host Detail Passive Host Service Checks 0 0 Hostgroup Overview Hostgroup Summary Servicegroup Overvie Servicegroup Summar Host Health E J Servicegroup Grid 1 Outages A Status Map ro Service Health E utages 3 D Status Map Service Problems Unhandled Host Problems 5 Unhandled 0 Unreachable 9Up 0 Pending Network Outages Comments Downtime Process Info Performance Info Scheduling Queue Reporting Trends Monitoring Features availability Flap Detection Notifications Event Handlers Active Checks Passive Checks Alert Histogram Alert History Alert Summary Notifications No Services Event Log Flapping
159. mechanism but events do not continue to be raised for existing problems Yes Yes Service host Yes No dependencies Root cause UNREACHABLE Outages Path No analysis status for devices outages behind network single point of failure Also host service dependencies 9 1 4 Performance management Nagios OpenNMS Zenoss Collect No Yes Yes performance data using SNMP Collect No NSClient JMX ssh telnet other performance data using other methods HTTP methods using ZenPacks 145 Nagios OpenNMS Zenoss Threshold No Yes Yes performance data Graph No Yes lots provided Yes lots provided performance data OOTB OOTB MIB compiler No No Yes MIB Browser No No No though a MIB Browser ZenPack is said to be available for 2 2 9 2 Product high points and low points This section is far more subjective your mileage may vary 9 2 1 Nagios goodies and baddies Good points Bad points Good stable code for systems management No auto discovery Good correlation between service events and host events Weak event console Command to check validity of config files No OOTB collection or thresholding of performance data Command to reload config files without disrupting Nagios operation No easy way to receive and interpret SNMP TRAPs Good documentation No MIB compiler or browser
160. ml file Obviously you can specify different packages with different address ranges collection intervals and with different collection keys You can also specify data collectors other than SNMP such as NSClient JMX and HTTP See http blogs opennms org p 242 for a note on using an HTTP data collector The datacollection config xml file defines one or more SNMP data collections that Tarus Balog the prime developer behind OpenNMS calls a scheme to differentiate it from the package defined in the collectd configuration file These schemes bring together OIDs for collection into groups and the groups are mapped to systems The systems are mapped to interfaces by a device s systemOID In addition each scheme controls how the data will be collected and stored Fundamentally OpenNMS uses RRD Tool Round Robin Database Tool to store performance data This paper is not a tutorial on RRD Tool so please follow the reference to RRD at the end of this paper for more information The basis of RRD is that a fixed amount of space is allocated for a given database when it is created It holds data for a given period of time say 1 month 1 year etc The sampling interval is known so you know how many datapoints will go into the database and hence how much space is required Once the database is full newer datapoints will replace the oldest ones cycling around Kf xml version 1 0 gt lt datacollection config rrdRepository opt openn
161. ms share rrd snmp gt lt snmp collection name default maxUarsPerPdu 10 snmpStorageFlag select lt rrd step 300 gt lt rra gt RRA AVERAGE 0 5 1 2016 lt rra gt lt rra gt RRA AVERAGE 0 5 12 14884 rra lt rra gt RRA AVERAGE 0 5 286 366 lt rra gt lt rra gt RRA MAX 0 5 288 366 lt rra gt lt rra gt RRA MIN 0 5 288 366 lt rra gt amp rrd Figure 56 OpenNMS datacollection config xml collection and RRD parameters 78 The lt rrd gt stanza specifies how data will be stored in a Round Robin Archive RRA The snapshot shown in the figure above specifies rrd step 300 gt O data to be saved every 5 minutes per step RRA AVERAGE 0 5 1 2016 O create an RRA with values AVERAGE d over 1 step ie this data is raw not consolidated The RRA will have 2016 rows representing 7 days of data 5 minute steps 12 hour 24 hours 7 days 2016 Consolidate the samples provided 0 5 half of them are not UNKNOWN otherwise the consolidated value will be UNKNOWN RRA AVERAGE 0 5 12 1488 O create an RRA with values AVERAGE d over 12 steps ie this data is consolidated over 1 hour The RRA will have 1488 rows representing 2 months of data 1 hour consolidations 24 hours 62 days 1488 Consolidate the samples provided 0 5 half of them are not UNKNOWN otherwise the consolidated value will be UNKNOWN RRA AVERAGE 0 5 288 366 create an RRA with values AVERAGE d over 288
162. n 2008 07 04 2008 07 04 1 5 03 02 03 000 03 02 09 000 Q wsv2k1 class examp Schedule iStatusMWinSer Windows Service Schedule is down 2008 06 20 2008 05 24 10 Qroup 100 a7 class ex Status Ping ip 172 31 100 3 is down 2008 06 20 2008 06 20 11 47 10 000 11 48 36 000 Ou bino skils 1stco uk ftp IStatus OSProc Process not running ftp 2008 07 03 2008 07 04 175 22 11 55 000 16 34 24 000 ES K group 100 s2 class ex snmp Status Snmp snmp agent down 2008 07 04 2008 07 04 9 Reports 14 53 08 000 16 34 04 000 e group 100 rt class ex snmp Status Snmp snmp agent down 2008 07 04 2008 07 04 69 Management 03 35 02 000 16 28 27 000 QU Add Device localhost IPerf Snmp threshold of zenpertsnmp cycle time 2008 07 04 2008 07 04 12 p exceeded current value 451 04 02 18 15 000 12 49 00 000 SAM zenoss skills 1st co ul zenperfsnmp Status Heartbe zenoss skills 1st co uk zenperfsnmp 2008 07 03 2008 07 03 11 EI heartbeat failure 04 23 08 000 04 52 02 000 Qu adsl2 skills 1st co uk IP PPPoA 1 Perfiinterface threshold of Utilization 75 perc 2008 07 02 2008 07 04 248 Pointto exceeded current value 351 27 11 29 09 000 16 32 18 000 Figure 105 Zenoss Event Console From the Console events can be selected by checking the box alongside the event and the drop down can be used for various functions including Acknowledge and Move to History The drop down can also be used to generate any tes
163. n criteria 4 1 1 Mandatory Requirements Open Source free software Very active fora maillists Established history of community support and regular fixes and releases Integrated network and systems management including o Configuration management o Availability management o Problem management o Performance management Centralised open database Both Graphical User Interface GUI and Command Line Interface CLI Easy deployment of agents Scalability to several hundred devices Adequate documentation 4 1 2 Desirable Requirements 10 Support for SNMP V3 User management to limit aspects of the tool to certain individuals Graphical representation of network Controllable remote access to discovered devices Easy server installation No requirement for proprietary web browsers Scalability to several thousand devices Good documentation Availability of chargeable support 4 2 Defining network and systems management The Integrated network and systems management requirement needs some further expansion 4 2 1 Network management e Configuration O O O O O Automatic controllable discovery of network Layer 3 IP devices Topology display of discovered devices Support for SNMP V1 V2 and preferably V3 Ability to discover devices that do not support ping Ability to discover devices that do not support SNMP Central open database to store information for these devices Ability to add to t
164. ne Overview zProperties Modifications Subnetworks Add Network Description Delete Networks Discover De 10191 100 024 10191101 024 889725004 172168024 1721691004 17216222004 17216223024 17216224004 17216225024 17230006 172305004 172303100074 172310016 172315004 172313100024 19216800704 19216810724 19216810024 217206980724 1of 23 162818816 v show all Subnets Figure 86 Zenoss Networks class with drop down menu OO O0v un00 0000 000 0000 2 o Number of IPs OON oG oG n GOG oG a o o n oo o Free IPs 254 242 254 Once the presence of a network has been discovered devices can automatically be discovered on that network this uses a spray ping mechanism There is a drop down menu from the top left corner of the Networks page which works fine for simple Class C networks Although the GUI does manage to display subnetworks accurately even if the subnetmask is not on a byte boundary the Discover Devices menu does not honour the subnetmask However a good feature of Zenoss is that there is a command line CLI for virtually everything and the CLI for device discovery on a network does honour supplied netmasks For example zendisc run net 10 0 0 0 24 Note that the Zenoss discovery algorithm is very dependent on getting routing tables using SNMP and the Zenoss server must support SNMP itself For devices that do not supp
165. ng the Import and Export functionalities can be found through this link as well Scheduled Outages provides an interface for adding and editing scheduled outages You can pause notifications polling thresholding and data collection or any combination of the four for any interface node for any time Manage Surveillance Categories allows you to add and delete surveillance categories and edit the list of nodes belonging to each category Manage applications allows you to manage applications groups of services on interfaces Manually Provisioned Nodes allows you to manually add nodes interfaces and services to Selecting the Manage Thresholds option displays all thresholds currently configured in thresholds xml 96 File Edit View History Bookmarks Tools Help E d d x e a O http opennms 8980 opennms admin thresholds index htm E gt Q I amp J N Nagios O List Thresh amp Ci Nagios Addons SourceForge t FAQ Configur t Thresholding Thresholds Configuration open User admin Notices On Log out 06 Aug 2008 00 16 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications Assets Reports Charts Surveillance Map Admin Help Home Admin Threshold Groups Threshold Configuration Name RRD Repository CC snmp opt opennms share rrd snmp default snmp opt opennms share rrd snmp Edit raddle snmp opt opennms sha
166. noss o LES os Hardware Software Events Main Views Dashboard Sortable Selection Event Console Name zCollectorPlugins Path Network Router Cisco Device List Network Map zenoss snmp NewDeviceM ai Classes P p 5 zenoss snmp DeviceMap mis 2 zenoss snmp CiscoMap pDeviaas zenoss snmp InterfaceMap zenoss snmp CiscoHSRP zenoss snmp RouteMap Services Processes Products Browse By Systems Groups Locations Networks Reports Management Add Device Mibs Collectors Settings Event Manager Plugins drag to change order Save Delete Figure 99 Zenoss zCollectorPlugins for device group 100 r1 class example org When modifying characteristics for specific devices do note that the main page menu from the arrow drop down at the top left corner has both a More submenu which includes zProperties among other things and a Manage submenu 118 Zenoss coe A Devices Server Windows wsvr2k1 class example org Zenoss server time 11 54 58 Main Views a Status os Hardware Software Events Dashboard Event Console admin S out Help More Custom 41 Status Down Manage zProperties Component Type Run Commands Templa WinService Classes Events Administration Schedule Collector Plugins IpRouteEntry Las BST DEPTHS FileSystem Last zem t History Event History IpService Ipinterface Locations Device Information N Organizers Reports Location None Tag
167. ode group 100 r1 class example org View Events View Alarms Asset Info Telnet Resource Graphs Rescan Admin Update SNMP General Status Active Notification View Node Link Detailed Info You Outstanding Check You Acknowledged Check Availability Recent Events 30 06 08 16 48 52 30 06 08 16 47 35 30 06 08 14 40 58 42752 30 06 08 12 40 53 Acknowledge Reset More Availability last 24 hours 89 242 89 286 89 286 Not Monitored I 44965 Interface 172 30 100 1 is up Minor Interface 172 30 100 1 is down Minor Minor V 44923 Alarm 281 for node group 100 ri class example org interface 10 191 100 4 service SNMP was escalated BOPIEL 43842 10 191 100 4 Not Monitored Alarm 281 for node group 100 ri class example org interface 10 191 100 4 service SNMP was escalated Not Monitored T 43333 StrafePing Telnet Not Monitored SNMP data collection on interface 10 191 100 4 failed 89 197 89 197 Not Monitored Overall Router Recent Outages 172 30 100 1 Not Monitored Interface Service Lost Regained Outage ID StrafePing Not Monitored 172 30 100 1 ICMP 30 06 08 16 47 35 30 06 08 16 48 52 102 Telnet Not Monitored 172 30 100 1 ICMP 29 06 08 23 36 44 30 06 08 01 45 35 57 10 191 100 4 ICMP 29 06 08 23 36 44 30 06 08 01 45
168. on hp7410 18 hours Categories Availability You 6 outstanding notices Check switch skills 1st co uk 3 days cisco skills 1st co uk 3 days All 6 outstanding notices Check On Call Schedule Network Interfaces 86 691 Web Servers 83 927 Resource Graphs Email Servers 100 000 DNS and DHCP Servers 94 905 Choose a node E Database Servers 99 976 KSC Reports JMX Servers 100 000 No KSC reports defined Other Servers 95 195 Overall Service Availability 89 874 OpenNMS Copyright 2002 2008 The OpenNMS Group Inc OpenNMS is a registered trademark of The OpenNMS Group Inc Figure 30 Main default window for OpenNMS The following sections will describe how to configure different aspects of OpenNMS by editing xml configuration files It is possible to configure many aspects of OpenNMS using GUI driven menus See section 7 5 Managing OpenNMS for a brief description 7 1 Configuration Discovery and topology 7 1 1 Interface discovery OpenNMS uses a straightforward file for interface discovery by default this is opt opennms etc discovery configuration xml It comes with some commented out defaults so by default it discovers nothing This file needs modifying to specify include ranges and exclude ranges to ping specific IP addresses for discovery can also be configured The first stanza specifies the characteristics of the ping discovery m
169. ork Map Select All None TE a gre Aggregate Reports Aggregate Reports Availability Report Availability Report CPU utilization CPU Utilization Filesystem Util Report Filesystem Util Report Interface Utilization Interface Utilization Memory Utilization Memory Utilization Browse By Threshold Summary Threshold Summary tems 1of7 Aggregate Reports v show all Page Size ao ok Figure 127 Zenoss Performance Reports menu 8 5 Zenoss summary Zenoss is an extremely comprehensive systems and network management product satisfying most of my requirements One feels that the object oriented architecture is extremely flexible and powerful with most things you require already configured out of the box The automatic discovery and topology mapping options are the most powerful of the products discussed here It can accommodate Nagios and Cacti plugins and has its own addon architecture in the form of ZenPacks 141 Zenoss will use SNMP to gain status and performance information from a device but it also has ssh and telnet as alternatives for those devices where SNMP is inappropriate The Quick Start Guide gets you running fast and the Admin Guide provides what it says a reasonable comprehensive Administrator s Guide There is also a book by Michael Badger published June 2008 Zenoss Core Network and System Monitoring which is well worth the investment available both in paper and in electronic format Howe
170. ort ping but do support SNMP they can be added manually with the Add Device menu The zProperties of the device or class of 106 devices if you create a subclass should have zPingMonitorIgnore True and zSsnmpMonitorlgnore False There are three Zenoss processes that implement discovery e zenmodeler can use SNMP ssh and telnet to discover detailed information about devices zenmodeler will only be run against devices that have already been discovered by zendisc By default zenmodeler runs every 6 hours e zenwin detects Windows WMI services e zendisc is a subclass of zenmodeler It traverses routing tables using SNMP and then uses ping to detect devices on discovered networks 8 1 2 Zenoss topology maps Zenoss has an automatic topology mapping option which can display upto 4 hops from a selected device It even seems to be able to understand networks served by several routers Zenoss dmd Mozilla Firefox File Edit View History Bookmarks Tools Help i i Main Views Selected Device or Network Device Class Filter Dashboard group 100 r1 class example org CIR ial TF Fit to window Number of Hops Repulsion bl jJ 10 191 100 0 group 100 a1 cl 172 30 100 0 Management C 172 31 100 0 Add Device Mibs group 100 r2 cl J group 100 r3 cl 172 31 100 16 Figure 87 Zenoss Network Map showing 4 hops from group 100 r1 107 8 2 Availability monitoring
171. ou want a detailed event description type the identifier into the Get details for Event ID box and hit Enter You will then go to the appropriate details page Figure 37 OpenNMS Events menu The Advanced Search option provides several ways to filter events By default Outstanding events are displayed ie events that have not been Acknowledged 59 Advanced Event Search User admi Notices O 008 09 04 arch Outages Path Outages Dashboard Events Alarms Not s A Reports Cha jurve Map Admin Help Home Events Advanced Event Search Advanced Event Search Searching Instrt Event Text Contains TCP IP Address Like The Advanced Event Search page can be used to search the event list on multiple fields Fill in S 4 values for each field that you wish to use to narrow down the search meme To select events by time first check the box for the time range that you wish to limit and then fill Node Label Contains Severity out the time in the boxes provided r T If you wish to select events within a specific time span check both boxes and enter the beginning Any zj and end of the range in the boxes provided Service Any zj Events After Events Before Jul jf9 2008 Jul gt io 2008 Sort By Number of Events Per Page Event ID Descending zl 10 events zj 1 Search Figure 38 OpenNMS Advanced Event Search options Note that if you wish to search on severity you ha
172. pen Node List Path Out Home Reports KSC Reports Cust Reports Choose the custom report title to view or modify from the list below There are 4 custom reports to select from Memory stuff on bino CPU stuff on bino Response time on group 100 r2 interfaces Router interface comprisons View Customize Create New Create New From Existing Delete Submit Node SNMP Interface Reports 3 De 3 Select node for desired performance report adsl2 skills 1st co uk bino skills 1st co uk blue atlas skills 1st co uk cisco skills 1st co uk deodar mgt skills 1st co uk deodar skills 1st co uk group 100 a1 class example org group 100 b1 class example org group 100 b2 class example org group 100 cl class example org Submit Domain SNMP Interface Reports No data has been collected by domain Q Performance Re B O Nagios Addons E SourceForge net 05 Au GMT 0 Alarms No Charts Surveillance Map Admin Help Customized Reports allows users to create view and edit customized reports containing any number of prefabricated reports from any available graphable resource Node and Domain Reports allows users to view automatically generated reports for any node or domain These reports can be further edited and saved just like other customized reports These reports list only the SNMP interfaces on the selected node or domain but they can be customized
173. ping raddle machines longer ping return trip time define seruicet use ping service Name of service template to use hostgroup_name raddle service_description PING check_command check_ping 300 0 20 1500 0 60 Define a service to check the disk space of the root partition on the local machine Warning if lt 10 free critical if lt 5 free space on partition define seruicet use local service gt Name of service template to use host name nagios3 seruice description Root Partition check command check local disk 107t57t7 H j Define a service to check DNS resolution for www skills 1st co uk on bino The name to look up is defined in the check dns stanza in commands cfg it The host name parameter here is the DNS server to use in a local nslookup command Cie bino define seruicet use local seruice Mame of seruice template to use host name bino seruice description DNS Check check command check dns H it Define a service to check SNMP on bino define seruicet use generic seruice Name of seruice template to use host name bino seruice description SNMP Check check command check snmp C public o sysUpTime O y Figure 16 Nagios services cfg showing specific services Note that services can be applied either to groups of hosts hostgroup_name or to specific hosts host_name As with hosts it is possible to create groups of services to improve the flexibility of configuration and the d
174. program resi Send host notifications at any time Se ve ve ve Check each Linux host 10 times max DONT REGISTER THIS DEFINITION ITS NOT A REAL Ht host just a template The name of this host template This template inherits other values from the gene By default Linux hosts are checked round the clc Actively check the host every 5 minutes Schedule host check retries at 1 minute intervals Check each Linux host 10 times max Default command to check Linux hosts Linux admins hate to be woken up so ue only noti Note that the notification period variable is bei the value that is inherited from the generic hosi Resend notifications euery 2 hours Only send notifications for specific host states Notifications get sent to the admins by default DONT REGISTER THIS DEFINITION ITS NOT A REAL Ht Host availability parameters are shown in the screenshot above 23 check period 24x7 check interval 5 mins retry interval 1 min max_check_attempts 10 check_command check_host_alive which is based on check_ping define hostt name use parents check command contact groups register define host name use parents check_command contact_groups register define host name use parents check_command contact_groups register host_10 191 hosts on the 10 191 network gener ic host inherits from generic host bino bino is the router from
175. ps type counter 7 gt lt mibOb j 3 6 1 2 1 5 25 instance 0 alias icmp utf ddrhMasks type counter 7 gt lt mibOb j 3 6 1 2 1 5 26 instance 0 alias icmpOutAddrMaskReps type counter 7 gt lt group gt lt group name mib2 host resources storage ifType all gt lt mibObj oid 1 3 6 1 2 1 25 2 3 1 3 instance hrStorage Index alias hrStorageDescr type string 7 gt 1 3 6 1 2 1 25 2 3 1 4 instance hrStorage Index alias hrStoragefllocUnits tupe gauge gt 1 3 6 1 2 1 25 2 3 1 5 instance hrStorage Index alias hrStorageSize type gauge gt nib bj oid 1 3 6 1 2 1 25 2 3 1 6 instance hrStorage Index alias hrStorageUsed tuype gauge lt group gt datacollection config xml line 178 of 1966 9 col 5 Figure 60 OpenNMS group definitions in datacollection config xml Unfortunately OpenNMS does not have a MIB compiler so all MIB OIDs need to be manually specified in this file the good news is that there are lots there out of the box Once groups of MIB variables are declared system stanzas say which group s are to be collected for any device whose system OID matches a particular pattern Each SNMP MIB variable consists of an OID plus an instance Usually that instance is either zero 0 or an index to a table At the moment OpenNMS only understands a small number of table indices for example the ifIndex index to the ifTable and the hrStorageIndex to the hrStorageTable
176. r Status i Administration zProperties Modifications Main Views Dashb Event Dev List Network Map Service Class Name Monitor False Port 53 Description Domain Name Server Send String Expect Regex Classes Service Keys domain tcp_00053 udp 00053 Service Instances Name Monitor sses bino skills 1st co uk tcp 00053 True Products wsvr2k1 class example orq tcp 00053 False deodar skills 1st co uk tcp 00053 False Browse By blue atlas skills 1st co uk tcp 00053 False bino skills 1st co uk udp 00053 False wsvr2k class example orq udp 00053 False Groups il udp 00053 False Locations blue atlas skills 1st co uk udp 00053 False Networks orts Management Add Device Figure 92 Zenoss devices running the domain DNS service on TCP 53 or UDP 58 112 The fact that a service has been detected does not imply that it is being monitored for availability the default out of the box is that nothing is monitored The Monitor column for devices shows whether active monitoring is taking place and hence events potentially being generated The Monitor field in the top part of the window shows the global default for this service To turn on service monitoring globally for a particular service use the Services menu to find the service in question You can then use either the zProperties tab or the Edit tab to change the Monitor global default to True the default as shipped is False To turn on
177. r SNMP datasource ifInOctets on interface 10 191 100 3 parms ds ifInOctets value 70624 0 previousValuez 19591 0 multiplierz 1 05 label Unknown ifLabel2 eth0 000c29fb7555 ifIndex 2 V 217563 Warning 05 08 08 23 54 51 lt gt group 100 linux class example o 10 191 100 3 SNMP uei opennms org threshold relativeChangeExceeded Edit notifications for event Relative change exceeded for SNMP datasource ifOutOctets on interface 10 191 100 3 parms ds ifOutOctets value 15337 0 previousValue 14119 0 multiplierz 1 05 label Unknown ifLabel eth0 000c29fb7555 ifIndex 2 V 217538 Warning 05 08 08 23 49 41 lt gt server class example org 10 191 101 1 SNMP GC uei opennms org threshold relativeChangeExceeded Edit notifications for event Relative change exceeded for SNMP datasource ifInOctets on interface 10 191 101 1 parms ds ifInOctets value 400 5987209669021 previousValue 283 394439700244 multiplier 1 05 label Unknown ifLabelz eth0 000c29aea14f ifIndex 2 Figure 74 OpenNMS Threshold events from various devices in the raddle network For those who prefer not to edit XML configuration files the OpenNMS Admin menu provides a GUI way to create and modify thresholds 95 Admin OpenNMS Web Console Mozilla Firefox File Edit View History Bookmarks Tools Help gt amp G O http opennms 8980 opennms admin in
178. ractise this is often not the case and a solution for one aspect of systems management in one area of a business may become the de facto standard for a whole organisation There are good reasons why this might come about It is not practical to run a centralised Service Desk with a plethora of different tools A Framework based tool with a centralised database and a common look and feel across both Graphical User Interface GUI and Command Line Interface CLI offering modules that deliver the different systems management disciplines is a much more cost effective solution then different piecemeal tools for different projects especially when the cost of building and maintaining skills and educating users is taken into account Tool integration is a large factor in the successful rollout of systems management The concept of a single Configuration Management Database CMDB that all tools feed and use is key to this A good tool delivers useful stuff easily out of the box and provides a standard way to then provide local customisation At its most basic the tool is a compiler or interpreter C bash and the customisation is writing programs from scratch At the complex end of the spectrum the tool may be a large suite of modules from one of the big four commercial suppliers IBM HP CA and BMC At the really complex end is where you have several of the big commercial products involved in addition to home grown
179. re rrd snmp Edit Figure 76 OpenNMS Configuring thresholds through the Admin menu Using the Edit button permits modification of an existing threshold File Edit View History Bookmarks Tools Help e v a x e a B http opennms 8980 opennms admin thresholds index htm groupName C i ene 1S N Nagios O Edit Group G i Nagios Addons E SourceForge t FAQ Configur t Thresholding Threshold Group open User admin Notices On Log out 06 Aug 2008 00 25 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications Assets Reports Charts Surveillance Map Admin Help Home Admin Threshold Groups Edit Group Edit group CC snmp Basic Thresholds Type Datasource Datasource type Datasource label Value Re arm Trigger Triggered UEI Re armed UEI high avgBusy5 node 5 0 4 0 2 Delete low freeMem node 1024 0 1000000 0 3 Edit Delete Create New Threshold Expression based Thresholds Type Expression Datasource type Datasource label Value Re arm Trigger Triggered UEI Re armed UEI Create New Expression based Threshold The upper section is Basic Thresholds thresholds on a single datasource The threshold details are displayed to edit the threshold click on the Edit link on same line as the threshold line To delete the threshold click on Delete on the same line as the threshold you want to delete To crea
180. rfdata file template HOSTPERFDATA INt T IMETSNtSHOSTNAMESNtSHOS TEXECUT IONT IMESNtSHOS TOUTPUTSNtSHOS TPERF DATAS seruice perfdata file template SERUICEPERFDATA INt T IMETONtSHOSTNAMESNtSSERU ICEDESCSNtSSERU ICEEXECUT IONT IMESNtSSERU ICELATENCYSNt ISSERU ICEQUTPUTSNtSSERU ICEPERFDATAS HOST AND SERVICE PERFORMANCE DATA FILE MODES it This option determines whether or not the host and service performance data files are opened in Hturite u or append a mode If you want to use named pipes you should use the special pipe p mode which avoid blocking at startup otherwise you will likely want the defult append a mode host perfdata file mode a seruice perfdata file mode a HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL it These options determine hou often in seconds the host and service iit performance data files are processed using the commands defined below A value of O indicates the files should not be periodically it processed host perfdata file processing interual 0 seruice perfdata file processing interual 0 t HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS it These commands are used to periodically process the host and service performance data files The interval at which the iit processing occurs is determined by the options above host perfdata file processing command process host perfdata f ile seruice perfdata file processing command process seruice perfdata f i le lt
181. rgs parm all lt logmsg gt lt sever ity gt Normal lt sever ity gt lt alarm data reduction key zsourcez zsnnphostz zidz zgenericz zspecificz alarm type 2 lt vevent gt Figure 46 OpenNMS Definition in default events xml for an unknown specific trap This example event will match any TRAP whose generic field is equal to 6 Note as with other configurations in eventconf xml that this definition will only match the incoming TRAP if no previous definition higher in the file or include files had already matched it The mask element name tag must be one or more of the following e uei e Source e host e snmphost e nodeid e interface e Service e id OID e specific e generic It is possible to use the symbol to indicate a wildcard in the mask values SNMP TRAPs often have additional data with them known as varbinds This data can be accessed using the lt parm gt element where Each parameter consists of a name and a value parm all Will return a space separated list of all parameter values in the form parmName1 parmValuei1 parmName2 parmValue2 etc parm values all Will return a space separated list of all parameter values associated with the event parm names al1 Will return a space separated list of all parameter names associated with the event 67 parm lt name gt Will return the value of the parameter named lt name gt if it exists parm Will
182. ri Sat Sun Mon BlIn Passive Avg 817 11 m Min 694 32 m Max 850 36 m Mout Active Avg 830 44 m Min 694 28 m Max Totz Choose graph options Title Timespan 7 day x This selects the relative start and stop times for the report Prefabricated Report netsnmp cpuUsage xf This selects the prefabricated graph report to use Graph Index 1 This selects the desired position in the report for the graph to be inserted Cancel edits to this graph Refresh sample view Choose different resource Done with edits to this graph the default reports to include in your own customised reports You can include several different graphs from the same or different nodes in your KSC report 7 4 3 Thresholding The thresholding capability in OpenNMS has changed fairly significantly over time see http www opennms org index php Thresholding Merge into collectd for a good explanation Pre OpenNMS 1 3 10 collectd collected data and threshd performed thresholding two separate processes This design used a range parameter in threshd configuration xml to get around problems caused by the asynchronous manner nature of collectd and threshd OpenNMS 1 3 10 merged the thresholding functionality into collectd and introduced a new parameter into collectd configuration xml e lt parameter key thresholding group value default snmp gt where the value of the thresholding group matched a definition in threshd conf
183. rk management platform developed under the Open Source model It is a Java application that runs under several flavours of Linux A VMware Virtual Machine VM is also available with the latest release of OpenNMS which makes initial evaluation very easy without having to go through a full build process There is also an online demo system which appears to be monitoring real kit which gives a good first taste of the product The following section is based on the VM download which is OpenNMS 1 5 93 based on Mandriva it worked very easily The VM was setup for DHCP but I modified the Operating System files to use a local fixed address with the VM network bridged to my local environment To access the OpenNMS Web Console point your browser at http opennms 8980 opennms The default logon id is admin with a password of admin Here is a screenshot of the main default window of OpenNMS 46 le OpenNMS Web Console Mozilla Firefox of xJ File Edit View History Bookmarks Tools Help Qe e fit B http opennms 8980 opennms index jsp kl gt Glz g A i4 SFirefox Support HPlug in FAQ 2 RSS Feeds Lj OpenNMS Web Console N Nagios Web Console User admin Notices On Log out 30 Jun 2008 23 43 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications Assets Reports Charts Surveillance Map Admin Help Home Nodes with Outages Percentage change over past 24 hours Notificati
184. ropdown menu and the More submenu to choose All Templates Zenoss Devices Mozilla Firefox File Edit View History Bookmarks Tools Help gt f GC htpuzenoss s080 zpor dmd Devices performanceremplates Main Views Classes Events zProperties Templates Dashboard v All Performance Templates Event Console Device List Network Map Select All None Name Definition Path Description Device Devices Basic template that only collects sysUp Time Classes Device IDevices Network Router Cisco Cisco template that collects cpu and free memory Has CPU threshold at 90 Events Device Devices Server Net SNMP template for late vintage unix device Has CPU threshold Device IDevices Server Windows Windows template that requires Informant MB Device Devices Server Scan Blank device template No collection on port scanned devices Device IDevices Server Cmd ZenPlugin template for late vintage unix device Has CPU threshold Device IDevices Power UP S APC APC Device Profile that tracks battery capacity load and runtime Browse By Device IDevices Ping Blank Template Systems Device HRMB IDevices Server Windows devices wsvr2k1 class example org Windows template that requires Host Resources MB Groups Device HRMB IDevices Server Windows template that requires Host Resources MB Locations Networks Reports FileSystem IDevices Server Filesystem template that uses HOST RESOURCES mib Has a 90 threshold FileSystem
185. roup 100 c2 PING OK HARD 1 PING OK Packet loss 096 RTA 72 91 ms 07 2008 10 35 46 HOST NOTIFICATION nagiosadmin group 100 r3 UP notify host by email PING OK Packet loss 0 RTA 74 60 ms 07 2008 10 35 46 HOST ALERT group 100 r3 UP HARD 1 PING OK Packet loss 096 RTA 74 60 ms 07 2008 10 35 36 SERVICE ALERT group 100 r3 PING OK HARD 1 PING OK Packet loss 096 RTA 141 22 ms 07 2008 10 17 36 HOST NOTIFICATION nagiosadmin group 100 c1 UNREACHABLE notify host by email CRITICAL Host Unreachable group 100 c1 class example org 31 07 2008 10 17 36 HOST ALERT group 100 c1 UNREACHABLE HARD 4 CRITICAL Host Unreachable group 100 c1 class example org 7 2008 10 17 261 HOST NOTIFICATION naaiosadmin aroup 100 s1 UNREACHABLE notifv host bv email CRITICAL Host Unreachable ad Next Previous Highlight all Match case amp Phrase not found Service Problems Unhandled Host Problems Unhandled Network Outages Comments Downtime Process Info Performance Info Scheduling Queue Reporting Trends Availability Alert Histogram Alert History Alert s mary Notifications Event Log Configuratio w w Figure 18 Nagios Event Log By default the event log is displayed in one hourly sections The log shows the event status and also shows whether a Notification has been generated the megaphone symbol This display is effectively simply showing u
186. rts amp portcol B Gl bes W Firefox Support Plug in FAQ RSS Feeds Netdisco Device View group 100 r1 class example org 10 191 100 4 Device Control Name group 100 r1 class example org Location Contact Model Serial cisco 7206 Unknown OS Version ios 12 0 12 Description Cisco Internetwork Operating System Software IOS tm 7200 Software C7200 DS M Version 12 0 12 RELEASE LLL SOFTWARE fc1 Copyright c 1986 2000 by cisco Systems Inc Compiled Tue 11 Jul 00 02 09 by htseng Uptime Last 71 weeks 0 days 2 hours 27 min Thu Jun 26 17 36 00 2008 Discovered Aliases 172 30 100 1 group 100 r1 class example org Serial1 0 First ev Discovered THe Apr 29 15 16 16 2008 Last ArpNip Thu Jun 26 18 30 02 2008 Duplex Connected Port FastEthernet0 0 NAV NA ain site network 100 Mbps Serial1 0 172 30 100 1 group 100 r1 class example org IM E na iG uso AL CE y Key Virtual comms rack 100 Andrew Findlay skills 1st co uk Admin Disabled Link Down Slociing EXETER BE Port View Show All Ports Hide Ports Columns Name VLAN Duplex Description Spanning Tree LastChange Speedi Type Pot MACIP MTU Connected Device Age Stamp Off On Show Archived Data Off On Show Connected Device IP Off On Resolve IPs Off C On Change View Ad e Adblock Figure 5 Netdisco details of router device 19 netdisco Device View Mozi
187. s 79 RTA 21101 40 ms PING OK Packet loss 096 RTA 5 73 ms PING OK Packet loss 096 RTA 8 13 ms Host Check Timed Out Host Check Timed Out PING CRITICAL Packet loss 100 Host Check Timed Out Host Check Timed Out uUuoooonQn EY Match case 4 amp Phrase not found Figure 22 Nagios Alert Summary for group 100 r1 6 3 2 Internally generated events Nagios has the concept of soft errors and hard errors to allow for occasional glitches in host and service monitoring Any host or service monitor can specify or inherit parameters for the check interval under OK conditions the check interval under non OK conditions and the number of check attempts that will be made e Host parameters o check interval o retry interval o max check attempts e Service parameters o normal check interval o retry check interval 37 default 5 mins check interval when host OK default 1 min check interval when host non OK default 4 number of attempts before HARD event default 10 mins default 2 mins o max_check_attempts default 3 number of attempts before HARD event When a non OK status is detected a soft error is generated for each sampling interval until max_check_attempts are exhausted after which a hard event will be generated At this point the polling interval reverts to the check_interval rather than the retry_interval Nagios Mozilla Firefox File Edit View His
188. s Tools Help E M e a Li http opennms 8980 opennms element nodeList htm r gt IG 4 i Firefox Support EPlug in FAQ RSS Feeds O Node List OpenNMS N Nagios z Node List User admin Notices On 01 Jul Path Outa ve Ala Notifi o o n Me Admin Help Home Search Node List N s adsl2 skills 1st co uk group 100 r2 class example org bino skills 1st co uk group 100 r3 class example org cisco skills 1st co uk group 100 si class example org deodar mgt skills 1st co uk group 100 s2 class example org deodar skills 1st co uk hp7410 group 100 a1 class example org nagios3 skills 1st co uk group 100 b1 class example org nagios skills 1st co uk group 100 b2 class example org opennms skills 1st co uk group 100 cl class example org server class example org group 100 c2 class example org switch skills 1st co uk group 100 c3 class example org tile skills 1st co uk group 100 linux class example org wrtS4g skills 1st co uk zenoss skills 1st co uk 26 Nodes Show interfaces ssna OpenNMS Copyright 2002 2008 The OpenNMS Group Inc OpenNMS is a registered trademark of The OpenNMS Group Inc Figure 32 OpenNMS Node List of discovered nodes 52 fe group 100 r1 class example org Node OpenNMS Web Console Mozilla Firefox m File Edit View History Bookmarks Tools Help O group 100 r1 class exa N Nagios I Home 7 Search Node N
189. s provide a stock way to easily visualize the critical SNMP data collected from managed nodes and interfaces throughout your network Start Custom Resource Performance Reports Choose a resource for a custom performance report adsl2 skills 1st co uk Node bino skills 1st co uk Node blue atlas skills 1st co uk 1 Node cisco skills 1st co uk Node deodar mgat skills 1st co uk Node deodar skills 1st co uk Node group 100 a1 class example org Node group 100 bi class example org Node group 100 b2 class example org Node group 100 c1 class example org OpenNMS Copyright 2002 2008 The OpenNMS Group Inc OpenNMS is a registered trademark of The OpenNMS Group Inc Figure 63 OpenNMS Standard performance reports The standard performance reports display various collected values for one particular node which you choose from the menu provided The different categories provide e Node level performance data such as TCP connections CPU memory e Interface data for each interface such as bits in out e Response time data for services such as ICMP DNS SSH e Disk space information from the ucd snmp MIB 86 File Edit View History Bookmarks Tools Help E i v e X ft O http opennms 8980 opennms graph chooseresource htm reports all amp pat gt iGi amp E SourceForge net o FAQ Configuration O Choose Resour G Nagios Ad
190. s provides many plugins for port monitoring including generic TCP and UDP monitors The check snmp plugin could be used to check SNMP parameters from the Host Resources MIB if a target supports this Nagios also provides remote agents NSClient for Windows and NRPE for Unix Linux systems which provide a much more customisable definition of system monitoring Services are typically defined in services cfg As with host definitions services can be defined in a class hierarchy where characteristics of an object are inherited from its parent 29 lt Generic service definition template This is NOT a real service just a template JC generic service defined in templates cfg which also defines local service itdefine servicet u name t active_checks_enabled t passive_checks_enabled t parallelize_check erformance problems obsess_over_service check_freshness notifications enabled euent handler enabled process perf data is volatile check period max check attempts normal check interual retry check interual contact groups notification options notification interval notification period register F Tt todto toXtoitoXtoitoXtoit Gto itoXtoitoXoitoXmoit He it Local service definition template This is NOT a real service itdefine servicet Ht name at use it max_check_attempts normal_check_interval it retry_check_interval jt register i x flap detection enabled failure prediction enabled retain
191. s the list of Template names that are bound to this device or component For device components this is usually just the meta type of the component e g FileSystem CPU HardDisk etc For devices this list is the list of names in the device s zDeviceTemplates zProperty File Edit View History Bookmarks Tools Help as o http zenoss 8080 zport dmd Devices Server Linux devices bino skills 1 s gt IIGI Main Views Dashboard zProperties Configuration e sole Event Conso Property Type Path Device List z Collector Client Timeout int Network Map z Collector Decoding iatin i string I Classes zCollector LogChanges Tue vj boolean I Events zCollector Plugins Edit lines Server Linux z Command Command Timeout 15 0 float zCommandCycle Time fo int zCommandExistance Test test f s string zCommandLogin Timeout float 10 0 Browse By zCommandLogin Tries m us Systems zCommandPassword string Groups A soup zCommandPath Joptzenossilibexec string Locations Networks zCommandPort p2 int Reports zCommandProtocol ssh string Management zCommandSearchPath Add Device Mibs Collectors zCommandUsername Settings Event Manager zDevice Templates H IServer Linux zFileSystemMaplgnore Names zFileSystemMaplgnore Types Figure 117 Zenoss zProperties showing zDeviceTemplate The default out of the box is that the device template called Device is bound to each device discovered As noted in the previous screenshot there are sev
192. short to find anything useful in them My second personal non preference is that OpenNMS is very wordy The important information never seems to hit the eye on most screens 8 Zenoss Zenoss is a third Open Source multi function systems and network management tool Unlike Nagios and OpenNMS there is a free core offering which does seem to have most things you need and Zenoss Enterprise that has extra add on goodies high availability configurations distributed management server configurations and various 98 support contract offerings which includes some education For a comparison of the free and fee alternatives try http www zenoss com product subscriptions Zenoss offers configuration discovery including layer 3 topology maps availability monitoring problem management and performance management It is based around the ITIL concept of a Configuration Management Database CMDB the Zenoss Standard Model Zope Enterprise Objects ZEO is the back end object database that stores the configuration model and Zope is the web application development environment used to display the console The relational MySQL database is used to hold current and historical events Zenoss 2 2 has recently been released which provides stack builds complete bundles including Zenoss and all its prerequisites These stack installers are available for a wide variety of Linux platforms standard RPM and source formats are also a
193. sr local nagios var nagios log 33 Under the Reporting heading on the left hand menu there are further options to display information on events alerts The Alert History is effectively the same as the Event Log The Alert Histogram produces graphs for either a host or service with customisable parameters File Edit View History Bookmarks Tools Help E G N _http nagios3 nagios Tile N Nagios G Li OpenNMS Web Console A State Types N a g ios Host Alert Histogram Last Updated Thu Jul 31 12 42 32 BST 2008 Nagios 3 0 1 www nagios org Logged in as nagiosadmin Home Documentation Tactical Overview Service Detail Host Detail Hostgroup Overviev Hostgroup Summary Hostgroup Grid Servicegroup Overv Servicegroup Summ Servicegroup Grid Status Map 3 D Status Map Service Problems Unhandled Host Problems Unhandled Network Outages Comments Downtime Process Info Step 3 Select Report Options Report Period If Custom Report Period Start Date Inclusive End Date Inclusive Statistics Breakdown Events To Graph State Types To Graph Assume State Retention Initial States Logged Ignore Repeated States Last 7 Days gt July zih 2008 July 1 poos Day of the Month All host events M Hard and soft states Yes gt No No gt Create Report Performance Info Scheduling Queue Reporting Trends Availability Alert Histogram Alert H
194. ssfully When zenperfsnmp actually collects data it requires the correct instance as well as the correct MIB OID If your test is successful but you subsequently see empty graphs with a message of Missing RRD file then the problem is likely to be that the MIB instance is incorrect Data sources can be added or deleted with the dropdown AddDataSource and DeleteDataSource menus Thresholds can be applied to any of the data points collected along with events to generate if the threshold is breached CPU Utilization laLoadintS laLoadint5 Network Map memAvailReal_memAvailReal memAvailSwap_memAvailSwap memBuffer memBuffer S Data Points memCached memCached i ts ssCpuRawldle ssCpuRawldle ssCpuRawSystem ssCpuRaw System ssCpuRawUser ssCpuRawUser ssCpuRawWait ssCpuRaw Wait sysUpTime sysUp Time es Products Mn Value Max Value Event Class IPerf CPU Severity Warning gt Escalate Count Browse By Systems Groups Locations Networks Enabled Tue vj Reports Figure 121 Zenoss Threshold on CPU collected data All of the data points defined in the data sources section are supplied in the top selection box If an event is to be generated dropdowns are provided to select the event class and severity You can also specify an escalation count Thresholds can be added or deleted from the Thresholds dropdown menu 136 Zenoss Device Mozilla Firefox File Edit View History Bookmarks Tools Help eGo
195. story Bookmarks Tools Help Ta e tt Q http zenoss 8080 zport dmd Devices Server Linux deviceOrganizerSt P IG openSUSE ffGetting Started S Latest Headlines admin Preferences Logout Events zProperties Templates Summary Sub Devices v Devices Events usse LEAL Jue Events T bino skills 1st co uk ame Subs Devices T blue atias skils 1st co uk zenoss skills 1st co uk Figure 81 Zenoss Linux Server devices I vue s Figure 80 Zenoss Server Device classes Discovery and monitoring is largely controlled by the combination of zProperties applied to a device of which there are a large number most with sensible defaults Initially basic SNMP and ping polling parameters should be configured in the zProperties page for Devices 102 Zenoss Devices Mozilla Firefox File Edit View History Bookmarks Tools Help admin Pre Classes Events zProperties Templates zProperties Configuration Property z Collector Client Timeout z Collector Decoding zCollector LogChanges z Collector Plugins z Command Command Timeout zz CommandCycle Time z CommandExistance Test zz CommandLogin Timeout iz CommandLogin Tries z CommandPass word z CommandPath z CommandPort z CommandProtocol zz CommandSearchPath zz CommandUsername zDevice Templates zFileSystemMaplgnore Names z File SystemMaplgnore Types Value 180 fatin Tue v Edit so B5
196. t lt mstat ion gt localhost lt mstation gt lt header gt Kpath name Emai1 Report ing lt target gt lt nane gt Report ing lt name gt lt command gt javaEmai1 lt command gt lt target gt lt path gt lt path name Page Management gt lt target gt lt name gt Management lt name gt lt command gt textPage lt command gt lt command gt javaPagerEma i 1 lt command gt lt command gt javaEmai1 lt command gt lt target gt lt path gt path name Page Network Systems Management gt lt target interval 15m gt lt name gt Network Systems lt name gt lt command gt textPage lt command gt lt command gt javaPagerEmai1 lt command gt command javaEma i 1 lt command gt lt target gt lt escalate delay 15m gt lt target gt lt name gt Management lt name gt lt command gt textPage lt command gt lt command gt javaPagerEma i 1 lt command gt command javaEmai1 lt command gt lt target gt lt escalate gt lt path gt Figure 52 OpenNMS Example entries in destinationPaths xml The lt name gt tag specifies a user or group of users defined in OpenNMS The lt command gt tag specifies a method that must be defined in notificationCommands xml Note that escalations are possible When an event is received for which a notification is required OpenNMS walks the destination path We say that the destination path is walked because it is often a series of actions performed
197. t automat ions gt automation name cosmicClear interval 30000 actiue true trigger nane selecthResoluers action name clearProblems 7 gt automation name cleanUp interval 30000 actiue true action name deletePastCleared larms gt automation name fullCleanUp interval 300000 actiue true action name deletefillPastCleared larms automation name GC interval 300000 actiue true action name garbageCollect gt automation name fullGC interval 300000 actiue true action name fullGarbageCollect 7 gt automation name unclear interual 30000 actiue true trigger nane selectCleared larms act ion name resetSeverity gt automation name escalation interval 30000 actiue true trigger name se lectSuspectAlarms act ion name escalateAlarm action euent euentEscalated gt lt automation name purgeStatisticsReports active true interva1 3600000 act ion name de letePurgeableStatisticsReports gt Figure 49 OpenNMS Default definitions for automations in vacuumd xml Note that automations always require an action name but do not necessarily need a trigger name The cosmicClear automation is the means by which an lt alarm data gt alarm type 2 tag in eventconf xml can clear bad news events when good news events arrive Here is the definition of the selectResolvers trigger name lt t Find all alarms that potentially clear problems
198. t event with the Add Event option if you are a CLI person rather than a GUI person the zensendevent command is also available The column headers of the Event Console can be used to change the sorting criteria and the icon at the far right of the event can be used to display the detailed data of fields 8 3 2 Internally generated events Events are automatically generated by Zenoss if an availability metric is missed such as a ping check failing or a service check failing Similarly if performance sampling is setup along with thresholds then events will be generated if the threshold is breached Reasonable defaults for such events are configured out of the box 123 Events are organised in class hierarchies which have zProperties just like Devices To modify the properties of an event select the Events option from the left hand menu ZENOSS 1 core _ Main Views Mappings Events zProperties Dashboard Event Co Device Lis Network Map SubClass Count 14 Instance Count Classes SubClasses Heartbeat roducts E IpService E Nagios Browse By oSProcess Systems Groups Reports Management winservice Add Device F wri J xiniRpe Ee Beat o Ee o Ben o iSl o Be o o Fe o IS o Fe o Gert o Fa o fer o o PSI o o DS o E o Di o f o P zenwinmodeler Event Manager 1 of 13 Heartbeat v show all Page Size lo ok v EventClass Mappings Ze EventClassKey Evaluation Figure
199. t match xmlns gt serviceid lt match gt X auto acknouledge Figure 54 OpenNMS notifd configuration xml with auto acknowledgements for notifications Note that at present July 2008 notifications are driven by events not alarms Also note that acknowledging notices has no effect on their associated events or alarms It would appear that there has been a discussion of a change in architecture around events alarms and notifications at least throughout 2008 In the future it is suggested that alarms will be where most automation is driven from including notifications and that events will become more of a background log 7 4 Performance management 7 4 1 Defining data collections There are several parallels between the capability discovery subsystem and the performance data collection subsystem Each uses the snmp config xml file described in section 7 1 2 to get SNMP parameters for each device such as SNMP version port number community names The capability discovery process capsd uses the protocol definitions in capsd configuration xml to determine what services capabilities to discover these are things like SNMP DNS ICMP SSH The performance data collection process collectd uses 2 files to define what data to collect 76 e datacollection config xml specifies collection names just the snmp collection called default out of the box which defines typically MIB values to collect e collectd configuration xml
200. tae vata 143 9 12 Availability DYordborimpgos nens cxi cro tu Die dedii qa edita dique PE ii 144 9 1 3 Problem manageMent ccccccccsssessseseeeseeeseeesseesseceeeeseeeseeeseeeseeeseeeseeeseees 144 9 1 4 Performance manageMent ccccsescsescsesesssesseesseesseeseeeeccccecsaneneeeeeeeasnees 145 9 2 Product high points and low points eeeseeeeeseeeeeeeeeee eene nnne 146 9 2 1 Nagios goodies and baddies cccccccccccccecccecececececccecceecceeeceeeceeeeeeeeeeenes 146 9 2 2 OpenNMS goodies and baddies ssseeee 146 9 2 3 Zenoss goodies and baddies cc sssssssssssssssssesssssssssssssssesecceceeeaenees 147 9 9 ConcIusloti Sion s Ete ui p ca ca in E 148 10 Referenc S eatis bed sic sereni E mi EE Cuv dude T a Ded au E uud asusta a 149 11 Appendix A Cacti installation details ccccccccccceccceecceecceeeceecceecceeceeeeceeeeeeeneeees 149 1 Defining Systems Management 1 1 Jargon and processes Every organisation and individual has their own perspective on systems management requirements the first essential step when looking for systems management solutions is to define what those requirements are This gives a means to measure success of a project There are many different methodologies and disciplines for systems management from the International Standards Organization ISO FCAPS acronym Fault Configuration Ac
201. tate Types j Most Recent Alerts For Host Report Options Summary Logged in as nagiosadmin Home Documentation Tactical Overview Service Detail Host Detail Hostgroup Overview Hostgroup Summary Hostgroup Grid Servicegroup Overv cegroup Sumni Servicegroup Grid Status Map 3 D Status Map Time 31 07 2008 10 36 06 Host Alert 31 07 2008 10 17 26 Host Alert 31 07 2008 10 16 16 Host Alert 31 07 2008 10 15 06 Host Alert Service Problems 31 07 2008 10 14 06 Host Alert Unhandled Host Problems Unhandled Network Outages H 30 07 2008 14 00 16 Host Alert Comments Downtime 30 07 2008 13 59 36 Host Alert 30 07 2008 13 56 16 Host Alert Process Info 30 07 2008 13 55 56 Hast Alert Performance Info Scheduling Queue Reporting Trends Availability Alert Histogram Alert History Alert Summary Notifications Event Log Configuration 30 07 2008 10 03 01 Host Alert 30 07 2008 10 02 11 Host Alert 30 07 2008 10 00 31 Host Alert 30 07 2008 09 59 01 Host Alert 30 07 2008 09 57 21 Host Alert G Find Next Previous L Highlight all Alert Type Host 31 07 2008 10 37 06 Service Alert aroup 100 r1 PING group 100 r1 N A group 100 r1 N A 31 07 2008 10 17 06 Service Alert group 100 r1 PING group 100 r1 N A group 100 r1 N A group 100 r1 N A 30 07 2008 15 07 06 Service Alert group 100 r1 P 30 07 2008 15 06 16 Service Alert aroup 100 r1 PING 30 07 2008 15 05 06 Service Alert aroup 100 r1 PING aroup 100 r1
202. tatus for all Administratively up interfaces check_ssh check that the ssh port can be contacted on a remote host check_by_ssh use ssh to run command on remote host check_nt check Windows parameters disk cpu services etc Needs NSClient agent installed on Windows targets check_nrpe check remote Linux parameters disk cpu processes etc Needs NRPE agent installed on Unix Linux target Nagios has two separate concepts host monitoring and service monitoring and there is a known relationship between the state of the host and the state of its services Host monitoring is a reachability test and will generally use the check_ping Nagios plugin If you have devices that support SNMP but do not support ping perhaps because there is a firewall in the way that blocks ping then the check_ifstatus plugin works well to test all interfaces on a device and compares the SNMP administrative status with the operational status Host monitoring is defined in the Nagios configuration files with the check command stanza where typically this is defined at a high level of the host definition hierarchy but can be overridden for sub groups or specific hosts For example in hosts cfg define host host_name group 100 a1 use host 172 31 100 lnherits from this parent class parents group 100 r2 This is n w route to device alias group 100 al class example org address group 100 al class example org check command check ifstatus
203. te a new threshold click on the Create New Threshold link The lower section is for Expression based Thresholds where the value being checked is a mathematical expression including one or more data sources Functionality is identical to that for the Basic Thresholds section If you have a custom UEI for triggering or re arming the threshold then it will be a hyperlink Clicking on that link takes you to the notifications wizard for that UEI allowing you to see existing notifications for that UEI and possibly create a new notification for that UEI Figure 77 OpenNMS Modifying thresholds through the Admin GUI 7 5 Managing OpenNMS So far this description of OpenNMS has focused very much on configuration by editing xml files It is well worth mentioning that there is now an Admin menu touched on in the Thresholding section previously which means many of the configuration tasks can be driven by a menu based fill in the blanks GUI Refer back 97 to Figure 75 OpenNMS Admin menu for a list of the areas which can be configured this way 7 6 OpenNMS summary OpenNMS is a mature and very capable systems and network management product It satisfies most requirements for discovery availability monitoring problem management and performance management It has a clean architecture for configuration with everything being defined in XML files It has an excellent mechanism for collecting and configuring SNMP TRAPs For those who pre
204. time Process Info Performance Info Scheduling Queue Reporting Trends Availability Alert Histogram Alert History Alert Summary Notifications Event Log Configuration A State Types gt IG Standard Reports Report Type 25 Most Recent Hard Alerts z Create Summary Report Custom Report Options Report Type Report Period If Custom Report Period Start Date Inclusive End Date Inclusive Limit To Hostgroup Limit To Servicegroup Limit To Host Alert Types State Types Host States Service States Max List Items Most Recent Alerts x Last 7 Days July t 2008 July 1 2008 ALL HOSTGROUPS gt ALL SERVICEGROUPS gt ALL HOSTS gt Host and Service Alerts gt Hard and Soft States All Host States All Service States E 95 Create Summary Report Find Nex Previous Highlight all Match case 4 amp Phrase not found Figure 21 Nagios Alert Summary configuration options Limiting the report to a specific host group 100 r1 produces the following report 36 Nagios Mozilla Firefox File Edit View History Bookmarks Tools Help e gt COA N http nagios3 nagios gt G N Nagios Nagios 6B L1 OpenNMS Web Console Alert Summary Report Last Updated Thu Jul 31 13 08 02 BST 2008 Nagios 3 0 1 www nagios org A S
205. time The event time hdpname The event dpname Yonodeid The event nodeid onodelabel The nodelabel host The host hinterface s The interface hinterfaceresolve Reverse DNS lookup of the interface ifalias community SNMP ifAlias SNMP community string hid hsnmphost s SNMP ID SNMP host osnmp service SNMP OpenNMS service hidtext oseverity SNMP ID Text OpenNMS severity version operinstruct SNMP version Event defined operator instructions specifics omouseovertexts SNMP specific ID Event defined mouse over text ogenerich SNMP generic ID Categories Events and Notifications Configuration community Figure 41 OpenNMS event parameters that can be substituted Here is an example event from the default eventconf xml 64 lt event gt lt ue i gt uei opennns org nodes JodeLostService lt uei gt lt event label gt OpenNMS defined node event nodeLostService lt event label gt lt descr gt amp lt p amp gt A service outage was identified on interface zinterfacez amp lt p amp gt amp 1t p amp gt A new Outage record has been created and service level availability calculations will be impacted until this outage is resolved amp lt p amp gt lt descr gt lt logmsg dest logndisplay gt Zseruicez outage identified on interface zinterfacez with reason code 7parm euentReason lz lt logmsg gt lt sever ity gt Minor lt sever ity gt lt al
206. tistics Reports shows Top 20 ifInOctets across all nodes Following the Resource Graphs link provides access to many standard reports 85 Resource Graphs Reports OpenNMS Web Console Mozilla Firefox 2l xj File Edit View History Bookmarks Tools Help k 4 G http opennms 8980 opennms graph index jsp l gt G amp t FAQ Configuration N Nagios O Resource Graphs L3 Nagios Addons Ei SourceForge net Resource Graphs 0 en User admin Notices On Log out 05 Aug 2008 09 34 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications Reports Charts Surveillance Map Admin Help Home Reports Resource Graphs Standard Resource Performance Reports Choose a resource for a standard performance report Node adsl2 skills 1st co uk Custom Performance Reports can be used to produce a single graph that contains the data of Node bino skills 1st co uk your choice from a single interface or node You can select the timeframe line colors line styles Node blue atlas skills 1st co uk si and title of the graph and you can bookmark the results Node cisco skills 1st co uk Node deodar mat skills 1st co uk Node deodar skills 1st co uk Node group 100 a1 class example org Node group 100 b1 class example org Node group 100 b2 class example org Node group 100 ci class example org Network Performance Data The Standard Performance Report
207. tor service DominoIIOP class name org opennms netmgt poller DominoIIOPMonitor monitor service ICMP class name org opennms netmgt poller IcmpMonitor monitor service Citrix class name org opennms netmgt poller CitrixMonitor gt lt monitor service LDAP class name org opennms netmgt poller LdapMonitor gt lt monitor service HTTP class name org opennms netmgt poller HttpMonitor monitor service HTTP 8080 class name org opennms netmgt poller HttpMonitor monitor service HTTP 8000 class name org opennms netmgt poller HttpMonitor gt lt monitor service HTTPS class name org opennms netmgt poller HttpsMonitor gt lt monitor service SMTP class name org opennms netmgt poller SmtpMonitor gt lt monitor service DHCP class name org opennms netmgt poller DhcpMonitor gt lt monitor service DNS class name org opennms netmgt poller DnsMonitor gt lt monitor service FTP class name org opennms netmgt poller FtpMonitor monitor service SNMP class name org opennms netmgt poller SnmpMonitor monitor service Oracle class name org opennms netmgt poller TcpMonitor gt lt monitor service Postgres class name org opennms netmgt poller TcpMonitor monitor service MySQL class name org opennms netmgt poller TcpMonitor gt lt monitor service Sybase class name org opennms netmgt poller TcpMonitor monitor service Informix class name org opennms netmgt pol
208. tory Bookmarks Tools Help n e 5 e N Nagios Nagios Home Documentation Tactical Overview Service Detail Host Detail Hostgroup Overview Hostgroup Summary Hostgroup Grid Servicegroup Overvie Servicegroup Summa Servicegroup Grid Status Map 3 D Status Map Service Problems Unhandled Host Problems Unhandled Network Outages Comments Downtime Process Info Performance Info Scheduling Queue Reporting Trends Availability Alert Histogram Alert History Alert Summary Notifications Event Log Configuration View Config N http nagios3 nagios 6066666 666666666 LOOCOCOCO 46 106666666646 AE B O OpenNMS Web Console 31 07 2008 10 17 20 31 07 2008 10 17 26 group 100 c3 class e 31 07 2008 10 17 26 07 2008 10 17 26 07 2008 10 17 26 07 2008 10 17 16 07 2008 10 17 06 07 2008 10 16 26 07 2008 10 16 26 07 2008 10 16 16 07 2008 10 16 16 31 07 2008 10 16 16 31 07 2008 10 16 06 GUUGBGOBDGD 31 07 2008 10 16 06 31 07 2008 10 16 06 group 100 c2 class e 07 2008 10 16 06 07 2008 10 15 56 07 2008 10 15 56 07 2008 10 15 46 31 07 2008 10 15 36 31 07 2008 10 15 36 31 07 2008 10 15 36 07 2008 10 15 16 07 2008 10 15 06 07 2008 10 15 06 07 2008 10 15 06 31 07 2008 10 14 56 31 07 2008 10 14 56 31 07 2008 10 14 26 31 07 2008 10 14 26 1 3 6 1 2 1 2 2 1 8 wit 31 07 2008 10 14 16 31 07 2008 10 14 06 31 07 2008 10 14 06 07 2008 10 14 06 07 2008 10 14
209. ts alarm is outstanding 1 2 Next Last Legend EX M Ack ID Bade Last Event Time lt Severity a Count First Event Time Log Msg Service r gus 3 gt m UEI nagios3 skills 1st co uk 10 07 08 07 53 25 lt gt Bad news from enterprise 1 3 6 1 4 1 123 generic 6 specific 1234 with varbinds Cre EAT Eee s E 10 07 08 07 53 12 lt gt args 1 1 3 6 1 4 1 123 1234 bad news 24 w 1 gt m UEI 4 10 07 08 07 08 34 gt OpenNMS user may be blank has failed to login from 10 0 0 121 The failure event is Sev 10 07 08 07 08 34 lt gt BadCredentialsException with the message Bad credentials ev Ez 1395 group 100 s2 class example org 16 UEI 1 10 07 08 07 24 58 lt gt j Sev 172 31 100 21 09 07 08 19 49 28 lt gt SNMP data collection on interface 172 31 100 21 failed SNMP E 1394 wrt54g skills 1st co uk i 09 07 08 19 30 56 lt gt UERCEM e 09 07 08 19 30 56 lt gt Sev DNS DNS outage identified on interface 10 0 0 3 with reason code Unknown 1389 group 100 a1 class example org i 09 07 08 17 07 04 gt 3 VEI 09 07 08 17 07 04 ue E Node group 100 a1 class example org is down Sev 5 1387 group 100 a1 class example org 16 UEI 10 07 08 07 25 00 lt gt
210. ts clients Host Status Totals lup Down Unreachable Pending mm 0 Service Overview For All Host Groups b IGl Service Status Totals ERI Warning TE S 3 0 0 All Problems All T ritical raddle raddle Host Si group 100 c1 up menma E grouc 100 2 OPI RCRA i are 100 s1 routers routers ervices Actions nagios nagios Services Actions asics BIN INSIDE ER servers s Servers bino group 100 r1 up sroup 100 2 OBIS ree a HA group 100 r3 up Bou Q BA group 100 c1 rouj lgroup 100 ri mue a fA group 100 12 w ue e E A 100 3 BI MORN O amp A m LEE ux oe A Figure 12 Nagios Host group summary Whenever changes have taken place to any configuration file the command etc init d nagios reload should be used This does not stop and start the Nagios processes use stop start restart status to control the background processes the reload parameter simply re reads the configuration file s There is also a handy command to verify that your configuration files are legal and consistent before actually performing the reload usr local nagios bin nagios v usr local nagios etc nagios cfg All objects to be managed need defining in the Nagios configuration files there is no form of automatic discovery however the
211. ts of both good and bad news e Alarms important events e Notifications typically email or pager but could be other methods The events subsystem is driven by the eventd process which listens on port 5817 Out of the box eventd receives internal events from OpenNMS such as new suspect events and SNMP TRAPs It is possible to also configure for other event sources such as from syslogs 7 3 1 Event console Events can be viewed from the web GUI by selecting the Events option Events User admin Notices On Log out 09 Jul 2008 09 01 GMT 05 00 Node List Search Outages Path Outages Dashboard Alarms Notifications Assets Reports Charts Surveillance Map Admin Help Home Events Event ID Get details Events can be acknowledged or removed from the view of other users by selecting the event in the Ack check box and clicking the Acknowledge Selected Events at the bottom of the page Acknowledging an event gives users the ability to take personal responsibility for addressing a network or systems related issue Any event that has Advanced Search not been acknowledged is active in all users browsers and is considered outstanding Outstanding and acknowledged events All events If an event has been acknowledged in error you can select the View all acknowledged events link find the event and unacknowledge it making it available again to all users views If you have a specific event identifier for which y
212. tus Map 3 D Status Map Service Problems Unhandled Host Problems Unhandled Network Outages Comments Downtime Process Info Performance Info Scheduling Queue Reporting Trends Availability Alert Histogram Alert History Alert Summary Notifications Event Log Configuration View Config o gt server Up G Nagios Process nagios3 Up eel eJ IGI v mrcurar viarkeu Up jU U Drawing Layers Layer mode clients Include nagios Exclude raddle routers sci popups Update roup 100 c1 Up D group 100 r1 Up 2 group 100 r2 Up spdup 100 c2 Up group L00 K3 Ur group 100 c3 Up group 100 s1 Up Figure 13 Nagios Status map 6 2 Availability monitoring Nagios availability monitoring focuses much more on systems than on networks Nagios provides a large number of official plugins for monitoring in addition there are 27 other community plugins available or you can write your own The official plugins should be installed alongside the base Nagios The executables can be found in usr local nagios libexec use lt plugin name gt help for usage on each plugin The official plugins include check_ping configurable ping test with warning amp critical thresholds check_snmp generic SNMP test to get MIB OIDs amp test return values check_ifstatus check SNMP ifOperStatus against ifAdminS
213. vailable For easy evaluation a VMware appliance can be downloaded ready to go I tried both the VMware build and the 2 2 stack install for SuSE 10 3 both were relatively painless The rest of this section is based on the 2 2 stack installation on a machine whose hostname is zenoss To access the Web console point your browser at http zenoss 8080 The default user is admin with a password of zenoss The default dashboard is completely configurable but this screenshot is close to the default 99 Zenoss Dashboard Mozilla Firefox Eile Edit View History Bookmarks Tools Help lt a i O httpJzenoss 8080 zport dmd Dashboard openSUSE Getting Started amp Latest Headlines ZENOSS core Last updated 2008 07 01 17 50 00 Configure layout Add portlet Stop Refresh Production States Device Issues Device Prod State D t No records found Network Map wsvr2k1 class example org blue atlas skills 1 st co uk Classes Events bino skills 1 st co uk group 100 al class example org s Processes IN zenoss skills 1 st co uk Products s deodar mgt skills 1st co uk Browse By Object Watch Uist wrtS4g skills 1 st co uk Systems Groups Object Events gt group 100 c3 class example org i Devices Discovered GD group 100 c2 class example org etworks Reports group 100 c1 class example org Management b LA group 100 b2 class example org Add Devi
214. ve to specify an exact severity you cannot specify severity greater than 60 le List Events OpenNMS Web Console Mozilla Firefox File Edit View History Bookmarks Tools Help i fo x E v x e tt Lj http opennms 8980 opennms eventllist z gt IIGI O List Events OpenN C3 N Nagios U SourceForge net Part t Search results Open U color css in opennms Event List User admin Notices On Log out 09 Jul 2008 09 21 GMT 05 00 Node List Search Outages Path Outages Dashboard Events Alarms Notifications Assets Reports Charts Surveillance Map Admin Help Home Events List View all events Advanced Search Severity Legend Acknowledge entire search Event Text Time Any Search Results 1 10 of 1689 12 3 4 5 Next Last Search constraints Event s outstanding Legend SIS iM Ack ID Severity Time Node Interface Service Ackd 151463 Normal 09 07 08 09 20 06 lt gt P uei opennms org internal authentication sessionRemoved Edit notifications for event OpenNMS user rtc has been logged out of the WebUI most likely due to a session timeout 151455 Normal 09 07 08 09 19 58 lt gt uei opennms org internal authentication successfulLogin Edit notifications for event OpenNMS user rtc has logged in from 127 0 0 1 151303 f Minor 09 07 08 09 00 35 lt gt hp7410 skills 1st co uk
215. ver one feels that there is so much more in the detail of Zenoss that one needs to know and can find no information on My only real negative comment on Zenoss other than the lack of detailed technical information is that it is a rapidly evolving product and it feels rather buggy The current August 2008 poll on the zenoss users forum for input to Zenoss 2 3 has many requesters with code reliability and better documentation at the top of their lists 9 Comparison of Nagios OpenNMS and Zenoss Necessarily comparisons are based on a mixture of fact and feeling and you need a clear definition of what features are important to your environment before comparisons can be valid for you Nagios is an older more mature product It evolved from the NetSaint project emerging as Nagios in 2002 OpenNMS also dates back to 2002 but feels like the lead developer Tarus Balog has learned some lessons from observing Nagios Zenoss is a more recent offering evolving from an earlier project by developer Erik Dahl and emerging to the community as Zenoss around 2006 All the products expect to use SNMP OpenNMS and Zenoss use SNMP as the default monitoring protocol They all provide other alternatives Zenoss supports ssh and telnet along with customised ZenPacks Nagios has NRPE and NSCA agents both of which of course require installing on remote nodes OpenNMS doesn t have much else to offer out of the box but it can support JMX and H
216. z e N http nagios3 nagios ZA CE N Nagios 6 O OpenNMS Web Console 4 State Types H Current Event Log Log File T Older Entries First N a it 10S Last Updated Thu Jul 31 12 11 40 BST 2008 Eu Navigation Update Nagios 3 0 1 www nagios org Archive Thu Jul 31 00 00 00 Logged in as nagiosadmin e BST 2008 Home to Present Documentation itori ile usr local nagi nagios Monitoring File usr local nagios var nagios log Tactical Overview Service Detail July 31 2008 11 00 Host Detail Hostgroup Overviev Qo 31 07 2008 11 54 36 Auto save of retention data completed successfully Hostgroup Summar Hostgroup Grid Servicegroup Overv July 31 2008 10 00 Servicegroup Summ Servicegroup Grid Status Map 3 D Status Map 07 2008 10 54 36 Auto save of retention data completed successfully 07 2008 10 37 16 SERVICE ALERT group 100 s1 PING OK HARD 1 PING OK Packet loss 0 RTA 69 70 ms 07 2008 10 37 06 SERVICE ALERT group 100 r1 PING OK HARD 1 PING OK Packet loss 0 RTA 12 17 ms 07 2008 10 36 26 SERVICE ALERT group 100 c3 PING OK HARD 1 PING OK Packet loss 0 RTA 109 75 ms 07 2008 10 36 16 HOST ALERT group 100 a1 UP HARD 1 OK host group 100 a1 class example org interfaces up 2 down 0 dormant 0 excluded 0 used 0 07 2008 10 36 06 HOST NOTIFICATION nagiosadmin group 100 r1 UP notify host by email PING OK Packet loss 0 RTA 17 06 ms 07 2008 10 36 06 HOST ALERT group 100 r

Open Source Management Options

Contents

Download Pdf Manuals

Related Search

Related Contents