Home

System Event Log Troubleshooting Guide for Intel® S5500

image

Contents

1. 3 2 BIOS POST owned Sensors GID 0001h The following table can be used to find the details of sensors owned by BIOS POST Table 6 BIOS POST owned Sensors See Sensor Name Details Section Next Steps Oth Mirroring Redundancy State Mirrored Redundancy State Sensor Table 55 Mirrored Redundancy State Sensor Event Trigger Offset Next Steps 06h POST Error System Firnware Progress Formerly System Firmware Progress Formerly Post Error Next Steps 11h Sparing Redundancy State Sparing Redundancy State Sensor Table 59 Sparing Redundancy State Sensor Event Trigger Offset Next Steps 12h Mirroring Configuration Status Mirroring Configuration Status Table 53 Mirroring Configuration Status Sensor Event Trigger Offset Next Steps 13h Sparing Configuration Status Sparing Configuration Status Table 57 Sparing Configuration Status Sensor Event Trigger Offset Next Steps 83h System Event System Events Not applicable Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards 3 3 BIOS SMI owned Sensors GID 0033h Sensor Cross Reference List The following table can be used to find the details of sensors owned by BIOS SMI Table 7 BIOS SMI owned Sensors ee Sensor Name Details Section Next Steps Number 02h Memory
2. Byte Field Description 8 9 Generator ID 0033h BIOS SMI Handler 11 Sensor Type Och Memory 12 Sensor Number 02h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 61 7 2 Reserved Set to 0 1 0 The logical rank associated with the failed DDR3 DIMM 15 Event Data 2 58 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Memory subsystem Byte Field Description 16 Event Data 3 7 5 Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached 000b Processor Socket 1 001b Processor Socket 2 All other values are reserved 4 3 Indicates the processor Memory Channel to which the failing DDR3 DIMM is attached 00b Channel A 01b Channel B 10b Channel C 11b is reserved 2 0 Indicates the DIMM Socket on the channel to which the failing DDR3 DIMM is attached 000b DIMM Socket 1 001b DIMM Socket 2 All other values are reserved Table 61 Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset Next Steps Event Trigger Offset Hex Description Description Next Steps
3. Uncorrectable ECC version of SEL verify contacts are clean 1 If needed decode DIMM location from hex An uncorrectable multi bit ECC error has occurred This is a fatal issue that will typically 2 Verify DIMM is seated properly lead to an OS crash unless memory has been configured in a RAS mode The system 3 Examine gold fingers on edge of DIMM to will generate a CATERR catastrophic error and an MCE Machine Check Exception Oth Error ENOR 4 Inspect processor socket this DIMM is While the error may be due to a failing DRAM chip on the DIMM it could also be cause by connected to for bent pins and if found incorrect seating or improper contact between socket and DIMM or by bent pins in the replace the board processor socket 5 Consider replacing the DIMM as a preventative measure For multiple occurrences replace the DIMM Revision 1 0 Intel order number G7421 1 001 59 Memory subsystem System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards Event Trigger Offset Hex Description Description Next Steps Even though this event doesn t immediately lead to problems it can indicate one of the DIMM modules is slowly failing If this error occurs more than once 1 If needed decode DIMM location from hex version of SEL There have been too many 10 or more correctable ECC errors for this particular DIMM 2 Verify DIMM is seated prop
4. Sensor Name Number 64h P1 Therm Ctl These events normally only happens due to failures of the thermal solution 1 Verify heat sink is properly attached and has thermal grease 2 If system has a heat sink fan ensure the fan is spinning 65h P2 Therm Cti 3 Check all system fans are operating properly 4 Check that the air used to cool the system is within limits typically 35 C 5 2 4 Discrete Thermal Sensors Discrete thermal sensors do not report a temperature at all instead they report an overheating event of some kind Examples as VRD Hot voltage regulator is overheating or processor Thermal Trip the processor got so hot that its over temperature protection was triggered and the system was shut down to prevent damage 40 Table 42 Discrete Thermal Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 43 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type See Table 43 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 43 15 Event Data 2 Not used 16 Event Data 3 Not used Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Cooling subsystem Table 43 Discrete Thermal
5. S5500 53420 series Server Boards Memory subsystem Byte Field Description Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 02h 15 Event Data 2 7 5 Reserved Set to 0 4 Channel Information Validity Check Ob Channel Number in Event Data 3 Bits 4 3 is not valid 1b Channel Number in Event Data 3 Bits 4 3 is valid 3 DIMM Information Validity Check Ob DIMM Slot ID in Event Data 3 Bits 2 0 is not valid 1b DIMM Slot ID in Event Data 3 Bits 2 0 is valid 2 0 Error Type 000b Parity Error Type not known 001b Data Parity Error not used 010b Address Parity Error All other values reserved Revision 1 0 Intel order number G7421 1 001 61 Memory subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description Event Data 3 7 5 Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached 000b Processor Socket 1 001b Processor Socket 2 All other values are reserved 4 3 Channel Number if valid on which the Parity Error occurred This value will be indeterminate and should be ignored if ED2 Bit 4 is Ob 00b Cha
6. This is an Informational event only Correctable errors are acceptable and normal at a low rate of occurrence If error continues QPI Correctable Error Sensor Next Steps 1 Check the processor is installed correctly 2 Inspect the socket for bent pins 3 Cross test the processor if possible Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards 8 4 2 QPI Non Fatal Error Sensor The system detected a QPI non fatal error that is recoverable This is an informational event Table 49 QPI Non Fatal Error Sensor Typical Characteristics Description 0033h BIOS SMI Handler 13h Critical Interrupt 07h 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 73h OEM Discrete 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 0 3 CPU1 4 Not used Byte Field 8 9 Generator ID 11 Sensor Type 12 Sensor Number 13 Event Direction and Event Type 14 Event Data 1 15 Event Data 2 16 Event Data 3 8 4 2 1 QPI Non Fatal Error Sensor Next Steps Processor subsystem This is an Informational event only Non Fatal errors are acceptable and normal at a low rate of occurrence If error continues 1 Check the processor is installed correctly 2 Inspect the socket for bent pins 3 Cross
7. 11 Miscellaneous E CN 82 11 1 LG EE ee EEE EE EE dake 82 11 2 SM TMG OUI EE 83 11 21 SMI Timeout Next Steps msme ejnekigrjdeodnssnge tteeteeeneesenetitenteeeee 84 11 3 System Event Log Cleared rnnnnnnnnnnnnnnnrnnnnnnnnnnrnannnnnnnnnnnnnnrnnnnnrnnnnnnsennnnnnnnnnnnnne 84 11 4 System Event e T 85 11 4 1 System Event PEF Action Next Steps rrrrrrnnnnnnnnonnnnnnrrrnnnnnnnnnrrnnnnrrrnnnnennrenr 85 12 Hot Swap Controller eventS rrrrnrrnnnnnvvvnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnne 86 Revision 1 0 Intel order number G7421 1 001 V Table of Contents System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards 12 1 HSC Backplane Temperature Sensor rrrrrrrnnnrrrrrnnnnnnnrnnnnnrrrrnnnernnrrrrnnnrrrnnnnnsnennn 86 12 2 HSC Drive Slot Status EE 87 12 2 1 HSC Drive Slot Status Sensor Next Giepe 88 12 3 HSC Drive Presence Sensor ENNEN 88 12 3 1 HSC Drive Presence Sensor Next Giepne 89 13 Manageability Engine ME events svvvnnnnnnnnnnnnnnnnvvnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnvnnnnnnnnnnnnnnnneen 90 13 1 Node Manager Exception Event EEN 90 13 1 1 Node Manager Exception Event Next Giepns ccceeeceeeeceeeeeeeeeeeeeeeetnneeeeeeees 91 13 2 Node Manager Health Event uiuatasandrsdenerukt gesandt ete 91 13 2 1 Node Manager Health Event Next Steps cc ccceeeeeeeeeeeeeenneeeeeeeeeeeeeseeeeeeeeeees 92 13 3 Node Manager Operational Ca
8. 11 Sensor Type 07h Processor 12 Sensor Number 68h 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 01h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 8 2 1 Catastrophic Error Sensor Next Steps This error is typically caused by other platform components 1 Check for other errors near the time of the CATERR event 2 Verify all peripherals are plugged in and operating correctly particularly Hard Drives Optical Drives and I O 3 Update system firmware and drivers 44 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards 8 3 CPU Missing Sensor Processor subsystem The CPU Missing sensor is a discrete sensor reporting the processor is not installed The most common instance of this event is due to a processor populated in the incorrect socket Table 47 CPU Missing Sensor Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number 69h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5
9. 9 Generator ID 0001h BIOS POST 11 Sensor Type Och Memory 12 Sensor Number 13h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 09h digital Discrete 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 57 15 Event Data 2 Not used 54 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Memory subsystem Byte Field Description 16 Event Data 3 Not used Table 57 Sparing Configuration Status Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description The system has configured into Sparing mode is enabled in Oth Spare Channel RAS mode setup Informational event only at 1 If this event is accompanied by a post error 8500 there was a problem applying f Sparing mode is disabled the sparing configuration to the memory Check for other errors related to the 00h The system has configured out of either from setup or due to memory and troubleshoot accordingly Spare Channel RAS mode error in which case post error i 8 f e el 8500 also occurs 2 If there is no post error then sparing mode was simply disabled in bios setup and this should be considered informational only Rev
10. Event Type 01h Threshold 7 6 01b Trigger reading in Event Data 2 14 Event Data 1 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 37 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event Table 37 Thermal Margin Sensors Event Triggers Description Event Trigger Assertion Deassert neseisier Hex Description Severity Severity P 07h ee critical Degraded OK The thermal margin has gone over its upper non critical threshold 09h Saas non fatal Degraded The thermal margin has gone over its upper critical threshold Table 38 Thermal Margin Sensors Next Steps Sensor Name Number 22h IOH Therm Margin 1 Check for clear and unobstructed airflow into and out of chassis 23h Mem P1 Therm Margin 2 Ensure SDR is programmed and correct chassis has been selected 3 Ensure there are no fan failures 24h Mem P2 Therm Margin 4 Ensure the air used to cool the system is within the thermal specifications for the system typically below 35 C 62h P1 Therm Margin Not a logged SEL event Sensor is used for thermal management of the processor 63h P2 Therm Margin Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 5 2 3 Processor Thermal Control Sensors Cooling subsys
11. OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when event was logged LS byte first 5 TS Example TS 29 76 68 4C 4C687629h 1281914409 Sun 15 Aug 2010 6 23 20 09 UTC 7 Note There are various websites that will convert the raw number to a date time 8 Manufacturer ID LS Byte first The manufacturer ID is a 20 bit value that is derived from the IANA 9 Private Enterprise ID 10 Most significant four bits reserved 0000b 000000h unspecified OFFFFFh reserved This value is binary encoded For example the ID for the IPMI forum is 7154 decimal which is 1BF2h which would be stored in this record as F2h 1Bh 00h for bytes 8 through 10 respectively 11 OEM Defined OEM Defined This is defined according to the manufacturer identified by the 12 Manufacturer ID field 13 14 15 16 Table 4 OEM SEL Record Type E0h FFh Byte Field Description 1 Record ID ID used for SEL Record access 2 RID 3 Record Type 7 0 Record Type RT EOh FFh OEM system event record 4 OEM OEM Defined This is defined by the system integrator 5 6 7 8 9 10 11 12 13 14 15 16 Revision 1 0 Intel order number G7421 1 001 7 Sensor Cross Reference List 3 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards This section contains a cross reference to help find details on any specific SEL entry 3 1 BMC owned Sensors GID 0020h The following table
12. Power Unit Status Sensor Sensor Specific Offsets Next Steps Power Supply 1 52h AC Power Input Power Supply AC Power Input Table 22 Power Supply AC Power Input Sensor Event Trigger Offset Next PS1 Power In Sensors Steps Power Supply 2 53h AC Power Input Power Supply AC Power Input Table 22 Power Supply AC Power Input Sensor Event Trigger Offset Next PS2 Power In Sensors Steps 10 Intel order number G74211 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Sensor Cross Reference List red Sensor Name Details Section Next Steps Power Supply 1 12V of 54h Maximum Current Output Power Supply Current Output Table 24 Power Supply Current Output Sensor Event Trigger Offset Next PS1 Curr Out Sensors Steps Power Supply 2 12V of Maximum Curren Power Supply Current Output Table 24 Power Supply Current Output Sensor Event Trigger Offset Next 55h aximum Current Output PS2 Curr Out Sensors Steps Power Supply 1 Temperature 56h PS1 Temperature Power Supply Temperature Sensors Table 26 Power Supply Temperature Sensor Event Trigger Offset Next Steps Power Supply 2 Temperature 57h PS2 Temperature Power Supply Temperature Sensors Table 26 Power Supply Temperature Sensor Event Trigger Offset Next Steps Processor 1 Status 60h P1 Status Table 45 Proces
13. Receiver Buffer Overflow Error Indicates a synchronization problem between PCI Express devices a Update all bios firmware and drivers Extremely rare b Replace the board 09h ACS Violation Error Access Control Services a transaction routing feature failed Indicates a transaction was sent with data exceeding the maximum OAh Malformed TLP Error allowed number of bytes This is not allowed and is a fatal error usually a firmware or driver problem Received ERR_FATAL message from e d OBh downstream Error Indicates a fatal error occurred and is being reported F Indicates the device received a completion notification for a OCh Unexpected Completion Error transaction is does not recognize 66 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards PCI Express and Legacy PCI subsystem Event Trigger Offset Description Next Steps Hex Description ODh Received ERR NONFATAL Message Error Indicates a non fatal error is redefined as fatal and is being reported 10 1 3 Legacy PCI Errors Legacy PCI errors include PERR and SERR both are fatal errors Table 67 Legacy PCI Error Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0033h BIOS SMI Handler 11 Sensor Type 13h Gritical Interrupt 12 Sensor Number 03h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deass
14. Table 54 Table 55 Table 56 Table 57 Table 58 Table 59 Table 60 Table 61 Table 62 Table 63 Table 64 Table 65 Table 66 Table 67 Table 68 Table 69 Table 70 Table 71 Table 72 Table 73 Table 74 Table 75 Table 76 Table 77 Table 78 Table 79 Table 80 viii Processor Thermal Control Sensors Event Triggers Description 0 ee 39 Processor Thermal Control Sensors Next Steps rrrrrrrnnrrrrrnnnvnnnrrrnnnrrrrnnnnnnnennr 40 Discrete Thermal Sensors Typical Charactertetce 40 Discrete Thermal Sensors Next Steps rrrrrrrrnrrrrrnnnnnnnrrnnnnnvrrrnnnnnnnrrrnnnrrrrnnnenenennn 41 Process Status Sensors Typical Charactertsiics 42 Processor Status Sensors Next Steps kee 43 Catastrophic Error Sensor Typical Characheristtcs eeen 44 CPU Missing Sensor Typical Characteristics rrrrnnnrrrrnnnrnnnnnnnnnvrrrnnnnnnnnrrnnnnrrnnnnn 45 QPI Correctable Error Sensor Typical Characteristics rrrrrrrnnnrrrrnnnnnnnrnrnnnnrrnnnnn 46 QPI Non Fatal Error Sensor Typical Characteristics rrrrrrnrnrnnnnrrrrnnnnnnnrnrnnnnrrnnnnne 47 QPI Fatal Error Sensor Typical Characteristics rrnnvrrrrnnnrnnnrnnnnnvrrrnnnnnnnnnrnnnnrrrnnnn 48 QPI Fatal 2 Error Sensor Typical Characteristics rrrrnnrnnnrnnnnnrrrrnnnnnnnrrrnnnnrrnnnnn 48 Mirroring Configuration Status Sensor Typical Characteristics rrrrnnrnnnnnnnnnnrrnnnnr 50 Mirroring Configuration Status Sensor E
15. 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 01h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 8 3 1 CPU Missing Sensor Next Steps Verify the processor is installed in the correct slot 8 4 QuickPath Interconnect Error Sensors The Intel QuickPath Interconnect QPI bus on Intel S5500 S3420 series server boards is the interconnection between processors and to the chipset The QPI Error sensors are all reported by the BIOS SMI Handler to the BMC so the Generator ID will be 33h Revision 1 0 Intel order number G7421 1 001 45 Processor subsystem 8 4 1 QPI Correctable Error Sensor System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards The system detected an error and corrected it This is an informational event 8 4 1 1 46 Table 48 QPI Correctable Error Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0033h BIOS SMI Handler 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 06h Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 72h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used
16. Guide for Intel S5500 S3420 series Server Boards Byte Field Description 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 64 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number Table 64 PCI Express Correctable Error Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h Receiver error Correctable error occurred Oih Bad DLLP error Correctable bad DLLP occurred Informational event only Correctable errors are acceptable and normal at a low rate of occurrence If error continues 02h Bad TLLP error Correctable bad TLP occurred 1 Decode bus device and function to identify the card 2 If this i i 03h Ee Correctable Replay event occurred USE SE pe ollover Error a Verify card is inserted properly REPLAY Timer b Install the card in another slot and check if the error follows the card or 04h Timeout Error Correctable Replay timeout event occurred stays with the slot Advisory non fatal c Update all firmware and drivers including non Intel components oeh Error received Correctable advisory event occurred typically 3 If this is an onboard device provided as notice to software driver ERR_COR message a Update all bios firmware and drivers i
17. Next Greng 30 Table 27 Fan Speed Sensors Typical Characteristics cc ccccceeeeeeeeeeeeneeeeeeeeeeeeeeeenneeeeeeees 31 Table 28 Fan Speed Sensor Event Trigger Offset Next Giepns 32 Table 29 Fan Presence Sensors Typical Characteristics rrrnnnnnnnnnnnnnrrrnnnnnnnnnnrnnnrrrrnnnnnnnennr 32 Table 30 Fan Presence Sensors Event Trigger Offset Next Steps rrnnnrnnnrrrnnnrrrrnnnnnnnnnnr 33 Table 31 Fan Redundancy Sensors Typical Charactertseics resserrer 34 Table 32 Fan Redundancy Sensor Event Trigger Offset Next Greng 35 Table 33 Temperature Sensors Typical Characteristics ccccccccecceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeees 36 Table 34 Temperature Sensors Event Triggers Description 36 Table 35 Temperature Sensors Next Gieps AAR 37 Table 36 Thermal Margin Sensors Typical Charactertstcs 37 Table 37 Thermal Margin Sensors Event Triggers Description 38 Table 38 Thermal Margin Sensors Next Steps wrrrrrrrrnvrrrrrnnnrnnnnnnnnnrrrrnnnnnnnnrrrnnnrrrrnnnnensennn 38 Table 39 Processor Thermal Control Sensors Typical Characteristics xrrrrrnrrrrrrnnnnnnenrn 39 Revision 1 0 Intel order number G7421 1 001 vii List of Tables System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 40 Table 41 Table 42 Table 43 Table 44 Table 45 Table 46 Table 47 Table 48 Table 49 Table 50 Table 51 Table 52 Table 53
18. Power EE 25 4 3 1 Power Supply Status Sensors un uuiauvgrnsmnnirue md hv E ENEE EENEG 26 4 3 2 Power Supply AC Power Input Sensors arrrnnrrrrrnnnnnnnrvrnnnvrrrnnnnnnnrernnnrrrrnnnnnenennn 27 4 3 3 Power Supply Current Output Gensors ENNEN 28 4 3 4 Power Supply Temperature Sensors Ak 29 5 Co ling subsystem eege 31 5 1 FE ee ERE ERE EE EE ae cit aa A 31 5 1 1 Fan Speed Sensors EE 31 5 1 2 Fan Presence and Redundancy Sensors rrrvnnnnnnnrnnnnvrrrrnnnnnnnrrrrnnrrrrnnneenrrrrrnnnnnn 32 5 2 Temperature EIER ein 35 5 2 1 Regular Temperature EE 36 522 Thermal Margin Sensors Sue ENEE geek 37 5 2 3 Processor Thermal Control Gensors cette ee eeeeeenaeeeeeeeeeeeeneenaeeeeeeeees 39 5 2 4 Discrete Ferreira eege ere bedeelege deere A0 6 Processor Subsystem EE 42 6 1 Processor EISE ER 42 6 2 Catastrophic Error Sensor vamser eene Eege 44 6 2 1 Catastrophic Error Sensor Next Giepe ANNE 44 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards Table of Contents 6 3 CPU Missing RE acces aidan eens 45 6 3 1 CPU Missing Sensor Next Steps AAA 45 6 4 QuickPath Interconnect Error Sensors sins tench eae eege anal aed ed 45 6 4 1 QPI Correctable Error Sensor ke 46 6 4 2 OPI Non Fatal Error Sensor icceseescccncy geigend Eegen 47 643 OPI FHM Ard TEEN 48 Te Memory subsystem ci geussgeegeuserb ees ege era 50 7 1 Memory RAS Mirroring and Sparing errvn
19. S5500 S3420 series Server Boards ed Sensor Name Details Section Next Steps BB 1 35V P2 LV DDR3 1Eh BB 1 35v P2 MEM Voltage Sensors Table 14 Voltage Sensors Next Steps Baseboard Temperature 20h Baseboard Temp Regular Temperature sensors Table 35 Temperature Sensors Next Steps Front Panel Temperature 21h Front Panel Temp Regular Temperature sensors Table 35 Temperature Sensors Next Steps IOH Thermal Margin 22h IOH Therm Margin Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps Processor 1 Memory Thermal 23h Margin Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps Mem P1 Thrm Mrgn Processor 2 Memory Thermal 24h Margin Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps Mem P2 Thrm Mrgn Fan Tachometer Sensors 30h 39h Chassis specific Fan Speed Sensors Table 28 Fan Speed Sensor Event Trigger Offset Next Steps sensor names 40h 45h Heg Fan Presence and Redundancy Table 30 Fan Presence Sensors Event Trigger Offset Next Steps 46h EE Fan Presence and Redundancy Table 32 Fan Redundancy Sensor Event Trigger Offset Next Steps Power Supply 1 Status 50h Power Supply Status Sensors Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps PS1 Status Power Supply 2 Status NW 51h PS2 Status Power Supply Status Sensors Table 16
20. Sensor Name Sensor Next Steps number 1 Check for clear and unobstructed airflow into and out of chassis 2 Ensure SDR is programmed and correct chassis has been selected Baseboard Temp 20h ier e 3 Ensure there are no fan failures 4 Ensure the air used to cool the system is within the thermal specifications for the system typically below 35 C If the front panel temperature reads zero check 1 It is connected properly Front Panel Temp 21h 2 The FRUSDR has been programmed correctly for your chassis If the front panel temperature is too high Check the cooling of your server room 5 2 2 Thermal Margin Sensors Margin sensors are also linear sensors but typically report a negative value This is not an actual temperature but in fact an offset to a critical temperature Example sensors are Processor Thermal Margin Memory Thermal Margin and IOH Thermal margin Values reported should be seen as number of degrees below a critical temperature for the particular component Revision 1 0 Table 36 Thermal Margin Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 38 Intel order number G7421 1 001 37 Cooling subsystem 38 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0
21. Temperature Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 35 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 01h Threshold 7 6 01b Trigger reading in Event Data 2 14 Event Data 1 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 34 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event Table 34 Temperature Sensors Event Triggers Description Event Trigger Assertion Deassert Description Hex Description Severity Severity P 00h SE EES Degraded OK The temperature has dropped below its lower non critical threshold 02h GE non fatal Degraded The temperature has dropped below its lower critical threshold 07h Ipper non ertica Degraded OK The temperature has gone over its upper non critical threshold going high Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Cooling subsystem Event Trigger Assertion Deassert Description Hex Description Severity Severity H 09h Stan non fatal Degraded The temperature has gone over its upper critical threshold Table 35 Temperature Sensors Next Steps
22. and Legacy PCI subsystem 10 PCI Express and Legacy PCI subsystem The PCI Express PCIe Specification defines standard error types under the Advanced Error Reporting AER capabilities The BIOS logs AER events into the SEL The Legacy PCI Specification error types are PERR and SERR These errors are supported and logged into the SEL 10 1 PCI Express Errors PCle error events are either correctable informational event or fatal In both cases information is logged to help identify the source of the PCIe error and the bus device and function is included in the extended data fields The PCle devices are mapped in the operating system by bus device and function Each device is uniquely identified by the bus device and function PCle device information can be found in the operating system 10 1 1 PCI Express Correctable errors When a PCI Express correctable error is reported to the BIOS SMI handler it will record the error using the following format Table 63 PCI Express Correctable Error Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0033h BIOS SMI Handler 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 05h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 71h OEM Specific Revision 1 0 Intel order number G7421 1 001 63 PCI Express and Legacy PCI subsystem System Event Log Troubleshooting
23. are used by the external management software or BMC to configure and control the Intel Intelligent Power Node Manager feature Since Platform Services firmware does not have any external interface external commands are first received by the BMC over LAN and then relayed to the Platform Services firmware over IPMB channel The BMC acts as a relay and the transport conversion device for these commands For simplicity the commands from the management console might be encapsulated in a generic CONFIG packet format config data length config data blob to the BMC so that the BMC doesn t even have to even parse the actual configuration data BMC provides the access point for remote commands from external management SW and generates alerts to them Intel Intelligent Power Node Manager on Intel Manageability Engine Intel ME is an IPMI satellite controller A mechanism needs to exist to forward commands to Intel ME and send response back to originator Similarly events from Intel ME have to be sent as alerts outside of BMC It is the responsibility of BMC to implement these mechanisms for communication with Intel Intelligent Power Node Manager The full specification can be downloaded from the following link http www intel com content dam doc technical specification intelligent power node manager 1 5 specification pdf Revision 1 0 Intel order number G7421 1 001 3 Basic decoding of a SEL Record System Event Log Troubleshooting Guide
24. can be used to find the details of sensors owned by the BMC Table 5 BMC owned Sensors ue Sensor Name Details Section Next Steps Power Unit Status Oth Power Unit Status Sensor Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps Pwr Unit Status Power Unit Redundanc 02h y Power Unit Redundancy Sensor Table 18 Power Unit Redundancy Sensor Event Trigger Offset Next Steps Pwr Unit Redund IPMI Watchdo 03h g IPMI Watchdog Table 77 IPMI Watchdog Sensor Event Trigger Offset Next Steps IPMI Watchdog Physical Security i 04h Physical Security Table 73 Physical Security Sensor Event Trigger Offset Next Steps Physical Scrty FP 05h e FP NMI Interrupt FP NMI Interrupt Next Steps FP NMI Diag Int SMI Timeout 06h d SMI Timeout SMI Timeout Next Steps SMI Timeout System Event Lo 07h X 9 System Event Log Cleared Not applicable System Event Log System Event 08h System Event PEF action System Event PEF Action Next Steps System Event Button Press Event h Button P Event Not licable 09 Button Press utton Press Events ot applica Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Sensor Cross Reference List males Sensor
25. can occur during Power On Self Test POST or when coming out of a sleep state Not all of these events signify errors Some events are described in other chapters in this document for example memory events 11 1 System Events These events can occur during POST or when coming out of a sleep state These are informational events only 1 When logging events during POST BIOS uses generator ID 0001h 2 When coming out of a sleep state BIOS uses generator ID 0033h 11 1 1 System Boot The BIOS logs a system boot event every time the system boots The event gets logged early during POST when BIOS BMC communication is first established This event is not an error 11 1 2 Timestamp Clock Synchronization These events are use when the time between the BIOS and the BMC is synchronized Two events are logged BIOS does the first one to send the time synch message to the BMC for synchronization and the timestamp that message gets is unknown that is the timestamp in the log could be anything since it gets the before timestamp So BIOS sends a second time synch message to get a baseline correct timestamp in the log That is the starting time For example say that the time the BMC has is March 1 2011 21 00 The BIOS time synch updates that to same date 21 20 BMC was running behind Without that 2nd time synch message you don t know that the log time jumped ahead and when you get the next log message it looks like there was a 20 min d
26. decoding of a SEL Record Byte Field Description 8 Generator ID RqSA and LUN if event was generated from IPMB 9 GID Software ID if event was generated from system software Byte 1 7 1 7 bit I2C Slave Address or 7 bit system software ID 0 Ob ID is IPMB Slave Address 1b system software ID Software ID values 0001h BIOS POST for POST errors RAS Configuration State Timestamp Synch OS Boot events 0033h BIOS SMI Handler 0020h BMC Firmware 002Ch ME Firmware 0041h Server Management Software 00C0h HSC Firmware HSBP A 00C2h HSC Firmware HSBP B Byte 2 7 4 Channel number Channel that event message was received over Oh if the event message was received from the system interface primary IPMB or internally generated by the BMC 3 2 reserved Write as 00b 1 0 IPMB device LUN if byte 1 holds Slave Address 00b otherwise 10 EvM Rev Event Message format version 04h IPMI v2 0 03h IPMI v1 0 ER 11 Sensor Type Sensor Type Code for sensor that generated the event ST 12 Sensor Number of sensor that generated the event From SDR SN 13 Event Dir Event Dir Event Type 7 Ob Assertion event EDIR 1b Deassertion event Event Type Type of trigger for the event for example critical threshold going high state asserted and so on Also indicates class of the event For example discrete threshold or OEM
27. event to all BMCs on a panic option you will get one event on a panic in a standard IPMI event format If you enable the Generate OEM events containing the panic string option you will also get a set of OEM events holding the panic string Table 96 Linux Kernel Panic Event Record Characteristics Byte Field Description Generator ID 0021h Kernel 10 EvM Rev 03h IPMI 1 0 format 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number The first byte of the panic string 0 if no panic string 7 Event direction Ob Assertion Event 13 Event Direction and Event Type ds 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 1h Run time Critical Stop that is core dump blue screen 15 Event Data 2 Second byte of panic string 16 Event Data 3 Third byte of panic string 104 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Linux Kernel Panic Records Table 97 Linux Kernel Panic String Extended Record Characteristics Byte Field Description 1 2 Record ID ID used for SEL Record access 3 Record Type 7 0 FOh OEM non timestamped bytes 4 16 OEM defined 4 Slave Address The slave address of the card saving the panic 5 Sequenc
28. for Intel 5500 53420 series Server Boards 2 Basic decoding of a SEL Record The System Event Log SEL record format is defined in the PMI Specification The following section provides a basic definition for each of the fields in a SEL For more details see the PMI Specification The definitions for the standard SEL can be found in Table 1 The definitions for the OEM defined event logs can be found in Table 3 and Table 4 2 1 Default values in the SEL records Unless otherwise noted in the event record descriptions the following are the default values in all SEL entries Byte 3 Record Type RT 02h system event record Byte 9 8 Generator ID 0020h BMC Firmware Byte 10 Event Message Revision ER 04h IPMI 2 0 Table 1 SEL Record Format Byte Field Description 1 Record ID ID used for SEL Record access 2 RID 3 Record Type 7 0 Record Type RT 02h system event record COh DFh OEM timestamped bytes 8 16 OEM defined See Table 3 EOh FFh OEM non timestamped bytes 4 16 OEM defined See Table 4 Timestamp Time when event was logged LS byte first TS Example TS 29 76 68 4C 4C687629h 1281914409 Sun 15 Aug 2010 23 20 09 UTC Note There are various websites that will convert the raw number to a date time NO P 4 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Basic
29. going low threshold been selected 07n Upper non critical Begiaded OK The temperature has gone over its upper non critical 3 Ensure there are no fan failures going high threshold 4 Ensure the air used to cool the system is within the 09h Upper critical EE Degraded The temperature has gone over its upper critical lig specifications for the system typically below going high 9 threshold 14 2 HSC Drive Slot Status Sensor The HSC Drive Slot Status sensor will provide the current status for drives in each of the slots Revision 1 0 Table 83 HSC Drive Slot Status Sensor Typical Characteristics Byte Field Description 8 E 00COh HSC Firmware HSBP A 9 vere 00C2h HSC Firmware HSBP B 11 Sensor Type ODh Drive Slot Bay 12 Sensor Number 6 Slot HSBP 8 Slot HSBP 02h Drive Slot 0 Status 03h Drive Slot 1 Status 04h Drive Slot 2 Status 05h Drive Slot 3 Status 06h Drive Slot 4 Status 07h Drive Slot 5 Status 02h Drive Slot 0 Status 03h Drive Slot 1 Status 04h Drive Slot 2 Status 05h Drive Slot 3 Status 06h Drive Slot 4 Status 07h Drive Slot 5 Status 08h Drive Slot 6 Status 09h Drive Slot 7 Status Intel order number G7421 1 001 87 Hot Swap Controller events System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description Event Direction and Event Type 7 Event
30. interrupts in order to log them to the SEL If this interrupt times out the system is frozen Table 78 SMI Timeout Sensor Typical Characteristics Byte Field Description 11 Sensor Type F3h SMI Timeout 12 Sensor Number 06h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 03h digital Discrete 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1 State Asserted 15 Event Data 2 Not used Revision 1 0 Intel order number G7421 1 001 83 Miscellaneous events System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description 16 Event Data 3 Not used 13 2 1 SMI Timeout Next Steps This event normally only occurs after another more critical event Check the SEL for any critical interrupts memory errors bus errors PCI errors or any other serious errors If these are not present the system locked up before it was able to log the original issue In this case low level debug is normally required 13 3 System Event Log Cleared The BMC logs a SEL clear event This would only ever be the first event in the SEL Cause of this event is either a manual SEL clear using Intel SEL Viewer or some other IPMI aware utility or is done in the factory as one of the last steps
31. is exceeded over Correction Time Limit Table 85 Node Manager Exception Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 002Ch ME Firmware 11 Sensor Type DCh OEM 12 Sensor Number 18h 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 72h OEM 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 Node Manager Policy event 14 Event Data 1 e 1 Policy Correction Time Exceeded policy did not meet the contract for the defined policy The policy will continue to limit the power or shutdown the platform based on the defined policy action 2 Reserved 1 0 00b 15 Event Data 2 e 0 3 Domain Id Currently supports only one domain Domain 0 16 Event Data 3 Policy Id 90 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Manageability Engine ME events 15 1 1 Node Manager Exception Event Next Steps This is an informational event Next steps will depend on the policy that was set See the Node Manager Specification for more details 15 2 Node Manager Health Event A Node Manager Health Event message provides a run time error indication about Intel Intelligent Power Node Manager s health Types of service that can send an error are defined as follows Misconfig
32. isolates systems management software from hardware Hardware advancements can be made without impacting the systems management software PMI facilitates cross platform management software You can find more information on IPMI at the following URL http www intel com design servers ipmi 1 2 2 Baseboard Management Controller BMC A baseboard management controller BMC is a specialized microcontroller embedded on most Intel Server Boards The BMC is the heart of the IPMI architecture and provides the intelligence behind intelligent platform management that is the autonomous monitoring and recovery features implemented directly in platform management hardware and firmware Different types of sensors built into the computer system report to the BMC on parameters such as temperature cooling fan speeds power mode operating system status and so on The BMC monitors the system for critical events by communicating with various sensors on the system board it sends alerts and logs events when certain parameters exceed their preset thresholds indicating a potential failure of the system The administrator can also remotely communicate with the BMC to take some corrective action such as resetting or power cycling the system to get a hung OS running again These abilities save on the total cost of ownership of a system For Intel server boards and Intel Server platforms the BMC supports the industry standard IPMI 2 0 Specification e
33. 01 101 Microsoft Windows Records 16 3 Bug Check Blue Screen Event Records System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards When the system experiences a bug check blue screen there will be multiple records written to the event log The first is a Bug Check Blue Screen OS Stop Shutdown Event Record this can be followed by multiple Bug Check Blue Screen code OEM records that will contain the Bug Check Blue Screen codes This information can be used to determine what caused the failure Table 94 Bug Check Blue Screen OS Stop Event Record Typical Characteristics Byte Field Description 8 9 Generator ID 0041h System Software with an ID 20h 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number 00h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h Run time Critical Stop that is core dump blue screen 15 Event Data 2 Not used 16 Event Data 3 Not used Table 95 Bug Check Blue Screen code OEM Event Record Typical Characteristics Byte Field Description 1 2 Record ID ID used for SEL Record access 3 Record Type 7 0 DEh OEM timestamped bytes 8 16 OEM defined 102 Intel order number G7421 1 001 Re
34. Data 2 Not used 16 Event Data 3 Not used Table 20 Power Supply Status Sensor Sensor Specific Offsets Next Steps Sensor Specific Offset Description Next Steps Hex Description 00h Presence Power supply detected Informational Event Oih Failure Power supply failed Indicates a power supply failed 1 Remove and reapply AC 2 If power supply still fails replace it 26 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards Power Subsystems Lo ensor Specific Offset Hex Description Description Next Steps 02h Predictive Failure Typically means a fan inside the power supply is not cooling the power supply It may indicate the fan is failing Replace power supply 03h A C lost AC removed Informational Event 06h Configuration error Power supply configuration is not supported Indicates that at least one of the supplies is not correct for your system configuration 1 Remove the power supply and verify compatibility 2 If power supply is compatible it may be faulty Replace it 43 2 Power Supply AC Power Input Sensors These sensors will log an event when a power supply in the system is exceeding its AC power in threshold Revision 1 0 Table 21 Power Supply AC Power Input Sensors Typical Characteristics Byte Field Description 11 Sens
35. E Sieps This 1 5V line is supplied by the main board This 1 5V line is used by the memory on processor 1 13h BB 1 5V P1 DDR3 1 Ensure all cables are connected correctly 2 Check the DIMMs are seated properly 3 Cross test DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise replace the DIMM This 1 5V line is supplied by the main board This 1 5V line is used by the memory on processor 2 14h BB 1 5V P2 DDR3 1 Ensure all cables are connected correctly 2 Check the DIMMs are seated properly 3 Cross test DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM 1 8V is supplied by the main board 15h BB 1 8V AUX 1 8V is used by the onboard NIC and I O hub 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the main board 3 3V is supplied by the power supplies 3 3V is used by the PCle and PCI X slots 16h BB 43 3V 1 Ensure all cables are connected correctly 2 Reseat any PCI cards try other slots 3 If the issue follows the card swap it otherwise replace the main board 4 Ifthe issue remains replace the power supplies 3 3V Stby is supplied by the main board 3 3V Stby is used by the BMC On board NIC IOH and ICH 17h BB 3 3V STBY 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the board 3 If the issue remains replace the power supplies 3 3V Vbat is supplied by the CMOS battery
36. ECC Error Memory Correctable and Table 61 Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset y Uncorrectable ECC Error Next Steps 03h Legacy PCI Error Legacy PCI Errors Table 68 Legacy PCI Error Sensor Event Trigger Offset Next Steps 04h PCI Express Fatal Error PCI Express Fatal Errors Table 66 PCI Express Fatal Error Sensor Event Trigger Offset Next Steps 05h PCI Express Correctable Error PCI Express Correctable errors Table 64 PCI Express Correctable Error Sensor Event Trigger Offset Next Steps 06h me Sui ath iterace QPI Correctable Error Sensor QPI Correctable Error Sensor Next Steps Correctable Error 07h er oe Interface Non QPI Non Fatal Error Sensor QPI Non Fatal Error Sensor Next Steps 14h Memory Address Parity Error Memory Address Parity Error Memory Address Parity Error Sensor Next Steps e 17h d SES QPI Fatal and Fatal 2 QPI Fatal and Fatal 2 Next Steps e 18h Intel QuickPath Interface QPI Fatal and Fatal 2 QPI Fatal and Fatal 2 Next Steps Fatal2 Error 83h System Event System Events Not applicable Revision 1 0 Intel order number G7421 1 001 13 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 3 4 Hot Swap Controller Firmware owned Sensors GID 00COh 00C2h The following table can be used to find the details of sensors owned by the Hot Swap Controller HSC firmware The HSC firmware resides on a
37. Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h C boot completed 15 Event Data 2 Not used 16 Event Data 3 Not used Revision 1 0 Intel order number G7421 1 001 97 Microsoft Windows Records System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 90 Boot up OEM Event Record Typical Characteristics Byte Field Description 1 2 Record ID ID used for SEL Record access 3 Record Type 7 0 DCh OEM timestamped bytes 8 16 OEM defined 4 5 f 6 Timestamp Time when event was logged LS byte first 7 8 9 IPMI Manufacturer ID 0137h 311d IANA enterprise number for Microsoft 10 11 Record ID Sequential number reflecting the order in which the records are read The numbers start at 1 for the 1st entry in the SEL and continue sequentially to n the number of entries in the SEL 12 13 14 Boot Time Timestamp of when system booted into the OS 15 16 Reserved 00h 98 Intel order number G74211 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards 16 2 Shutdown Event Records Microsoft Windows Records When the system shuts down from the Microsoft Windows OS there can be multiple events logged The first is an O
38. Hot Swap Back Planes HSBP There can be up to two HSBP in a system Each HSBP will have its own GID 00COh HSC Firmware HSBP A 00C2h HSC Firmware HSBP B Table 8 Hot Swap Coniroller Firmware owned Sensors SE Sensor Name Details Section Next Steps Number Oth Backplane Temperature HSC Backplane Temperature Sensor Table 82 HSC Backplane Temperature Sensor Event Trigger Offset Next Steps 02h Drive Slot 0 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 03h Drive Slot 1 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 04h Drive Slot 2 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 05h Drive Slot 3 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 06h Drive Slot 4 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 07h Drive Slot 5 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 6 Slot HSBP 08h Drive Slot 0 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 09h Drive Slot 1 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OAh Drive Slot 2 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OBh Drive Slot 3 Presence HSC Drive Presence Sensor HSC Drive Pre
39. L entry This event will be logged every time a POST error is displayed Even though this event indicates an error it may not be a fatal error If this is a serious error there will typically also be a corresponding SEL entry logged for whatever was the cause of the error this event may contain more information about what happened than the POST error event Table 70 POST Error Sensor Typical Characteristics Field Description Generator ID 0001h BIOS POST Sensor Type OFh System Firmware Progress formerly POST Error Sensor Number 06h Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 0 15 Event Data 2 Low Byte of POST Error Code 16 Event Data 3 High Byte of POST Error Code 11 2 1 System Firmware Progress Formerly Post Error Next Steps See the following table for POST error Codes Revision 1 0 Intel order number G7421 1 001 71 System BIOS events System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 71 POST Error Codes 72 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards System BIOS events Revisi
40. Name Details Section Next Steps Number BB 1 1V IOH 10h ae 1V IOH Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 1V P1 Vi 11h GE Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 1V P1 Vccp BB 1 1 P2 Vcc 12h EE 1V P2 is Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 5V P1 DDR3 13h Beis P1 DDR3 Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 5V P2 DDR3 14h EE P2 DDR3 Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 8V AUX 15h EN Voltage Sensors Table 14 Voltage Sensors Next Steps BB 3 3V 16h sans Voltage Sensors Table 14 Voltage Sensors Next Steps BB 3 3V BB 3 3V STBY 17h Voltage Sensors Table 14 Voltage Sensors Next Steps BB 3 3V STBY BB 3 3V Vbat 18h Voltage Sensors Table 14 Voltage Sensors Next Steps BB 3 3V Vbat Tomage Sensors 9 e BB 5 0V 19h Ee coy Voltage Sensors Table 14 Voltage Sensors Next Steps BB 5 0V STBY 1Ah ES STBY Voltage Sensors Table 14 Voltage Sensors Next Steps BB 12 0V 1Bh FN Voltage Sensors Table 14 Voltage Sensors Next Steps BB 12 0V voltage sensors 1Ch BB 12 0V Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 35V P1 LV DDR3 hi 1Dh BB 1 35v P1 MEM Voltage Sensors Table 14 Voltage Sensors Next Steps Revision 1 0 Intel order number G7421 1 001 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel
41. S Stop Shutdown Event Record this can be followed by a shutdown reason code OEM record and then zero or more shutdown comment OEM records These are all informational only records Table 91 Shutdown Reason Code Event Record Typical Characteristics Byte Field Description 8 9 Generator ID 0041h System Software with an ID 20h 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number 00h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 3h OS Graceful Shutdown 15 Event Data 2 Not used 16 Event Data 3 Not used Table 92 Shutdown Reason OEM Event Record Typical Characteristics Byte Field Description 1 2 Record ID ID used for SEL Record access 3 Record Type 7 0 DDh OEM timestamped bytes 8 16 OEM defined Revision 1 0 Intel order number G7421 1 001 99 Microsoft Windows Records System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description 4 5 e 6 Timestamp Time when event was logged LS byte first 7 8 9 IPMI Manufacturer ID 0137h 311d IANA enterprise number for Microsoft 10 11 Record ID Sequential number reflecting t
42. SUS FE Lo montor power 1 Verify the power budget is within the specified range Upper critical supply power consumption 09h going high non fatal Degraded 2 Check http www intel com p en US support for the power budget tool for your system 4 3 4 Power Supply Temperature Sensors The BMC will monitor one power supply temperature sensor for each installed PMBus compliant power supply Revision 1 0 Table 25 Power Supply Temperature Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number en power Supply Temperature 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 01h Threshold Intel order number G7421 1 001 29 Power Subsystems System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description 7 6 01b Trigger reading in Event Data 2 14 Event Data 1 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 26 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and for deassertion Table 26 Power Supply Temperature Sensor Event Trigger Offset Next Steps Event Trigger Offset Asserti
43. Sensors Next Steps Event Trigger Offset ae Sensor Name ly Description Next Steps umber ype Hex Description 66h P1 VRD Hot En 1 Check for clear and unobstructed airflow into and out of chassis 05h Oth Limit Exceeded 9 2 Ensure SDR is programmed and correct chassis has been selected 67h P2 VRD Hot He Ee 3 Ensure there are no fan failures dee 4 Ensure the air used to cool the system is within the thermal 6ah IOH Thermal Trip 03h Oth State Asserted I O Hub IOH overheated specifications for the system typically below 35 C Revision 1 0 Intel order number G7421 1 001 4 Processor subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 6 Processor subsystem Intel servers report several processor centric sensors in the SEL 8 1 Processor Status Sensor The status sensor reports processor presence or a thermal trip condition Each processor has a status sensor Table 44 Process Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number See Table 45 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 45 15 Event Da
44. System Event Log Troubleshooting Guide for Intel 5500 S3420 series Server Boards Intel order number G7421 1 001 Revision 1 0 August 2012 Enterprise Platforms and Services Division Marketing Disclaimers System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards Disclaimers Information in this document is provided in connection with Intel products No license express or implied by estoppel or otherwise to any intellectual property rights is granted by this document Except as provided i in Intel s Terms and Conditions of Sale for such products Intel assumes no liability whatsoever and Intel disclaims any express or implied warranty relating to sale and or use of Intel products including liability or warranties relating to fitness for a particular purpose merchantability or infringement of any patent copyright or other intellectual property right Intel products are not intended for use in medical lifesaving or life sustaining applications Intel may make changes to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them This document contains information on products in the design p
45. The Event Type field is encoded using the Event Reading Type Code 6 0 Event Type Codes 01h Threshold States 0x00 Ox0b 02h Och Discrete 6Fh Sensor Specific 70 7Fh OEM 14 Event Data 1 Per Table 2 Event Request Message Event Data Field Contents ED1 15 Event Data 2 ED2 16 Event Data 3 ED3 Revision 1 0 Intel order number G74211 001 5 Basic decoding of a SEL Record System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards Table 2 Event Request Message Event Data Field Contents Sensor Event Data Class Threshold Event Data 1 7 6 00b unspecified Event Data 2 01b trigger reading in Event Data 2 10b OEM code in Event Data 2 11b sensor specific event extension code in Event Data 2 5 4 00b unspecified Event Data 3 01b trigger threshold value in Event Data 3 10b OEM code in Event Data 3 11b sensor specific event extension code in Event Data 3 3 0 Offset from Event Reading Code for threshold event Event Data 2 reading that triggered event FFh or not present if unspecified Event Data 3 threshold value that triggered event FFh or not present if unspecified If present Event Data 2 must be present discrete Event Data 1 7 6 00b unspecified Event Data 2 01b previous state and or severity in Event Data 2 10b OEM code in Event Data 2 11b sensor specific event extension code in Event Data 2 5 4 00b unspecified Ev
46. a 71 e NEE 72 Physical Security Sensor Typical Characteristics rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrren 78 Physical Security Sensor Event Trigger Offset Next Steps rrrnrnnnrrrnnnvrrrrnnnnnnnnnr 79 FP NMI Interrupt Sensor Typical Characteristics rrrrrrrnnrrrrrnnnnnnnrnnnnnrrrrnnnnnnnennr 79 Button Press Events Sensor Typical Characteristics rrrrnnrrrrrnnnnnnnnnrnnnrrrrrnnnnnnnnnr 80 IPMI Watchdog Sensor Typical Characteristics cccccccccccceeeeeeeeeeeeeeeeeeeeeeeeeeseeeeess 82 IPMI Watchdog Sensor Event Trigger Offset Next Greng 83 SMI Timeout Sensor Typical Characteristics sseessesseesseeseseeeeeeeseseeeeeeeeeea 83 System Event Log Cleared Sensor Typical Characteristics rnrrrrrnnnnnnnnrnnnnrrrnnnn 84 System Event PEF Action Sensor Typical Characteristics rrrrrnnnnnnnnnnnnnnrrnnnnne 85 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards List of Tables Table 81 Table 82 Table 83 Table 84 Table 85 Table 86 Table 87 Table 88 Table 89 Table 90 Table 91 Table 92 Table 93 Table 94 Table 95 Table 96 Table 97 Revision 1 0 HSC Backplane Temperature Sensor Typical Characteristics wrrrrrrrnnrrrrrnnnnnnnnrn 86 HSC Backplane Temperature Sensor Event Trigger Offset Next Steps 87 HSC Drive Slot Status Sensor Typical Cha
47. al memory mirroring domain that is restricted to memory mirroring pairs within a processor socket only 16 Event Data 3 1b Global memory sparing domain instance This SEL pertains to a global memory mirroring domain that pertains to memory mirroring between processor sockets 6 4 Reserved 3 0 0 based Instance ID of this sparing domain Revision 1 0 Intel order number G7421 1 001 53 Memory subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 55 Mirrored Redundancy State Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description Memory is configured in Mirrored i Oth Channel Mode ane the memory is Siren O Informational event operating in the fully redundant i entry per processor state Memory is configured in Mirrored One of the channels in the 00h Channel Mode and the memory mirror pair is taken offline This event should be accompanied by memory errors indicating the source of the has lost redundancy and is loss of mirror one entry only issue Troubleshoot accordingly probably replace affected DIMM operating in the degraded state for affected processor 9 1 3 Sparing Configuration Status This sensor provides the Spare Channel mode RAS Configuration status Table 56 Sparing Configuration Status Sensor Typical Characteristics Byte Field Description 8
48. ay occur if this situation remains for a longer period of time 06h non redundant System has lost one or more fans and is running in non degraded from fully redundant mode There are enough fans to keep the system redundant properly cooled but fan speeds will boost 07h redundant degraded System has lost one or more fans and is running in a degraded from non redundant mode but still is redundant There are enough fans to keep the system properly cooled 5 2 Temperature Sensors There are a variety of temperature sensors that can be implemented on Intel server systems They are split into three types Regular temperature sensors thermal margin sensors and discrete temperature sensors Each of them has their own types of events that can be logged Revision 1 0 Intel order number G7421 1 001 55 Cooling subsystem 5 2 1 Regular Temperature sensors System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Regular temperature sensors are sensors that report an actual temperature These are linear threshold based sensors In most Intel server systems there are at least two sensors defined front panel temperature and baseboard temperature Both these sensors typically have upper and lower thresholds set upper to warn in case of an over temperature situation lower to warn against sensor failure temperature sensors typically read out 0 if they stop working 36 Table 33
49. cates a CRC error detected during a DLLP transaction This poh Data tink layer Protocol Error means the transaction was corrupted i d The link was lost and is no longer functional Requires a reboot to Oth Surprise Link Down bring the link back 02h Unexpected Completion Indicates the device received a completion notification for a transaction it does not recognize This is a fatal error Received Unsupported request condition on f iah d 03h inbound address decode with the exception re de a failure due to an incorrect address sent to the of SAD target This unknown address is a fatal error i Decode bus device and function to identify i indi i i i i the card 04h Poisoned TLP Error Typically indicates a parity error in a TLP transaction This means pr the data received is not correct If this is an add in card Indicates an error during initialization with the device not providing a Verify card is inserted properly 05h Flow Control Protocol Error enough flow control credits This means the bus configuration is b Install the card in another slot and check incorrect and it cannot continue if the error follows the card or stays with the slot Indicates a transaction did not complete in the specified amount of 06h Completion Timeout Error wier if c Update all firmware and drivers including non Intel components 07h Completer Abort Error Indicates a transaction had unexpected content or format If this is an onboard device 08h
50. de Manager Specification for more details 94 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards 15 4 Node Manger Alert Threshold Exceeded Policy Correction Time Exceeded Event will be sent each time when maintained policy power limit is exceeded over Correction Time Limit Table 88 Node Manager Alert Threshold Exceeded Sensor Typical Characteristics Manageability Engine ME events Byte Field Description 8 9 Generator ID 002Ch ME Firmware 11 Sensor Type DCh OEM 12 Sensor Number 1Bh 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 72h OEM 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 Node Manager Policy event 0 Threshold exceeded 14 Event Data 1 1 Policy Correction Time Exceeded policy did not meet the contract for the defined policy The policy will continue to limit the power or shutdown the platform based on the defined policy action 2 Reserved 1 0 Threshold Number valid only if Byte 5 bit 3 is set to 0 0 to 2 threshold index 7 4j R 15 Event Data 2 7 4 E o 3 0 Domain Id Currently supports only one domain Domain 0 16 Event Data 3 Policy ID Revision 1 0 Intel order number G7421 1 001 95 Manageability Engine ME events System Event Log Troublesho
51. direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 40h Failed Drive 15 Event Data 2 Not used 16 Event Data 3 Not used 14 2 1 HSC Drive Slot Status Sensor Next Steps If during normal operation a drive gets reported as failed then ensure that the drive was seated properly and the drive carrier was properly latched If that does not work then replace the drive 14 3 HSC Drive Presence Sensor The HSC Drive Slot Presence sensor will provide the current presence state for drive in each of the slots After an AC power cycle there will be a SEL entry to report the presence of the drive in a slot and there will be another entry for any changes in the presence of drives after that Table 84 HSC Drive Presence Sensor Typical Characteristics Byte Field Description Generator ID 00COh HSC Firmware HSBP A 00C2h HSC Firmware HSBP B 11 Sensor Type ODh Drive Slot Bay 12 Sensor Number 6 Slot HSBP 8 Slot HSBP 88 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description OAh Drive Slot 0 Presence 08h Drive Slot 0 Presence OBh Drive Slot 1 Presence 09h Drive Slot 1 Presence OCh Drive Slot 2 Presence OAh Drive Slot 2 Presence ODh Drive Slot 3 Presenc
52. e OBh Drive Slot 3 Presence OEh Drive Slot 4 Presence OCh Drive Slot 4 Presence OFh Drive Slot 5 Presence ODh Drive Slot 5 Presence 10h Drive Slot 6 Presence 11h Drive Slot 7 Presence 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 08h digital Discrete 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 14 Event Data 1 3 0 Event Trigger Offset Oh Device Removed Device Absent h Device Inserted Device Present 15 Event Data 2 Not used 16 Event Data 3 Not used 14 3 1 HSC Drive Presence Sensor Next Steps On AC power on the drive presence will be logged as an informational event If during normal operation a drive is removed or installed it will also log an event Hot Swap Controller events If you get a drive removed or installed without operator intervention then ensure that the drive was seated properly and the drive carrier was properly latched Revision 1 0 Intel order number G7421 1 001 89 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Manageability Engine ME events 15 Manageability Engine ME events The Manageability Engine controls the PECI interface and also contains the Node Manager functionality 15 1 Node Manager Exception Event A Node Manager Exception Event will be sent each time when maintained policy power limit
53. e Number A sequence number starting at zero 6 Br Kernel Panic Data These hold the panic sting If the panic string is longer than 11 bytes multiple messages will be sent with increasing sequence numbers 16 Revision 1 0 Intel order number G7421 1 001 105
54. el Panic Events 18 Table 12 Voltage Sensors Typical Characteristics ccccccccccccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeees 19 Table 13 Voltage Sensors Event Triggers Description rrrnnnnnnnnonnnnnnrrrnnnnnnnnnrrnnnnrrrnnnnnnnnenr 20 Table 14 Voltage Sensors Next Greng 20 Table 15 Power Unit Status Sensors Typical Characteristics rrrrrrrnrrrrrnnnrnnnnrrnnnrrrrnnnnnnnennr 23 Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Gtenmg 24 Table 17 Power Unit Redundancy Sensors Typical Characteristics rrrrnnnrnnnnrnnnnrrrrnnnnnnnnnnr 24 Table 18 Power Unit Redundancy Sensor Event Trigger Offset Next Steps ssssssneen 25 Table 19 Power Supply Status Sensors Typical Characterstics eneee eeren 26 Table 20 Power Supply Status Sensor Sensor Specific Offsets Next Steps 008 26 Table 21 Power Supply AC Power Input Sensors Typical Characteristics orrnrrrrrrnnnnrnnrr 27 Table 22 Power Supply AC Power Input Sensor Event Trigger Offset Next Steps 28 Table 23 Power Supply Current Output Sensors Typical Characteristics ccccccccceeeeees 28 Table 24 Power Supply Current Output Sensor Event Trigger Offset Next Steps 29 Table 25 Power Supply Temperature Sensors Typical Characteristics cccccceeeeeeeeeeeeeeeees 29 Table 26 Power Supply Temperature Sensor Event Trigger Offset
55. elay during the boot for some unknown reason Without that second time synch message the time span to the next logged message is indeterminate With the second time synch as a baseline the following log timestamps are always determinate Revision 1 0 Intel order number G7421 1 001 69 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards System BIOS events Table 69 System Event Sensor Typical Characteristics Description Byte Field 8 9 Generator ID 0001h BIOS POST 0033h BIOS SMI Handler 11 Sensor Type 12h System Event 12 Sensor Number 83h Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 01h System Boot 05h Timestamp Clock Synchronization 15 Event Data 2 For Event Trigger Offset 05h only Timestamp Clock Synchronization 00h 1st in pair 80h 2nd in pair 16 Event Data 3 Not Used 70 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards 11 2 System Firmware Progress Formerly Post Error System BIOS events The BIOS logs any POST errors to the SEL The two byte POST code gets logged in the ED2 and ED3 bytes in the SE
56. ent Data 3 01b reserved 10b OEM code in Event Data 3 11b sensor specific event extension code in Event Data 3 3 0 Offset from Event Reading Code for discrete event state Event Data 2 7 4 Optional offset from Severity Event Reading Code OFh if unspecified 3 0 Optional offset from Event Reading Type Code for previous discrete event state OFh if unspecified Event Data 3 Optional OEM code FFh or not present if unspecified OEM Event Data 1 7 6 00b unspecified in Event Data 2 01b previous state and or severity in Event Data 2 10b OEM code in Event Data 2 11b reserved 5 4 00b unspecified Event Data 3 01b reserved 10b OEM code in Event Data 3 11b reserved 3 0 Offset from Event Reading Type Code Event Data 2 7 4 Optional OEM code bits or offset from Severity Event Reading Type Code OFh if unspecified 3 0 Optional OEM code or offset from Event Reading Type Code for previous event state OFh if unspecified Event Data 3 Optional OEM code FFh or not present or unspecified Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Basic decoding of a SEL Record Table 3 OEM SEL Record Type COh DFh Byte Field Description 1 Record ID ID used for SEL Record access 2 RID 3 Record Type 7 0 Record Type RT COh DFh
57. erly Correctable ECC t GA since last boot This event in itself does not pose any direct problems as the ECC errors 3 Examine gold fingers on edge of DIMM to 00h Error threshold ill bei D S he RAS fi fth he IMC reached are still being corrected epending on the configuration of the memory the may verify contacts are clean lake NE GUE Cie DIMM omine 4 Inspect processor socket this DIMM is connected to for bent pins and if found replace the board 5 Consider replacing the DIMM as a preventative measure For multiple occurrences replace the DIMM 9 2 2 Memory Address Parity Error Address Parity errors are errors detected in the memory addressing hardware Since these affect the addressing of memory contents they can potentially lead to the same sort of failures as ECC errors They are logged as a distinct type of error since they affect memory addressing rather than memory contents but otherwise they are treated exactly the same as Uncorrectable ECC Errors Address Parity errors are logged to the BMC SEL with Event Data to identify the failing address by channel and DIMM to the extent that it is possible to do so 60 Table 62 Address Parity Error Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0033h BIOS SMI Handler 11 Sensor Type Och Memory 12 Sensor Number 14h Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel
58. ertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 68 15 Event Data 2 PCI Bus number 7 3 PCI Device number 16 Event Data 3 2 0 PCI Function number Revision 1 0 Intel order number G7421 1 001 67 PCI Express and Legacy PCI subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 68 Legacy PCI Error Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 1 Decode bus device and function to identify the card 04h PERR Parity Error PERR asserted This is a fatal error 2 If this is an add in card a Verify card is inserted properly b Install the card in another slot and check if the error follows the card or stays with the slot c Update all firmware and drivers including non Intel components 05h SERR System Error SERR asserted This is a fatal error 3 If this is an onboard device a Update all bios firmware and drivers b Replace the board 68 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards System BIOS events 11 System BIOS events There are a number of events that are owned by the system BIOS These events
59. evision 1 0 System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards Miscellaneous events Table 77 IPMI Watchdog Sensor Event Trigger Offset Next Steps Event Trigger Offset Hex Description Description Next Steps timer expired ooh status only Oih hard reset 02h power down 03h power cycle 08h timer interrupt Our server systems support a BMC watchdog timer which can check to see if the OS is still responsive The timer is disabled by default and would have to be enabled manually It then requires 1 an IPMI aware utility in the operating system that will reset the timer before it expires If the timer does expire the BMC can take action if it is configured to do so reset power down power 2 cycle or generate a critical interrupt watchdog timer If this event is being logged it is because the BMC has been configured to check the Make sure you have support for this in your OS typically using a third party IPMI aware utility like ipmitool or ipmiutil along with the openipmi driver If this is the case then it is likely your OS has hung and you should investigate OS event logs to determine what may have caused this 13 2 SMI Timeout SMI stands for system management interrupt and is an interrupt that gets generated so the processor can service server management events typically memory or PCI errors or other forms of critical
60. ger Offset as described in Table 55 52 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Memory subsystem Byte Field Description 7 4 H Domain Instance Type ED3 is set to Local this field specifies the mirroring domain local sub instances which channels are included in this sub instance 0000b Reserved 0001b Ch A Ch B 0010b Ch A Ch C 0011b Ch B Ch C 0100b 1110b Reserved If Domain Instance Type ED3 is set to Global this field specifies the 0 based Socket ID of the first participant processor in this mirroring domain global instance A value of 1111b indicates that this field is unused and does not contain valid data 3 0 If Domain Instance Type ED3 is set to Local this field specifies the sparing domain local sub instances which channels are included in this sub instance 0000b Reserved 0001b Ch A Ch B Ch C only configuration possible on Intel 5500 S5520 Server Boards 0010b 1110b Reserved If Domain Instance Type ED3 is set to Global this field specifies the 0 based Socket ID of the first participant processor in this sparing domain global instance A value of 1111b indicates that this field is unused and does not contain valid data 15 Event Data 2 7 Domain Instance Type Ob Local memory sparing domain instance This SEL pertains to a loc
61. hase of development Do not finalize a design with this information Revised information will be published when the product is available Verify with your local sales office that you have the latest datasheet before finalizing a design The product may contain design defects or errors known as errata which may cause the product to deviate from the published specifications Current characterized errata are available on request This document and the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license The information in this manual is furnished for informational use only is subject to change without notice and should not be construed as a commitment by Intel Corporation Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document Except as permitted by such license no part of this document may be reproduced stored in a retrieval system or transmitted in any form or by any means without the express written consent of Intel Corporation Intel Pentium Itanium and Xeon are trademarks or registered trademarks of Intel Corporation Other brands and names may be claimed as the property of others Copyright Intel Corporation 2012 All rights reserved ii Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide fo
62. hassis Intrusion is monitored on supported chassis and the BMC logs corresponding events when the chassis lid is opened and closed 12 1 2 LAN Leash lost The LAN Leash lost sensor monitors the physical connection on the onboard network ports If a LAN Leash lost event is logged this means the network port lost its physical connection Table 72 Physical Security Sensor Typical Characteristics Byte Field Description 11 Sensor Type 05h Physical Security 12 Sensor Number 04h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 73 78 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Chassis subsystem Byte Field Description 15 Event Data 2 Not used 16 Event Data 3 Not used Table 73 Physical Security Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 1 Use the Quick Start Guide and the Service Guide to determine whether the chassis intrusion switch is connected properly 00h chassis Somebody has opened ine NESS KTE Chassis 2 If this is the case make sure it makes proper contact when the cha
63. hat is part of Intel server boards and Intel server platforms serves as a vital part of the overall server management strategy The server management hardware provides essential information to the system administrator and provides the administrator the ability to remotely control the server even when the operating system is not running The Intel server boards and Intel server platforms offer comprehensive hardware and software based solutions The server management features make the servers simple to manage and provide alerting on system events From entry to enterprise systems good overall server management is essential to reducing overall total cost of ownership This Troubleshooting Guide is intended to help the users better understand the events that are logged in the Baseboard Management Controllers BMC System Event Logs SEL on these Intel server boards There are separate User s Guide that covers the general server management and the server management software offered on Intel server boards and Intel server platforms Server boards currently supported by this document Intel S3200 X38ML server boards Intel S5500 S3420 series server boards 1 1 Purpose The purpose of this document is to list all possible events generated by the Intel platform It may be possible that other sources not under our control also generate events which will not be described in this document 1 2 Industry Standard 1 2 1 I
64. he order in which the records are read The numbers start at 1 for the 1st entry in the SEL and continue sequentially to n the number of entries in the SEL 12 13 Sh td wn Reason Shutdown Reason code from the registry LSB first 14 HKLM Software Microsoft Windows CurrentVersion Reliability shutdown ReasonCode 15 16 Reserved 00h Table 93 Shutdown Comment OEM Event Record Typical Characteristics Byte Field Description 1 2 Record ID ID used for SEL Record access 3 Record Type 7 0 DDh OEM timestamped bytes 8 16 OEM defined 4 5 i 6 Timestamp Time when event was logged LS byte first 7 8 0137h 311d IANA enterprise number for Microsoft 9 IPMI Manufacturer ID 0157h 343 IANA enterprise number for Intel 10 The value logged will depend on the Intelligent Management Bus Driver IMBDRV that is loaded 100 Intel order number G74211 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Microsoft Windows Records Byte Field Description 11 Record ID Sequential number reflecting the order in which the records are read The numbers start at 1 for the 1st entry in the SEL and continue sequentially to n the number of entries in the SEL 12 13 Shutdown Comment from the registry LSB first 14 Shutdown Comment HKLM Software Microsoft Windows CurrentVersion Reliability shutdown Comment 15 16 Reserved 00h Revision 1 0 Intel order number G7421 1 0
65. i b Replace the board 06h sink banat Link bandwidth changed changed 64 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 10 1 2 PCI Express Fatal Errors PCI Express and Legacy PCI subsystem When a PCI Express fatal error is reported to the BIOS SMI handler it will record the error using the following format Table 65 PCI Express Fatal Error Sensor Typical Characteristics Description Byte Field 8 9 Generator ID 0033h BIOS SMI Handler 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 04h Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 70h OEM Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 66 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number Revision 1 0 Intel order number G7421 1 001 65 PCI Express and Legacy PCI subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 66 PCI Express Fatal Error Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description f Indi
66. if possible 3 Replace power subsystem 06h Power Unit Failure Power subsystem experienced a Indicates a power supply failed failure 1 Remove and reapply AC power 2 If power supply still fails replace it 4 2 2 Power Unit Redundancy Sensor This sensor is enabled on systems that support redundant power supplies When a system has AC applied or if it loses redundancy of the power supplies a message will get logged into the SEL 24 Table 17 Power Unit Redundancy Sensors Typical Characteristics Byte Field Description 11 Sensor Type 09h Power Unit 12 Sensor Number 02h Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards Byte Field Description Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 18 15 Event Data 2 Not used 16 Event Data 3 Not used Power Subsystems Table 18 Power Unit Redundancy Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h fully redundant System is fully operational Informational Event Oth redu
67. in the manufacturing process This is an informational event only Table 79 System Event Log Cleared Sensor Typical Characteristics Byte Field Description 11 Sensor Type 10h Event Logging Disabled 12 Sensor Number 07h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 2 Log area reset cleared 15 Event Data 2 Not used 14 Event Data 1 84 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Miscellaneous events Byte Field Description 16 Event Data 3 Not used 13 4 System Event PEF action The BMC is configurable to send alerts for events logged into the SEL These alerts are called Platform Event Filters PEF and are disabled by default The user must configure and enable this feature PEF events are logged if the BMC takes action due to a PEF configuration The BMC event triggering the PEF action will also be in the SEL This functionality is built into the BMC to allow it to send alerts SNMP or other for any event that gets logged to the SEL PEF filters are turned off by default and would have to be enabled manually using Intel deployment assistant Intel syscfg utility Intel or man
68. is sub instance 0000b Reserved 0001b Ch A Ch B Ch C only configuration possible on Intel 5500 S5520 Server Boards 0010b 1110b Reserved If Domain Instance Type ED3 is set to Global this field specifies the 0 based Socket ID of the first participant processor in this sparing domain global instance A value of 1111b indicates that this field is unused and does not contain valid data 15 Event Data 2 56 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Memory subsystem Byte Field Description 7 Domain Instance Type Ob Local memory sparing domain instance This SEL pertains to a local memory sparing domain that is restricted to memory sparing pairs within a processor socket only 16 Event Data 3 1b Global memory sparing domain instance This SEL pertains to a global memory sparing domain that pertains to memory sparing between processor sockets 6 4 Reserved 3 0 0 based Instance ID of this sparing domain Table 59 Sparing Redundancy State Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description Memory is configured in Spare Channel Mode and the memory is System boots with spare Oih operating in the fully redundant channel mode active one Informational event state with the spare channel entry per processor i
69. ision 1 0 Intel order number G7421 1 001 55 Memory subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 9 1 4 Sparing Redundancy State Sensor This sensor provides the RAS Redundancy state for the Spare Channel Mode Table 58 Sparing Redundancy State Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0001h BIOS POST 11 Sensor Type Och Memory 12 Sensor Number 11h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 59 7 4 If Domain Instance Type ED3 is set to Local this field specifies the 0 based Socket ID of the processor that contains the sparing domain local sub instances A value of 1110b indicates that the sparing configuration specified in Bits 8 0 applies globally to all sockets in the system If Domain Instance Type ED3 is set to Global this field specifies the 0 based Socket ID of the second participant processor in this sparing domain global instance A value of 1111b indicates that this field is unused and does not contain valid data 3 0 If Domain Instance Type ED3 is set to Local this field specifies the sparing domain local sub instances which channels are included in th
70. lane Temperature Sensor There is a thermal sensor on the Hot Swap Backplane to measure the ambient temperature Table 81 HSC Backplane Temperature Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 00COh HSC Firmware HSBP A 00C2h HSC Firmware HSBP B 11 Sensor Type 01h Temperature 12 Sensor Number Oth Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 82 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event 86 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Hot Swap Controller events Table 82 HSC Backplane Temperature Sensor Event Trigger Offset Next Steps Event Trigger Assertion Deassert Hex Description Severity Severity Description Next Steps 00h SE aea Degraded OK Dees has dropped PRO MOI non 1 Check for clear and unobstructed airflow into and out of i chassis 02h Lower critical AE Degraded The temperature has dropped below its lower critical 2 Ensure SDR is programmed and correct chassis has
71. logs state changes Expected power on events such as DC ON OFF are logged and unexpected events are also logged such as AC loss and power good loss Revision 1 0 Table 15 Power Unit Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 09h Power Unit 12 Sensor Number Oth 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Sensor Specific offset as described in Table 9 Intel order number G7421 1 001 23 Power Subsystems System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description 15 Event Data 2 Not used 16 Event Data 3 Not used Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps Sensor Specific Offset Description Next Steps Hex Description 00h Power down System is powered down Informational Event 04h A C Lost AC removed Informational Event 05h Soft Power Control Generally means power good was lost This could be cause by the power supply subsystem or system components Failure in the system causing a shutdown 1 Verify all power cables and adapters are connected properly AC cables as well as the cables between PSU and system components 2 Cross test PSU
72. n Deassert SE Hex Description Severity Severity P 00h SE critica Degraded OK The voltage has dropped below its lower non critical threshold 02h eae non fatal Degraded The voltage has dropped below its lower critical threshold 07h AE critical Degraded OK The voltage has gone over its upper non critical threshold 09h ry ie non fatal Degraded The voltage has gone over its upper critical threshold Table 14 Voltage Sensors Next Steps Sensor Sens r Name Next Steps Number This 1 1V line is supplied by the main board This 1 1V line is used by the I O hub OH 10h BB 1 1V IOH 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the motherboard This 1 1V line is supplied by the main board This 1 1V line is used by processor 1 11h BB 1 1V P1 Vccp 1 Ensure all cables are connected correctly 2 Cross test processor if possible If the issue remains with the socket replace the main board otherwise the processor 12h BB 1 1V P2 Vccp This 1 1V line is supplied by the main board This 1 1V line is used by processor 2 1 Ensure all cables are connected correctly 2 Cross test processor if possible If the issue remains with the socket replace the main board otherwise the processor 20 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Power Subsystems E Sensor Name NEX
73. n and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 74h OEM Discrete 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used QPI Fatal and Fatal 2 Next Steps Processor subsystem This is an Informational event only Correctable errors are acceptable and normal at a low rate of occurrence If error continues 1 2 Revision 1 0 Check the processor is installed correctly Inspect the socket for bent pins 3 Cross test the processor if possible Intel order number G7421 1 001 49 Memory subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 9 Memory subsystem Intel servers report memory errors status and configuration in the SEL 9 1 Memory RAS Mirroring and Sparing Memory RAS Configuration Status refers to the BIOS sending the current RAS mode and RAS operational state to the BMC to log into the SEL as a SEL record This allows a remote software application to query and retrieve the system memory state The memory configuration state sensors are virtual sensors In other words these sensors are owned and controlled completely by the BIOS independently of the BMC The RAS configuration and state definitions are aligned with the definitions within the Intelligent Platform Management Inte
74. n happen if the max min power consumption of the platform exceeds the values in policy due to hardware reconfiguration First occurrence of an unacknowledged event will be retransmitted no faster than every 300 milliseconds Real time clock synchronization failure alert is sent when NM is enabled and capable of limiting power but within 10 minutes the firmware cannot obtain valid calendar time from the host side so NM cannot handle suspend periods Next steps will depend on the policy that was set See the Node Manager Specification for more details 92 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards 15 3 Node Manager Operational Capabilities Change Manageability Engine ME events This message provides a run time error indication about Intel Intelligent Power Node Manager s operational capabilities This applies to all domains Assertion and deassertion of these events are supported Table 87 Node Manager Operational Capabilities Change Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 002Ch ME Firmware 11 Sensor Type DCh OEM 12 Sensor Number 1Ah 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 74h OEM 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Current state of O
75. nabling you to configure monitor and recover systems remotely 1 2 2 1 System Event Log SEL The BMC provides a centralized non volatile repository for critical warning and informational system events called the System Event Log or SEL By having the BMC manage the SEL and logging functions it helps to ensure that post mortem logging information is available should a failure occur that disables the systems processor s 2 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards Introduction The BMC allows access to SEL from in band and out of band mechanisms There are various tools and utilities that can be used to access the SEL There is the Intel SELViewer and multiple open sourced IPMI tools 1 2 3 Intel Intelligent Power Node Manager version 1 5 Intel Intelligent Power Node Manager version 1 5 NM is a platform resident technology that enforces power and thermal policies for the platform These policies are applied by exploiting subsystem knobs such as processor P and T states that can be used to control power consumption Intel Intelligent Power Node Manager enables data center power and thermal management by exposing an external interface to management software through which platform policies can be specified It also enables specific data center power management usage models such as power limiting The configuration and control commands
76. nactive and available Memory is configured in Spare Channel Mode and the memory Spare channel replaces failing 00h has lost redundancy and is channel one SEL entry for This event should be accompanied by memory errors indicating the source of the operating in the degraded state processor with failing memory issue Troubleshoot accordingly probably replace affected DIMM with the spare channel active and to signify loss of redundancy used to replace a failed channel Revision 1 0 Intel order number G7421 1 001 57 Memory subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 9 2 ECC and Address Parity 1 Memory data errors are logged as correctable or uncorrectable 2 Uncorrectable errors are fatal 3 Memory addresses are protected with parity bits and a parity error is logged This is a fatal error 9 2 1 Memory Correctable and Uncorrectable ECC Error ECC errors are divided into Uncorrectable ECC Errors and Correctable ECC Errors A Correctable ECC Error actually represents a threshold overflow More Correctable Errors are detected at the memory controller level for a given DIMM within a given timeframe In both cases the error can be narrowed down to particular DIMM s The BIOS SMI error handler uses this information to log the data to the BMC SEL and identify the failing DIMM module Table 60 Correctable and Uncorrectable ECC Error Sensor Typical Characteristics
77. ndancy lost 02h redundancy degraded 03h non redundant sufficient from redundant 04h non redundant sufficient from insufficient System is not running in redundant This event should be accompanied by specific power supply errors AC power supply mode lost PSU failure and so on Troubleshoot these events accordingly 05h non redundant insufficient 06h non redundant degraded from fully redundant 07h redundant degraded from non redundant 4 3 Power Supply The BMC monitors the power supply subsystem Revision 1 0 Intel order number G7421 1 001 25 Power Subsystems System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards 43 1 Power Supply Status Sensors These sensors report the status of the power supplies in the system When a system first AC applied or removed it can log an event Also if there is a failure predictive failure or a configuration error it can log an event Table 19 Power Supply Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 08h Power Supply 50h Power Supply 1 Status 12 N Sensor Number 51h Power Supply 2 Status 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Sensor Specific offset as described in Table 20 15 Event
78. nnel A 01b Channel B 10b Channel C 11b reserved 2 0 DIMM Slot ID If valid of the specific DIMM that was involved in the transaction that led to the parity error This value will be indeterminate and should be ignored if ED2 Bit 3 is Ob 000b DIMM Socket 1 001b DIMM Socket 2 All other values are reserved 9 2 2 1 Memory Address Parity Error Sensor Next Steps These are bit errors that are detected in the memory addressing hardware An Address Parity Error implies that the memory address transmitted to the DIMM addressing circuitry has been compromised and data read or written are compromised in turn An Address Parity Error is logged as such in SEL but in all other ways is treated the same as an Uncorrectable ECC Error While the error may be due to a failing DRAM chip on the DIMM it could also be caused by incorrect seating or improper contact between socket and DIMM or by bent pins in the processor socket arwn gt 62 If needed decode DIMM location from hex version of SEL Verify DIMM is seated properly Examine gold fingers on edge of DIMM to verify contacts are clean Inspect processor socket this DIMM is connected to for bent pins and if found replace the board Consider replacing the DIMM as a preventative measure For multiple occurrences replace the DIMM Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards PCI Express
79. nnnnnnnnnnnnnrrrnnnnnnnnnnnnnnnrrrnnnennsnennnnnnrrnnnnne 50 7 1 1 leen Ben le HIE te TEE EE 50 7 1 2 Mirrored Redundancy State Sensor rrrrrnnnnnnnornnnnnrrrrnnnnnnnrrrnnnnrrrnnnnennnennnnnnrrnnnnne 52 7 1 3 Sparing Configuration Gtaius ccc cece eee rete eeeeeeeeeeeeeeteceecaeeeeeeeeeeneees 54 7 1 4 Sparing Redundancy State Sensor eee eeeeeeneeeeeeee eee eeeetaaeeeeeeeeeeneeee 56 7 2 ECC and Address Parity iesse GEESS NEE teeta 58 7 2 1 Memory Correctable and Uncorrectable ECC Emor 58 7 2 2 Memory Address Parity Error samenes erd 60 8 PCI Express and Legacy PCI subsystem rrrnnnnnnnnnnnvvvnnnnnnnnnnnnnnnvnnnnnnnnnnnnnnnnevennennnnnnnnn 63 8 1 PCI Express Errors engen ee Genee Eege Beete teas heel teens 63 8 1 1 PCI Express le 63 8 1 2 PCI Express Fatal EE 65 8 1 3 Legacy CC 67 9 System BIOS events EE 69 9 1 NABER Veeteren 69 9 1 1 e Ee 69 9 1 2 Timestamp Clock Synchronization arrnnnvrrrnnnnnnnrrnnnnnrrrrnnnnnnnrrrnnnrrrrnnnenenrrrrnnnnnn 69 9 2 System Firmware Progress Formerly Post Error 71 9 2 1 System Firmware Progress Formerly Post Error Next Steps eesessesseeen 71 10 Chassis SUDSYSTC EE 78 10 1 PHYSICA SOC UY EE 78 TOTA CHASSIS NATHUSIOM EE 78 10 1 2 LAN Leashl stuksarnmagsgnmiramui hu riemnde d seksere Ee EE Ee 78 10 2 FP NMI Interrupt EE 79 10 2 1 FP NMI Interrupt Next Steps rrrnnnnnnnnrrnnnnnrrnnnnnnnannrnnnnnrrnnnnnnnnnnrrnnnnrnnnnnnnennnr 80 10 3 Button Press E 80
80. nspecified Event Data 3 3 0 Event Trigger Offset 0 15 Event Data 2 Not used 16 Event Data 3 Not used 12 2 1 FP NMI Interrupt Next Steps The purpose of this button is for diagnosing software issues when a critical interrupt is generated the OS typically saves a memory dump This allows for exact analysis of what is going on in system memory which can be useful for software developers or for troubleshooting OS software and driver issues If this button was not actually pressed you should ensure there is no physical fault with the front panel This event only gets logged if a user pressed the NMI button and although it causes the OS to crash is not an error 12 3 Button Press Events The BMC logs when the front panel power and reset buttons get pressed This is purely for informational purposes and these events do not indicate errors Table 75 Button Press Events Sensor Typical Characteristics Byte Field Description 11 Sensor Type 14h Button Switch 12 Sensor Number 09h 80 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards Revision 1 0 Byte Field Description 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified E
81. ntelligent Platform Management Interface IPMI The key characteristic of the Intelligent Platform Management Interface IPMI is that the inventory monitoring logging and recovery control functions are available independent of the main processors BIOS and operating system Platform management functions can also be made available when the system is in a powered down state IPMI works by interfacing with the BMC which extends management capabilities in the server system and operates independent of the main processor by monitoring the on board instrumentation Through the BMC IPMI also allows administrators to control power to the server and remotely access BIOS configuration and operating system console information IPMI defines a common platform instrumentation interface to enable interoperability between Revision 1 0 Intel order number G7421 1 001 Introduction System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards The baseboard management controller and chassis The baseboard management controller and systems management software Between servers IPMI enables the following Common access to platform management information consisting of Local access from systems management software Remote access from LAN Inter chassis access from Intelligent Chassis Management Bus Access from LAN serial modem IPMB PCI SMBus or ICMB available even if the processor is down IPMI interface
82. oltage rail and report the current usage as a percentage of the maximum power output for that rail Table 23 Power Supply Current Output Sensors Typical Characteristics Byte Field Description 11 Sensor Type 03h Current 54h Power Supply 1 Current Output 12 Sensor Number 55h Power Supply 2 Current Output 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 01h Threshold 7 6 01b Trigger reading in Event Data 2 14 Event Data 1 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 24 15 Event Data 2 Reading that triggered event 28 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards Power Subsystems Byte Field Description 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and for deassertion Table 24 Power Supply Current Output Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert E f Description Next Steps Hex Description Severity Severity Upper non critical If you see this event the system is using too much power on the output for orn going high Degraded OK the PSU rating P
83. on 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 32 15 Event Data 2 Not used 16 Event Data 3 Not used Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Cooling subsystem The following table describes the severity of each of the event triggers for both assertion and for deassertion Table 32 Fan Redundancy Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h fully redundant Oth redundancy lost 02h redundancy degraded System has lost one or more fans and is running in non 03h Oe soll redundant mode There are enough fans to keep the system sufficient from properly cooled but fan speeds will boost redundant 04h non redundant er sufficient from Fan redundancy loss indicates failure of one or more fans insufficient Look for lower non critical fan errors or fan removal errors in the SEL to indicate which fan is causing the problem and follow the 05h non redundant System has lost fans and may no longer be able to cool itself troubleshooting steps for these event types insufficient adequately Overheating m
84. on 1 0 Intel order number G7421 1 001 73 System BIOS events System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards JS JH KH ITC Isi SL 74 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards System BIOS events Revision 1 0 Intel order number G7421 1 001 75 System BIOS events System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 76 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards System BIOS events A5SA4 PCI Express IBIST error A6A0 DXE boot services driver Not enough memory available to shadow a legacy option ROM Minor B6A3 DXE boot services driver Unrecognized Major Revision 1 0 Intel order number G7421 1 001 77 Chassis subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 12 Chassis subsystem The BMC monitors several aspects of the chassis Next to logging when the power and reset buttons get pressed the BMC also monitors chassis intrusion if a chassis intrusion switch is included in the chassis as well as looking at the network connections and logging an event whenever the physical network link is lost 12 1 Physical Security Two sensors are included in the physical security subsystem chassis intrusion and LAN leash lost 12 1 1 Chassis Intrusion C
85. on Deassert we Description Next Steps Hex Description Severity Severity Upper non K 1 Check for clear and unobstructed airflow into and out of chassis 07h critical going Degraded OK An upper non critical 3 Ensure SDR is programmed and correct chassis has been selected high or Critical temperature E h fan fail S threshold has been 3 nsure there are no fan failures itica d 4 Ensure the air used to cool the system is within the thermal specifications for the system ogh Pperen non fatal Degraded Crossed y p y going high typically below 35 C 30 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Cooling subsystem 5 Cooling subsystem 5 1 Fan sensors There are three types of fan sensors that can be present on Intel server systems speed presence and redundancy The last two are only present in systems with hot swap redundant fans 5 1 1 Fan Speed Sensors Fan speed sensors monitor the rpm signal on the relevant fan headers on the platform Fan speed sensors are threshold based sensors Usually they only have lower critical thresholds set so that a SEL entry is only generated should the fan spin too slowly Table 27 Fan Speed Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number 30h 39h Chassis specific 7 Event direction ua Event Directi
86. on and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 01h Threshold 7 6 01b Trigger reading in Event Data 2 14 Event Data 1 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 28 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and for deassertion Revision 1 0 Intel order number G7421 1 001 31 Cooling subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 28 Fan Speed Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert te i Description Next Steps Hex Description Severity Severity A fan speed error on a new system build is typically not caused by the fan spinning too slowly instead it is caused by the fan being connected to the wrong header the BMC expects them on certain headers for each chassis and will log this event if there is no fan on that header fa 1 Refer to the Quick Start Guide or the Service Guide to identify the correct 00h Lower eck critical Degraded OK Gin fan speed has Kong Ge fan headers to use oing low its lower non critical threshold going 2 Ensure the latest FRUSDR update has been run and the correct chassis was detected or selected 3 If yo
87. or Type OBh Other Units 12 Sensor Number 52h Power Supply 1 AC Power Input 53h Power Supply 2 AC Power Input 7 Event direction Event Direction and 0b Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 01h Threshold 7 6 01b Trigger reading in Event Data 2 14 Event Data 1 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 22 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event Intel order number G7421 1 001 27 Power Subsystems System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards The following table describes the severity of each of the event triggers for both assertion and for deassertion Table 22 Power Supply AC Power Input Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert i Description Next Steps Hex Severity Severity P P Upper non If you see this event the system is pulling too much power on the input for the SEN critical going high Degraded PSU rating U ttical power 1 Verify the power budget is within the specified range 09h Bea Hake non fatal Degraded l 2 Check hitp www intel com p en_US support for the power budget tool for your system 4 3 3 Power Supply Current Output Sensors PMBus compliant power supplies may monitor the current output of the main 12v v
88. oting Guide for Intel S5500 S3420 series Server Boards 15 4 1 Node Manger Alert Threshold Exceeded Next Steps First occurrence of an unacknowledged event will be retransmitted no faster than every 300 milliseconds First occurrence of Threshold exceeded event assertion deassertion will be retransmitted no faster than every 300 milliseconds Next steps will depend on the policy that was set See the Node Manager Specification for more details 96 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 16 Microsoft Windows Records Microsoft Windows Records With Microsoft Windows Server 2003 R2 and later versions an Intelligent Platform Management Interface IPMI driver was added This added the capability of logging some OS events to the SEL The driver can write multiple records to the SEL for the following events Boot up Shutdown Bug Check Blue Screen 16 1 Boot up Event Records When the system boots into the Microsoft Windows OS there can be two events logged The first is a boot up record and the second is an OEM event These are informational only records Table 89 Boot up Event Record Typical Characteristics Byte Field Description Generator ID 0041h System Software with an ID 20h 11 Sensor Type 1Fh OS Boot 12 Sensor Number 00h 7 Event direction 13 Event Direction and Ob Assertion
89. pabilities Change 93 13 3 1 Node Manager Operational Capabilities Change Next Steps essen 94 13 4 Node Manger Alert Threshold Exceeded unn 95 13 4 1 Node Manger Alert Threshold Exceeded Next Greng 96 14 Microsoft Windows Records rrrnrrnnnnvvvvnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnneen 97 14 1 Bo oL up Event Records EE 97 14 2 Shutdown Event Records eye Keeseren Ae Eege ENEE ETC 99 14 3 Bug Check Blue Screen Event RecordS AAA 102 15 Linux Kernel Panic RecordsS rrrnnnnnnnnvnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnner 104 vi Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards List of Tables List of Tables Table SEL R cord Formali EE 4 Table 2 Event Request Message Event Data Field Contents cccccccceceeeeeeeeeeeeeeeeeeeeeeeeeeeees 6 Table 3 OEM SEL Record Type COh DFh ENNEN 7 Table 4 OEM SEL Record Type EOh FER ENNEN 7 Table 5 BMC owned EE EE 8 Table 6 BIOS POST owned SGNSOIS e S R E S 12 Table 7 BIOS SMI owned Gensors enen n ennn n nennen nnne nennen nnen n ennnen ennenen 13 Table 8 Hot Swap Controller Firmware owned Sensors ccccccceceeceeeeeeeeeeeeeeeeeeeeeeeeeeeeeseeeeeees 14 Table 9 Management Engine Firmware owned Sensors ccccceceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeess 16 Table 10 Microsoft OS owned Events 17 Table 11 Linux Kern
90. perational Capabilities Bit pattern 0 Policy interface capability 0 Not Available 1 Available 14 Event Data 1 an P 1 Monitoring capability 0 Not Available 1 Available 2 Power limiting capability 0 Not Available 1 Available 15 Event Data 2 Not used Revision 1 0 Intel order number G7421 1 001 93 Manageability Engine ME events System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description 16 Event Data 3 Not used EEN Node Manager Operational Capabilities Change Next Steps Policy Interface available indicates that Intel Intelligent Power Node Manager is able to respond to the external interface about querying and setting Intel Intelligent Power Node Manager policies This is generally available as soon as the microcontroller is initialized Monitoring Interface available indicates that Intel Intelligent Power Node Manager has the capability to monitor power and temperature This is generally available when firmware is operational Power limiting interface available indicates that Intel Intelligent Power Node Manager can do power limiting and is indicative of an ACPI compliant OS loaded unless the OEM has indicated support for non ACPI compliant OS Current value of not acknowledged capability sensor will be retransmitted no faster than every 300 milliseconds Next steps will depend on the policy that was set See the No
91. r Intel 5500 53420 series Server Boards Revision History Revision History Number August 2012 Initial draft Revision 1 0 Intel order number G7421 1 001 il Table of Contents System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards Table of Contents Te hiroduy N EE 1 1 1 Purpose me EE 1 1 2 Industry Standard seyn y a a a E meee 1 1 2 1 Intelligent Platform Management Interface PMI 1 1 2 2 Baseboard Management Controller DM 2 1 2 3 Intel Intelligent Power Node Manager version 1 p 3 2 Basle decoding of a SEL Record iis cicssscssees sted scetisasnne scccieiesssasnnescecesiesssetaseancccsieastetaeeece 4 2 1 Default values in the SEL records AANEREN 4 3 Sensor Cross Reference let eusssdetosbeser eege dE NEEEN EENS EES 8 3 1 BMC owned Sensors GID 0020 8 3 2 BIOS POST owned Sensors GID OOO 1h rrrrrrnnnnnnnnnnnnnrrrrnnnnnnnrrrrnnrrrrnnnnnnnennn 12 3 3 BIOS SMI owned Sensors GID 0033h ENNEN 12 3 4 Hot Swap Controller Firmware owned Sensors GID 00COh 00C2h 14 3 5 Node Manager ME Firmware owned Sensors GID 0020h eects 16 3 6 Microsoft OS owned Events GID 0041 17 3 7 Linus Kernel Panic Events GID O021 ENNEN 18 d POWER SubsystemsS itum EENE sacs ec ea dea ce check ees dete 19 4 1 Voll ge SANS AT 19 4 2 Power UT aent 23 4 2 1 Power Unit Status Sensor ke EEN 23 4 2 2 Power Unit Redundancy Denger Edge det ege ee Ee teens 24 4 3
92. racteristics rrnrrrrrnnnnnnnrnnnnnvrrrnnnnnnnnnnr 87 HSC Drive Presence Sensor Typical Characheristtce resserre 88 Node Manager Exception Sensor Typical Characteristics rrrrnrrrrrnnnrnnnrrrnnnrrrnnnnn 90 Node Manager Health Event Sensor Typical Charachertetce 91 Node Manager Operational Capabilities Change Sensor Typical Characteristics 93 Node Manager Alert Threshold Exceeded Sensor Typical Characteristics 95 Boot up Event Record Typical Characteristics cccccccccccccccccccceceeeceeeeeeeeeeeeeeeeeess 97 Boot up OEM Event Record Typical Characteristics ccccccccccccceecececeeeeeeeeeeeeeeees 98 Shutdown Reason Code Event Record Typical Charachertsics 99 Shutdown Reason OEM Event Record Typical Characteristics ssseseeeeeeee 99 Shutdown Comment OEM Event Record Typical Characteristics rrrnnrnnnrrrnnnnnr 100 Bug Check Blue Screen OS Stop Event Record Typical Characteristics 102 Bug Check Blue Screen code OEM Event Record Typical Characteristics 102 Linux Kernel Panic Event Record Characteristics cccccccccccceccceceeeeeeceeeeeeeeeees 104 Linux Kernel Panic String Extended Record Characteristics rrrrrrrrrrrrrrrrrrrrrrrnr 105 Intel order number G7421 1 001 ix System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards Introduction 1 Introduction The server management hardware t
93. rface Specification Version 2 0 Accordingly these sensors are read as Status and Redundancy sensors Event Reading Type 0x09 and 0x0B respectively Sensor Number 12h Event Type 0x09 Mirroring Configuration Status Sensor Number 01h Event Type 0x0B Mirroring Redundancy State Sensor Number 13h Event Type 0x09 Sparing Configuration Status Sensor Number 11h Event Type 0x0B Sparing Redundancy State 9 1 1 Mirroring Configuration Status This sensor provides the Mirroring mode RAS configuration status Table 52 Mirroring Configuration Status Sensor Typical Characteristics Byte Field Description Generator ID 0001h BIOS POST 11 Sensor Type Och Memory 12 Sensor Number 12h 50 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Memory subsystem Byte Field Description 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 09h digital Discrete 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 53 15 Event Data 2 Not used 16 Event Data 3 Not used Table 53 Mirroring Configuration Status Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex De
94. s 12V is used by the serial port and by PCI cards In addition it is used to generate various processor voltages 1 Ensure all cables are connected correctly 1Ch BB 12 0V 2 Reseat any PCI cards try other slots 3 If the issue follows the card swap it otherwise replace the main board 4 Ifthe issue remains replace the power supplies This 1 35V line is supplied by the main board This 1 35V line is used by low voltage memory on processor 1 1 E ll tly 1Dh BB 1 35 P1 Mem nsure all cables are connected correctly 2 Check the DIMMs are seated properly 3 Cross test DIMMs 4 Ifthe issue remains with the DIMMs on this socket replace the main board otherwise the DIMM 22 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 53420 series Server Boards Power Subsystems Sensor Sansor N ia Next Steps Number This 1 35V line is supplied by the main board This 1 35V line is used by low voltage memory on processor 2 fi E ll ly AEN BB 1 35 P2 Mem nsure all cables are connected correctly 2 Check the DIMMs are seated properly 3 Cross test DIMMs 4 Ifthe issue remains with the DIMMs on this socket replace the main board otherwise the DIMM 4 2 Power Unit The power unit monitors the power state of the system and logs the state changes in the SEL 4 2 1 Power Unit Status Sensor The power unit status sensor monitors the power state of the system and
95. scription The system has been configured into User enabled mirrored channel SCH Mirrored Channel RAS Mode mode in setup nor Mana neh EE Mirrored channel mode is 1 If this event is accompanied by a post error 8500 there was a problem applying the Th tem has b figured out disabled either in setup or due mirroring configuration to the memory Check for other errors related to the memory 00h e System has been conligured out to unavailability of memory at and troubleshoot accordingly of Mirrored Channel RAS Mode post in which case post error 8500 is also logged 2 If there is no post error then mirror mode was simply disabled in bios setup and this should be considered informational only Revision 1 0 Intel order number G7421 1 001 5 Memory subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 9 1 2 Mirrored Redundancy State Sensor This sensor provides the RAS Redundancy state for the Memory Mirrored Channel Mode Table 54 Mirrored Redundancy State Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0001h BIOS POST 11 Sensor Type Och Memory 12 Sensor Number O1h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 10b OEM code in Event Data 3 3 0 Event Trig
96. sence Sensor Next Steps OCh Drive Slot 4 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps ODh Drive Slot 5 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 8 Slot HSBP 08h Drive Slot 6 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 14 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Sensor Cross Reference List eg Sensor Name Details Section Next Steps 09h Drive Slot 7 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps OAh Drive Slot 0 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OBh Drive Slot 1 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OCh Drive Slot 2 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps ODh Drive Slot 3 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OEh Drive Slot 4 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OFh Drive Slot 5 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 10h Drive Slot 6 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 11h Drive Slot 7 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps Re
97. sor Status Sensors Next Steps Processor Status Sensor Processor 2 Status 61h P2 Status Table 45 Processor Status Sensors Next Steps Processor Status Sensor Processor 1 Thermal Margin 62h P1 Therm Margin Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps Processor 2 Thermal Margin 63h P2 Therm Margin Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps Processor 1 Thermal Control processor Thermal Control 64h P1 Therm Ctrl EE e e Table 41 Processor Thermal Control Sensors Next Steps Processor 2 Thermal Control processor Thermal Control 65h P2 Therm Ctrl Sane ee Table 41 Processor Thermal Control Sensors Next Steps Revision 1 0 Intel order number G7421 1 001 11 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Sensor Sensor Name Details Section Next Steps Number Processor 1 VRD Temp 66h P1 VRD Hot Discrete Thermal Sensors Table 43 Discrete Thermal Sensors Processor 2 VRD Temp 67h P2 VRD Hot Discrete Thermal Sensors Table 43 Discrete Thermal Sensors Catastrophic Error 68h CATERR Catastrophic Error Sensor Catastrophic Error Sensor Next Steps CPU Missing au 69h CPU Missing CPU Missing Sensor CPU Missing Sensor Next Steps IOH Thermal Trip 6Ah Discrete Thermal Sensors Table 43 Discrete Thermal Sensors IOH Thermal Trip
98. ssis is closed intrusion intrusion sensor is not connected G 3 If this is also the case someone has opened the chassis Ensure nobody has access to the system that shouldn t This is most likely due to unplugging the cable but could also happen if there is an issue with cable Someone has unplugged a LAN cable that was or switch 04h LAN leash present when the BMC initialized This event gets 1 Check the LAN cable and connector for issues lost logged when the electrical connection on the NIC i f connector gets lost 2 Investigate switch logs where possible 3 Ensure nobody has access to the server that shouldn t 12 2 FP NMI Interrupt The front panel interrupt button also referred to as NMI button is a recessed button on the front panel that allows the user to force a critical interrupt which causes a crash error or kernel panic Revision 1 0 Table 74 FP NMI Interrupt Sensor Typical Characteristics Byte Field Description 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 05h Intel order number G74211 001 79 Chassis subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b U
99. stics Not applicable ee DCh Not applicable Table 90 Boot up OEM Event Record Typical Characteristics 02h 20h OS Stop Shutdown Table 91 Shutdown Reason Code Event Record Typical Characteristics Not applicable Shutdown Event GE Re Table 92 Shutdown Reason OEM Event Record Typical Characteristics Not pplicabi Table 93 Shutdown Comment OEM Event Record Typical Characteristics 02h 20h OS Stop Shutdown Table 94 Bug Check Blue Screen OS Stop Event Record Typical Characteristics Not applicable Bug Check Blue Screen DEh Not applicable Table 95 Bug Check Blue Screen code OEM Event Record Typical Characteristics Revision 1 0 Intel order number G7421 1 001 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 3 7 Linux Kernel Panic Events GID 0021 The following table can be used to find the details of records that can be generated when there is a Linux Kernel panic Table 11 Linux Kernel Panic Events Record Sensor Name Type Sensor Type Details Section Next Steps 02h 20h OS Stop Shutdown Table 96 Linux Kernel Panic Event Record Characteristics Not applicable Linux Kernel Panic FOh Not applicable Table 97 Linux Kernel Panic String Extended Record Characteristics Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 4 Power Sub
100. systems The BMC monitors the power subsystem including power supplies select onboard voltages and related sensors 4 1 Voltage Sensors The BMC monitors the main voltage sources in the system including the baseboard memory and processors using IPMI compliant analog threshold sensors Note A voltage error could be caused by the device supplying the voltage or by the device using the voltage For each sensor it will be noted who is supplying the voltage and who is using it Table 12 Voltage Sensors Typical Characteristics Byte Field Description 11 Sensor Type 02h Voltage 12 Sensor Number See Table 14 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 01h Threshold 7 6 01b Trigger reading in Event Data 2 14 Event Data 1 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 13 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and for deassertion Revision 1 0 Intel order number G7421 1 001 Power Subsystems Power Subsystems System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 13 Voltage Sensors Event Triggers Description Event Trigger Assertio
101. ta 2 Not used 16 Event Data 3 Not used 42 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards Processor subsystem Table 45 Processor Status Sensors Next Steps Sensor Event Trigger Offset Numb Sensor Name Description Next Steps umber Hex Description The processor exceeded the Oih Thermal trip SEN 60h P1 Status maximum temperature This event normally only happens due to failures of the thermal solution 07h State Asserted Indicates processor is present 1 Verify heatsink is properly attached and has thermal grease 2 If system has a heatsink fan ensure the fan is spinning 01h Thermal trip Le rn NG the 3 Check all system fans are operating properly 61h P2 Status p 4 Check that the air used to cool the system is within limits typically 35 C 07h State Asserted Indicates processor is present Revision 1 0 Intel order number G7421 1 001 43 Processor subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 8 2 Catastrophic Error Sensor When the Catastrophic Error signal CATERR stays asserted it is a sign that something serious has gone wrong in the hardware The BMC monitors this signal and reports when it stays asserted Table 46 Catastrophic Error Sensor Typical Characteristics Byte Field Description
102. tem Processor Thermal Control sensors report the percentage of the time that the processor is throttling its performance due to thermal issues If this is not addressed the processor could overheat and shut down the system to protect itself from damage Revision 1 0 Table 39 Processor Thermal Control Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 41 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 01h Threshold 7 6 01b Trigger reading in Event Data 2 14 Event Data 1 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 40 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event Table 40 Processor Thermal Control Sensors Event Triggers Description Event Trigger Assertion Deassert Description Hex Description Severity Severity P 07h mie PRESS Degraded OK The thermal margin has gone over its upper non critical threshold 09h EE non fatal Degraded The thermal margin has gone over its upper critical threshold Intel order number G7421 1 001 39 Cooling subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 41 Processor Thermal Control Sensors Next Steps
103. test the processor if possible Revision 1 0 Intel order number G7421 1 001 47 Processor subsystem 8 4 3 QPI Fatal and Fatal 2 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards The system detected a QPI fatal or non recoverable error This is a fatal error Table 50 QPI Fatal Error Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0033h BIOS SMI Handler 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 17h Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 74h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used The QPI Fatal 2 Error is a continuation of QPI Fatal Error Table 51 QPI Fatal 2 Error Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0033h BIOS SMI Handler 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 18h 48 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel 5500 53420 series Server Boards 8 4 3 1 Byte Field Description 7 Event direction ua Event Directio
104. u are sure this was done the event may be a sign of impending fan failure although this would only normally apply if the system has been in use for a while Replace the fan Lower critical The fan speed has dropped below GC going low Honda Degraded its lower critical threshold 5 1 2 Fan Presence and Redundancy Sensors Fan presence sensors are only implemented for hot swap fans and require an additional pin on the fan header Fan redundancy is an aggregate of the fan presence sensors and will warn when redundancy is lost Typically the redundancy mode on Intel servers is an n 1 redundancy if one fan fails there are still sufficient fans to cool the system but it is no longer redundant although other modes are also possible 32 Table 29 Fan Presence Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number 40h 45h Chassis specific Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards Cooling subsystem Byte Field Description 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 08h Generic digital Discrete 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 30 15 Event Data 2 Not used 16 E
105. ually Table 80 System Event PEF Action Sensor Typical Characteristics Byte Field Description 11 Sensor Type 12h System Event 12 Sensor Number 08h 7 Event direction ua Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 4 PEF Action 15 Event Data 2 Not used 16 Event Data 3 Not used 13 4 1 System Event PEF Action Next Steps This event gets logged if the BMC takes an action due to PEF configuration Actions can be sending an alert or resetting power cycling or powering down the system There will be another event that has led to the action so you should investigate the SEL and PEF settings to identify this event and troubleshoot accordingly Revision 1 0 Intel order number G7421 1 001 85 Hot Swap Controller events 14 Hot Swap Controller events System Event Log Troubleshooting Guide for Intel SS500 S3420 series Server Boards The Hot Swap Controller HSC implements the same basic sensor model that is utilized by the other management controllers in the system Sensor model information is contained in the document Intelligent Platform Management Interface Specification A common set of IPM commands is used for configuring the sensors and returning threshold status 14 1 HSC Backp
106. ured policy Error reading power data Error reading inlet temperature Table 86 Node Manager Health Event Sensor Typical Characteristics Byte Field Description Generator ID 002Ch ME Firmware 11 Sensor Type DCh OEM 12 Sensor Number 19h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 73h OEM 7 6 10b OEM code in Event Data 2 14 Event Data 1 5 4 10b OEM code in Event Data 3 3 0 Health Event Type 02h Sensor Node Manager Revision 1 0 Intel order number G7421 1 001 91 Manageability Engine ME events System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Byte Field Description 15 Event Data 2 7 4 Error type 0 9 Reserved 10 Policy Misconfiguration 11 Power Sensor Reading Failure 12 Inlet Temperature Reading Failure 13 Host Communication error 14 Real time clock synchronization failure 15 Platform shutdown initiated by NM policy due to execution of action defined by Policy Exception Action 3 0 Domain Id Currently supports only one domain Domain 0 16 Event Data 3 if Error type 10 or 15 lt Policy ld gt if Error type 11 lt Power Sensor Address gt if Error type 12 lt Inlet Sensor Address gt Otherwise set to 0 15 2 1 Node Manager Health Event Next Steps Misconfigured policy ca
107. vent Data 3 14 Event Data 1 3 0 Event Trigger Offset Oh Power Button 2h Reset Button 15 Event Data 2 Not used 16 Event Data 3 Not used Intel order number G7421 1 001 Chassis subsystem 81 Miscellaneous events 13 Miscellaneous events System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards The miscellaneous events section addresses sensors not easily grouped with other sensor types 13 1 IPMI Watchdog EPSD server systems support an IPMI watchdog timer which can check to see if the OS is still responsive The timer is disabled by default and would have to be enabled manually It then requires an IPMI aware utility in the operating system that will reset the timer before it expires If the timer does expire the BMC can take action if it is configured to do so reset power down power cycle or generate a critical interrupt Table 76 IPMI Watchdog Sensor Typical Characteristics Byte Field Description 11 Sensor Type 23h Watchdog 2 12 Sensor Number 03h 7 Event direction 13 Event Direction and Ob Assertion Event Event Type 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 7 6 00b Unspecified Event Data 2 14 Event Data 1 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as describe in Table 77 15 Event Data 2 Not used 16 Event Data 3 Not used 82 Intel order number G7421 1 001 R
108. vent Data 3 Not used The following table describes the severity of each of the event triggers for both assertion and for deassertion Table 30 Fan Presence Sensors Event Trigger Offset Next Steps Event Trigger Offset ASENON Dessert Description Next Steps Hex Description Severity Severity Assertion A fan was inserted This event may also get logged when the Informational only BMC initializes when AC is applied These events only get generated in systems with hot swappable fans and normally E only when a fan is physically inserted or removed If fans were not physically removed Oth Device OK Degraded 1 Use the Quick Start Guide to check if the right fan headers were used Present Deassert A fan was removed or 2 Swap the fans round to see if the problem stays with the location or follows was not present at the expected ihe fah location when the BMC initialized g i 3 Replace fan or fan wiring housing depending on the outcome of step 2 4 Ensure the latest FRUSDR update has been run and the correct chassis was detected or selected Revision 1 0 Intel order number G7421 1 001 33 Cooling subsystem 34 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Table 31 Fan Redundancy Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number 46h 7 Event directi
109. vent Trigger Offset Next Giepns 51 Mirrored Redundancy State Sensor Typical Characteristics rrrrrrnnnnnnnrrnnnnrrrnnnnn 52 Mirrored Redundancy State Sensor Event Trigger Offset Next Giepe 54 Sparing Configuration Status Sensor Typical Characteristics 0ssseeeeeeeeeeee 54 Sparing Configuration Status Sensor Event Trigger Offset Next Gene 55 Sparing Redundancy State Sensor Typical Characteristics rrrrrrrrrrrrrrrrrrrrnnnnnn 56 Sparing Redundancy State Sensor Event Trigger Offset Next Gieps 57 Correctable and Uncorrectable ECC Error Sensor Typical Characteristics 58 Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset Next Steps59 Address Parity Error Sensor Typical Characteristics rrrrrrnnnnnnorrnnnrrrrrnnnnnnnrrrnnnnnr 60 PCI Express Correctable Error Sensor Typical Charactertetce 63 PCI Express Correctable Error Sensor Event Trigger Offset Next Steps 64 PCI Express Fatal Error Sensor Typical Characteristics rrrrrnnnnnnnnnrnnnrrrrnnnnnnnnnnr 65 PCI Express Fatal Error Sensor Event Trigger Offset Next Gtens 66 Legacy PCI Error Sensor Typical Characteristics rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrnr 67 Legacy PCI Error Sensor Event Trigger Offset Next Greng 68 System Event Sensor Typical Characteristics ssseseesesseseseeeseeeeseeeeeeeeeeeeea 70 POST Error Sensor Typical Characteristcs nn
110. vision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Microsoft Windows Records Byte Field Description 4 5 f 6 Timestamp Time when event was logged LS byte first 7 8 0137h 311 IANA enterprise number for Microsoft 9 IPMI Manufacturer ID 0157h 343 IANA enterprise number for Intel 10 The value logged will depend on the Intelligent Management Bus Driver IMBDRV that is loaded 11 Sequence Number Sequential number reflecting the order in which the records are read The numbers start at 1 for the 1st entry in the SEL and continue q sequentially to n the number of entries in the SEL 12 The first record of this type will contain the Bug Check Blue Screen Stop code and will be followed by the four Bug Check Blue Screen parameters LSB first 13 Bug Check Blue Screen Data Note that each of the Bug Check Blue Screen parameters requires two records each 19 Both of the two records for each parameter will have the same Record ID 15 There will be a total of 9 records 16 Operating system type ee PR 01 64 bit OS Revision 1 0 Intel order number G7421 1 001 103 Linux Kernel Panic Records System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 17 Linux Kernel Panic Records The OpenIPMI driver supports the ability to put semi custom and custom events in the system event log if a panic occurs If you enable the Generate a panic
111. vision 1 0 Intel order number G7421 1 001 15 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 3 5 Node Manager ME Firmware owned Sensors GID 002Ch The following table can be used to find the details of sensors owned by the Node Manager Management Engine ME firmware Table 9 Management Engine Firmware owned Sensors SEH Sensor Name Details Section Next Steps Number 18h Node Manager Exception Events Node Manager Exception Event Node Manager Exception Event Next Steps 19h Node Manager Health Events Node Manager Health Event Node Manager Health Event Next Steps 1Ah Me ag Capabilities Node Manager Operational Capabilities Change Node Manager Operational Capabilities Change Next Steps 1Bh ee ene Alert Threshold Exceeded Node Manger Alert Threshold Exceeded Node Manger Alert Threshold Exceeded Next Steps 16 Intel order number G7421 1 001 Revision 1 0 System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards 3 6 Microsoft OS owned Events GID 0041 The following table can be used to find the details of records that are owned by the Microsoft Operating System OS Table 10 Microsoft OS owned Events Sensor Cross Reference List Record Sensor Name Type Sensor Type Next Steps 02h 1Fh OS Boot Table 89 Boot up Event Record Typical Characteri
112. when power is off and by the main board when power is on 3 3V Vbat is used by the CMOS and related circuits 18h BB 3 3V Vbat 1 Replace the CMOS battery Any battery of type CR2032 can be used 2 If error remains unlikely replace the board Revision 1 0 Intel order number G7421 1 001 d Power Subsystems System Event Log Troubleshooting Guide for Intel S5500 S3420 series Server Boards Sensor Sansor N ia Next Steps Number 5 0V is supplied by the power supplies 5 0V is used by the PCI slots 1 Ensure all cables are connected correctly 19h BB 5 0V 2 Reseat any PCI cards try other slots 3 If the issue follows the card swap it otherwise replace the main board 4 Ifthe issue remains replace the power supplies 5 0V STBY is supplied by the power supplies 5 0V STBY is used to generate other standby voltages 1Ah BB 5 0V STBY 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the board 3 If the issue remains replace the power supplies 12V is supplied by the power supplies 12V is used by SATA drives Fans and PCI cards In addition it is used to generate various processor voltages 1 Ensure all cables are connected correctly 1Bh BB 12 0V 2 Check connections on fans and HDD s 3 If the issue follows the component swap it otherwise replace the board 4 Ifthe issue remains replace the power supplies 12V is supplied by the power supplie

Download Pdf Manuals

image

Related Search

Related Contents

JVC LT-26DR7SU User's Manual    クイックスタートガイド テキスト翻訳機  Finisar 40BASE-SR4 300m QSFP+    Tristar KW-2430  参考資料[PDF]  User manual - Leaf Technologies do Brasil  取扱説明書/1MB  Programmable Elevon and V  

Copyright © All rights reserved.
Failed to retrieve file