Home

System Event Log Troubleshooting Guide for EPSD

image

Contents

1. Byte Field Description 3 0 Event Trigger Offset OA Critical over temperature 15 Event Data 2 Not used 16 Event Data 3 Not used 5 2 6 1 DIMM Thermal Trip Sensors Next Steps Check for clear and unobstructed airflow into and out of the chassis Ensure the SDR is programmed and correct chassis has been selected Ensure there are no fan failures Ensure the air used to cool the system is within the thermal specifications for the system typically below 35 C Pons 5 3 System Air Flow Monitoring Sensor The BMC provides an IPMI sensor to report the volumetric system airflow in CFM cubic feet per minute The airflow in CFM is calculated based on the system fan PWM values The specific Pulse Width Modulation PWM or PWMs used to determine the CFM is SDR configurable The relationship between PWM and CFM is based on a lookup table in an OEM SDR The airflow data is used in the calculation for exit air temperature monitoring It is exposed as an IPMI sensor to allow a data center management application to access this data for use in rack level thermal management This sensor is informational only and will not log events into the SEL 58 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Processor Subsystem 6 Processor Subsystem Intel servers report multiple processor
2. c ccceeeeeeeeeseeeneees 41 Table 23 Power Supply Power In Sensor Event Trigger Offset Next Steps eeee 41 Table 24 Power Supply Current Out Sensors Typical Characteristics ccccccccceeesseees 42 Table 25 Power Supply Current Out Sensor Event Trigger Offset Next Greng 42 Table 26 Power Supply Temperature Sensors Typical Characteristics nnnnns0nnenntnrneeeeee 43 Table 27 Power Supply Temperature Sensor Event Trigger Offset Next Steps 43 Table 28 Power Supply Fan Tachometer Sensors Typical Characteristics ccccceesees 44 Table 29 Fan Tachometer Sensors Typical Characteristics cececeeeeeeeeeeeeeeeeeeeeeeeeeneeeees 45 Table 30 Fan Tachometer Sensor Event Trigger Offset Next Steps ceeeeeeeeeeeeneeeees 46 Table 31 Fan Presence Sensors Typical Characteristics AAA 46 Table 32 Fan Presence Sensors Event Trigger Offset Next Giepe 47 Table 33 Fan Redundancy Sensors Typical Characteristics cccecceseeeeeeeeeeeeeeeeeeeeseneeeees 47 Table 34 Fan Redundancy Sensor Event Trigger Offset Next Steps eens 48 Table 35 Temperature Sensors Typical Characteristics cccccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeesenaeeees 49 Table 36 Temperature Sensors Event Triggers Descrotton 50 Table 37 Temperature Sensors Next Gren ENNEN 50 Table 38 Thermal Margin Sensors Typical Character
3. scccccccceeeeeeeeneeeeceeeeeessseennaeeeeeeesseesseseeeeeeeeeeeessssnnaaees 24 Table 7 BIOS SMI Handler owned Gensors A 24 Table 8 Management Engine Firmware Owned Sensors sssesssssrseessreteserrrsertrnrererensserrnnent 25 Table 9 Microsoft OS Owned Events 26 Table 10 Linux Kernel Panic Events 26 Table 11 Threshold based Voltage Sensors Typical Charachertsics eene 27 Table 12 Threshold based Voltage Sensors Event Triggers Description 28 Table 13 Threshold based Voltage Sensors Next Steps ccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeesenaeeees 28 Table 14 Voltage Regulator Watchdog Timer Sensor Typical Characteristics 0 0006 34 Table 15 Power Unit Status Sensors Typical Characteristics ccccceseeeeeeeeeeeeeeeeeeeeeeseneeees 35 Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps cee 35 Table 17 Power Unit Redundancy Sensors Typical Charachertsice reee 36 Table 18 Power Unit Redundancy Sensor Event Trigger Offset Next Steps snessnneesn 37 Table 19 Node Auto Shutdown Sensor Typical Characteristics cccceeeeseeeeeeeeeeeeeeeeeeeeees 37 Table 20 Power Supply Status Sensors Typical Characteristics cccccssssscceeeeeeeeesseseeees 38 Table 21 Power Supply Status Sensor Sensor Specific Offsets Next Steps 0 11100000 39 Table 22 Power Supply Power In Sensors Typical Characteristics
4. Byte Field Description 8 Generator ID 0001h BIOS POST 9 11 Sensor Type Och Memory 12 Sensor Number 02h 70 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Memory Subsystem Byte Field Description 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 09h digital Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 59 15 Event Data 2 RAS Configuration Error Type 7 4 Reserved 3 0 Configuration Error 0 None 3 Invalid DIMM Configuration for RAS Mode All other values are reserved 16 Event Data 3 RAS Mode Configured 7 4 Reserved 3 0 RAS Mode Oh None Independent Channel Mode 1h Mirroring Mode 2h Lockstep Mode 4h Rank Sparing Mode Table 59 Memory RAS Configuration Status Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description Oth RAS configuration User enabled mirrored channel mode Informational event only enabled in setup 00h RAS configuration Mirrored channel mode is disabled 1 If this event is accompanied by a post error 8500 there was a problem disabled either in se
5. Table 58 Memory RAS Configuration Status Sensor Typical ae 02h Memory RAS Configuration Status Memory RAS Configuration Status Characteristics System Firmware Progress Formerly Post 06h POST Error Syston Firmware Progress Formerny Post System Firmware Progress Formerly Post Error Next Steps Error On S 09h Intel Quick Path Interface Link QPI Link Width Reduced Sensor QPI Link Width Reduced Sensor Next Steps Width Reduced meg eo Pe 12h Memory RAS Mode Select Memory RAS Mode Select Not applicable 83h System Event System Events Not applicable 3 3 BIOS SMI Handler owned Sensors GID 0033h The following table can be used to find the details of sensors owned by BIOS SMI Handler Table 7 BIOS SMI Handler owned Sensors Sensor Sensor Name Details Section Next Steps Number Oth Mirroring Redundancy State Mirroring Redundancy State Mirroring Redundancy State Sensor Next Steps 02h Memory ECC Error Memory Correctable and Uncorrectable Table 64 Correctable and Uncorrectable ECC Error Sensor y ECC Error Event Trigger Offset Next Steps 03h Legacy PCI Error Legacy PCI Errors Legacy PCI Error Sensor Next Steps 04h PCI Express Fatal Error PCI Express Fatal Errors and Fatal Error 2 PCI Express Fatal Error and Fatal Error 2 Sensor Next express ata US and Tata Error e Steps 05h PCI Express Correctable Error PCI Express Correctable Errors PCI Expres
6. IPMI v1 0 ER 11 Sensor Type Sensor Type Code for sensor that generated the event ST 12 Sensor Number of sensor that generated the event From SDR SN 13 Event Dir Event Dir Event Type 7 Ob Assertion event EDIR 1b Deassertion event Event Type Type of trigger for the event for example critical threshold going high state asserted and so on Also indicates class of the event For example discrete threshold or OEM The Event Type field is encoded using the Event Reading Type Code Revision 1 1 Intel order number G90620 002 5 Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Byte Field Description 6 0 Event Type Codes 01h Threshold States 0x00 0x0b 02h 0ch Discrete 6Fh Sensor Specific 70 7Fh OEM 14 Event Data 1 Per Table 2 ED1 15 Event Data 2 ED2 16 Event Data3 ED3 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Basic Decoding of a SEL Record Table 2 Event Request Message Event Data Field Contents Sensor Event Data Class Threshold Event Data 1 7 6 00b Unspecified Event Data 2 01b Trigger reading in Event Data 2 10b OEM code in Event
7. If you have replaced a hard drive this is expected Rebuild Remap S e o g 07h in progress If you have a hot spare and one of the drives failed this is expected Check logs for which drive has failed If this is seen unexpectedly it could be an indication of a drive that is close to failing 12 3 Hot Swap Controller Health Sensor The BMC supports an IPMI sensor to indicate the health of the Hot Swap Controller HSC This sensor will indicate that the controller is offline for the cases that the BMC either cannot communicate with it or it is stuck in a degraded state so that the BMC cannot restore it to full operation through a firmware update Revision 1 1 Table 91 HSC Health Sensor Typical Characteristics Byte Field Description 11 Sensor Type 16h Microcontroller 69h Hot Swap Controller 1 Status 12 Sensor Number 6Ah Hot Swap Controller 2 Status 6Bh Hot Swap Controller 3 Status 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type OAh Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 4h Transition to offline Intel order number G90620 002 113 Hot Swap Controller Backplane Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families
8. MIC Status Sensors Intel Xeon Phi Coprocessor MIC Status Sensors Next Steps Processor 1 DIMM Aggregate BOh Thermal Margin 1 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P1 DIMM Thrm Mrgn1 Processor 1 DIMM Aggregate Bih Thermal Margin 2 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P1 DIMM Thrm Mrgn2 Processor 2 DIMM Aggregate B2h Thermal Margin 1 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P2 DIMM Thrm Mrgn1 Processor 2 DIMM Aggregate B3h Thermal Margin 2 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P2 DIMM Thrm Mrgn2 Processor 3 DIMM Aggregate B4h Thermal Margin 1 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P3 DIMM Thrm Mrgn1 Processor 3 DIMM Aggregate B5h Thermal Margin 2 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P3 DIMM Thrm Mrgn2 Processor 4 DIMM Aggregate B6h Thermal Margin 1 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P4 DIMM Thrm Mrgn1 20 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Cross Reference List Sensor Sensor Name Details Section Next Steps Num
9. Sensor Number 05h Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 71h OEM Specific Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset Oh Receiver Error 1h Bad DLLP 2h Bad TLP 3h Replay Num Rollover 4h Replay Timer timeout 5h Advisory Non fatal 6h Link BW Changed 7h Correctable Internal 8h Header Log Overflow Fh Unspecified Non AER Correctable Error 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number Revision 1 1 Intel order number G90620 002 85 PCI Express and Legacy PCI Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 8 1 3 1 PCI Express Correctable Error Sensor Next Steps This is an informational event only Correctable errors are acceptable and normal at a low rate of occurrence If the error continues 1 Decode the bus device and function to identify the card 2 If this is an add in card a Verify the card is inserted properly b Install the card in another slot and check whether the error follows the card or stays with the slot c Update all firmware and drivers including non Intel components 3 If this is an on board device a Update all
10. 84FF System event log full Minor 8500 Memory component could not be configured in the selected RAS mode Major 8501 DIMM Population Error Major 8520 DIMM AT failed test initialization Major 8521 DIMM_A2 failed test initialization Major 8522 DIMM_AS failed test initialization Major 8523 DIMM_B1 failed test initialization Major 8524 DIMM_B2 failed test initialization Major 8525 DIMM_B3 failed test initialization Major 8526 DIMM_C1 failed test initialization Major 8527 DIMM_C2 failed test initialization Major 8528 DIMM_C3 failed test initialization Major 8529 DIMM_D1 failed test initialization Major 852A DIMM_D2 failed test initialization Major 852B DIMM_D3 failed test initialization Major 852C DIMM_E1 failed test initialization Major 852D DIMM_E2 failed test initialization Major 852E DIMM_E38 failed test initialization Major 852F DIMM_F1 failed test initialization Major 8530 DIMM_F2 failed test initialization Major 8531 DIMM_FS failed test initialization Major 8532 DIMM_G1 failed test initialization Major 8533 DIMM_G2 failed test initialization Major Intel order number G90620 002 System BIOS Events 9 System BIOS Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Error Code Error Message Response 8534 DIMM_G8 failed test initialization Ma
11. 85DA DIMM_R1 disabled Major 85DB DIMM_R2 disabled Major 85DC DIMM_R3 disabled Major 85DD DIMM_T1 disabled Major 85DE DIMM_T2 disabled Major 85DF DIMM_TS3 disabled Major 85E0 DIMM_L3 encountered a Serial Presence Detection SPD failure Major 85E1 DIMM_M1 encountered a Serial Presence Detection SPD failure Major 85E2 DIMM_M2 encountered a Serial Presence Detection SPD failure Major 85E3 DIMM M encountered a Serial Presence Detection SPD failure Major 85E4 DIMM_N1 encountered a Serial Presence Detection SPD failure Major 85E5 DIMM_N2 encountered a Serial Presence Detection SPD failure Major 85E6 DIMM_N3 encountered a Serial Presence Detection SPD failure Major 85E7 DIMM_P1 encountered a Serial Presence Detection SPD failure Major 85E8 DIMM_P2 encountered a Serial Presence Detection SPD failure Major 85E9 DIMM_P3 encountered a Serial Presence Detection SPD failure Major 85EA DIMM_R1 encountered a Serial Presence Detection SPD failure Major 85EB DIMM_R2 encountered a Serial Presence Detection SPD failure Major 85EC DIMM_R3 encountered a Serial Presence Detection SPD failure Major 85ED DIMM_T1 encountered a Serial Presence Detection SPD failure Major 85EE DIMM_T2 encountered a Serial Presence Detection SPD failure Major 85EF DIMM_T3 encountered a Serial Presence Detection SPD failure Major 8604 POST Reclaim of non critical NVRAM variables Minor Intel order number G90620 002 System BIOS Events 95 System
12. Catastrophic Error Sensor Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number 80h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 03h Digital Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Event Data 2 values as described in Table 50 16 Event Data 3 Bitmap of the CPU that causes the system CATERR 0 CPU1 1 CPU2 2 CPU3 3 CPU4 Note If more than one bit is set the BMC cannot determine the source of the CATERR Table 50 Catastrophic Error Sensor Event Data 2 Values Next Steps ED2 Description Next Steps 1 h Pag ee Cross test the processors 2 Replace the processors depending on the results of the test Revision 1 1 Intel order number G90620 002 61 Processor Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families ED2 Description Next Steps This error is typically caused by other platform components 1 Check for other errors near the time of the CATERR event Oih CATERR 2 Verify all peripherals are plugged in and operating correctly particularly Hard Drives Optical Drives and I O 3 Update system f
13. RELATING TO SALE AND OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE MERCHANTABILITY OR INFRINGEMENT OF ANY PATENT COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT A Mission Critical Application is any application in which failure of the Intel Product could result directly or indirectly in personal injury or death SHOULD YOU PURCHASE OR USE INTEL S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES SUBCONTRACTORS AND AFFILIATES AND THE DIRECTORS OFFICERS AND EMPLOYEES OF EACH HARMLESS AGAINST ALL CLAIMS COSTS DAMAGES AND EXPENSES AND REASONABLE ATTORNEYS FEES ARISING OUT OF DIRECTLY OR INDIRECTLY ANY CLAIM OF PRODUCT LIABILITY PERSONAL INJURY OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN MANUFACTURE OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS Intel may make changes to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them The information here is subject to change without notice Do not finalize a design with this information T
14. Revision 1 1 Intel order number G90620 002 v Table of Contents System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families 9 2 1 System Firmware Progress Formerly Post Error Next Steps ceeeeeeeeees 89 TO C assis Subsystem 2 6522 eevee sectesvasedoced sae sac ennec seen saan aaae ay seugseseiecs eves saccacetecseues Eoee eenseseaee 97 10 1 Physical Security EE 97 10 11 GAASSIS ng e EE 97 101 2 LAN Ce sh Feiere eege gue 97 10 2 FP NMI Interrupt cote colent depen EE ege Seege 98 10 2 1 PP NMI Interrupt Next Steps enge cc decercesacetsedeneedeacadateadeta vehi EES penitence 99 10 3 e RE 100 11 Miscellaneous E 101 11 1 BIN Een OO EE 101 11 2 MOM MIMIC EE 102 11 2 1 SMI Timeout Next Steps wees s ckescissdedesa deca apeldetd edees nate aslcaiadk auesiendeeedeadeasteeee 103 11 3 System Event Log Cleared a 4 atv ea eee ie ace ace Gs Ba es eds see 103 11 4 System Event PEP ACHON DE 104 11 4 1 System Event PEF Action Next Steps EEN 104 11 5 BMC Watchdog EE TEE 105 11 5 1 BMC Watchdog Sensor Next Gens 105 11 6 BMC FW Health BEE 106 11 6 1 BMC FW Health Sensor Next Giepns AAA 106 11 7 Firmware Update Status Gensor EEN 107 11 8 Add In Module Presence Gensor ANNE 108 11 8 1 Add In Module Presence Next Giepe AEN 108 11 9 Intel Xeon Phi Coprocessor Management Sensors cccccesesssessssssesssesensees
15. 001Fh RT Record Type 02h system event record TS Timestamp 4F8D70C3h GID Generator ID 0033h BIOS SMI Handler ER Event Message Revision 04 IPMI v2 0 ST Sensor Type 12h System Event From IPMI Specification Table 42 3 Sensor Type Codes SN Sensor Number 83h EDIR Event Direction Event Type 6Fh 7 0 Assertion Event 6 0 6fh Sensor specific ED1 Event Data 1 05h Timestamp Clock Synchronization ED2 Event Data 2 00h First in pair RID 20 00 RT 02 TS C4 70 SD 4F GID 33 00 ER 04 ST 12 SN 83 EDIR 6F ED1 05 ED2 80 ED3 FF 2 2 2 RID Record ID 0020h RT Record Type 02h system event record TS Timestamp 4F8D70C4h GID Generator ID 0033h BIOS SMI Handler ER Event Message Revision 04 IPMI v2 0 ST Sensor Type 12h System Event From IPMI Specification Table 42 3 Sensor Type Codes SN Sensor Number 83h EDIR Event Direction Event Type 6fh 7 0 Assertion Event 6 0 6fh Sensor specific ED1 Event Data 1 05h Timestamp Clock Synchronization ED2 Event Data 2 00h First in pair Example of Decoding a PCI Express Correctable Error Events Basic Decoding of a SEL Record The following is an example of decoding a PCI Express correctable error event For this particular event it recorded a receiver error on Bus 0 Device 2 and Function 2 Note that correctable errors are acceptable and normal at a low rate of occurrence
16. 010b DIMM Socket 3 All other values are reserved 7 5 2 1 Memory Address Parity Error Sensor Next Steps These are bit errors that are detected in the memory addressing hardware An Address Parity Error implies that the memory address transmitted to the DIMM addressing circuitry has been compromised and data read or written is compromised in turn An Address Parity Error is logged as such in SEL but in all other ways is treated the same as an Uncorrectable ECC Error While the error may be due to a failing DRAM chip on the DIMM it can also be cause by incorrect seating or improper contact between the socket and DIMM or by the bent pins in the processor socket 1 If needed decode DIMM location from hex version of SEL 2 Verify the DIMM is seated properly 3 Examine gold fingers on edge of the DIMM to verify contacts are clean Revision 1 1 Intel order number G90620 002 79 Memory Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 4 Inspect the processor socket this DIMM is connected to for bent pins and if found replace the board 5 Consider replacing the DIMM as a preventative measure For multiple occurrences replace the DIMM 80 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families PCI
17. 1 Verify the power budget is within the specified range going high 2 Check http Awww intel com p en_US support for the power budget tool for your system 42 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 444 Power Supply Temperature Sensors Power Subsystems The BMC monitors one or two power supply temperature sensors for each installed PMBus compliant power supply Table 26 Power Supply Temperature Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number 5Ch Power Supply 1 Temperature 5Dh Power Supply 2 Temperature 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 27 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Table 27 Power Supply Temperature Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deass
18. 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM 3 3V Riser 1 Power Good is supplied by Riser 1 on specific platforms 3 3V Riser 1 Power Good is an indication of the 3 3V on Riser 1 EAh Baseboard 3 3V Riser 1 Power Good 1 Ensure that the riser is seated correctly BB 3 3 RSR1 PGD 2 If issue remains replace the riser 3 If issue remains replace the main board 4 Ifthe issue remains replace the power supplies 3 3V Riser 2 Power Good is supplied by Riser 2 on specific platforms 3 3V Riser 2 Power Good is an indication of the 3 3V on Riser 2 EBh Baseboard 3 3V Riser 2 Power Good 1 Ensure that the riser is seated correctly BB 3 3 RSR2 PGD 2 If issue remains replace the riser 3 If issue remains replace the main board 4 If the issue remains replace the power supplies 32 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Power Subsystems Sensor Sensor Name Next Steps Number 0 9V Core IB is supplied by the main board on specific platforms 0 9V Core IB is used by the on board Infiniband controller on those specific platforms Baseboard 0 9V ECh 1 Ensure all cables are connected correctly BB 0 9V Core IB F 2 If t
19. 600 1 400 Product Families Introduction 1 Introduction The server management hardware that is part of the Intel Server Boards and Intel Server Platforms serves as a vital part of the overall server management strategy The server management hardware provides essential information to the system administrator and provides the administrator the ability to remotely control the server even when the operating system is not running The Intel Server Boards and Intel Server Platforms offer comprehensive hardware and software based solutions The server management features make the servers simple to manage and provide alerting on system events From entry to enterprise systems good overall server management is essential to reduce overall total cost of ownership This Troubleshooting Guide is intended to help the users better understand the events that are logged in the Baseboard Management Controllers BMC System Event Logs SEL on these Intel Server Boards There is a separate User s Guide that covers the general server management and the server management software offered on the Intel Server Boards and Intel Server Platforms Server boards currently supported by this document Intel S1400FP Server Boards Intel S1400SP Server Boards Intel S1600JP Server Boards Intel S2400BB Server Boards Intel S2400EP Server Boards Intel S2400GP Server Boards Intel S2400LP Server Boards Intel S2400SC Server Boards Intel
20. 88 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event Revision 1 1 Intel order number GS0620 002 111 Hot Swap Controller Backplane Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 88 HSC Backplane Temperature Sensor Event Trigger Offset Next Steps Event Trigger Assertion Deassert Description Next Steps Hex Description Severity Severity 00h Lower non critical Degraded OK The temperature has dropped below its lower Check for clear and unobstructed airflow into going low non critical threshold and out of the chassis 02h Lower critical non fatal Degraded The temperature has dropped below its lower Ensure the SDR is programmed and correct going low critical threshold chassis has been selected 07h U itical D ded OK The temperature has gone over its upper non See en es er non critica egrade e EES pper n 9 a p g PP Ensure the air used to cool the system is within going high critical threshold A ed the thermal specifications for the system 09h Upper critical non fatal Degraded The temperature has gone over its upper typically below 35 C going high critical threshold 12 2 Hard Disk Drive Monitoring Sensor The new backplane design for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600
21. BIOS firmware and drivers b Replace the board 86 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families System BIOS Events 9 System BIOS Events There are a number of events that are owned by the system BIOS These events can occur during Power On Self Test POST or when coming out of a sleep state Not all of these events signify errors Some events are described in other chapters in this document for example memory events 9 1 System Events These events can occur during POST or when coming out of a sleep state These are informational events only 1 When logging events during BIOS POST uses generator ID 0001h 2 When logging events during BIOS SMI Handler uses generator ID 0033h 9 1 1 System Boot At the end of POST just before the actual OS boot occurs a System Boot Event is logged This basically serves to mark the transition of control from completed POST to OS Loader It is an informational only event 9 1 2 Timestamp Clock Synchronization These events are used when the time between the BIOS and the BMC is synchronized Two events are logged The BIOS does the first one to send the time synch message to the BMC for synchronization and the timestamp that message gets is unknown that is the timestamp in the log can be anything because it gets the before timestamp So the BIOS sends a s
22. BIOS Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 96 Error Code Error Message Response 8605 BIOS Settings are corrupted Major 8606 NVRAM variable space was corrupted and has been reinitialized Major 92A3 Serial port component was not detected Major 92A9 Serial port component encountered a resource conflict error Major A000 TPM device not detected Minor A001 TPM device missing or not responding Minor A002 TPM device failure Minor A003 TPM device failed self test Minor A100 BIOS ACM Error Major A421 PCI component encountered a SERR error Fatal A5A0 PCI Express component encountered a PERR error Minor A5A1 PCI Express component encountered an SERR error Fatal A6A0 DXE Boot Services driver Not enough memory available to shadow a Legacy Option ROM Minor Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide Tor EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Chassis Subsystem 10 Chassis Subsystem The BMC monitors several aspects of the chassis Next to logging when the power and reset buttons get pressed the BMC also monitors chassis intrusion if a chassis intrusion switch is included in the chassis as well as looking at the network connections and logging an event whenever the physical network li
23. Byte Field Description 15 Event Data 2 Not used 16 Event Data 3 Not used 12 3 1 HSC Health Sensor Next Steps Ensure that all connections to the HSC are well seated Cross test with another HSC If the issue remains with the HSC replace the HSC otherwise start cross testing all interconnections 114 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Manageability Engine ME Events 13 Manageability Engine ME Events The Manageability Engine controls the PECI interface and also contains the Node Manager functionality 13 1 ME Firmware Health Event This sensor is used in Platform Event messages to the BMC containing health information including but not limited to firmware upgrade and application errors Table 92 ME Firmware Health Event Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch or 602Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 17h 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 75h OEM 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Health event type Oh Firmware Status 15 Event Data 2 See Table 93 16 Event Data 3 See Table 93 13 1 1 ME Firmware
24. Express and Legacy PCI Subsystem 8 PCI Express and Legacy PCI Subsystem The PCI Express PCle Specification defines standard error types under the Advanced Error Reporting AER capabilities The BIOS logs AER events into the SEL The Legacy PCI Specification error types are PERR and SERR These errors are supported and logged into the SEL 8 1 PCI Express Errors PCle error events are either correctable informational event or fatal In both cases information is logged to help identify the source of the PCle error and the bus device and function is included in the extended data fields The PCle devices are mapped in the operating system by bus device and function Each device is uniquely identified by the bus device and function PCle device information can be found in the operating system 8 1 1 Legacy PCI Errors Legacy PCI errors include PERR and SERR both are fatal errors Table 66 Legacy PCI Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 03h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset Revision 1 1 Intel order number G90620 002 81 PCI Express and Lega
25. Fan Presence and Redundancy Sensors AEN 46 5 2 Temperature CET 49 iv Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table of Contents 5 2 1 Threshold based Temperature Sensors Au 49 5 2 2 Thermal Margin Sensors seed ENER Eed ERRCASANEEEE ENEE Ku 51 5 2 3 Processor Thermal Control Gensors ensena 53 5 2 4 Processor DTS Thermal Margin Sensors EEN 55 5 2 5 Discrete Thermal Sensors egen bn i mresstesticveden araccedehe nates oenaeat atusaonieadadaansaie 55 5 2 6 DIMM Thermal Trip SONSOls sscc cc ciscscteceeesnencenescecseneadata cache th centauenstnenetbancneesadncte ds 57 5 3 System Air Flow Monitoring SQNSOM sccessc0 cccesiccecgagteneeeessedeteese tn Ueteeeensd ceded 58 6 IPFOCESSOF Subsystem eessen 59 6 1 Proc ssor Status SENSO ccd cctecteis e e E E EEA aE 59 6 2 Catastrophic ee 61 6 3 ES ER Eet RE E 62 6 3 1 CPU Missing Sensor Next Steps ek 63 6 4 Quick Path Interconnect Sensors EEN 63 6 4 1 QPI Link Width Reduced Sensor Seed haensnlactenees gucte easeedeneG beads Eegen 63 6 4 2 QPI Correctable el EE 64 6 4 3 QPI Fatal Error and Fatal Error 2 s ecc scia Gestcacactsheiaeeeitauidedeeintienekcieakialee 65 6 5 Processor ERR2 ue tegt eege foe 206 halen lan el ues deed EEN 67 6 5 1 Processor ERR2 Timeout Next Steps AAA 68 6 6 Processor MSID Mismatch Sensor ENEE 68 6 6 1 Processor MSID Mismatc
26. Health Event Next Steps In the following table Event Data 3 is only noted for specific errors If the issue continues to be persistent provide the content of Event Data 3 to Intel support team for interpretation Event Data 3 codes are in general not documented because their meaning only provides some clues varies and usually needs to be individually interpreted Revision 1 1 Intel order number G90620 002 115 Manageability Engine ME Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 93 ME Firmware Health Event Sensor Next Steps ED2 ED3 Description Next Steps 00h Recovery GPIO forced Recovery Image loaded due to recovery 1 Deassert MGPIO1 and reset the Intel ME 1 Image execution failed MGPIO pin asserted Pin number is configurable in factory presets Recovery Image or backup operational image loaded because Default recovery pin is MGPIO1 operational image is corrupted This may be either caused by flash device corruption or failed upgrade procedure 2 Either the flash device must be replaced if error is persistent or the upgrade procedure must be started again 02h Flash erase error Error during flash erasure procedure The flash device must be replaced 03h 00h Flash state information Recovery bootloader image or factory presets image corrupted Oth Check exten
27. If the system has a heatsink fan ensure the fan is spinning Check all system fans are operating properly Check that the air used to cool the system is within limits typically 35 C Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Cooling Subsystem 5 2 4 Processor DTS Thermal Margin Sensors Intel Xeon processor E5 4600 2600 2400 1600 v2 product families are incorporating a DTS based thermal spec This allows a much more accurate control of the thermal solution and enables lower fan speeds and lower fan power consumption For Intel Xeon processor E5 4600 2600 2400 1600 product families this requires significant BMC FW calculations to derive the sensor value Intel Xeon processor E5 4600 2600 2400 1600 v2 product families are the follow on processors to Intel Xeon processor E5 4600 2600 2400 1600 product families For Intel Xeon processor E5 4600 2600 2400 1600 v2 product families the BMC s derivation of this value is greatly simplified because the majority of the calculations are performed within the processor itself The main usage of this sensor is as an input to the BMC s fan control algorithms The BMC implements this as a threshold sensor There is one DTS sensor for each installed physical processor package Thresholds are not set and alert generation is not enabled for these se
28. Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Sensor Sensor Name Details Section Next Steps Number D2h coe pas ae Table 13 Threshold based Voltage Sensors Next Steps D3h GE a aoe Table 13 Threshold based Voltage Sensors Next Steps D4h ere GPS ee Voltage Table 13 Threshold based Voltage Sensors Next Steps Baseboard 1 05V Processor1 Det Vecp t Threshold based Voltage Sensors Table 13 Threshold based Voltage Sensors Next Steps BB 1 05Vccp P1 Spe at Baseboard 1 05V Processor2 D7h Vecp i Threshold based Voltage Se sors Table 13 Threshold based Voltage Sensors Next Steps BB 1 05Vccp P2 WEN Baseboard 1 5V P1 Memory AB D8h VDDQ S y Threshold based Voltage Sensors Table 13 Threshold based Voltage Sensors Next Steps BB 1 5 P1MEM AB ee Baseboard 1 5V P1 Memory CD Dot VDDQ j y Threshold based Voltage Sensors Table 13 Threshold based Voltage Sensors Next Steps BB 1 5 P1MEM CD faces some Baseboard 1 5V P2 Memory AB DAh VDDQ y Threshold based Voltage Sensors Table 13 Threshold based Voltage Sensors Next Steps BB 1 5 P2MEM AB aw Baseboard 1 5V P2 Memory CD DBh VDDQ a Y Threshold based Voltage Table 13 Threshold based Voltage Sensors Next Steps Sensors BB 1 5 P2MEM CD SE Baseboard 1 8V Aux Threshold based Voltag
29. Out rable 25 Power Supply Current Out Sensor Event Trigger Offset PS2 Curr Out 5Ch Power Supply 1 Temperature Power Supply Temperature Table 27 Power Supply Temperature Sensor Event Trigger Offset Next PS1 Temperature Sensors Steps 16 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Cross Reference List Sensor Sensor Name Details Section Next Steps Number 5Dh Power Supply 2 Temperature Power Supply Temperature Table 27 Power Supply Temperature Sensor Event Trigger Offset Next PS2 Temperature Sensors Steps 60h 68h Hard Disk Drive 15 23 Status Hard Disk Drive Monitoring Table 90 Hard Disk Drive Monitoring Sensor Event Trigger Offset Next HDD 15 23 Status Sensor Steps Hot Swap Controller 1 3 Status Hot Swap Controller Health 69h 6Bh HSC Health Sensor Next Steps HSC 3 Status Sensor Processor 1 Status 70h Processor Status Sensor Table 48 Processor Status Sensors Next Steps P1 Status a n P 2 71h rcessolie Status Processor Status Sensor Table 48 Processor Status Sensors Next Steps P2 Status Og Natasa oe ee e Processor 3 Status 72h Processor Status Sensor Table 48 Processor Status Sensors Next Steps P3 Status Processor 4 Status 73h Processor Status S
30. Processor E5 4600 2600 2400 1600 Product Families this timeout is 500ms Revision 1 1 Intel order number G90620 002 33 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families If the SystemPowerGood signal has not asserted by the time the VR Watchdog Timer expires the FW powers down the system logs a SEL entry and emits a beep code 1 5 1 2 This failure is termed as VR Watchdog Timeout Table 14 Voltage Regulator Watchdog Timer Sensor Typical Characteristics Byte Field Description 11 Sensor Type 02h Voltage 12 Sensor Number OBh 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 03h digital Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 4 2 1 Voltage Regulator Watchdog Timer Sensor Next Steps 1 Ensure that all the connectors from the power supply are well seated 2 Cross test the baseboard If the issue remains with the baseboard replace the baseboard 4 3 Power Unit The power unit monitors the power state of the system and logs the state changes in the SEL 4 3 1 Power Unit Status Sensor The power unit status sensor monitors the power state of t
31. RID 27 00 RT 02 TS 0A 9B 2E 50 GID 33 00 ER 04 ST 13 SN 05 EDIR 71 ED1 A0 ED1 00 ED3 12 RID Record ID 0027h Revision 1 1 Intel order number G90620 002 11 Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families RT Record Type 02h system event record TS Timestamp 502E9BOAh GID Generator ID 0033h BIOS SMI Handler ER Event Message Revision 04 IPMI v2 0 ST Sensor Type 13h Critical Interrupt From IPMI Specification Table 42 3 Sensor Type Codes SN Sensor Number 05h EDIR Event Direction Event Type 71h 7 0 Assertion Event 6 0 71h OEM Specific for PCI Express correctable errors ED1 Event Data 1 AOh 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset Oh Receiver Error ED2 Event Data 2 00h PCI Bus number 0 ED3 Event Data 3 12h 7 3 PCI Device number 02h 2 0 PCI Function number 2 2 2 3 Example of Decoding a Power Supply Predictive Failure Event The following is an example of decoding a Power Supply predictive failure event For this example power supply 1 saw an A C power loss event with both the input under voltage warning and fault events getting set In most cases this means that the A C power spiked under the minimum warning and fault thresholds for over 20 milliseconds
32. S2600CO Server Boards Intel S2600CP Server Boards Intel S2600GZ S2600GL Server Boards Intel S2600IP Server Boards Intel S2600JF Server Boards Intel S2600WP Server Boards Intel S4600LH Server Boards Intel W2600CR Workstation Boards 1 1 Purpose The purpose of this document is to list all possible events generated by the Intel platform It may be possible that other sources not under our control also generate events which will not be described in this document Revision 1 1 Intel order number G90620 002 Introduction System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 1 2 Industry Standard 1 2 1 Intelligent Platform Management Interface IPMI The key characteristic of the Intelligent Platform Management Interface IPMI is that the inventory monitoring logging and recovery control functions are available independently of the main processors BIOS and operating system Platform management functions can also be made available when the system is in a power down state IPMI works by interfacing with the BMC which extends management capabilities in the server system and operates independently of the main processor by monitoring the on board instrumentation Through the BMC IPMI also allows administrators to control power to the server and remotely access BIOS configuration and operating system console information IPMI defines
33. Sensor Specific 14 Event Data 1 7 6 11b Sensor specific event extension code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 4h Sensor failure 15 Event Data 2 Sensor number of the failed sensor 16 Event Data 3 Not used 11 6 1 BMC FW Health Sensor Next Steps 1 Check the SEL for any other events around the time of the failure Take note of all IPMI activity that was occurring around the time of the failure Capture a System BMC Debug Log as soon as you can after experiencing this failure This log can be captured from the Integrated BMC Web Console or by using the Intel Syscfg utility syscfg somcdl private filename zip Send the log file to your system manufacturer or Intel representative for failure analysis 106 If the failure continues around a specific sensor replace the board with that sensor Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Miscellaneous Events 11 7 Firmware Update Status Sensor The BMC FW supports a single Firmware Update Status sensor This sensor is used to generate SEL events related to update of embedded firmware on the platform This includes updates to the BMC BIOS and ME FW This sensor is an event only sensor that is not readable Event generation is only enabled for assert
34. Status event is logged after an AC power on occurs only if any RAS Mode is currently configured and only if RAS Mode is successfully initiated This is to make sure that there is a record in the SEL telling what the RAS Mode was at the time that the system started up This is only logged after AC power on not DC power on The Memory RAS Configuration Status Sensor is also used to log an event during POST whenever there is a RAS configuration error This is a case where a RAS Mode has been selected but when the system boots the memory configuration cannot support the RAS Mode The memory configuration fails and operates in Independent Channel Mode In the SEL record logged the ED1 Offset value is RAS Configuration Disabled and ED3 contains the RAS Mode that is currently selected but could not be configured ED2 gives the reason for the RAS configuration failure at present only two RAS Configuration Error Type values are implemented 0 None This is used for an AC power on log record when the RAS configuration is successfully configured 3 Invalid DIMM Configuration for RAS Mode The installed DIMM configuration cannot support the currently selected RAS Mode This may be due to DIMMs that have failed or been disabled so when this reason has been logged the user should check the preceding SEL events to see whether there are DIMM error events Table 58 Memory RAS Configuration Status Sensor Typical Characteristics
35. Steps P4 VRD Hot ee ee 18 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Cross Reference List Sensor Sensor Name Details Section Next Steps Number Processor 1 Memory VRD Hot 0 1 94h P1 Mem01 VRD Se Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps Processor 1 Memory VRD Hot 2 3 95h P1 Mem23 VRD Se Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps Processor 2 Memory VRD Hot 0 1 96h P2 Mem01 VRD SE Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps Processor 2 Memory VRD Hot 2 3 97h P2 Mem23 VRD Se Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps Processor 3 Memory VRD Hot 0 1 98h P3 Mem01 VRD SCH Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps Processor 3 Memory VRD Hot 2 3 99h P4 Mem23 VRD ae Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps Processor 4 Memory VRD Hot 0 1 9Ah P4 Mem01 VRD an Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps Processor 4 Memory VRD Hot 2 3 9Bh P4 Mem23 VRD Si Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps Power Supply 1 Fan Tachometer 1 Power Supply Fan Tachometer Fowe
36. Subsystems Sensor Sensor Name Next Steps Number 12V is supplied by the power supplies 12V is used by SATA drives Fans and PCI cards In addition it is used to generate various processor voltages DOh SE 1 Ensure all cables are connected correctly ete Ov 2 Check connections on the fans and HDDs 3 If the issue follows the component swap it otherwise replace the board 4 Ifthe issue remains replace the power supplies 5 0V is supplied by the power supplies for pedestal systems and supplied by the main board on rack optimized systems 5 0V is used by the PCI slots Baseboard 5V 1 Ensure all cables are connected correctly pin BB 5 0V 2 Reseat any PCI cards 3 Try PCI cards in other PCI slots 4 Ifthe issue follows the card swap it otherwise replace the main board 5 If the issue remains replace the power supplies 3 3V is supplied by the power supplies for pedestal systems and supplied by the main board on rack optimized systems 3 3V is used by the PCle and PCI X slots Baseboard 3 3V 1 Ensure all cables are connected correctly Den BB 3 3V 2 Reseat any PCI cards 3 Try PCI cards in other PCI slots 4 Ifthe issue follows the card swap it otherwise replace the main board 5 If the issue remains replace the power supplies 5 0V STBY is supplied by the power supplies for pedestal systems and supplied by the main board on rack optimized systems 4 Baseboard 5V Stand by 5 0V STBY is used to generate o
37. Type OBh Generic Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 18 15 Event Data 2 Not used 16 Event Data 3 Not used 36 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Power Subsystems Table 18 Power Unit Redundancy Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h Fully redundant System is fully operational Informational Event Oth Redundancy lost 02h Redundancy degraded 03h Non redundant sufficient from redundant s op System is not running in This event is accompanied by specific power supply errors 04h Non redundant sufficient from insufficient redundant power supply mode ae failure and so on Troubleshoot these events 05h Non redundant insufficient 06h Non redundant degraded from fully redundant 07h Redundant degraded from non redundant 4 3 3 Node Auto Shutdown Sensor The BMC supports a Node Auto Shutdown sensor for logging a SEL event due to an emergency shutdown of a node due to loss of power supply redundancy or PSU CLST throttling due to an over current warning condition This sensor is applicable only to multi node systems Th
38. Type 01h Temperature 12 Sensor Number 78h Processor 1 Thermal Control 79h Processor 2 Thermal Control Revision 1 1 Intel order number G90620 002 53 Cooling Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 7Ah Processor 3 Thermal Control 7Bh Processor 4 Thermal Control Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 42 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event Table 42 Processor Thermal Control Sensors Event Triggers Description Event Trigger Assertion Deassert Description Hex Description Severity Severity 07h a oon ee Degraded OK The thermal margin has gone over its upper non critical threshold 09h SE non fatal Degraded The thermal margin has gone over its upper critical threshold 5 2 3 1 Processor Thermal Control Sensors Next Steps These events normally occur due to failures of the thermal solution ON gt 54 Verify heatsink is properly attached and has thermal grease
39. as soon as the microcontroller is initialized Monitoring Interface available indicates that Intel Intelligent Power Node Manager has the capability to monitor power and temperature This is generally available when firmware is operational Power limiting interface available indicates that Intel Intelligent Power Node Manager can do power limiting and is indicative of an ACPI compliant OS loaded unless the OEM has indicated support for non ACPI compliant OS Current value of not acknowledged capability sensor will be retransmitted no faster than every 300 milliseconds Next steps depend on the policy that was set See the Node Manager Specification for more details Revision 1 1 Intel order number GS0620 002 121 Manageability Engine ME Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 13 5 Node Manger Alert Threshold Exceeded Policy Correction Time Exceeded Event will be sent each time maintained policy power limit is exceeded over Correction Time Limit Table 97 Node Manager Alert Threshold Exceeded Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 1Bh 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 72h OEM 14 Event Data 1 7 6 10b OEM code in Event Dat
40. bent pins and if found replace the board Revision 1 1 Intel order number G90620 002 77 Memory Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Event Trigger Offset Hex Description Description Next Steps 5 Consider replacing the DIMM as a preventative measure For multiple occurrences replace the DIMM 7 5 2 Memory Address Parity Error Address Parity errors are errors detected in the memory addressing hardware Because these affect the addressing of memory contents they can potentially lead to the same sort of failures as ECC errors They are logged as a distinct type of error because they affect memory addressing rather than memory contents but otherwise they are treated exactly the same as Uncorrectable ECC Errors Address Parity errors are logged to the BMC SEL with Event Data to identify the failing address by channel and DIMM to the extent that it is possible to do so Table 65 Address Parity Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type Och Memory 12 Sensor Number 13h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5
41. centric sensors in the SEL 6 1 Processor Status Sensor The BMC provides an IPMI sensor of type processor for monitoring status information for each processor slot If an event state sensor offset has been asserted it remains asserted until one of the following happens e Arearm Sensor Events command is executed for the processor status sensor e ACor DC power cycle system reset or system boot occurs CPU Presence status is not saved across A C power cycles and therefore will not generate a deassertion after cycling AC power Table 47 Process Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number 70h Processor 1 Status 71h Processor 2 Status 72h Processor 3 Status 73h Processor 4 Status 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 48 15 Event Data 2 Not used 16 Event Data 3 Not used Revision 1 1 Intel order number G90620 002 59 Processor Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 48 Processor Status Sensors Next Steps Eve
42. fault associated PMBus Status 04h Over tamperature fault register For example Data 3 will have the contents of the 05h Fan fault VOLTAGE_STATUS register at the time an Output Voltage fault was detected Refer to the PMBus Specification for details on specific register contents AC 2 If the power supply still fails replace it Revision 1 1 Intel order number G90620 002 39 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Specific Offset Description ED2 ED3 Next Steps Hex Description 02h Predictive Check the data in ED2 10b OEM code in Event Data 2 10b OEM code in Event Data 3 Depends on the warning Failure and ED3 for more details a 01h Output voltage warning event 02h Output power warning Will have the contents of the 1 Replace the power 03h Output over current associated PMBus Status supply warning register For example Data 3 will 2 Verify proper airflow 04h Over t t have the contents of the to the system Lver temperature warning VOLTAGE_STATUS register at f 3 Verify the power 05h Fan warning the time an Output Voltage Source D 06h Input under voltage warning was detected Refer to warning the PMBus Specification for 4 Rates the system 07h Input over current e specific register i warning SS HSH 08h Input ov
43. generate a critical interrupt Table 77 IPMI Watchdog Sensor Typical Characteristics Byte Field Description 11 Sensor Type 23h Watchdog 2 12 Sensor Number 03h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as describe in Table 78 15 Event Data 2 Not used 16 Event Data 3 Not used Revision 1 1 Intel order number GS0620 002 101 Miscellaneous Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 78 IPMI Watchdog Sensor Event Trigger Offset Next Steps Description Next Steps Event Trigger Offset Hex Description ool eye Oih Hard reset 02h Power down 03h Power cycle 08h Timer interrupt Our server systems support a BMC watchdog timer which can check to see whether the OS is still responsive The timer is disabled by default and has to be enabled manually It then requires an IPMl aware utility in the operating system that will reset the timer before it expires If the timer does expire the BMC can take action if it is configured to do so reset power down power cycle or generate a critical interrupt If this event
44. owned by the BMC Table 5 BMC owned Sensors Sensor Sensor Name Details Section Next Steps Number Power Unit Status s Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Oth Power Unit Status Sensor Pwr Unit Status Steps Power Unit Redundancy Table 18 Power Unit Redundancy Sensor Event Trigger Offset Next Ower Unit Reaunaancy sensor p 02h Pwr Unit Redund Power Unit Redundancy Sensor Steps IPMI Watch 03h See IPMI Watchdog Table 78 IPMI Watchdog Sensor Event Trigger Offset Next Steps IPM Watchdog Physical Security e 04h i Physical Security Table 74 Physical Security Sensor Event Trigger Offset Next Steps Physical Scrty FP Interrupt o5h FP NMI Interrupt FP NMI Interrupt Next Steps FP NMI Diag Int SMI Timeout 06h SMI Timeout SMI Timeout Next Steps SMI Timeout System Event Lo 07h y g System Event Log Cleared Not applicable System Event Log System Event f 08h System Event PEF Action System Event PEF Action Next Steps System Event Button Sensor 09h Button Sensor Not applicable Button Revision 1 1 Intel order number G90620 002 13 Sensor Cross Reference List System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Sensor Se
45. panic string 0 if no panic string 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 1h Runtime Critical Stop a k a core dump blue screen 15 Event Data 2 The second byte of the panic string 16 Event Data 3 The third byte of the panic string 130 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Linux Kernel Panic Records Table 106 Linux Kernel Panic String Extended Record Characteristics Byte Field Description 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0 FOh OEM non timestamped bytes 4 16 OEM defined 4 Slave Address The slave address of the card saving the panic 5 Sequence A sequence number starting at zero Number 6 Kernel Panic Data These hold the panic sting If the panic string is longer than 11 bytes multiple messages will be sent with increasing sequence SE numbers 16 Revision 1 1 Intel order number G90620 002 131
46. power cycle with Recovery jumper asserted If this does not clear the issue reflash the SPI flash 10h Reserved FFh 116 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Manageability Engine ME Events 13 2 Node Manager Exception Event A Node Manager Exception Event will be sent each time maintained policy power limit is exceeded over Correction Time Limit Table 94 Node Manager Exception Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch or 602Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 18h 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 72h OEM 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 Node Manager Policy event 0 Reserved 1 Policy Correction Time Exceeded Policy did not meet the contract for the defined policy The policy will continue to limit the power or shut down the platform based on the defined policy action 2 Reserved 1 0 00b 15 Event Data 2 4 7 Reserved 0 3 Domain Id Currently supports only one domain Domain 0 16 Event Data 3 Policy Id 13 2 1 Node Manager Exception Event Next Steps This is an informational eve
47. selected Table 33 Fan Redundancy Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number OCh Revision 1 1 Intel order number G90620 002 47 Cooling Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 34 15 Event Data 2 Not used 16 Event Data 3 Not used The following table describes the severity of each of the event triggers for both assertion and deassertion Table 34 Fan Redundancy Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h Fully redundant The system has lost one or more fans and is running in non Fan redundancy loss indicates failure of oik Bedundanc igsi redundant mode There are enough fans to keep the system one or more fans 4 properly cooled but fan speeds will boost Look for lower non critical fan errors 02h Redundancy degraded or fan removal errors in the SEL to Se indicate which fan is causin
48. sensor is rearmed on power on AC or DC power on transitions 68 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Processor Subsystem 6 6 1 Verify the processor is supported by your baseboard Check your boards Technical Product Specification TPS Revision 1 1 Table 57 Processor MSID Mismatch Sensor Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number 81h Processor 1 MSID Mismatch 87h Processor 2 MSID Mismatch 88h Processor 3 MSID Mismatch 89h Processor 4 MSID Mismatch 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 03h digital discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used Processor MSID Mismatch Sensor Next Steps Intel order number G90620 002 69 Memory Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 7 Memory Subsystem Intel servers report memory errors status and configuration in the SEL 7 1 Memory RAS Configuration Status A Memory RAS Configuration
49. the first entry in the SEL and continue sequentially to n the number of entries in the SEL 12 Bug Check Blue Screen The first record of this type contains the Bug Check Blue Screen Stop code and is followed by the four Bug Check Blue 13 Data Screen parameters LSB first 14 Note that each of the Bug Check Blue Screen parameters requires two records each 15 Both of the two records for each parameter have the same Record ID There is a total of nine records 16 Operating system type 00 32 bit OS 01 64 bit OS Revision 1 1 Intel order number G90620 002 129 Linux Kernel Panic Records System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 15 Linux Kernel Panic Records The Open IPMI driver supports the ability to put semi custom and custom events in the system event log if a panic occurs If you enable the Generate a panic event to all BMCs on a panic option you will get one event on a panic in a standard IPMI event format If you enable the Generate OEM events containing the panic string option you will also get a set of OEM events holding the panic string Table 105 Linux Kernel Panic Event Record Characteristics Byte Field Description 8 Generator ID 0021h Kernel 9 10 EvM Rev 03h IPMI 1 0 format 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number The first byte of the
50. 109 11 9 1 Intel Xeon Phi Coprocessor MIC Thermal Margin Sensors 109 11 9 2 Intel Xeon Phi Coprocessor MIC Status Sensors 109 12 Hot Swap Controller Backplane Event ccccessseeeeeeeeeeeeeseeeeeeeeeseeeeeeeseeneeeeseeeeeeeeeees 111 12 1 HSC Backplane Temperature Sensor cccceeeseeeeececeeeeeeeseeseeeeeeeeeeeeessesntaaeeees 111 12 2 Hard Disk Drive Monitoring Sensor sistance A as decease ats aS 112 12 3 Hot Swap Controller Health Gensor AEN 113 12 3 1 HSC Health Sensor Next Steps ceececeeeeeceeeeeeecneeeeeeeaeeeeeeceeeeeeeeneeeeeenenaees 114 13 Manageability Engine ME Events secceeeesseeeeeeeeeeeeeensaneeeeneeeeeeeeseeeeseeeeeeeeeeeeenneas 115 13 1 ME Firmware Health Ewent ees eet Edge cipdiseeees beaenertea enneasdenene 115 13 1 1 ME Firmware Health Event Next Steps cccccceceeeeeeeeeeeeeeeeeenaeeeteeeneeeeeeeneeees 115 13 2 Node Manager Exception Event se cccscicessescecacinsdaceey cevgydeecncet gedet estSde Edge 117 13 2 1 Node Manager Exception Event Next Steps 0 cccceeecccceeeeeeeeeeeeeeseeeeeteeeeeees 117 13 3 Node Manager Health Event ccccecseeeccececeeeeeesseeeeneeeeeeeeeeeeeseneeeeeeeeeeneensnaees 118 13 3 1 Node Manager Health Event Next Steps ccceeccceeeeeseeeeeeeeeeeeteeeneeeeteeneeees 119 13 4 Node Manager Operational Capabilities Change c cceccceeeseeeeeeeeetteeeteeees 120 13 4 1 Node Manager Operationa
51. 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset Oh 15 Event Data 2 Low Byte of POST Error Code 16 Event Data 3 High Byte of POST Error Code 9 2 1 System Firmware Progress Formerly Post Error Next Steps See the following table for POST Error Codes Revision 1 1 Intel order number G90620 002 89 System BIOS Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 72 POST Error Codes Error Code Error Message Response 0012 System RTC date time not set Major 0048 Password check failed Major 0140 PCI component encountered a PERR error Major 0141 PCI resource conflict Major 0146 PCI out of resources error Major 0191 Processor core thread count mismatch detected Fatal 0192 Processor cache size mismatch detected Fatal 0194 Processor family mismatch detected Fatal 0195 Processor Intel R QPI link frequencies unable to synchronize Fatal 0196 Processor model mismatch detected Fatal 0197 Processor frequencies unable to synchronize Fatal 5220 BIOS Settings reset to default settings Major 5221 Passwords cleared by jumper Major 5224 Password clear jumper is Set Major 8130 Processor 01 disabled Major 8131 Processor 02 disabled Major 8132 Processor 03 disabled Ma
52. 37 Temperature Sensors Next Steps Network Interface Controller 1hresnoid oasea _ emperature 2Fh Temperature Tee based Temperature Table 37 Temperature Sensors Next Steps LAN NIC Temp ower Fan Tachometer Sensors 30h 3Fh SE Fan Tachometer Sensors Table 30 Fan Tachometer Sensor Event Trigger Offset Next Steps Chassis specific sensor names ert Bes os ae Fan Present Sensors y 40h 4Fh Fame Besant Fan Presence and Redundane Table 32 Fan Presence Sensors Event Trigger Offset Next Steps Power Supply 1 Status i gi ifi S 50h pply Power Supply Status Sensors Table 16 Power Unit Status Sensor Sensor Specific Offsets Next PS1 Status Steps Power Supply 2 Status Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Fower supply status sensors p 51h PS2 Status Power Supply Status Sensors Steps Power Supply 1 AC Power Input Table 23 Power Supply Power In Sensor Event Trigger Offset Next Fower supply Fower In sensors p 54h ES Power In Power Supply Power In Sensors Stens Power Supply 2 AC Power Input Table 23 Power Supply Power In Sensor Event Trigger Offset Next Fower supply Fower In sensors p 55h PS2 Power In Power Supply Power In Sensors Stens Power Supply 1 12V of 58h E Output Power Supply Current Out sae 2s Power Supply Current Out Sensor Event Trigger Offset PS1 Curr Out HESE Power Supply 2 12V of 59h Nevin Caren Output power Supply Current
53. 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 2h 15 Event Data 2 7 5 Reserved Set to 0 4 Channel Information Validity Check Ob Channel Number in Event Data 3 Bits 4 3 is not valid 1b Channel Number in Event Data 3 Bits 4 3 is valid 3 DIMM Information Validity Check Ob DIMM Slot ID in Event Data 3 Bits 2 0 is not valid 78 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Memory Subsystem Byte Field Description 1b DIMM Slot ID in Event Data 3 Bits 2 0 is valid 2 0 Error Type 000b Parity Error Type not known 001b Data Parity Error not used 010b Address Parity Error All other values are reserved 16 Event Data 3 7 5 Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached 0 3 CPU1 4 All other values are reserved 4 3 Channel Number if valid on which the Parity Error occurred This value will be indeterminate and should be ignored if ED2 Bit 4 is Ob 00b Channel A 01b Channel B 10b Channel C 11b Channel D 2 0 DIMM Slot ID if valid of the specific DIMM that was involved in the transaction that led to the parity error This value will be indeterminate and should be ignored if ED2 Bit 3 is Ob 000b DIMM Socket 1 001b DIMM Socket 2
54. 51 CPU Missing Sensor Typical Characteristics cccccceeeeeeeeeeeeeeeeeseeeeeeeeeeeeeeeeenaeeees 62 Table 52 QPI Link Width Reduced Sensor Typical Characteristics cecceceeeeeeeeeeeeeeeneeees 63 Table 53 QPI Correctable Error Sensor Typical Charachertsics neee 64 Table 54 QPI Fatal Error Sensor Typical Characteristics cccccceeeeeseneeeeeeeeeeeeeeeeeeeeeeseneeeees 65 Table 55 QPI Fatal 2 Error Sensor Typical Characteristics ccceceseeeeeeeeeeeeeeeeeeeeeeeeneeeees 66 Table 56 Processor ERR2 Timeout Sensor Typical Characteristics ssnneeeeeeeeeeneeeserre neee 68 Table 57 Processor MSID Mismatch Sensor Typical Characteristics 0 ccccceeeeeeereeeeeneees 69 Table 58 Memory RAS Configuration Status Sensor Typical Charachertstce 70 Table 59 Memory RAS Configuration Status Sensor Event Trigger Offset Next Steps 71 Table 60 Memory RAS Mode Select Sensor Typical Characteristics 0 ccccceeeeeeteeeeneees 72 Table 61 Mirroring Redundancy State Sensor Typical Characteristics ssssseeeeeeneeeeeee eene 73 Table 62 Sparing Redundancy State Sensor Typical Characteristics c ccccccceeeeesesesees 75 Table 63 Correctable and Uncorrectable ECC Error Sensor Typical Characteristics 76 Table 64 Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset Next Steps77 Table 65 Address Parity Error Sensor Typical Cha
55. B 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM Revision 1 1 Intel order number G90620 002 31 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Sensor Name Next Steps Number This 1 35V line is supplied by the main board This 1 35V line is used by processor 1 memory slots C and D Baseboard 1 35V P1 Low Voltage yp ES5h Memory CD VDDQ 1 Ensure all cables are connected correctly BB 1 35 P1LV CD 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM This 1 35V line is supplied by the main board This 1 35V line is used by processor 2 memory slots A and B Baseboard 1 35V P2 Low Voltage ep EE YP y E6h Memory AB VDDQ nsure all cables are connected correctly BB 1 35 P2LV AB 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM This 1 35V line is supplied by the main board This 1 35V line is used by processor 2 memory slots C and D Baseboard 1 35V P2 Low Voltage a E Y y E7h Memory CD VDDQ 1 nsure all cables are connected correctly BB 1 35 P2LV CD
56. DDh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when the event was logged LS byte first 5 6 7 8 IPMI Manufacturer 0137h 311d IANA enterprise number for Microsoft 9 ID 0157h 343 IANA enterprise number for Intel 10 The value logged depends on the Intelligent Management Bus Driver IMBDRV that is loaded 11 Record ID Sequential number reflecting the order in which the records are read The numbers start at 1 for the first entry in the SEL and continue sequentially to n the number of entries in the SEL Revision 1 1 Intel order number G90620 002 127 Microsoft windows Records System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 12 Shutdown Comment Shutdown Comment from the registry LSB first 13 HKLM Software Microsoft Windows CurrentVersion Reliability shutdown Comment 14 15 16 Reserved 00h 14 3 Bug Check Blue Screen Event Records When the system experiences a bug check blue screen multiple records will be written to the event log The first is a Bug Check Blue Screen OS Stop Shutdown Event Record this can be followed by multiple Bug Check Blue Screen code OEM records that will contain the Bug Check Blue Screen codes This information can be used to determine what caused the failure 128 Table 103 Bug Check Blue Screen OS Stop Event Record
57. Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 45 15 Event Data 2 Not used 16 Event Data 3 Not used Table 45 Discrete Thermal Sensors Next Steps Sensor Sensor Name Event Event Trigger Offset Description Next Steps Number Type ie yP Hex Description ODh SSB Thermal Trip 03h Oih State Asserted South Side Bridge SSB overheated Check for clear and unobstructed airflow into and out of the chassis 90h P1 VRD Hot 05h Oth Limit Exceeded Processor 1 voltage regulator overheated F Ensure the SDR is programmed and 91h P2 VRD Hot Processor 2 voltage regulator overheated correct chassis has been selected 92h P3 VRD Hot Processor 3 voltage regulator overheated Ensure there are no fan failures Ensure the air used for cooling the 93h P4 VRD Hot Processor 4 voltage regulator overheated system is within the thermal 94h P1 Mem01 VRD Hot Processor 1 Memory 0 1 voltage regulator specifications for the system typically overheated below 35 C 95h P1 Mem23 VRD Hot Processor 1 Memory 2 3 voltage regulator overheated 96h P2 Mem01 VRD Hot Processor 2 Memory 0 1 voltage regulator overheated 56 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Cooling Subsystem Sensor Sensor Name Event Event Trigger Offset Description
58. Data 2 11b Sensor specific event extension code in Event Data 2 5 4 00b Unspecified Event Data 3 01b Trigger threshold value in Event Data 3 10b OEM code in Event Data 3 11b Sensor specific event extension code in Event Data 3 3 0 Offset from Event Reading Code for threshold event Event Data 2 Reading that triggered event FFh or not present if unspecified Event Data 3 Threshold value that triggered event FFh or not present if unspecified If present Event Data 2 must be present discrete Event Data 1 7 6 00b Unspecified Event Data 2 01b Previous state and or severity in Event Data 2 10b OEM code in Event Data 2 11b Sensor specific event extension code in Event Data 2 5 4 00b Unspecified Event Data 3 01b Reserved 10b OEM code in Event Data 3 11b Sensor specific event extension code in Event Data 3 3 0 Offset from Event Reading Code for discrete event state Event Data 2 7 4 Optional offset from Severity Event Reading Code OPER if unspecified 3 0 Optional offset from Event Reading Type Code for previous discrete event state OFh if unspecified Event Data 3 Optional OEM code FFh or not present if unspecified OEM Event Data 1 7 6 00b Unspecified in Event Data 2 01b Previous state and or severity in Event Data 2 10b OEM code in Event Data 2 Revision 1 1 Intel order number G90620 002 Basic Decoding of a SEL Recor
59. Event 6 0 Event Type 76h OEM Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset Oh Atomic Egress Blocked 1h TLP Prefix Blocked Fh Unspecified Non AER Fatal Error 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number 8 1 2 1 PCI Express Fatal Error and Fatal Error 2 Sensor Next Steps 1 Decode the bus device and function to identify the card 2 If this is an add in card a Verify the card is inserted properly b Install the card in another slot and check whether the error follows the card or stays with the slot c Update all firmware and drivers including non Intel components 3 If this is an on board device a Update all BIOS firmware and drivers b Replace the board 8 1 3 PCI Express Correctable Errors When a PCI Express correctable error is reported to the BIOS SMI handler it will record the error using the following format 84 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Table 69 PCI Express Correctable Error Sensor Typical Characteristics PCI Express and Legacy PCI Subsystem Field Description Generator ID 0033h BIOS SMI Handler Sensor Type 13h Critical Interrupt
60. Exception Action 3 0 Domain Id 16 Event Data 3 If Error type 10 or 15 lt Policy Id gt 118 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Manageability Engine ME Events Byte Field Description If Error type 11 lt Power Sensor Address gt If Error type 12 lt Inlet Sensor Address gt Otherwise set to 0 13 3 1 Node Manager Health Event Next Steps Misconfigured policy can happen if the max min power consumption of the platform exceeds the values in policy due to hardware reconfiguration First occurrence of not acknowledged event will be retransmitted no faster than every 300 milliseconds Real time clock synchronization failure alert is sent when NM is enabled and capable of limiting power but within 10 minutes the firmware cannot obtain valid calendar time from the host side so NM cannot handle suspend periods Next steps depend on the policy that was set See the Node Manager Specification for more details Revision 1 1 Intel order number GS0620 002 119 Manageability Engine ME Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 13 4 Node Manager Operational Capabilities Change This message provides a runtime error indication about Intel
61. FF RID Record ID 0119h RT Record Type 02h system event record TS Timestamp 4E6A4957h GID Generator ID 0001h BIOS POST ER Event Message Revision 04 IPMI v2 0 ST Sensor Type 12h System Event From IPMI Specification Table 42 3 Sensor Type Codes SN Sensor Number 83h EDIR Event Direction Event Type 6fh 7 0 Assertion Event 6 0 6fh Sensor specific ED1 Event Data 1 05h Timestamp Clock Synchronization ED2 Event Data 2 00h First in pair RID 1A 01 RT 02 TS 57 49 6A 4E GID 01 00 ER 04 ST 12 SN 83 EDIR 6F ED1 05 ED2 80 ED3 FF 10 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 2 2 1 2 RID Record ID 011Ah RT Record Type 02h system event record TS Timestamp 4E6A4957h GID Generator ID 0001h BIOS POST ER Event Message Revision 04 IPMI v2 0 ST Sensor Type 12h System Event From IPMI Specification Table 42 3 Sensor Type Codes SN Sensor Number 83h EDIR Event Direction Event Type 6fh 7 0 Assertion Event 6 0 6fh Sensor specific ED1 Event Data 1 05h Timestamp Clock Synchronization ED2 Event Data 2 80h Second in pair BIOS SMI Handler Timestamp Events RID 1F 00 RT 02 TS C3 70 SD 4F GID 33 00 ER 04 ST 12 SN 83 EDIR 6F ED1 05 ED2 00 ED3 FF RID Record ID
62. Fatal Error and Fatal Error 2 The system detected a QPI fatal or non recoverable error This is a fatal error Table 54 QPI Fatal Error Sensor Typical Characteristics Byte Field Description Generator ID 0033h BIOS SMI Handler 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 07h 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 73h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Oh Link Layer Uncorrectable ECC Error 1h Protocol Layer Poisoned Packet Reception Error Revision 1 1 Intel order number G90620 002 65 Processor Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 2h Link PHY Init Failure with resultant degradation in link width 3h PHY Layer detected drift buffer alarm 4h PHY detected latency buffer rollover 5h PHY Init Failure 6h Link Layer generic control error buffer overflow underflow credit underflow and so on 7h Parity error in link or PHY layer 8h Protocol layer timeout detected 9h Protocol layer failed response Ah Protocol layer illegal packet field target Node ID Error and so on Bh Protocol Layer Queue table over
63. Intelligent Power Node Manager s operational capabilities This applies to all domains Assertion and deassertion of these events are supported Table 96 Node Manager Operational Capabilities Change Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch or 602Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 1Ah 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 74h OEM 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Current state of Operational Capabilities Bit pattern 0 Policy interface capability 0 Not Available 1 Available 1 Monitoring capability 0 Not Available 1 Available 2 Power limiting capability 0 Not Available 1 Available 15 Event Data 2 Not used 16 Event Data 3 Not used 120 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Manageability Engine ME Events 13 4 1 Node Manager Operational Capabilities Change Next Steps Policy Interface available indicates that Intel Intelligent Power Node Manager is able to respond to the external interface about querying and setting Intel Intelligent Power Node Manager policies This is generally available
64. Mrgn2 2 Ensure the SDR is programmed and correct chassis has been selected Beh P2 DIMM Thrm Mrgn1 3 Ensure there are no fan failures 52 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Cooling Subsystem Sensor Sensor Name Next Steps Number B3h P2 DIMM Thrm Mrgn2 4 Ensure the air used to cool the system is within the thermal specifications for the system typically below 35 C B4h P3 DIMM Thrm Mrgn1 B5h P3 DIMM Thrm Mrgn2 B6h P4 DIMM Thrm Mrgn1 B7h P4 DIMM Thrm Mrgn2 C8h Agg Therm Mrgn 1 C9h Agg Therm Mrgn 2 CAh Agg Therm Mrgn 3 CBh Agg Therm Mrgn 4 CCh Agg Therm Mrgn 5 CDh Agg Therm Mrgn 6 CEh Agg Therm Mrgn 7 CFh Agg Therm Mrgn 8 5 2 3 Processor Thermal Control Sensors The BMC FW monitors the percentage of time that a processor has been operationally constrained over a given time window nominally six seconds due to internal thermal management algorithms engaging to reduce the temperature of the device This monitoring is instantiated as one IPMI analog threshold sensor per processor package If this is not addressed the processor will overheat and shut down the system to protect itself from damage Table 41 Processor Thermal Control Sensors Typical Characteristics Byte Field Description 11 Sensor
65. Next Steps Numb T Fa nai ype Hex Description 97h P2 Mem23 VRD Hot Processor 2 Memory 2 3 voltage regulator overheated 98h P3 Mem01 VRD Hot Processor 3 Memory 0 1 voltage regulator overheated 99h P4 Mem23 VRD Hot Processor 3 Memory 2 3 voltage regulator overheated 9Ah P4 Mem01 VRD Hot Processor 4 Memory 0 1 voltage regulator overheated 9Bh P4 Mem23 VRD Hot Processor 4 Memory 2 3 voltage regulator overheated 5 2 6 DIMM Thermal Trip Sensors The BMC supports DIMM Thermal Trip monitoring that is instantiated as one aggregate IPMI discrete sensor per CPU When a DIMM Thermal Trip occurs the system hardware will automatically power down the server and the BMC will assert the sensor offset and log an event Revision 1 1 Table 46 DIMM Thermal Trip Typical Characteristics Byte Field Description 11 Sensor Type OCh Memory 12 Sensor Number COh Processor 1 DIMM Thermal Trip Cth Processor 2 DIMM Thermal Trip C2h Processor 3 DIMM Thermal Trip C3h Processor 4 DIMM Thermal Trip 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 Intel order number G90620 002 57 Cooling Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families
66. Processor E5 4600 2600 2400 1600 1 400 Product Families 44 3 Power Supply Current Out Sensors PMBus compliant power supplies may monitor the current output of the main 12v voltage rail and report the current usage as a percentage of the maximum power output for that rail Table 24 Power Supply Current Out Sensors Typical Characteristics Byte Field Description 11 Sensor Type 03h Current 12 Sensor Number 58h Power Supply 1 Current Out 59h Power Supply 2 Current Out 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 25 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Table 25 Power Supply Current Out Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert Description Next Steps Severity Severity Hex Description 07h Upper non critical Degraded OK PMBus feature to monitor power If you see this event the system is using too much power on the going high supply power consumption output for the PSU rating 09h Upper critical non fatal Degraded
67. Product Families moves IPMI ownership of the HDD sensors to the BMC Note that systems may have multiple storage backplanes Hard Disk Drive status monitoring is supported through disk status sensors owned by the BMC 112 Table 89 Hard Disk Drive Monitoring Sensor Typical Characteristics Byte Field Description 11 Sensor Type ODh Drive Slot Bay 12 Sensor Namba 60h 68h Hard Disk Drive 15 23 Status FOh FEh Hard Disk Drive 0 14 Status 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 90 15 Event Data 2 Not used Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Hot Swap Controller Backplane Events Byte Field Description 16 Event Data 3 Not used Table 90 Hard Disk Drive Monitoring Sensor Event Trigger Offset Next Steps Event Description Next Steps Trigger 00h Drive Presence If during normal operation the state changes unexpectedly ensure that the drive was seated properly and the drive carrier was S properly latched If that does not work replace the drive Oth Drive Fault
68. Record access 2 3 Record Type 7 0 DCh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when the event was logged LS byte first 5 6 7 8 IPMI Manufacturer ID 0137h 311d IANA enterprise number for Microsoft 9 10 11 Record ID Sequential number reflecting the order in which the records are read The numbers start at 1 for the first entry in the SEL and continue sequentially to n the number of entries in the SEL 12 Boot Time Timestamp of when the system booted into the OS 13 14 15 16 Reserved 00h Revision 1 1 Intel order number G90620 002 125 Microsoft windows Records System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 14 2 Shutdown Event Records When the system shuts down from the Microsoft Windows OS multiple events can be logged The first is an OS Stop Shutdown Event Record this can be followed by a shutdown reason code OEM record and then zero or more shutdown comment OEM records These are all informational only records Table 100 Shutdown Reason Code Event Record Typical Characteristics Byte Field Description 8 Generator ID 0041h System Software with an ID 20h 9 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number 00h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 E
69. Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 01h System Boot 05h Timestamp Clock Synchronization 15 Event Data 2 For Event Trigger Offset 05h only Timestamp Clock Synchronization 00h 1st in pair 80h 2nd in pair 16 Event Data 3 Not used 88 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families System BIOS Events 9 2 System Firmware Progress Formerly Post Error The BIOS logs any POST errors to the SEL The 2 byte POST code gets logged in the ED2 and ED3 bytes in the SEL entry This event will be logged every time a POST error is displayed Even though this event indicates an error it may not be a fatal error If this is a serious error there will typically also be a corresponding SEL entry logged for whatever was the cause of the error this event may contain more information about what happened than the POST error event Table 71 POST Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 11 Sensor Type OFh System Firmware Progress formerly POST Error 12 Sensor Number 06h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific
70. State Rank Sparing Mode is a Memory RAS configuration option that reserves one memory rank per channel as a spare rank If any rank on a given channel experiences enough Correctable ECC Errors to cross the Correctable Error Threshold the data in that rank is copied to the spare rank and then the spare rank is mapped into the memory array to replace the failing rank Rank Sparing Mode protects memory data by reserving a Spare Rank on each channel that has memory installed on it If a Correctable Error Threshold event occurs the data from the failing rank is copied to the Spare Rank on the same channel and the failing DIMM is disabled Because the Sparing Domain is no longer redundant a Sparing Redundancy State SEL Event is logged 74 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Table 62 Sparing Redundancy State Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type Och Memory 12 Sensor Number 11h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset Oh Fully Redundant 2h Redundancy Degr
71. System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor 5 4600 2600 2400 1600 1400 Product Families Intel order number G90620 002 Revision 1 1 SI intel S September 2013 SERVER BOARD inside Enterprise Platforms and Services Division Marketing Revision History System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Revision History Number January 2013 Initial release September 2013 1 1 Added MIC Thermal Margin sensors C4 through C7 Added MIC Status sensors A2 A3 A6 and A7 Added voltage sensors EA EB EC ED and EF Corrected typographical errors Made corrections to Firmware Update Status table Made corrections to Catastrophic Error Sensor table Added support for S1400FP S1400SP S1600JUP and S4600LH ji Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Disclaimers Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS NO LICENSE EXPRESS OR IMPLIED BY ESTOPPEL OR OTHERWISE TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY
72. Threshold based Temperature 23h Platform Specific SE Table 37 Temperature Sensors Next Steps Baseboard Temperature 3 Threshold based Temperature 24h Platform Specific Sensors Table 37 Temperature Sensors Next Steps Baseboard Temperature 4 d p 25h Platform SE Tee based Temperature Table 37 Temperature Sensors Next Steps IO Module Temperature S 26h WO Mod rid Tee based Temperalure Table 37 Temperature Sensors Next Steps PCI Riser 1 Temperature p 27h PCI Riser 4 ee Tee based Temperature Table 37 Temperature Sensors Next Steps IO Riser Temperature 28h IO Riser SE Tee Ge Table 37 Temperature Sensors Next Steps Hot Swap Back Plane 1 3 HSC Backplane Temperature 29h 2Bh Temperature Sensor Table 88 HSC Backplane Temperature Sensor Event Trigger Offset HSBP 1 3 Temp Res Next Steps Revision 1 1 Intel order number G90620 002 15 Sensor Cross Reference List System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Sensor Sensor Name Details Section Next Steps Number PCI Riser 2 Temperature 4 p 2Ch PCI Riser 2 Se Tech based Temperature Table 37 Temperature Sensors Next Steps SAS Module Temperature B p 2Dh SAS Mod Temp P Tee Ge Table 37 Temperature Sensors Next Steps Exit Air Temperature 2Eh Exit Air Geen Tee pased Temperature Table
73. Typical Characteristics Byte Field Description 8 Generator ID 0041h System Software with an ID 20h 9 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number 00h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h Runtime Critical Stop that is core dump blue screen 15 Event Data 2 Not used 16 Event Data 3 Not used Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Microsoft Windows Records Table 104 Bug Check Blue Screen code OEM Event Record Typical Characteristics Byte Field Description 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0 DEh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when the event was logged LS byte first 5 6 7 8 IPMI Manufacturer ID 0137h 311 IANA enterprise number for Microsoft 9 0157h 348 IANA enterprise number for Intel ary CH The value logged depends on the Intelligent Management Bus Driver IMBDRV that is loaded e k Sequence Number Sequential number reflecting the order in which the records are read The numbers start at 1 for
74. a 2 5 4 10b OEM code in Event Data 3 3 Node Manager Policy event 0 Threshold exceeded 1 Policy Correction Time Exceeded Policy did not meet the contract for the defined policy The policy will continue to limit the power or shut down the platform based on the defined policy action 2 Reserved 1 0 Threshold Number Valid only if Byte 5 bit 8 is set to 0 0 to 2 Threshold index 15 Event Data 2 7 4 Reserved 3 0 Domain Id Currently supports only one domain Domain 0 16 Event Data 3 Policy ID 122 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Manageability Engine ME Events 13 5 1 Node Manger Alert Threshold Exceeded Next Steps First occurrence of not acknowledged event will be retransmitted no faster than every 300 milliseconds First occurrence of Threshold exceeded event assertion deassertion will be retransmitted no faster than every 300 milliseconds Next steps depend on the policy that was set See the Node Manager Specification for more details Revision 1 1 Intel order number GS0620 002 123 Microsoft Windows Records System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 14 Microsoft Windows Records With Microsoft Windows S
75. a common platform instrumentation interface to enable interoperability between The baseboard management controller and chassis The baseboard management controller and systems management software Between servers IPMI enables the following Common access to platform management information consisting of Local access from systems management software Remote access from LAN Inter chassis access from Intelligent Chassis Management Bus Access from LAN serial modem IPMB PCI SMBus or ICMB available even if the processor is down IPMI interface isolates systems management software from hardware Hardware advancements can be made without impacting the systems management software IPMI facilitates cross platform management software You can find more information on IPMI at the following URL http Avww intel com design servers ipmi 1 2 2 Baseboard Management Controller BMC A baseboard management controller BMC is a specialized microcontroller embedded on most Intel Server Boards The BMC is the heart of the IPMI architecture and provides the intelligence behind intelligent platform management that is the autonomous monitoring and recovery features implemented directly in platform management hardware and firmware Different types of sensors built into the computer system report to the BMC on parameters such as temperature cooling fan speeds power mode operating system status and so on The BMC monito
76. able 8 Management Engine Firmware owned Sensors Sensor Sensor Name Details Section Next Steps Number 17h ME Firmware Health Events ME Firmware Health Event ME Firmware Health Event Next Steps 18h N M E ion E N Manager Exception Event Next Step 8 ode Manager Exception Events EE Gaon Everi ode Manager Exception Even ext Steps 19h Node Manager Health Events Node Manager Health Event Node Manager Health Event Next Steps 1Ah Node Manager Operational Capabilities Node Manager Operational Capabilities Node Manager Operational Capabilities Change Next Change Events Change Steps Node Manager Alert Threshold 1Bh Node Manger Alert Threshold Exceeded Exceeded Events g Node Manger Alert Threshold Exceeded Next Steps Revision 1 1 Intel order number G90620 002 25 Sensor Cross Reference List System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families 3 5 Microsoft OS owned Events GID 0041 The following table can be used to find the details of records that are owned by the Microsoft Operating System OS Table 9 Microsoft OS owned Events Sensor Name Record Sensor Type Details Section Next Steps Type Stee 02h 1Fh OS Boot Table 98 Boot up Event Record Typical Characteristics Not applicable oot Event DCh Not applicable Table 99 Boot up OEM Event Record Typical Characteristics 02
77. aded 15 Event Data 2 Location 7 4 Sparing Domain 0 3 Channel A D for Socket 3 2 Reserved 1 0 Rank on DIMM 0 3 Rank Number 16 Event Data 3 Location 7 5 Socket ID 0 3 CPU1 4 4 3 Channel 0 3 Channel A D for Socket 2 0 DIMM 0 2 DIMM 1 3 on Channel Revision 1 1 Intel order number G90620 002 Memory Subsystem 75 Memory Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 7 4 1 Sparing Redundancy State Sensor Next Steps This event is accompanied by memory errors indicating the source of the issue Troubleshoot accordingly probably replace affected DIMM For boards with DIMM Fault LEDs the appropriate Fault LED is lit to indicate which DIMM was the source of the error triggering the Mirroring Failover action that is the failing DIMM 7 5 ECC and Address Parity 1 Memory data errors are logged as correctable or uncorrectable 2 Uncorrectable errors are fatal 3 Memory addresses are protected with parity bits and a parity error is logged This is a fatal error 7 5 1 Memory Correctable and Uncorrectable ECC Error ECC errors are divided into Uncorrectable ECC Errors and Correctable ECC Errors A Correctable ECC Error actually represents a threshold overflow More Correctable Errors are detected at the memory controller level for a given DIMM within a given timeframe In both cases th
78. ailure Major 857F DIMM_L2 encountered a Serial Presence Detection SPD failure Major Go to 85E0 85C0 DIMM_L3 failed test initialization Major 85C1 DIMM_M1 failed test initialization Major 85C2 DIMM_N2 failed test initialization Major 85C3 DIMM_MS failed test initialization Major 85C4 DIMM_N1 failed test initialization Major 85C5 DIMM_N2 failed test initialization Major 85C6 DIMM_N3 failed test initialization Major 85C7 DIMM_P1 failed test initialization Major 85C8 DIMM_P2 failed test initialization Major 85C9 DIMM_P3 failed test initialization Major 85CA DIMM_R1 failed test initialization Major 85CB DIMM_R2 failed test initialization Major 85CC DIMM_R3 failed test initialization Major 85CD DIMM_T1 failed test initialization Major 85CE DIMM_T2 failed test initialization Major 85CF DIMM_TS failed test initialization Major 85D0 DIMM_L3 disabled Major Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Revision 1 1 Error Code Error Message Response 85D1 DIMM_M1 disabled Major 85D2 DIMM_M2 disabled Major 85D3 DIMM M disabled Major 85D4 DIMM_N1 disabled Major 85D5 DIMM_N2 disabled Major 85D6 DIMM_N3 disabled Major 85D7 DIMM_P1 disabled Major 85D8 DIMM_P2 disabled Major 85D9 DIMM_P3 disabled Major
79. ajor 8555 DIMM_H1 disabled Major 8556 DIMM_H2 disabled Major 8557 DIMM H3 disabled Major 8558 DIMM_J1 disabled Major 8559 DIMM_J2 disabled Major 855A DIMM_J3 disabled Major 855B DIMM_K1 disabled Major 855C DIMM_K2 disabled Major 855D DIMM_K3 disabled Major 855E DIMM_L1 disabled Major 855F DIMM_L2 disabled Major Go to 85D0 8560 DIMM_A1 encountered a Serial Presence Detection SPD failure Major 8561 DIMM_A2 encountered a Serial Presence Detection SPD failure Major 8562 DIMM_A3 encountered a Serial Presence Detection SPD failure Major 8563 DIMM_B1 encountered a Serial Presence Detection SPD failure Major 8564 DIMM_B2 encountered a Serial Presence Detection SPD failure Major 8565 DIMM_B3 encountered a Serial Presence Detection SPD failure Major 8566 DIMM_C1 encountered a Serial Presence Detection SPD failure Major 8567 DIMM_C2 encountered a Serial Presence Detection SPD failure Major 8568 DIMM_C3 encountered a Serial Presence Detection SPD failure Major 8569 DIMM_D1 encountered a Serial Presence Detection SPD failure Major 856A DIMM_D2 encountered a Serial Presence Detection SPD failure Major 856B DIMM_D3 encountered a Serial Presence Detection SPD failure Major 856C DIMM_E1 encountered a Serial Presence Detection SPD failure Major 856D DIMM_E2 encountered a Serial Presence Detection SPD failure Major 856E DIMM_E3 encountered a Serial Presence Detection SPD failure Major 856F DIMM_F1 encountered a Serial Presence Detecti
80. an spins too slowly Table 29 Fan Tachometer Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number 30h 3Fh Chassis specific BAh BFh Chassis specific 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 30 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Revision 1 1 Intel order number G90620 002 45 Cooling Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 30 Fan Tachometer Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert Description Next Steps Severity Severity Hex Description 00h Lower non critical Degraded OK The fan speed has dropped A fan speed error on a new system build is typically not caused by the fan going low below its lower non critical spinning too slowly instead it is caused by the fan being connected to the threshold wrong header the BMC expects them on certai
81. ber Processor 4 DIMM Aggregate B7h Thermal Margin 2 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P4 DIMM Thrm Mrgn2 Node Auto Shutdown Sensor B8h AutorShutdown Node Auto Shutdown Sensor Node Auto Shutdown Sensor Next Steps Fan Tachometer Sensors BAh BFh Chassie specii sensor names Fan Tachometer Sensors Table 30 Fan Tachometer Sensor Event Trigger Offset Next Steps Processor 1 4 DIMM Thermal COh C3h_ Trip DIMM Thermal Trip Sensors DIMM Thermal Trip Sensors Next Steps P1 P4 Mem Thrm Trip Intel Xeon Phi Coprocessor e m Intel Xeon Phi Coprocessor Th al Margin 1 cah ie an MIC Thermal Margin Sensors Not applicable Intel Xeon Phi Coprocessor e DI C5h Thermal Margin 2 ie EE Not applicable MIC 2 Margin Intel Xeon Phi Coprocessor e DI C6h Thermal Margin 3 e E e Not applicable MIC 3 Margin Intel Xeon Phi Coprocessor e DI cm Thermal Margin 4 EE use MIC 4 Margin Global Aggregate Temperature C8h CFh Margin 1 8 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps Agg Therm Mrgn 1 8 Baseboard 12V 3 g DOh a S threshold based Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 12 0V Sensors B V 2 g Dih ASC DOAN Threshold based Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 5 0V Sensors Revision 1 1 Intel order number G90620 002 21 Sensor Cross Reference List System Event
82. but the system remained powered on If these events continue to occur it is advisable to check your power source RID 5D 00 RT 02 TS D3 B1 AE 4E GID 20 00 ER 04 ST 08 SN 50 EDIR 6F ED1 A2 ED2 06 ED3 30 RID Record ID 005Dh RT Record Type 02h system event record TS Timestamp 4EAEB1D3h GID Generator ID 0020h BMC ER Event Message Revision 04 IPMI v2 0 ST Sensor Type 08h Power Supply From IPMI Specification Table 42 3 Sensor Type Codes SN Sensor Number 50h Power Supply 1 EDIR Event Direction Event Type 6Fh 7 0 Assertion Event 6 0 6fh Sensor specific ED1 Event Data 1 A2h 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 2h Predictive Failure ED2 Event Data 2 06h Input under voltage warning ED3 Event Data 3 30h From PMBus Specification STATUS_INPUT command 5 VIN_UV_WARNING Input Under voltage Warning 1 4 VIN_UV_FAULT Input Under voltage Fault 1 12 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 3 Sensor Cross Reference List Sensor Cross Reference List This section contains a cross reference to help find details on any specific SEL entry 3 1 BMC owned Sensors GID 0020h The following table can be used to find the details of sensors
83. cation from hex version of SEL Error is a fatal issue that will typically lead to an OS crash 3 Verify the DIMM is seated properly unless memory has been configured in a RAS mode 3 E id fi d f the DIMM t if The system will generate a CATERR catastrophic error GE go VR ER O very and an MCE Machine Check Exception Error SE ell l While the error may be due to a failing DRAM chip on the A e the ee EE connected to for DIMM it can also be cause by incorrect seating or ent RnS an 5 ound replace the board improper contact between socket and DIMM or by bent 5 Consider replacing the DIMM as a preventative measure pins in the processor socket For multiple occurrences replace the DIMM 00h Correctable ECC There have been too many 10 or more correctable ECC Even though this event doesn t immediately lead to problems it Error threshold reached errors for this particular DIMM since last boot This event in itself does not pose any direct problems because the ECC errors are still being corrected Depending on the RAS configuration of the memory the IMC may take the affected DIMM offline can indicate one of the DIMM modules is slowly failing If this error occurs more than once 1 2 3 If needed decode DIMM location from hex version of SEL Verify the DIMM is seated properly Examine gold fingers on edge of the DIMM to verify contacts are clean Inspect the processor socket this DIMM is connected to for
84. chassis 1 Use the Quick Start Guide and the Service Guide to determine whether intrusion sensor is not connected the chassis intrusion switch is connected properly 00h chassis 2 If this is the case make sure it makes proper contact when the chassis is intrusion closed 3 If this is also the case someone has opened the chassis Ensure nobody has access to the system that shouldn t Someone has unplugged a LAN cable that was This is most likely due to unplugging the cable but can also happen if there is present when the BMC initialized This event gets an issue with the cable or switch oan LAN leash logged when the electrical connection on the NIC 1 Check the LAN cable and connector for issues lost connector gets lost A 2 Investigate switch logs where possible 3 Ensure nobody has access to the server that shouldn t 10 2 FP NMI Interrupt The BMC supports an NMI sensor for logging an event when a diagnostic interrupt is generated for the following cases e The front panel diagnostic interrupt button is pressed e The BMC receives an IPMI Chassis Control command that requests this action The front panel interrupt button also referred to as NMI button is a recessed button on the front panel that allows the user to force a critical interrupt which causes a crash error or kernel panic 98 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel X
85. ct Families Miscellaneous Events 11 2 1 SMI Timeout Next Steps This event normally only occurs after another more critical event 1 Check the SEL for any critical interrupts memory errors bus errors PCI errors or any other serious errors 2 If these are not present the system locked up before it was able to log the original issue In this case low level debug is normally required 11 3 System Event Log Cleared The BMC logs a SEL clear event This is only ever the first event in the SEL Cause of this event is either a manual SEL clear using selview or some other IPMI aware utility or is done in the factory as one of the last steps in the manufacturing process This is an informational event only Table 80 System Event Log Cleared Sensor Typical Characteristics Byte Field Description 11 Sensor Type 10h Event Logging Disabled 12 Sensor Number 07h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 2h Log area reset cleared 15 Event Data 2 Not used 16 Event Data 3 Not used Revision 1 1 Intel order number GS0620 002 103 Miscellaneous Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Prod
86. cy PCI Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 4h PCI PERR 5h PCI SERR 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number 8 1 1 1 Legacy PCI Error Sensor Next Steps 1 Decode the bus device and function to identify the card 2 If this is an add in card a Verify the card is inserted properly b Install the card in another slot and check whether the error follows the card or stays with the slot c Update all firmware and drivers including non Intel components 3 If this is an on board device a Update all BIOS firmware and drivers b Replace the board 8 1 2 PCI Express Fatal Errors and Fatal Error 2 When a PCI Express fatal error is reported to the BIOS SMI handler it will record the error using the following format Table 67 PCI Express Fatal Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 04h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 70h OEM Specific 82 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon P
87. d System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Sensor Event Data Class 11b Reserved 5 4 00b Unspecified Event Data 3 01b Reserved 10b OEM code in Event Data 3 11b Reserved 3 0 Offset from Event Reading Type Code Event Data 2 7 4 Optional OEM code bits or offset from Severity Event Reading Type Code 0Fh if unspecified 3 0 Optional OEM code or offset from Event Reading Type Code for previous event state OFh if unspecified Event Data 3 Optional OEM code FFh or not present if unspecified Table 3 OEM SEL Record Type COh DFh Byte Field Description 1 Record ID ID used for SEL Record access 2 RID 3 Record Type 7 0 Record Type RT COh DFh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when event was logged LS byte first 5 TS Example TS 29 76 68 4C 4C687629h 1281914409 Sun 15 Aug 2010 6 23 20 09 UTC 7 Note There are various websites that will convert the raw number to a date time 8 Manufacturer ID LS Byte first The manufacturer ID is a 20 bit value that is derived from the IANA 9 Private Enterprise ID 10 Most significant four bits Reserved 0000b 000000h Unspecified OFFFFFh Reserved This value is binary encoded For example the ID for the IPMI forum is 7154 decimal which is 1BF2h which will be s
88. de for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Cooling Subsystem Byte Field Description 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 32 15 Event Data 2 Not used 16 Event Data 3 Not used The following table describes the severity of each of the event triggers for both assertion and deassertion Table 32 Fan Presence Sensors Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert Description Next Steps Severity Severity Hex Description Oih Device OK Degraded Assertion A fan was inserted This Informational only Present event may also get logged when the BMC initializes when AC is applied Deassert A fan was removed or These events only get generated in the systems with hot swappable fans was not present at the expected and normally only when a fan is physically inserted or removed If fans location when the BMC initialized were not physically removed 1 Use the Quick Start Guide to check whether the right fan headers were used 2 Swap the fans round to see whether the problem stays with the location or follows the fan 3 Replace the fan or fan wiring housing depending on the outcome of step 2 4 Ensure the latest FRUSDR update has been run and the correct chassis is detected or
89. ded info byte in ED3 whether this is wear out Flash erase limit has been reached protection causing this event If so just wait until wear out protection 02h expires otherwise probably the flash device must be replaced if Flash write limit has been reached writing to flash has been disabled 03h SEENEN Writing to the flash has been enabled 04h Internal error Error during firmware execution FW Watchdog Operational image needs to be updated to other version or hardware board Timeout repair is needed if error is persistent 05h BMC did not respond to cold reset request and Intel ME rebooted Verify the Intel Node Manager configuration the platform 06h Direct Flash update requested by the BIOS Intel ME firmware will This is transient state Intel ME firmware will return to operational mode switch to recovery mode to perform full update from the BIOS after successful image update performed by the BIOS 07h 04h Manufacturing error Wrong manufacturing configuration detected The flash device must be replaced if error is persistent by Intel ME firmware Intel ME FW configuration is inconsistent or out of range 08h Persistent storage integrity error Flash file system error detected If error is persistent restore factory presets using Force ME Recovery IPMI command or by doing AC power cycle with Recovery jumper asserted 09h Firmware Exception Restore factory presets using Force ME Recovery IPMI command or by doing AC
90. dundancy State Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type Och Memory 12 Sensor Number Oih 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset Oh Fully Redundant 2h Redundancy Degraded Revision 1 1 Intel order number G90620 002 73 Memory Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 15 Event Data 2 Location 7 4 Mirroring Domain 0 1 Channel Pair for Socket 3 2 Reserved 1 0 Rank on DIMM 0 3 Rank Number 16 Event Data 3 Location 7 5 Socket ID 0 3 CPU1 4 4 3 Channel 0 3 Channel A D for Socket 2 0 DIMM 0 2 DIMM 1 3 on Channel 7 3 1 Mirroring Redundancy State Sensor Next Steps This event is accompanied by memory errors indicating the source of the issue Troubleshoot accordingly probably replace affected DIMM For boards with DIMM Fault LEDs the appropriate Fault LED is lit to indicate which DIMM was the source of the error triggering the Mirroring Failover action that is the failing DIMM 7 4 Sparing Redundancy
91. e DCh HES based voitage Table 13 Threshold based Voltage Sensors Next Steps BB 1 8V AUX Sensors g p Baseboard 1 1V Stand by Threshold based Voltage DDh Table 13 Threshold based Voltage Sensors Next Steps BB 1 1V STBY Sensors Baseboard CMOS Batter d g DEh y Threshold based Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 3 3V Vbat Sensors 22 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Cross Reference List Sensor Sensor Name Details Section Next Steps Number Baseboard 1 35V P1 Low Voltage E4h Memory AB VDDQ Te reent Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 1 35 P1LV AB ase Baseboard 1 35V P1 Low Voltage Jnresnoid based Vollage E5h Memory CD VDDQ Tee based Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 1 35 P1LV CD ros Baseboard 1 35V P2 Low Voltage Threshold based Volt 1hresnoid boasea Voltage E6h Memory AB VDDQ SE ased voltage Table 13 Threshold based Voltage Sensors Next Steps BB 1 35 P2LV AB we ass Baseboard 1 35V P2 Low Voltage Threshold based Volt 1hresnoid boasea Vollage E7h Memory CD VDDQ ee ased Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 1 35 P2LV CD a eae te Baseboard 3 3V Riser 1 Pow
92. e 01h Temperature 12 Sensor Number See Table 40 Revision 1 1 Intel order number GS0620 002 5 Cooling Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 39 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event Table 39 Thermal Margin Sensors Event Triggers Description Event Trigger Assertion Deassert Description Hex Description Severity Severity 07h er Degraded OK The thermal margin has gone over its upper non critical threshold 09h ea a non fatal Degraded The thermal margin has gone over its upper critical threshold Table 40 Thermal Margin Sensors Next Steps Sensor Sensor Name Next Steps Number 74h P1 Therm Margin 75h P2 Therm Margin 76h P3 Therm Margin 77h P4 Therm Margin Not a logged SEL event Sensor is used for thermal management of the processor BOh P1 DIMM Thrm Mrgn1 4 Check for clear and unobstructed airflow into and out of the chassis Bih P1 DIMM Thrm
93. e Table 4 4 Timestamp Time when event was logged LS byte first 5 TS Example TS 29 76 68 4C 4C687629h 1281914409 Sun 15 Aug 2010 6 23 20 09 UTC 7 Note There are various websites that will convert the raw number to a date time 4 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Basic Decoding of a SEL Record Byte Field Description 8 Generator ID RqSA and LUN if event was generated from IPMB 9 GID Software ID if event was generated from system software Byte 1 7 1 7 bit TC Slave Address or 7 bit system software ID 0 0b ID is IPMB Slave Address 1b System software ID Software ID values 0001h BIOS POST for POST errors RAS Configuration State Timestamp Synch OS Boot events 0033h BIOS SMI Handler 0020h BMC Firmware 002Ch ME Firmware 0041h Server Management Software 00CO0h HSC Firmware HSBP A 00C2h HSC Firmware HSBP B Byte 2 7 4 Channel number Channel that event message was received over Oh if the event message was received from the system interface primary IPMB or internally generated by the BMC 3 2 Reserved Write as 00b 1 0 IPMB device LUN if byte 1 holds Slave Address 00b otherwise 10 EvM Rev Event Message format version 04h IPMI v2 0 03h
94. e error can be narrowed down to particular DIMM s The BIOS SMI error handler uses this information to log the data to the BMC SEL and identify the failing DIMM module Table 63 Correctable and Uncorrectable ECC Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type Och Memory 12 Sensor Number 02h 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 76 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Memory Subsystem Byte Field Description 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 64 15 Event Data 2 7 2 Reserved Set to 0 1 0 Rank on DIMM 0 3 Rank number 16 Event Data 3 7 5 Socket ID 4 3 Channel 2 0 DIMM 0 3 CPU1 4 0 3 Chan A D for Socket 0 2 DIMM 1 3 on Channel Table 64 Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description Oih Uncorrectable ECC An uncorrectable multi bit ECC error has occurred This 1 If needed decode DIMM lo
95. e sensor is rearmed on power on AC or DC power on transitions This sensor is only used for triggering SEL to indicate node or power auto shutdown assertion or deassertion Table 19 Node Auto Shutdown Sensor Typical Characteristics Byte Field Description 11 Sensor Type 09h Power Unit 12 Sensor Number B8h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event Revision 1 1 Intel order number G90620 002 37 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 6 0 Event Type 03h digital discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 4 3 3 1 Node Auto Shutdown Sensor Next Steps This event is accompanied by specific power supply errors AC lost PSU failure and so on or other system events Troubleshoot these events accordingly 4 4 Power Supply The BMC monitors the power supply subsystem 4 4 1 Power Supply Status Sensors These sensors report the status of the power supplies in the system When a system first AC applied or removed it can log an event Also if there is a failure predictive failure or a configuration error
96. econd time synch message to get a baseline correct timestamp in the log That is the starting time For example say that the time the BMC has is March 1 2011 21 00 The BIOS time synch updates that to the same date 21 20 the BMC was running behind Without that second time synch message you don t know that the log time jumped ahead and when you get the next log message it looks like there was a 20 min delay during the boot for some unknown reasons Without that second time synch message the time span to the next logged message is indeterminate With the second time synch as a baseline the following log timestamps are always determinate Revision 1 1 Intel order number G90620 002 87 System BIOS Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families The timestamp clock synchronization is run and the events are logged by the BIOS POST every time the system boots In addition during the shutdown from some Operating Systems the BIOS SMI Handler is called to run timestamp clock synchronization and log the events Table 70 System Event Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 0033h BIOS SMI Handler 11 Sensor Type 12h System Event 12 Sensor Number 83h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh
97. ectable errors are acceptable and normal at a low rate of occurrence If the error continues 1 Check the processor is installed correctly 2 Inspect the socket for bent pins 3 Cross test the processor If the issue remains with the processor socket replace the main board otherwise the processor 6 5 Processor ERR2 Timeout Sensor The BMC supports an ERR2 Timeout Sensor 1 per CPU that asserts if a CPU s ERR2 signal has been asserted for longer than a fixed time period gt 90 seconds ERR 2 is a processor signal that indicates when the IIO Integrated IO module in the processor has a fatal error which could not be communicated to the core to trigger SMI ERR 2 events are fatal error conditions where the BIOS and OS will attempt to gracefully handle error but may not always do so reliably A continuously asserted ERR2 signal is an indication that the BIOS cannot service the condition that caused the error This is usually because that condition prevents the BIOS from running When an ERR2 timeout occurs the BMC asserts deasserts the ERR2 Timeout Sensor and logs a SEL event for that sensor The default behavior for BMC core firmware is to initiate a system reset upon detection of an ERR2 timeout The BIOS setup utility provides an option to disable or enable system reset by the BMC on detection of this condition Revision 1 1 Intel order number GS0620 002 67 Processor Subsystem System Event Log Troubleshooting Guide for EPSD Platf
98. ed Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 4 4 5 1 Power Supply Fan Tachometer Sensors Next Steps These events only get generated in the systems with PMBus capable power supplies and normally when the airflow is obstructed to the power supply 1 Remove and then reinstall the power supply to see whether something might have temporarily caused the fan failure 2 Swap the power supply with another one to see whether the problem stays with the location or follows the power supply 3 Replace the power supply depending on the outcome of steps 1 and 2 4 Ensure the latest FRUSDR update has been run and the correct chassis is detected or selected 44 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Cooling Subsystem 5 Cooling Subsystem 5 1 Fan Sensors There are three types of fan sensors that can be present on Intel Server Systems speed presence and redundancy The last two are only present in the systems with hot swap redundant fans 5 1 1 Fan Tachometer Sensors Fan tachometer sensors monitor the rom signal on the relevant fan headers on the platform Fan speed sensors are threshold based sensors Usually they only have lower critical thresholds set so that a SEL entry is only generated if the f
99. eeeeeneeeeeesnneeeeeeeaas 113 ME Firmware Health Event Sensor Typical Charachertetce eeen 115 ME Firmware Health Event Sensor Next Step ccecceeeeeeeeeereeeeeeeeeeeeeeeneeeeees 116 Node Manager Exception Sensor Typical Characteristics c ccceceseeeeeeeeeneeeees 117 Node Manager Health Event Sensor Typical Characteristics 0 ccccceceeeeeeeees 118 Node Manager Operational Capabilities Change Sensor Typical Characteristics 120 Node Manager Alert Threshold Exceeded Sensor Typical Characteristics 122 Boot up Event Record Typical Charachertetcs 124 Boot up OEM Event Record Typical Characteristics cccccseeceeeeeteeeeesetteeeeeeees 125 Shutdown Reason Code Event Record Typical Characteristics ceeeeee 126 Shutdown Reason OEM Event Record Typical Characteristics ccceeee 126 Shutdown Comment OEM Event Record Typical Characteristics eeee 127 Bug Check Blue Screen OS Stop Event Record Typical Characteristics 128 Bug Check Blue Screen code OEM Event Record Typical Characteristics 129 Linus Kernel Panic Event Record Characteristics cccccceeceeeeeeeeeeeeeeeeeeneeeees 130 Linux Kernel Panic String Extended Record Characteristics 000sennnneeeeeeeen 131 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1
100. ensor Table 48 Processor Status Sensors Next Steps P4 Status oe Processor 1 Thermal Margin 74h 9 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P1 Therm Margin Processor 2 Thermal Margin 75h 9 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P2 Therm Margin P Th Margi 76h EE SE argin Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P3 Therm Margin Processor 4 Thermal Margin 77h S 9 Thermal Margin Sensors Table 40 Thermal Margin Sensors Next Steps P4 Therm Margin Processor 1 3 Thermal Control 78h 7Bh Processor Thermal Control Processor Thermal Control Sensors Next Steps P1 P4 Therm Ctrl Sensors Processor 1 ERR2 Timeout 7Ch Processor ERR2 Timeout Sensor Processor ERR2 Timeout Next Steps P1 ERR2 Processor 2 ERR2 Timeout 7Dh Processor ERR2 Timeout Sensor Processor ERR2 Timeout Next Steps P2 ERR2 Veet oe Revision 1 1 Intel order number G90620 002 17 Sensor Cross Reference List System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Sensor Sensor Name Details Section Next Steps Number Processor 3 ERR2 Timeout 7Eh Processor ERR2 Timeout Sensor Processor ERR2 Timeout Next Steps P3 ERR2 Processor 4 ERR2 Timeout 7Fh Processor ERR2 Timeout Sensor Pr
101. eon Processor E5 4600 2600 2400 1 600 1 400 Product Families 10 2 1 Table 75 FP NMI Interrupt Sensor Typical Characteristics Byte Field Description 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 05h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Oh 15 Event Data 2 Not used 16 Event Data 3 Not used FP NMI Interrupt Next Steps Chassis Subsystem The purpose of this button is for diagnosing software issues when a critical interrupt is generated the OS typically saves a memory dump This allows for exact analysis of what is going on in system memory which can be useful for software developers or for troubleshooting OS software and driver issues If this button was not actually pressed you should ensure there is no physical fault with the front panel This event only gets logged if a user pressed the NMI button or sent an IPMI Chassis Control command requesting this action and although it causes the OS to crash is not an error Revision 1 1 Intel order number G90620 002 99 Chassis Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Famili
102. er Jnresnoid based Vollage EAh Good Lea based Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 3 3 RSR1 PGD ica Baseboard 3 3V Riser 2 Power Jnresnoid based Vollage EBh Good Tee based Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 3 3 RSR2 PGD DEER Baseboard 0 9V Threshold based Voltage R wm ECh BB 0 9V Core IB Sensors Table 13 Threshold based Voltage Sensors Next Steps Baseboard 1 8V Threshold based Voltage E Ge _ EDh BB 1 8V IB 1 0 Sensors Table 13 Threshold based Voltage Sensors Next Steps Baseboard 1 1V Threshold based Voltage eo EEh BB 1 1V PCH Sensors Table 13 Threshold based Voltage Sensors Next Steps Baseboard 1 2V Threshold based Voltage EFh BB 1 2V IB Sensors Table 13 Threshold based Voltage Sensors Next Steps FOh FEh Hard Disk Drive 0 14 Status Hard Disk Drive Monitoring Table 90 Hard Disk Drive Monitoring Sensor Event Trigger Offset Next HDD 0 14 Status Sensor Steps Revision 1 1 Intel order number G90620 002 23 Sensor Cross Reference List System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families 3 2 BIOS POST owned Sensors GID 0001h The following table can be used to find the details of sensors owned by BIOS POST Table 6 BIOS POST owned Sensors Sensor Sensor Name Details Section Next Steps Number
103. er power warning 03h A C lost AC removed 00b Unspecified Event Data 2 00b Unspecified Event Data 3 Informational Event 06h Configuration Power supply 10b OEM code in Event Data 2 00b Unspecified Event Data 3 Indicates that at least one of error configuration is not supported Check the data in ED2 for more details 01h The BMC cannot access the PMBus device on the PSU but its FRU device is responding 02h The PMBUS _REVISION command returns a version number that is not supported only version 1 1 and 1 2 are supported 03h The PMBus device does not successfully respond to the PMBUS _REVISION command 04h The PSU is incompatible with one or more PSUs that are present in the system 05h The PSU FW is operating in a degraded mode likely due to a failed firmware update the supplies is not correct for your system configuration 1 Remove the power supply and verify compatibility 2 If the power supply is compatible it may be faulty Replace it 40 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families 4 4 2 Power Supply Power In Sensors Power Subsystems These sensors will log an event when a power supply in the system is exceeding its AC power in threshold Table 22 Power Supply Power In Sensors Typical Charact
104. eristics Description OBh Other Units 54h Power Supply 1 Status 55h Power Supply 2 Status Byte Field 11 Sensor Type 12 Sensor Number 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 23 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Table 23 Power Supply Power In Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert Description Next Steps Severity Severity Hex Description 07h Upper non critical Degraded OK PMBus feature to monitor power If you see this event the system is pulling too much power on the going high supply power consumption input for the PSU rating 09h Upper critical non fatal Degraded 1 Verify the power budget is within the specified range going high 2 Check http www intel com p en_US support for the power budget tool for your system Revision 1 1 Intel order number G90620 002 4 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon
105. ert Description Next Steps Severity Severity Hex Description 07h Upper non critical Degraded OK An upper non critical or 1 Check for clear and unobstructed airflow into and out of the chassis going high critical temperature 2 Ensure SDR is programmed and correct chassis has been selected IS threshold has been f 09h Upper critical going non fatal Degraded crossed 3 Ensure there are no fan failures high 4 Ensure the air used to cool the system is within the thermal specifications for the system typically below 35 C Revision 1 1 Intel order number G90620 002 43 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 4 4 5 Power Supply Fan Tachometer Sensors The BMC polls each installed power supply using the PMBus fan status commands to check for failure conditions for the power supply fans Table 28 Power Supply Fan Tachometer Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number AOh Power Supply 1 Fan Tachometer 1 Ath Power Supply 1 Fan Tachometer 2 A4h Power Supply 2 Fan Tachometer 1 A5h Power Supply 2 Fan Tachometer 2 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 03h digital Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecifi
106. erver 2003 R2 and later versions an Intelligent Platform Management Interface IPMI driver was added This added the capability of logging some OS events to the SEL The driver can write multiple records to the SEL for the following events Boot up Shutdown Bug Check Blue Screen 14 1 Boot up Event Records When the system boots into the Microsoft Windows OS two events can be logged The first is a boot up record and the second is an OEM event These are informational only records Table 98 Boot up Event Record Typical Characteristics Byte Field Description 8 Generator ID 0041h System Software with an ID 20h 9 11 Sensor Type 1Fh OS Boot 12 Sensor Number 00h 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h C boot completed 15 Event Data 2 Not used 16 Event Data 3 Not used 124 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Microsoft Windows Records Table 99 Boot up OEM Event Record Typical Characteristics Byte Field Description 1 Record ID ID used for SEL
107. es 10 3 Button Sensor The BMC logs when the front panel power and reset buttons get pressed This is purely for informational purposes and these events do not indicate errors Table 76 Button Sensor Typical Characteristics Byte Field Description 11 Sensor Type 14h Button Switch 12 Sensor Number 09h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Oh Power Button 2h Reset Button 15 Event Data 2 Not used 16 Event Data 3 Not used 100 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Miscellaneous Events 11 Miscellaneous Events The miscellaneous events section addresses sensors not easily grouped with other sensor types 11 1 IPMI Watchdog EPSD server systems support an IPMI watchdog timer which can check to see whether the OS is still responsive The timer is disabled by default and has to be enabled manually It then requires an IPMl aware utility in the operating system that will reset the timer before it expires If the timer does expire the BMC can take action if it is configured to do so reset power down power cycle or
108. ffset Refer to the latest Intel Xeon Phi Coprocessor Adapter specification 15 Event Data 2 Not used 16 Event Data 3 Not used KEN Intel Xeon Phi Coprocessor MIC Status Sensors Next Steps Refer to the latest Intel Xeon Phi Coprocessor Adapter specification for the next steps 110 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Hot Swap Controller Backplane Events 12 Hot Swap Controller Backplane Events All new EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 Product Families backplanes follow a hybrid architecture in which the IPMI functionality previously supported in the HSC is integrated into the BMC FW 12 1 HSC Backplane Temperature Sensor There is a thermal sensor on the Hot Swap Backplane to measure the ambient temperature Table 87 HSC Backplane Temperature Sensor Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number 29h HSBP 1 Temp 2Ah HSBP 2 Temp 2Bh HSBP 3 Temp 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table
109. flow underflow Ch Viral Error Dh Protocol Layer parity error Eh Routing Table Error Fh unused Reserved 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used The QPI Fatal Error 2 is a continuation of QPI Fatal Error Table 55 QPI Fatal 2 Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 17h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 74h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 66 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Processor Subsystem Byte Field Description 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Oh Illegal inbound request 1h IIO Write Cache Uncorrectable Data ECC Error 2h IIO CSR crossing 32 bit boundary Error 3h IIO Received XPF physical logical redirect interrupt inbound 4h IIO Illegal SAD or Illegal or non existent address or memory 5h IIO Write Cache Coherency Violation 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used 6 4 3 1 QPI Fatal Error and Fatal Error 2 Next Steps This is an Informational event only Corr
110. g the 03h Non redundant sufficient from redundant problem and follow the troubleshooting 04h Non redundant sufficient from insufficient steps for these event types 05h Non redundant insufficient The system has lost fans and may no longer be able to cool itself adequately Overheating may occur if this situation remains for a longer period of time 06h Non redundant degraded from fully The system has lost one or more fans and is running in non redundant redundant mode There are enough fans to keep the system properly cooled but fan speeds will boost 07h Redundant degraded from non redundant The system has lost one or more fans and is running in a degraded mode but still is redundant There are enough fans to keep the system properly cooled 48 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 5 2 Cooling Subsystem Temperature Sensors There are a variety of temperature sensors that can be implemented on Intel Server Systems They are split into various types each with their own events that can be logged 5 2 1 Threshold based Temperature Thermal Margin Processor Thermal Control Processor DTS Thermal Margin Monitor only Discrete Thermal DIMM Thermal Trip Threshold based Temperature Sensors Threshold based temperature sensors are sensors that
111. ge sources in the system including the baseboard memory and processors using IPMI compliant analog threshold sensors Some voltages are only on specific platforms For details check your platforms Technical Product Specification TPS Note A voltage error can be caused by the device supplying the voltage or by the device using the voltage For each sensor it will be noted who is supplying the voltage and who is using it Table 11 Threshold based Voltage Sensors Typical Characteristics Byte Field Description 11 Sensor Type 02h Voltage 12 Sensor Number See Table 13 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 12 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Revision 1 1 Intel order number G90620 002 27 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 12 Threshold based Voltage Sensors Event Triggers Description Eve
112. gine Intel ME is an IPMI satellite controller A mechanism exists to forward commands to Intel ME and then sends the response back to originator Similarly events from Intel ME will be sent as alerts outside of the BMC Revision 1 1 Intel order number G90620 002 3 Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families 2 Basic Decoding of a SEL Record The System Event Log SEL record format is defined in the PMI Specification The following section provides a basic definition for each of the fields in a SEL For more details see the PMI Specification The definitions for the standard SEL can be found in Table 1 The definitions for the OEM defined event logs can be found in Table 3 and Table 4 2 1 Default Values in the SEL Records Unless otherwise noted in the event record descriptions the following are the default values in all SEL entries Byte 3 Record Type RT 02h System event record Byte 9 8 Generator ID 0020h BMC Firmware Byte 10 Event Message Revision ER 04h IPMI 2 0 Table 1 SEL Record Format Byte Field Description 1 Record ID ID used for SEL Record access 2 RID 3 Record Type 7 0 Record Type RT 02h System event record COh DFh OEM timestamped bytes 8 16 OEM defined See Table 3 EOh FFh OEM non timestamped bytes 4 16 OEM defined Se
113. h 20h OS Stop Shutdown Table 100 Shutdown Reason Code Event Record Typical Characteristics Not applicable Shutdown Event GE E AEE Table 101 Shutdown Reason OEM Event Record Typical Characteristics We SE Table 102 Shutdown Comment OEM Event Record Typical Characteristics PP 8 Table 103 Bug Check Blue Screen OS Stop Event Record Typical 02h 20h OS Stop Shutdown Characteristics Not applicable Bug Check Blue Screen DEh Not applicable Table 104 Bug Check Blue Screen code OEM Event Record Typical Characteristics 3 6 Linux Kernel Panic Events GID 0021 The following table can be used to find the details of records that can be generated when there is a Linux Kernel panic Table 10 Linux Kernel Panic Events Sensor Name Record Sensor Type Details Section Next Steps Type 02h 20h OS Stop Shutdown Table 105 Linux Kernel Panic Event Record Characteristics Not applicable Linux Kernel Panic FOh Not applicable Table 106 Linux Kernel Panic String Extended Record Characteristics 26 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Power Subsystems 4 Power Subsystems The BMC monitors the power subsystem including power supplies select onboard voltages and related sensors 4 1 Threshold based Voltage Sensors The BMC monitors the main volta
114. h Sensor Next Steps s nssssseesesseneessrerrsserrnnerrrreresrne 69 7 Memory Subsystem 52 8 cost cece ee ee ee 70 7 1 Memory RAS Configuration Status AC EEKENDEESEdEEEREUERSEEENE EEN EEN EAR 70 7 2 Memory RAS Mode Select ccccccceceeeeeeceeeeeeeeeeeeeeeeeeaeeeeesaaaeeeeenaaeeeeseseeeseenaaes 72 7 3 Mirroring Redundancy State eeegedd erger 73 7 3 1 Mirroring Redundancy State Sensor Next Steps 0 cccccccesssscceeeeeeeeeessesneaes 74 7 4 Sparing Redundancy State ics ccpsacteacedtandenedsedaenacdenankendvasdieassniene ens mane eeieinenees 74 7 4 1 Sparing Redundancy State Sensor Next Greng 76 7 5 ECC anad Address Parity eebe EE EE 76 7 5 1 Memory Correctable and Uncorrectable ECC Error ccccceeeencceeeeeeeeeeseeeneaaes 76 7 5 2 Memory Address Parity Error cceccseeececceeeeeeeeseeceeeeeeeeeeeeeseseneeeeeeeeeeeeneseeeeees 78 8 PCI Express and Legacy PCI Subsystem cccccseeeeeceeeeeeeeeeesenneeeeeeseeeeeenseeeeeeeeeeeeeees 81 8 1 PGI Express Error Sereen nnp ee paea aaa aE a a eai AEAEE E 81 8 1 1 ele AS ee 81 8 1 2 PCI Express Fatal Errors and Fatal Error Ai 82 8 1 3 PCI Express Correctable te 84 9 System BIOS IE Vents ed 87 9 1 Ee A E E E tels seach een ees Meeedtea 87 9 1 1 System Boot aere 87 9 1 2 Timestamp Clock Synchronization 4 ssccicssosaeesedeencnaesceke avassieessvianereeet ends ARC ERENAEEN 87 9 2 System Firmware Progress Formerly Post Error 89
115. he issue remains replace the board 3 If the issue remains replace the power supplies 1 8V IB I O is supplied by the main board on specific platforms 1 8V IB I O is used by the on board Infiniband controller on those specific platforms Baseboard 1 8V EDh 1 Ensure all cables are connected correctly BB 1 8V IB I O 8 2 Ifthe issue remains replace the board 3 Ifthe issue remains replace the power supplies This 1 1V line is supplied by the main board EEh Baseboard 1 1V This 1 1V line is used by the Intel C600 series Chipset BB 1 1V PCH 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the board 1 2V is supplied by the main board on specific platforms 1 2V is used by the on board Infiniband controller on those specific platforms Baseboard 1 2V EFh 1 Ensure all cables are connected correctly BB 1 2V IB 2 If the issue remains replace the board 3 Ifthe issue remains replace the power supplies 4 2 Voltage Regulator Watchdog Timer Sensor The BMC FW monitors that the power sequence for the board VR controllers is completed when a DC power on is initiated Incompletion of the sequence indicates a board problem in which case the FW powers down the system The sequence is as follows BMC FW monitors the PowerSupplyPowerGood signal for assertion indicating a DC power on has been initiated and starts a timer VR Watchdog Timer For EPSD Platforms Based on Intel Xeon
116. he products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications Current characterized errata are available on request Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order Copies of documents which have an order number and are referenced in this document or other Intel literature may be obtained by calling 1 800 548 4725 or go to http www intel com design literature Revision 1 1 Intel order number G90620 002 ii Table of Contents System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Table of Contents SM AO GUG TON WEEN 1 1 1 Nie 1 1 2 Industry STAN IAG TE 2 1 2 1 Intelligent Platform Management Interface IPMI eee cere eeeeeentaeeeeeeeeneee 2 1 2 2 Baseboard Management Controller DM 2 1 2 3 Intel Intelligent Power Node Manager Version 30 3 2 Basic Decoding ot a SEL Record iicccccaine ene ea 4 2 1 Default Values in the SEL Records biet deeg ne kainate 4 2 2 Notes on SEL Logs and Collecting SEL Information ccccceeceeseeeeeeeeeeeeeeeees 10 2 2 1 Examples of Decoding BIOS Timestamp Events s nsssosenneeeneeeserrsserrrnrrrreeeesnee 10 2 2 2 Example of Decoding a PCI Express Correctable Error Events 11 2 2 3 Example of Decoding a Power S
117. he system and logs state changes Expected power on events such as DC ON OFF is logged and unexpected events are also logged such as AC loss and power good loss 34 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Table 15 Power Unit Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 09h Power Unit 12 Sensor Number Oth 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Sensor Specific offset as described in Table 16 15 Event Data 2 Not used 16 Event Data 3 Not used Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps Power Subsystems hardware forced a power down Sensor Specific Offset Description Next Steps Hex Description 00h Power down System is powered down Informational Event 02h 240 VA power down 240 VA power limit was exceeded and the This could have been caused by many things 3 Remove replace the power supply 1 If you recently added hardware try removing it 2 Remove replace any add in adapters 4 Remove replace the processors DIMM and or hard drives 5 Remove replace the boards in
118. i Coprocessor adapter provides an IPMI sensor that is read to get the temperature data The BMC then instantiates its own version of this sensor which is used for fan speed control The thermal margin sensor is the difference between the Core Temp sensor value and the TControl value reported by the Intel Xeon Phi Coprocessor adapter This sensor will not log events into the SEL 11 9 2 Intel Xeon Phi Coprocessor MIC Status Sensors Every time DC power is turned on the BMC checks for Intel Xeon Phi Coprocessor adapters installed in the system All compatible cards will be enabled for management The status sensor is a direct copy of the status sensor reported by the Intel Xeon Phi Coprocessor adapter Table 86 MIC Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type COh OEM defined 12 Sensor Number A2h MIC 1 Status A3h MIC 2 Status A6h MIC 3 Status A7h MIC 4 Status Revision 1 1 Intel order number GS0620 002 109 Miscellaneous Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 70h OEM defined 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger O
119. ion and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 08h digital discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Oh Device Removed Device Absent 1h Device Inserted Device Present 15 Event Data 2 Not used 16 Event Data 3 Not used 11 8 1 Add In Module Presence Next Steps If an unexpected device is removed or inserted ensure that the module has been seated properly 108 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Miscellaneous Events 11 9 Intel Xeon Phi Coprocessor Management Sensors The Intel Xeon Processor E5 4600 2600 2400 1600 Product Families BMC supports limited manageability of the Intel Xeon Phi Coprocessor adapter as described in this section The Intel Xeon Phi Coprocessor adapter uses the Many Integrated Core MIC architecture and the sensors are referred to as MIC sensors For each manageable Intel Xeon Phi Coprocessor adapter found in the system the BMC automatically enables the associated thermal margin sensors 0xC4 0xC7 and status sensors OxA2 OxA3 OxA6 0xA7 11 9 1 Intel Xeon Phi Coprocessor MIC Thermal Margin Sensors The management controller FW of the Intel Xeon Ph
120. ion events Table 84 Firmware Update Status Sensor Typical Characteristics Byte Field Description 11 Sensor Type 2Bh Version Change 12 Sensor Number 12h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 70h OEM defined 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Oh Update started 1h Update completed successfully 02h Update failure 15 Event Data 2 Bits 7 4 Target of update 0000b BMC 0001b BIOS 0010b ME All other values are reserved Bits 3 1 Target instance zero based Bits 0 0 Reserved 16 Event Data 3 Not used Revision 1 1 Intel order number G90620 002 107 Miscellaneous Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 11 8 Add In Module Presence Sensor Some server boards provide dedicated slots for add in modules boards for example SAS IO and PCle riser For these boards the BMC provides an individual presence sensor to indicate whether the module board is installed Table 85 Add In Module Presence Sensor Typical Characteristics Byte Field Description 11 Sensor Type 15h Module Board 12 Sensor Number OEh IO Module Presence OFh SAS Module Presence 13h IO Module2 Presence 13 Event Direct
121. irmware and drivers oh CPU Core 1 Cross test the processors Error 2 Replace the processors depending on the results of the test 3h MSID Verify the processor is supported by your baseboard Check your boards Technical Product Specification Mismatch TPS 6 3 CPU Missing Sensor The CPU Missing sensor is a discrete sensor reporting the processor is not installed The most common instance of this event is due to a processor populated in the incorrect socket Table 51 CPU Missing Sensor Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number 82h 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 62 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Processor Subsystem 6 3 1 CPU Missing Sensor Next Steps Verify the processor is installed in the correct slot 6 4 Quick Path Interconnect Sensors The Intel Quick Path Interconnect QPI bus on Intel EPSD Boards Based on Intel Xeon Processor E5 4600 2600 2400 1600 1400 Prod
122. is being logged it is because the BMC has been configured to check the watchdog timer 1 Make sure you have support for this in your OS typically using a third party IPMl aware utility such as ipmitool or ipmiutil along with the OpenIPMI driver 2 If this is the case it is likely your OS has hung and you need to investigate OS event logs to determine what may have caused this 11 2 SMI Timeout SMI stands for system management interrupt and is an interrupt that gets generated so the processor can service server management events typically memory or PCI errors or other forms of critical interrupts in order to log them to the SEL If this interrupt times out the system is frozen The BMC will reset the system after logging the event 102 Table 79 SMI Timeout Sensor Typical Characteristics Byte Field Description 11 Sensor Type F3h SMI Timeout 12 Sensor Number 06h 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 03h digital Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Produ
123. istics ccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeenaeeees 51 viii Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 Revision 1 1 4600 2600 2400 1 600 1 400 Product Families List of Tables Table 39 Thermal Margin Sensors Event Triggers Description 52 Table 40 Thermal Margin Sensors Next Steps 00 cccceeeeeceeeeeeeneeeeeeeeaeeeeeeeeeeeeeeneeeeeeneneeeees 52 Table 41 Processor Thermal Control Sensors Typical Charachertsics eee 53 Table 42 Processor Thermal Control Sensors Event Triggers Description 54 Table 43 Processor DTS Thermal Margin Sensors Typical Characteristics ccceeeees 55 Table 44 Discrete Thermal Sensors Typical Characteristics cceeceseeeeeeeeeeeeeeeeeeeeeeeneeees 56 Table 45 Discrete Thermal Sensors Next Steps c ccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeneeeeeesenaeeees 56 Table 46 DIMM Thermal Trip Typical Characteristics ccccccceeeeeeeeeeeceeeeeeeeeeeeeeeeneeeeeesenaeeees 57 Table 47 Process Status Sensors Typical Characteristics ccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeneeeees 59 Table 48 Processor Status Sensors Next Gieps nnt 60 Table 49 Catastrophic Error Sensor Typical Characteristics cceccceeeeeeeeeeeeeeeeeeeeeeeseneeees 61 Table 50 Catastrophic Error Sensor Event Data 2 Values Next Steps eeeeeeeeees 61 Table
124. it can log an event Table 20 Power Supply Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 08h Power Supply 12 Sensor Number 50h Power Supply 1 Status 51h Power Supply 2 Status 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 38 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Power Subsystems Byte Field Description 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 ED2 data in Table 21 5 4 ED3 data in Table 21 3 0 Sensor Specific offset as described in Table 21 15 Event Data 2 As described in Table 21 16 Event Data 3 As described in Table 21 Table 21 Power Supply Status Sensor Sensor Specific Offsets Next Steps Sensor Specific Offset Description ED2 ED3 Next Steps Hex Description 00h Presence Power supply detected 00b Unspecified Event Data 2 00b Unspecified Event Data 3 Informational Event Oih Failure Power supply failed 10b OEM code in Event Data 2 10b OEM code in Event Data 3 Indicates a power supply Check the data in ED2 01h Output voltage fault failed and ED3 for more details a 92h Output power fault Will have the contents of the 1 Remove and reapply 03h Output over current
125. jor 8133 Processor 04 disabled Major 8160 Processor 01 unable to apply microcode update Major 8161 Processor 02 unable to apply microcode update Major 8162 Processor 03 unable to apply microcode update Major 8163 Processor 04 unable to apply microcode update Major 8170 Processor 01 failed Self Test BIST Major 8171 Processor 02 failed Self Test BIST Major 8172 Processor 03 failed Self Test BIST Major 8173 Processor 04 failed Self Test BIST Major 8180 Processor 01 microcode update not found Minor 8181 Processor 02 microcode update not found Minor 8182 Processor 03 microcode update not found OSL Minor 8183 Processor 04 microcode update not found Minor 90 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Revision 1 1 Error Code Error Message Response 8190 Watchdog timer failed on last boot Major 8198 OS boot watchdog timer failure Major 8300 Baseboard management controller failed self test Major 8305 Hot Swap Controller failure Major 83A0 Management Engine ME failed self test Major 83A1 Management Engine ME Failed to respond Major 84F2 Baseboard management controller failed to respond Major 84F3 Baseboard management controller in update mode Major 84F4 Sensor data record empty Major
126. jor 8535 DIMM_H1 failed test initialization Major 8536 DIMM_H2 failed test initialization Major 8537 DIMM_HS failed test initialization Major 8538 DIMM_J1 failed test initialization Major 8539 DIMM_J2 failed test initialization Major 853A DIMM_J3 failed test initialization Major 853B DIMM_K1 failed test initialization Major 853C DIMM_K2 failed test initialization Major 853D DIMM_K3 failed test initialization Major 853E DIMM_L1 failed test initialization Major 853F DIMM_L2 failed test initialization Major Go to 85C0 8540 DIMM_A1 disabled Major 8541 DIMM_A2 disabled Major 8542 DIMM A3 disabled Major 8543 DIMM_B1 disabled Major 8544 DIMM_B2 disabled Major 8545 DIMM_B3 disabled Major 8546 DIMM_C1 disabled Major 8547 DIMM_C2 disabled Major 8548 DIMM_C3 disabled Major 8549 DIMM_D1 disabled Major 854A DIMM_D2 disabled Major 854B DIMM_D3 disabled Major 854C DIMM_E1 disabled Major 854D DIMM_E2 disabled Major 854E DIMM_E8 disabled Major 854F DIMM_F1 disabled Major 8550 DIMM_F2 disabled Major 8551 DIMM_F3 disabled Major 8552 DIMM_G1 disabled Major 92 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Revision 1 1 Error Code Error Message Response 8553 DIMM_G2 disabled Major 8554 DIMM_G3 disabled M
127. ket replace the main board otherwise the processor This 1 5V line is supplied by the main board This 1 5V line is used by processor 1 memory slots A and B Baseboard 1 5V P1 Memory AB YP y D8h VDDQ 1 Ensure all cables are connected correctly BB 1 5 P1MEM AB 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM This 1 5V line is supplied by the main board This 1 5V line is used by processor 1 memory slots C and D Baseboard 1 5V P1 Memory CD YP y D9h VDDQ 1 Ensure all cables are connected correctly BB 1 5 P1MEM CD 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM 30 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Power Subsystems Sensor Sensor Name Next Steps Number This 1 5V line is supplied by the main board This 1 5V line is used by processor 2 memory slots A and B Baseboard 1 5V P2 Memory AB YP y DA VDDQ 1 Ensure all cables are connected correctly BB 1 5 P2MEM AB 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM This 1 5V line is supplied by the
128. l Capabilities Change Next Giepns eee 121 13 5 Node Manger Alert Threshold Exceeded eccseeeenccceeeeeeeeeeeeeeeeeeeeeeeneeeeenens 122 vi Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table of Contents 13 5 1 Node Manger Alert Threshold Exceeded Next Steps ceeeeeeeeeeeeeeeeeenees 123 14 Microsoft Windows Records iveccccccccescescsieveccccecacenvadeescceucvoceaussaseedenCoadeSvavesseverccccsasavess 124 14 1 Boot up Event RECORAS ET 124 14 2 Shutdown Event RECOrd Sainn a Hae eee Ree 126 14 3 Bug Check Blue Screen Event RecordS Abu 128 15 Linux Kernel Panic RecordS saannnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnmnnn nnmnnn nnna 130 Revision 1 1 Intel order number G90620 002 vii List of Tables System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families List of Tables Tabl 1 SEL Rec rd Format EE 4 Table 2 Event Request Message Event Data Field Content 7 Table 3 OEM SEL Record Type COh DFh EEN 8 Table 4 OEM SEL Record Type Ee EE geigeferete egerieug iiteete ch andaneett eeeeteanninns 9 Table 5 BMC Opere SeOnSOLs siissssicadovecdevsisstcaketeinscndaninienrmsacereadswaukecdtautunasdeveneuertiniaabshaauaaghanioes 13 Table 6 BIOS POST owned Sensors
129. main board This 1 5V line is used by processor 2 memory slots C and D Baseboard 1 5V P2 Memory CD YP y DBh VDDQ 1 Ensure all cables are connected correctly BB 1 5 P2MEM CD 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise the DIMM 1 8V AUX is supplied by the main board 1 8V AUX is used by the BMC and on board NIC Baseboard 1 8V Aux S y DCh 1 Ensure all cables are connected correctly BB 1 8V AUX S 2 If the issue remains replace the board 3 Ifthe issue remains replace the power supplies 1 1V STBY is supplied by the main board 1 1V STBY is used by the Intel C600 series Chipset Baseboard 1 1V Stand by DDh 1 Ensure all cables are connected correctly BB 1 1V STBY j 2 Ifthe issue remains replace the board 3 Ifthe issue remains replace the power supplies 3 3V Vbat is supplied by the CMOS battery when power is off and by the main board when power is on DEh Baseboard CMOS Battery 3 3V Vbat is used by the CMOS and related circuits BB 3 3V Vbat 1 Replace the CMOS battery Any battery of type CR2032 can be used 2 If error remains unlikely replace the board This 1 35V line is supplied by the main board This 1 35V line is used by processor 1 memory slots A and B Baseboard 1 35V P1 Low Voltage as 4 E4h Memory AB VDDQ 1 Ensure all cables are connected correctly BB 1 35 P1LV A
130. n headers for each chassis and will log this event if there is no fan on that header 1 Refer to the Quick Start Guide or the Service Guide to identify the correct fan headers to use 2 Ensure the latest FRUSDR update has been run and the correct chassis is detected or selected 3 If you are sure this was done the event may be a sign of impending fan failure although this only normally applies if the system has been in use for a while Replace the fan 02h Lower critical non fatal Degraded The fan speed has dropped going low below its lower critical threshold 5 1 2 Fan Presence and Redundancy Sensors Fan presence sensors are only implemented for hot swap fans and require an additional pin on the fan header Fan redundancy is an aggregate of the fan presence sensors and will warn when redundancy is lost Typically the redundancy mode on Intel servers is an n 1 redundancy if one fan fails there are still sufficient fans to cool the system but it is no longer redundant although other modes are also possible 46 Table 31 Fan Presence Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number 40h 4Fh Chassis specific 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 08h Generic digital Discrete Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Gui
131. n is the BIOS timestamp synchronization event log This event can be logged by the BIOS during POST or it can be logged by the BIOS SMI Handler when a system is requested to do a shutdown or a restart from the operating system OS See section 2 2 1 for examples Most utilities report this as just a BIOS event and do not differentiate between the two But sometimes it is useful because you can see the sequence of events better For example if there are multiple sequences of the timestamp synchronization events was the power lost after booting to the OS and then the system restarted was it multiple POST events or was it a restart from the OS An example of not decoding all the information is with the PCI Express errors and some of the Power Supply events For the PCI Express errors the type of error and the PCI Bus Device and Function are all a part of Event Data 1 through Event Data 3 See section 2 2 2 For the Power Supply events when there is a failure predictive failure or a configuration error Event Data 2 and Event Data 3 hold additional information that describes the Power Supplies PMBus Command Registers and values for that particular event See section 2 2 3 2 2 1 Examples of Decoding BIOS Timestamp Events The following are some samples of BIOS timestamp events during POST and during an OS shutdown 2 2 1 1 BIOS POST Timestamp Events RID 19 01 RT 02 TS 57 49 6A 4E GID 01 00 ER 04 ST 12 SN 83 EDIR 6F ED1 05 ED2 00 ED3
132. nk is lost 10 1 Physical Security Two sensors are included in the physical security subsystem chassis intrusion and LAN leash lost 10 1 1 Chassis Intrusion Chassis Intrusion is monitored on supported chassis and the BMC logs corresponding events when the chassis lid is opened and closed 10 1 2 LAN Leash Lost The LAN Leash lost sensor monitors the physical connection on the on board network ports If a LAN Leash lost event is logged this means the network port lost its physical connection Table 73 Physical Security Sensor Typical Characteristics Byte Field Description 11 Sensor Type 05h Physical Security 12 Sensor Number 04h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 74 Revision 1 1 Intel order number G90620 002 97 Chassis Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 15 Event Data 2 Not used 16 Event Data 3 Not used Table 74 Physical Security Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description Somebody has opened the chassis or the
133. nsor Name Details Section Next Steps Number BMC Watchdo OAh 9 BMC Watchdog Sensor BMC Watchdog Sensor Next Steps BMC Watchdog Volt Regulator Watchd g g g OBh Dems Saye Volta e Regulator Watchdo Voltage Regulator Watchdog Timer Sensor Next Steps VR Watchdog Timer Sensor Fan Redundancy Fan Presence and Redundancy h Table 34 Fan Redundancy Sensor Event Trigger Offset Next Steps ES Fan Redundancy Sensors able SSB Thermal Tri ODh P Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps SSB Thermal Trip IO Module P OEh macros ts Add In Module Presence Sensor Add In Module Presence Next Steps IO Mod Presence pe Se es wre a SAS Module Presence OFh Add In Module Presence Sensor Add In Module Presence Next Steps SAS Mod Presence BMC Firmware Health 10h BMC FW Health Sensor BMC FW Health Sensor Next Steps BMC FW Health System Airflow System Air Flow Monitoring d 11h Not applicable System Airflow Sensor pp Firmware Update Status 12h Firmware Update Status Sensor Not applicable FW Update Status PP IO Module2 P 13h SR Add In Module Presence Sensor Add In Module Presence Next Steps IO Mod Presence B T 5 E p 14h SE E Threshold based Temperature Table 37 Temperature Sensors Next Steps Platform Specific Sensors Baseboard Temperature 6 Threshold based Temperature 15h i p threshold based Temperature Table 37 Temperature Sensors Next Steps Platform Specific Sen
134. nsors Table 43 Processor DTS Thermal Margin Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number 83h Processor 1 DTS Thermal Margin 84h Processor 2 DTS Thermal Margin 85h Processor 3 DTS Thermal Margin 86h Processor 4 DTS Thermal Margin 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 5 2 5 Discrete Thermal Sensors Discrete thermal sensors do not report a temperature at all instead they report an overheating event of some kind For example VRD Hot voltage regulator is overheating or processor Thermal Trip the processor got so hot that its over temperature protection was triggered and the system was shut down to prevent damage Revision 1 1 Intel order number G90620 002 55 Cooling Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 44 Discrete Thermal Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 45 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type See Table 45 14 Event Data 1 7 6 00b Unspecified Event
135. nt Next steps depend on the policy that was set See the Node Manager Specification for more details Revision 1 1 Intel order number G90620 002 117 Manageability Engine ME Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 13 3 Node Manager Health Event A Node Manager Health Event message provides a runtime error indication about Intel Intelligent Power Node Manager s health Types of service that can send an error are defined as follows Misconfigured policy Error reading power data Error reading inlet temperature Table 95 Node Manager Health Event Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch or 602Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 19h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 73h OEM 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Health Event Type 02h Sensor Node Manager 15 Event Data 2 7 4 Error type 0 9 Reserved 10 Policy Misconfiguration 11 Power Sensor Reading Failure 12 Inlet Temperature Reading Failure 13 Host Communication error 14 Real time clock synchronization failure 15 Platform shutdown initiated by NM policy due to execution of action defined by Policy
136. nt Trigger Assertion Deassert Description Hex Description Severity Severity P 00h St Mal Degraded OK The voltage has dropped below its lower non critical threshold 02h sha H non fatal Degraded The voltage has dropped below its lower critical threshold 07h ick Degraded OK The voltage has gone over its upper non critical threshold 09h Upper critical non fatal Degraded The voltage has gone over its upper critical threshold going high Table 13 Threshold based Voltage Sensors Next Steps Sensor Sensor Name Next Steps Number This 1 05V line is supplied by the main board This 1 05V line is used by processor 1 19h Baseboard 1 05V Processor3 Vccp 1 Ensure all cables are connected correctly BB 1 05Vccp P3 2 Check the processor is seated properly 3 Cross test the processors If the issue remains with the processor socket replace the main board otherwise the processor This 1 05V line is supplied by the main board This 1 05V line is used by processor 1 4An Baseboard 1 05V Processor4 Vccp 1 Ensure all cables are connected correctly BB 1 05Vccp P4 2 Check the processor is seated properly 3 Cross test the processors If the issue remains with the processor socket replace the main board otherwise the processor 28 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Power
137. nt Trigger Processor Status Next Steps Offset 1 Cross test the processors Oh Internal error IERR P 7 2 Replace the processors depending on the results of the test This event normally only happens due to failures of the thermal solution 1 Verify heatsink is properly attached and has thermal grease th Thermal trip 2 Ifthe system has a heatsink fan ensure the fan is spinning 3 Check all system fans are operating properly 4 Check that the air used to cool the system is within limits typically 35 C 2h FRB1 BIST failure 1 Cross test the processors 3h FRB2 Hang in POST failure 2 Replace the processors depending on the results of the test 4h FRB3 Processor startup initialization failure CPU fails to start 5h Configuration error for DMI 6h SM BIOS uncorrectable CPU complex error 7h Processor presence detected Informational Event 8h Processor disabled 1 Cross test the processors 9h Terminator presence detected 2 Replace the processors depending on the results of the test 60 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families 6 2 Catastrophic Error Sensor Processor Subsystem When the Catastrophic Error signal CATERR stays asserted it is a sign that something serious has gone wrong in the hardware The BMC monitors this signal and reports when it stays asserted Table 49
138. o the action so you need to investigate the SEL and PEF settings to identify this event and troubleshoot accordingly 104 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Miscellaneous Events 11 5 BMC Watchdog Sensor The BMC supports an IPMI sensor to report that a BMC reset has occurred due to an action taken by the BMC Watchdog feature A SEL event will be logged whenever either the BMC FW stack is reset or the BMC CPU itself is reset Table 82 BMC Watchdog Sensor Typical Characteristics Byte Field Description 11 Sensor Type 28h Management Subsystem Health 12 Sensor Number OAh 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 03h digital Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 11 5 1 BMC Watchdog Sensor Next Steps A SEL event will be logged whenever either the BMC FW stack is reset or the BMC CPU itself is reset 1 Check the SEL for any other events around the time of the failure Take note of all IPMI activity that was occurring around the time of the failure Capture a System BMC Debug Log as soon as you can afte
139. ocessor ERR2 Timeout Next Steps P4 ERR2 Gia ea ia i Set me Catastrophic Error i 80h Catastrophic Error Sensor Table 50 Catastrophic Error Sensor Event Data 2 Values Next Steps CATERR Processor 1 MSID Mismatch Processor MSID Mismatch 81h ao eee Processor MSID Mismatch Sensor Next Steps P1 MSID Mismatch Sensor P Population Fault 82h princes SE STEE CPU Missing Sensor CPU Missing Sensor Next Steps CPU Missing Processor 1 4 DTS Thermal 83h 86h Margin Processor DTS Thermal Margin Not applicable P1 P4 DTS Therm Mgn ETE Processor 2 MSID Mismatch i 87h f Processor MSID Mismatch Processor MSID Mismatch Sensor Next Steps P2 MSID Mismatch Sensor P 3 MSID Mi tch i 88h ee SE Processor MSID Mismatch Processor MSID Mismatch Sensor Next Steps P3 MSID Mismatch Sensor P 4 MSID Mi tch i 89h SEERA SE Processor MSID Mismatch Processor MSID Mismatch Sensor Next Steps P4 MSID Mismatch Sensor Frocessor Mot Mismatch sensor Nex D Processor 1 VRD Tem 90h H Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps P1 VRD Hot Gs a ea EE P 2 VRD Ti 91h feiere ST Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps P2 VRD Hot a See ee Processor 3 VRD Tem 92h H Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next Steps P3 VRD Hot Processor 4 VRD Tem 93h H Discrete Thermal Sensors Table 45 Discrete Thermal Sensors Next
140. ode Manager Version 2 0 Intel Intelligent Power Node Manager Version 2 0 NM is a platform resident technology that enforces power and thermal policies for the platform These policies are applied by exploiting subsystem knobs such as processor P and T states that can be used to control power consumption Intel Intelligent Power Node Manager enables data center power and thermal management by exposing an external interface to management software through which platform policies can be specified It also enables specific data center power management usage models such as power limiting The configuration and control commands are used by the external management software or BMC to configure and control the Intel Intelligent Power Node Manager feature Because Platform Services firmware does not have any external interface external commands are first received by the BMC over LAN and then relayed to the Platform Services firmware over IPMB channel The BMC acts as a relay and the transport conversion device for these commands For simplicity the commands from the management console might be encapsulated in a generic CONFIG packet format configuration data length configuration data blob to the BMC so that the BMC doesn t even have to parse the actual configuration data The BMC provides the access point for remote commands from external management SW and generates alerts to them Intel Intelligent Power Node Manager on Intel Manageability En
141. on SPD failure Major 8570 DIMM_F2 encountered a Serial Presence Detection SPD failure Major 8571 DIMM_F3 encountered a Serial Presence Detection SPD failure Major Intel order number G90620 002 System BIOS Events 93 System BIOS Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 94 Error Code Error Message Response 8572 DIMM_G1 encountered a Serial Presence Detection SPD failure Major 8573 DIMM_G2 encountered a Serial Presence Detection SPD failure Major 8574 DIMM_G3 encountered a Serial Presence Detection SPD failure Major 8575 DIMM_H1 encountered a Serial Presence Detection SPD failure Major 8576 DIMM_H2 encountered a Serial Presence Detection SPD failure Major 8577 DIMM_H3 encountered a Serial Presence Detection SPD failure Major 8578 DIMM_J1 encountered a Serial Presence Detection SPD failure Major 8579 DIMM_J2 encountered a Serial Presence Detection SPD failure Major 857A DIMM_J3 encountered a Serial Presence Detection SPD failure Major 857B DIMM_K1 encountered a Serial Presence Detection SPD failure Major 857C DIMM_K2 encountered a Serial Presence Detection SPD failure Major 857D DIMM_K8 encountered a Serial Presence Detection SPD failure Major 857E DIMM_L1 encountered a Serial Presence Detection SPD f
142. orms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Table 56 Processor ERR2 Timeout Sensor Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number 7Ch Processor 1 ERR2 Timeout 7Dh Processor 2 ERR2 Timeout 7Eh Processor 3 ERR2 Timeout 7Fh Processor 4 ERR2 Timeout 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 03h digital discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 6 5 1 Processor ERR2 Timeout Next Steps 1 Check the SEL for any other events around the time of the failure Take note of all IPMI activity that was occurring around the time of the failure Capture a System BMC Debug Log as soon as you can after experiencing this failure This log can be captured from the Integrated BMC Web Console or by using the Intel Syscfg utility syscfg somcdl private filename zip Send the log file to your system manufacturer or Intel representative for failure analysis 6 6 Processor MSID Mismatch Sensor The BMC supports a MSID Mismatch sensor for monitoring for the fault condition that will occur if there is a power rating incompatibility between a baseboard and a processor The
143. oss test the processor If the issue remains with the processor socket replace the main board otherwise the processor 6 4 2 QPI Correctable Error Sensor The system detected an error and corrected it This is an informational event Table 53 QPI Correctable Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 06h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 72h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 64 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Byte Field Description 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used 6 4 2 1 QPI Correctable Error Sensor Next Steps Processor Subsystem This is an Informational event only Correctable errors are acceptable and normal at a low rate of occurrence If the error continues 1 Check the processor is installed correctly 2 Inspect the socket for bent pins 3 Cross test the processor If the issue remains with the processor socket replace the main board otherwise the processor 6 4 3 QPI
144. r EPSD Platforms Based on Intel Xeon Processor E5 Table 79 Table 80 Table 81 Table 82 Table 83 Table 84 Table 85 Table 86 Table 87 Table 88 Table 89 Table 90 Table 91 Table 92 Table 93 Table 94 Table 95 Table 96 Table 97 Table 98 Table 99 Table 100 Table 101 Table 102 Table 103 Table 104 Table 105 Table 106 4600 2600 2400 1600 1 400 Product Families SMI Timeout Sensor Typical Characteristics c cccecceceeeeeeneeeeeeeeneeeeeeeeneeeeeeeeeeees 102 System Event Log Cleared Sensor Typical Charachertsics 103 System Event PEF Action Sensor Typical Characteristics ccceeceeeeeeeeeeees 104 BMC Watchdog Sensor Typical Characteristics 0 cccccceceeeeeeeeeeeeneeeeeeeeneeeeeeees 105 BMC FW Health Sensor Typical Characteristics cccccceeeseeeeeeeeteeeeeeeteeeeeesees 106 Firmware Update Status Sensor Typical Charactertsics esere 107 Add In Module Presence Sensor Typical Characteristics ccceeeeeseeeeeeeeeeeeees 108 MIC Status Sensors Typical Charactertetce 109 HSC Backplane Temperature Sensor Typical Characteristics c eeeeeeeeeeeee 111 HSC Backplane Temperature Sensor Event Trigger Offset Next Gienps 112 Hard Disk Drive Monitoring Sensor Typical Charachertsics ree 112 Hard Disk Drive Monitoring Sensor Event Trigger Offset Next Gieps 113 HSC Health Sensor Typical Characteristics ccccceeeceeeeeeeeeneee
145. r experiencing this failure This log can be captured from the Integrated BMC Web Console or by using the Intel Syscfg utility syscfg somcdl private filename zip Send the log file to your system manufacturer or Intel representative for failure analysis Revision 1 1 Intel order number GS0620 002 105 Miscellaneous Events System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 11 6 BMC FW Health Sensor The BMC tracks the health of each of its IPMI sensors and reports failures by providing a BMC FW Health sensor of the IPMI 2 0 sensor type Management Subsystem Health with support for the Sensor Failure offset Only assertions will be logged into the SEL for the Sensor Failure offset The BMC Firmware Health sensor asserts for any sensor when 10 consecutive sensor errors are read These are not standard sensor events that is threshold crossings or discrete assertions These are BMC Hardware Access Layer HAL errors such as DC NAKs or internal errors while attempting to read a register If a successful sensor read is completed the counter resets to Zero Table 83 BMC FW Health Sensor Typical Characteristics Description 28h Management Subsystem Health 10h Byte Field 11 Sensor Type 12 Sensor Number 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh
146. r supply ran I achometer sensors Next steps AOh PS1 Fan Tach 1 Sensors Power Supply Fan Tachometer Sensors Next Steps Power Supply 1 Fan Tachometer 2 pply Ath H Power Supply Fan Tachometer Power Supply Fan Tachometer Sensors Next Steps PS1 Fan Tach 2 Sensors Intel Xeon Phi Coprocessor as PRG A2h Status 1 MIC SC Ga Intel Xeon Phi Coprocessor MIC Status Sensors Next Steps MIC 1 Status Intel Xeon Phi Coprocessor ET A3h Status 2 ee El Pea Intel Xeon Phi Coprocessor MIC Status Sensors Next Steps MIC 2 Status MIC Status Sensors Power Supply 2 Fan Tachometer 1 pply A4h H Power Supply Fan Tachometer Power Supply Fan Tachometer Sensors Next Steps PS2 Fan Tach 1 Sensors Revision 1 1 Intel order number G90620 002 19 Sensor Cross Reference List System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Sensor Sensor Name Details Section Next Steps Number Power Supply 2 Fan Tachometer 2 Power Supply Fan Tachometer A5h PS2 Fan Tach 2 Sensors Power Supply Fan Tachometer Sensors Next Steps Intel Xeon Phi Coprocessor A6h Status 3 MIC 3 Status Intel Xeon Phi Coprocessor MIC Status Sensors Intel Xeon Phi Coprocessor MIC Status Sensors Next Steps Intel Xeon Phi Coprocessor A7h Status 4 MIC 4 Status Intel Xeon Phi Coprocessor
147. racteristics ccccsceeeeeeeeeeeeeeeeeeeeeeneeeees 78 Table 66 Legacy PCI Error Sensor Typical Characteristics ccccceccceeeeeeeeeeeeeeeeeeeeeeeesenaeeees 81 Table 67 PCI Express Fatal Error Sensor Typical Characteristics ecceceeeeeeeeeeeeeeeneeees 82 Table 68 PCI Express Fatal Error 2 Sensor Typical Characteristics cc cceeeeeeeeeeeeseeeees 83 Table 69 PCI Express Correctable Error Sensor Typical Characteristics 0 cceceeeeees 85 Table 70 System Event Sensor Typical Characteristics cccccccceeeeeeeeeeeeeeeeeeeeeeeseeeeeeseaeeees 88 Table 71 POST Error Sensor Typical Characteristics ccccceceeeeeeeeeeeceeeeeeeeeeeeeeeeeeeeeeeeeeeeees 89 Table 72 POST Error Code EE 90 Table 73 Physical Security Sensor Typical Characteristics cc cceececeeeeeeeeeeeeeeeeeeeeeeeesenneeees 97 Table 74 Physical Security Sensor Event Trigger Offset Next Steps cc eeeeeeeeeeeeeteeeees 98 Table 75 FP NMI Interrupt Sensor Typical Characteristics cccccccscssesssceeeeeeeesssesseaees 99 Table 76 Button Sensor Typical Characteristics AEN 100 Table 77 IPMI Watchdog Sensor Typical Characteristics ccccceeeeseeeeeeeeeeeeeeeeeeeeeeteeneeees 101 Table 78 IPMI Watchdog Sensor Event Trigger Offset Next Steps cceeeeeeeeeeeeeeenees 102 Intel order number G90620 002 ix List of Tables System Event Log Troubleshooting Guide fo
148. report an actual temperature These are linear threshold based sensors In most Intel Server Systems multiple sensors are defined front panel temperature and baseboard temperature There are also multiple other sensors that can be defined and are platform specific Most of these sensors typically have upper and lower thresholds set upper to warn in case of an over temperature situation lower to warn against sensor failure temperature sensors typically read out 0 if they stop working Table 35 Temperature Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 37 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 36 15 Event Data 2 Reading that triggered event Revision 1 1 Intel order number GS0620 002 49 Cooling Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 16 Event Data 3 Threshold value that triggered event Table 36 Temperature Sensors Event Triggers Description Event Trigger Assertion Deassert Descrip
149. rocessor E5 4600 2600 2400 1 600 1 400 Product Families PCI Express and Legacy PCI Subsystem Byte Field Description 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Oh Data Link Layer Protocol Error 1h Surprise Link Down Error 2h Completer Abort 3h Unsupported Request 4h Poisoned TLP 5h Flow Control Protocol 6h Completion Timeout 7h Receiver Buffer Overflow 8h ACS Violation 9h Malformed TLP Ah ECRC Error Bh Received Fatal Message From Downstream Ch Unexpected Completion Dh Received ERR_NONFATAL Message Eh Uncorrectable Internal Fh MC Blocked TLP 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number The PCI Express Fatal Error 2 is a continuation of the PCI Express Fatal Error Table 68 PCI Express Fatal Error 2 Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt Revision 1 1 Intel order number GS0620 002 83 PCI Express and Legacy PCI Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 12 Sensor Number 14h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion
150. rs the system for critical events by communicating with various sensors on the system 2 Intel order number GS0620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1400 Product Families Introduction board it sends alerts and logs events when certain parameters exceed their preset thresholds indicating a potential failure of the system The administrator can also remotely communicate with the BMC to take some corrective action such as resetting or power cycling the system to get a hung OS running again These abilities save on the total cost of ownership of a system For Intel Server Boards and Intel Server Platforms the BMC supports the industry standard IPMI 2 0 Specification enabling you to configure monitor and recover systems remotely 1 2 2 1 System Event Log SEL The BMC provides a centralized non volatile repository for critical warning and informational system events called the System Event Log or SEL By having the BMC manage the SEL and logging functions it helps to ensure that post mortem logging information is available if a failure occurs that disables the system processor s The BMC allows access to SEL from in band and out of band mechanisms There are various tools and utilities that can be used to access the SEL There is the Intel SELView utility and multiple open sourced IPMI tools 1 2 3 Intel Intelligent Power N
151. s Correctable Error Sensor Next Steps 24 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Cross Reference List Sensor Sensor Name Details Section Next Steps Number Intel Quick Path Interface QP Correctable Error sensor LIT Correctable Error sensor Next steps 06h Correctable Error QPI Correctable Error Sensor QPI Correctable Error Sensor Next Steps 07h Intel Quick Path Interface Fatal Error QPI Fatal Error and Fatal Error 2 QPI Fatal Error and Fatal Error 2 Next Steps 11h Sparing Redundancy State Sparing Redundancy State Sparing Redundancy State Sensor Next Steps 13h Memory Parity Error Memory Address Parity Error Memory Address Parity Error Sensor Next Steps PCI Express Fatal Error 2 x PCI Express Fatal Error and Fatal Error 2 Sensor Next f f 14h continuation of Sensor 04h PCI Express Fatal Errors and Fatal Error 2 Steps Intel Quick Path Interface Fatal Error 17h 2 QPI Fatal Error and Fatal Error 2 QPI Fatal Error and Fatal Error 2 Next Steps continuation of Sensor 07h 83h System Event System Events Not applicable 3 4 Node Manager ME Firmware owned Sensors GID 002Ch or 602Ch The following table can be used to find the details of sensors owned by the Node Manager Management Engine ME firmware T
152. sors IO Module Temperature Threshold based Temperature 16h p threshold based Temperature Table 37 Temperature Sensors Next Steps I O Mod Temp Sensors PCI Ri T 1nresnola basea emperature 17h CAIRE Temperature Threshold based Temperature Table 37 Temperature Sensors Next Steps PCI Riser 3 Temp Sensors 14 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Cross Reference List Sensor Sensor Name Details Section Next Steps Number PCI Riser 4 Temperature 4 p 18h PCI Riser 4 sets Tech based Temperature Table 37 Temperature Sensors Next Steps Baseboard 1 05V Processor3 Jnresnoid dased Vollage 19h Vccp e based Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 1 05Vccp P3 Baseboard 1 05V Processor4 Jnresnoid based Vollage 1Ah Vccp Tee based Voltage Table 13 Threshold based Voltage Sensors Next Steps BB 1 05Vccp P4 gt Baseboard Temperature 1 B 20h Platform See e SE Temperature Table 37 Temperature Sensors Next Steps Front Panel Temperature B 21h Front Panel Ce NM EE Table 37 Temperature Sensors Next Steps SSB Temperature Threshold based Temperature 22h SSB Temp Sensors Table 37 Temperature Sensors Next Steps Baseboard Temperature 2
153. t Steps Number 23h Baseboard Temperature 2 24h Baseboard Temperature 3 25h Baseboard Temperature 4 26h UO Mod Temp 27h PCI Riser 1 Temp 28h IO Riser Temp 2Ch PCI Riser 2 Temp 2Dh SAS Mod Temp 2Eh Exit Air Temp 2Fh LAN NIC Temp 5 2 2 Thermal Margin Sensors Margin sensors are also linear sensors but typically report a negative value This is not an actual temperature but in fact an offset to a critical temperature Values reported are seen as number of degrees below a critical temperature for the particular component The BMC supports DIMM aggregate temperature margin IPMI sensors The temperature readings from the physical temperature sensors on each DIMM such as Temperature Sensor on DIMM or TSOD are aggregated into IPMI temperature margin sensors for groupings of DIMM slots the partitioning of which is platform SKU specific and generally corresponding to fan domains The BMC supports global aggregate temperature margin IPMI sensors There may be as many unique global aggregate sensors as there are fan domains Each sensor aggregates the readings of multiple other IPMI temperature sensors supported by the BMC FW The mapping of child sensors into each global aggregate sensor is SDR configurable The primary usage for these sensors is to trigger turning off fans when a lower threshold is reached Table 38 Thermal Margin Sensors Typical Characteristics Byte Field Description 11 Sensor Typ
154. t direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 09h digital Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset Oh RAS Configuration Disabled 1h RAS Configuration Enabled 15 Event Data 2 Prior RAS Mode 7 4 Reserved 3 0 RAS Mode Oh None Independent Channel Mode 1h Mirroring Mode 2h Lockstep Mode 4h Rank Sparing Mode 72 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Memory Subsystem Byte Field Description 16 Event Data 3 Selected RAS Mode 7 4 Reserved 3 0 RAS Mode Oh None Independent Channel Mode 1h Mirroring Mode 2h Lockstep Mode 4h Rank Sparing Mode 7 3 Mirroring Redundancy State Mirroring Mode protects memory data by full redundancy keeping complete copies of all data on both channels of a Mirroring Domain channel pair If an Uncorrectable Error which is normally fatal occurs on one channel of a pair and the other channel is still intact and operational then the Uncorrectable Error is demoted to a Correctable Error and the failed channel is disabled Because the Mirror Domain is no longer redundant a Mirroring Redundancy State SEL Event is logged Table 61 Mirroring Re
155. the system 04h A C Lost A C power was removed Informational Event Revision 1 1 Intel order number G90620 002 35 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Specific Offset Description Next Steps Hex Description 05h Soft Power Control Generally means power good was lost in This could be cause by the power supply subsystem or system Failure the system causing a shutdown components 1 Verify all power cables and adapters are connected properly AC cables as well as the cables between the PSU and system components 2 Cross test the PSU if possible 3 Replace the power subsystem 06h Power Unit Failure Power subsystem experienced a failure Indicates a power supply failed 1 Remove and reapply AC power 2 Ifthe power supply still fails replace it 4 3 2 Power Unit Redundancy Sensor This sensor is enabled on the systems that support redundant power supplies When a system has AC applied or if it loses redundancy of the power supplies a message will get logged into the SEL Table 17 Power Unit Redundancy Sensors Typical Characteristics Byte Field Description 11 Sensor Type 09h Power Unit 12 Sensor Number 02h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event
156. ther standby voltages Sh BB 5 0V STBY 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the board 3 Ifthe issue remains replace the power supplies Revision 1 1 Intel order number GS0620 002 29 Power Subsystems System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Sensor Sensor Name Next Steps Number 3 3V AUX is supplied by the main board 3 3V AUX is used by the BMC clock chips PCI E Slot on board NIC Intel C600 series Chipset and Baseboard 3 3V Auxiliary ICH D4h BB 3 3V AUX 1 Ensure all cables are connected correctly 2 If the issue remains replace the board 3 Ifthe issue remains replace the power supplies This 1 05V line is supplied by the main board This 1 05V line is used by processor 1 D6h Baseboard 1 05V Processor1 Vccp 1 Ensure all cables are connected correctly BB 1 05Vccp P1 2 Check the processor is seated properly 3 Cross test the processors If the issue remains with the processor socket replace the main board otherwise the processor This 1 05V line is supplied by the main board This 1 05V line is used by processor 2 D7h Baseboard 1 05V Processor2 Vccp 1 Ensure all cables are connected correctly BB 1 05Vccp P2 2 Check the processor is seated properly 3 Cross test the processors If the issue remains with the processor soc
157. tion Hex Description Severity Severity 00h EC Degraded OK The temperature has dropped below its lower non critical threshold 02h sree non fatal Degraded The temperature has dropped below its lower critical threshold 07h GE Degraded OK The temperature has gone over its upper non critical threshold 09h SC non fatal Degraded The temperature has gone over its upper critical threshold Table 37 Temperature Sensors Next Steps Sensor Sensor Name Next Steps Number If the front panel temperature reads zero check 1 It is connected properly 21h Front Panel Temp 2 The SDR has been programmed correctly for your chassis If the front panel temperature is too high 1 Check the cooling of your server room 14h Baseboard Temperature 5 1 Check for clear and unobstructed airflow into and out of the chassis 15h Baseboard Temperature 6 2 Ensure the SDR is programmed and correct chassis has been selected 16h VO Mod2 T 3 Ensure there are no fan failures emp 4 Ensure the air used to cool the system is within the thermal specifications for the system typically below 17h PCI Riser 5 Temp 35 C 18h PCI Riser 4 Temp 20h Baseboard Temperature 1 22h SSB Temperature 50 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Cooling Subsystem Sensor Sensor Name Nex
158. tored in this record as F2h 1Bh 00h for bytes 8 through 10 respectively 8 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Revision 1 1 Basic Decoding of a SEL Record Byte Field Description 11 12 13 14 15 16 OEM Defined OEM Defined This is defined according to the manufacturer identified by the Manufacturer ID field Table 4 OEM SEL Record Type EOh FFh Byte Field Description Record ID RID ID used for SEL Record access Record Type RT 7 0 Record Type EOh FFh OEM system event record ON Oar o 11 12 13 14 15 16 OEM OEM Defined This is defined by the system integrator Intel order number G90620 002 Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families 2 2 Notes on SEL Logs and Collecting SEL Information Whenever you capture the SEL log you should always collect both the text human readable version and the hex version Because some of the data is OEM specific some utilities cannot decode the information correctly In addition with some OEM specific data there may be additional variables that are not decoded at all An example of not decoding all of the informatio
159. tup or due to unavailability applying the mirroring configuration to the memory Check for other errors of memory at post in which case post related to the memory and troubleshoot accordingly error 8500 is also logged 2 If there is no post error mirror mode was simply disabled in BIOS setup and this should be considered informational only Revision 1 1 Intel order number G90620 002 71 Memory Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families 7 2 Memory RAS Mode Select Memory RAS Mode Select events are logged to record changes in RAS Mode When a RAS Mode selection is made that changes the RAS Mode including selecting a RAS Mode from or to Independent Channel Mode that change is logged to SEL in a Memory RAS Mode Select event message which records the previous RAS Mode from and the newly selected RAS Mode to The event also includes an Offset value in ED1 which indicates whether the mode change left the system with a RAS Mode active Enabled or not Disabled Independent Channel Mode selected This sensor provides the Spare Channel mode RAS Configuration status Memory RAS Mode Select is an informational event Table 60 Memory RAS Mode Select Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 11 Sensor Type Och Memory 12 Sensor Number 12h 13 Event Direction and 7 Even
160. uct Families 11 4 System Event PEF Action The BMC is configurable to send alerts for events logged into the SEL These alerts are called Platform Event Filters PEF and are disabled by default The user must configure and enable this feature PEF events are logged if the BMC takes action due to a PEF configuration The BMC event triggering the PEF action will also be in the SEL This is functionality built into the BMC to allow it to send alerts SNMP or other for any event that gets logged to the SEL PEF filters are turned off by default and have to be enabled manually using Intel deployment assistant Intel syscfg utility or an IPMl aware utility Table 81 System Event PEF Action Sensor Typical Characteristics Byte Field Description 11 Sensor Type 12h System Event 12 Sensor Number 08h 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 4h PEF Action 15 Event Data 2 Not used 16 Event Data 3 Not used 11 4 1 System Event PEF Action Next Steps This event gets logged if the BMC takes an action due to PEF configuration Actions can be sending an alert along with possibly resetting power cycling or powering down the system There will be another event that has led t
161. uct Families is the interconnect between processors The QPI Link Width Reduced sensor is used by the BIOS POST to report when the link width has been reduced Therefore the Generator ID will be 01h The QPI Error sensors are reported by the BIOS SMI Handler to the BMC so the Generator ID will be 33h 6 4 1 QPI Link Width Reduced Sensor BIOS POST has reduced the QPI Link Width because of an error condition seen during initialization Table 52 QPI Link Width Reduced Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 09h 13 Event Direction and 7 Event direction Event Type 0b Assertion Event 1b Deassertion Event 6 0 Event Type 77h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Revision 1 1 Intel order number G90620 002 63 Processor Subsystem System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1600 1 400 Product Families Byte Field Description 1h Reduced to 1 2 width 2h Reduced to 1 4 width 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used 6 4 1 1 QPI Link Width Reduced Sensor Next Steps If the error continues 1 Check the processor is installed correctly 2 Inspect the socket for bent pins 3 Cr
162. upply Predictive Failure Event 12 3 Sensor Cross Reference LiSt s seccccssseeeeeesseeeeeeeeeeeeeessseeneeeeseeeeeeeesseeeeeenseeeeeeeeeseeneees 13 3 1 BMG owned Sensors GID 00201 deet Greg ea aie 13 3 2 BIOS POST owned Sensors GID O0071bh ENEE 24 3 3 BIOS SMI Handler owned Sensors GID 00323 24 3 4 Node Manager ME Firmware owned Sensors GID 002Ch or 602Ch 25 3 5 Microsoft OS owned Events GID 0041 26 3 6 Linus Kernel Panic Events GID 00211 26 4 Power Subsystem S iiine ae nn eas ue araneosa eae ee 27 4 1 Threshold based Voltage Sensors ere tank Acehnese 27 4 2 Voltage Regulator Watchdog Timer Sensor cceceeeeeeeeeeeeeeeeeeeeeeeneeeeetenaeeeeenes 33 4 2 1 Voltage Regulator Watchdog Timer Sensor Next Gtenps 34 4 3 Eemere ageet eeng ege 34 4 3 1 Power Unit Status SOMSOf cic s sctce ies cie ce eiapessenspieaseneasladashs dh aeedaneetnecsssbocapeestinae ss 34 4 3 2 Power Unit Redundancy Gensor EEN 36 4 3 3 Node Auto Shutdown Sensor veau eEh E eege 37 4 4 ae 38 4 4 1 Power Supply Status Sensors EEN 38 4 4 2 Power Supply Power In Sensors kA 41 4 4 3 Power Supply Current Out Sensors eieiei ed a Rn eideaeawe 42 4 4 4 Power Supply Temperature Sensors EEN 43 4 4 5 Power Supply Fan Tachometer Sensors AEN 44 e GOING SUBSY STON E 45 5 1 Fan SGMSOLS EE 45 5 1 1 Fan Tachometer Sensors 2c cti scce cad ivancncuae cet cagaeeseiies Scar exnatsctee cadsajanemeetcsetacspeceens 45 5 1 2
163. vent Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 3h OS Graceful Shutdown 15 Event Data 2 Not used 16 Event Data 3 Not used Table 101 Shutdown Reason OEM Event Record Typical Characteristics Byte Field Description 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0 DDh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when the event was logged LS byte first 5 126 Intel order number G90620 002 Revision 1 1 System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel Xeon Processor E5 4600 2600 2400 1 600 1 400 Product Families Microsoft Windows Records Byte Field Description 6 7 8 IPMI Manufacturer 0137h 311d IANA enterprise number for Microsoft 9 ID 10 11 Record ID Sequential number reflecting the order in which the records are read The numbers start at 1 for the first entry in the SEL and continue sequentially to n the number of entries in the SEL 12 Shutdown Reason Shutdown Reason code from the registry LSB first 13 HKLM Software Microsoft Windows CurrentVersion Reliability shutdown ReasonCode 14 15 16 Reserved 00h Table 102 Shutdown Comment OEM Event Record Typical Characteristics Byte Field Description 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0

Download Pdf Manuals

image

Related Search

Related Contents

Anlisis Comparativo de conexiones usuales en estructuras de acero  Notice - Castorama  SP601 User Guide V1.cdr  User Manual - EasyLog.com  ADMINISTRATOR USER MANUAL  

Copyright © All rights reserved.
Failed to retrieve file