Home
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide
Contents
1. x c From the Event Logging Details screen select View Event Log All unread events are displayed 4 View the BMC system event log a From the BIOS Main Menu screen select Advanced The Advanced Settings screen is displayed b From the Advanced Settings screen select IPMI 2 0 Configuration The Advanced Menu IPMI 2 0 Configuration screen is displayed Appendix A Event Logs and POST Codes 23 Advanced KKK KEK KKK KEK KEK KEKE KEKE KKK KKK KKK KEKE KKK KEKE KKK KEK KKK KKK KEK KEK KKK KEKE KR KKK KKK KKK KKK KEK KKK KKKEEEK IPMI 2 0 Configuration View all events in the x kxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk OK BMC Event Log Status Of BMC Working View BMC System Event Log Tt will take up to Reload BMC System Event Log 60 Seconds approx bi Clear BMC System Event Log to read all LAN Configuration BMC SEL records PEF Configuration 4 a BMC Watch Dog Timer Action Disabled is Select Screen 7 F LR Select Item E x Enter Go to Sub Screen 7 pl General Help 7 FLO Save and Exit ESC Exit KKEKEKKKKKKKKEKKEKKKK KKK KEK KK KKKE KKK KKK KR KKK KKK KEKE KKK KEK KK KKK KEK AAA RARA ARA ARA RAR AAA AR v02 61 C Copyright 1985 2006 American Megatrends Inc c From the IPMI 2 0 Configuration screen select View BMC System Event Log The log takes about 60
2. How DIMM Errors Are Handled by the System on page 12 Isolating and Correcting DIMM ECC Errors on page 18 DIMM Population Rules The DIMM population rules for the server are as follows Each CPU can support a maximum of eight DIMMs The DIMM slots are paired and the DIMMs must be installed in pairs 0 1 2 3 4 5 and 6 7 See FIGURE 3 1 and FIGURE 3 2 The memory sockets are colored black or white to indicate which slots are paired by matching colors DIMMs are populated starting from the outside away from the CPU and working toward the inside CPUs with only a single pair of DIMMs must have those DIMMs installed in that CPU s outside white DIMM slots 6 and 7 See FIGURE 3 1 and FIGURE 3 2 Only DDR2 800 Mhz 667Mhz and 533Mhz DIMMs are supported Each pair of DIMMs must be identical same manufacturer size and speed DIMM Replacement Policy Replace a DIMM when one of the following events takes place a The DIMM fails memory testing under BIOS due to Uncorrectable Memory Errors UCEs a UCEs occur and investigation shows that the errors originated from memory In addition a DIMM should be replaced whenever more than 24 Correctable Errors CEs originate in 24 hours from a single DIMM and no other DIMM is showing further CEs m If more than one DIMM has experienced multiple CEs other possible causes of CEs have to be ruled out by a qualified Sun Support specialist before replacing any DIMMs R
3. This section lists facts and considerations about how the server handles system errors SERR m System error handling works through the HyperTransport Synch Flood Error mechanism on 8111 and 8131 m The following events happen during BIOS POST a POST reports any previous system errors at the bottom of screen See FIGURE D 4 for an example FIGURE D 4 POST Screen Previous System Error Listed i SS American Qy SUN microsystems www ami com BMC Firmware Revision 1 00 hecking NURAM Initializing USB Controllers Done Press F2 to run Setup CTRL E on Remote Keyboard Press F12 to boot from the network CTRL N on Remote Keyboard ISB Device s 3 Keyboards 3 Mice 2 Storage Devices luto Detecting Pri Master ATAPI CDROM Pri Master DU 28SL 1 0A DE 8 Ultra DMA Mode 2 Auto detecting USB Mass Storage Devices Device 01 AMI Virtual CDROM Device 02 AMI Virtual Floppy 2 USB mass storage devices found and configured 0085 BMC Respond ing 1 Hyper Transport sync flood error occurred on last boot PCI System Error SERR and Hypertransport Synch Flood Error are logged in DMI and the SP SEL See the following sample output SEL Record ID 0a00 Record Type 00 Timestamp 08 10 2005 06 05 32 Generator ID 0001 Appendix D Error Handling 61 62 EvM Revision 04 Sensor Type Critical Interrupt Sensor Number 00 Event Type Sensor specific Discrete Event Direction Assertion Event
4. m The system enters Halt mode and the following message is displayed XXXXXXX X Warning Bad Mix of Processors x x x Multiple core processors cannot be installed with single core processors Fatal Error System Halted Appendix D Error Handling 63 Hardware Error Handling Summary TABLE D 1 summarizes the most common hardware errors that you might encounter with these servers TABLE D 1 Hardware Error Handling Summary Logged DMI Log or SP Error Description Handling SEL Fatal SP failure The SP fails to boot The SP controls the system reset so the Not logged Fatal upon application of system may power on but will not come out system power of reset During power up the SP s boot loader turns on the power LED During SP boot Linux startup and SP sanity check the power LED blinks The LED is turned off when SP management code the IPMI stack is started e At exit of BIOS POST the LED goes to STEADY ON state SP failure SP boots but fails The SP controls the system RESET so the Not logged Fatal POST system will not come out of reset BIOS POST Server BIOS does There are fatal and non fatal errors in POST failure not pass POST The BIOS does detect some errors that are announced during POST as POST codes on the bottom right corner of the display on the serial console and on the video display Some POST codes are forwarded to the SP for logging The POST codes do not come out in sequen
5. Event Data OSFFFE Description PCI SERR m FIGURE D 5 shows an example DMI log screen from the BIOS Setup Page with a system error FIGURE D 5 DMI Log Screen with Error BIOS SETUP UTILITY y duanced View Event len U 09 12 05 14 23 47 A Hyper Transpor J flood error occurred or 2 53 C Copyright 1985 erican Megatrends Inc Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Handling Mismatching Processors This section lists facts and considerations about how the server handles mismatching processors m The BIOS performs a complete POST m The BIOS displays a report of any mismatching CPUs as shown in the following example AMIBIOS C 2003 American Megatrends Inc BIOS Date 08 10 05 14 51 11 Ver 08 00 10 CPU AMD Opteron tm Processor 254 Speed 2 4 GHz Count 3 CPU Revision CPUO E4 CPUL E6 Microcode Revision CPUO 0 CPUL 0 DRAM Clocking CPUO 400 MHz CPU1 Core0 1 400 MHz Sun Fire Server 1 AMD North Bridge Rev E 1 AMD North Bridge Rev E6 1 AMD 8111 I O Hub Rev C2 2 AMD 8131 PCI X Controllers Rev B2 System Serial Number 0505AMF028 BMC Firmware Revision 1 00 Checking NVRAM Initializing USB Controllers Done Press F2 to run Setup CTRL E on Remote Keyboard Press F12 to boot from the network CTRL N on Remote Keyboard Press F8 for BBS POPUP CTRL P on Remote Keyboard m No SEL or DMI event is recorded
6. 2008 Preface The Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide contains information and procedures for using available tools to diagnose problems with the servers Before You Read This Document It is important that you review the safety guidelines in the Sun Fire X4140 X4240 and X4440 Safety and Compliance Guide vii viii Related Documentation The document set for the Sun Fire X4140 X4240 and X4440 Servers is described in the Where To Find Sun Fire X4140 X4240 and X4440 Servers Documentation sheet that is packed with your system You can also find the documentation at http do6s sun com Translated versions of some of these documents are available at http docs sun com Select a language from the drop down list and navigate to the Sun Fire X4140 X4240 and X4440 Servers document collection using the Product category link Available translations for the Sun Fire X4140 X4240 and X4440 Servers include Simplified Chinese Traditional Chinese French Japanese and Korean English documentation is revised more frequently and might be more up to date than the translated documentation For all Sun documentation go to the following URL http docs sun com Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Typographic ConventionsThird Party Typeface Meaning Examples AaBbCc123 The names of commands files and directories onscreen computer output
7. AC power cords are attached firmly to the server s power supplies and to the AC sources Check that the main cover is firmly in place There is an intrusion switch on the motherboard that automatically shuts down the server power to standby mode when the cover is removed Externally Inspecting the Server To perform a visual inspection of the external system 1 Inspect the external status indicator LEDs which can indicate component malfunction For the LED locations and descriptions of their behavior see External Status Indicator LEDs on page 37 Verify that nothing in the server environment is blocking air flow or making a contact that could short out power If the problem is not evident continue with the next section Internally Inspecting the Server on page 4 Chapter 1 Initial Inspection of the Server 3 gt amp Internally Inspecting the Server To perform a visual inspection of the internal system 1 Choose a method for shutting down the server from main power mode to standby power mode See FIGURE 1 1 and FIGURE 1 2 a Graceful shutdown Use a ballpoint pen or other stylus to press and release the Power button on the front panel This causes Advanced Configuration and Power Interface ACPI enabled operating systems to perform an orderly shutdown of the operating system Servers not running ACPI enabled operating systems will shut down to standby power mode immediately a Emergen
8. Figure Legend 1 Locator LED Locator button White 4 Rear PS LED Amber Power supply fault 2 Service Required LED Amber 5 System Over Temperature LED Amber 3 Power OK LED Green 6 Top Fan LED Amber Service action required on fan s Back Panel LEDs FIGURE B 2 Back Panel LEDs X4140 shown Figure Legend 1 Power Supply LEDs 3 Service Required LED Power Supply OK Green 4 Power OK LED Power Supply Fail Amber 5 Ethernet Port LEDs AC OK Green Left side Green indicates link activity 2 Locator LED Button Right side Green indicates link activity Amber indicates link is operating at less than maximum speed 38 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Hard Drive LEDs FIGURE B 3 Hard Drive LEDs Figure Legend 1 Ready to remove LED Blue Service action is allowed 2 Fault LED Amber Service action is required 3 Status LED Green Blinks when data is being transferred Internal Status Indicator LEDs The server has internal status indicators on the motherboard and on the mezzanine board For motherboard locations see FIGURE B 4 For mezzanine board locations see FIGURE B 5 m The DIMM Fault LEDs indicate a problem with the corresponding DIMM They are located next to the DIMM ejector handles When you press the Press to See Fault button if there is a problem with a DIMM the corresponding DIMM Fault LED flashes See DIMM
9. LED and System signal upon Overheat Fault LED blink detecting an overtemp condition Boot device The BIOS is not able The BIOS goes to the next boot device in the DMI Log Non fatal failure to boot from a device in the boot device list list If all devices in the list fail an error message is displayed retry from beginning of list SP can control change boot order Appendix D Error Handling 67 68 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Index B BIOS changing POST options 28 event logs 21 POST code checkpoints 33 POST codes 31 POST overview 25 redirecting console output for POST 26 Bootable Diagnostics CD 8 C comments and suggestions x component inventory viewing with ILOM SP GUI 48 console output redirecting 26 correctable errors handling 56 D diagnostic software Bootable Diagnostics CD 8 SunVTS 7 DIMMs error handling 12 fault LEDs 15 isolating errors 18 population rules 11 E emergency shutdown 4 error handling correctable 56 DIMMs 12 hardware errors 64 mismatching processors 63 parity errors 59 system errors 61 uncorrectable errors 53 event logs BIOS 21 external inspection 3 external LEDs 37 F faults DIMM 15 FRU inventory viewing with ILOM SP GUI 48 G gathering service visit information 2 general troubleshooting guidelines 2 graceful shutdown 4 guidelines for troubleshooting 2 H hardwa
10. gt amp o SUN microsystems Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide Sun Microsystems Inc www sun com Part No 820 3067 11 August 2008 Revision A Submit comments about this document at http www sun com hwdocs feedback Copyright 2008 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 U S A All rights reserved Unpublished rights reserved under the Copyright Laws of the United States THIS PRODUCT CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF SUN MICROSYSTEMS INC USE DISCLOSURE OR REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SUN MICROSYSTEMS INC This distribution may include materials developed by third parties Sun Sun Microsystems the Sun logo Java Solaris Sun Fire 4140 Sun Fire 4240 and Sun Fire 4440 are trademarks or registered trademarks of Sun Microsystems Inc in the U S and other countries AMD Opteron and Opteron are trademarks of Advanced Micro Devices Inc Intel is a registered trademark of Intel Corporation This product is covered and controlled by U S Export Control laws and may be subject to the export or import laws in other countries Nuclear missile chemical biological weapons or nuclear maritime end uses or end users whether direct or indirect are strictly prohibited Export or reexport to countries subject to U S embargo or to entities identified on U S export exclusion lists including but not
11. in BMC XXX XXX XXX XXX a If you choose Static to assign the IP address manually perform the following steps i Type the IP address in the IP Address field You can also enter the subnet mask and default gateway settings in their respective fields ii Select Commit and press Return to commit the changes iii Select Refresh and press Return to see your new settings displayed in the Current IP address in BMC field Start a web browser and type the service processor s IP address in the browser s URL field When you are prompted for a user name and password type the following m User Name root Password changeme The Sun Integrated Lights Out Manager main GUI screen is displayed Click the Remote Control tab Click the Redirection tab Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 10 11 12 Set the color depth for the redirection console at either 6 or 8 bits Click the Start Redirection button When you are prompted for a user name and password type the following m User Name root Password changeme The current POST screen is displayed Appendix A Event Logs and POST Codes 27 Changing POST Options These instructions are optional but you can use them to change the operations that the server performs during POST testing To change POST options 1 Initialize the BIOS Setup utility by pressing the F2 key while the system is performing the power on
12. limited to the denied persons and specially designated nationals lists is strictly prohibited Use of any spare or replacement CPUs is limited to repair or one for one es of CPUs in products exported in compliance with U S export laws Use of CPUs as product upgrades unless authorized by the U S Government is strictly prohibited Copyright 2008 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 Etats Unis Tous droits r serv s Non publie droits r serv s selon la l gislation des Etats Unis sur le droit d auteur CE PRODUIT CONTIENT DES INFORMATIONS CONFIDENTIELLES ET DES SECRETS COMMERCIAUX DE SUN MICROSYSTEMS INC SON UTILISATION SA DIVULGATION ET SA REPRODUCTION SONT INTERDITES SANS L AUTORISATION EXPRESSE ECRITE ET PREALABLE DE SUN MICROSYSTEMS INC Cette distribution peut inclure des l ments d velopp s par des tiers Sun Sun Microsystems le logo Sun Java Solaris et Sun Fire 4140 Sun Fire 4240 and Sun Fire 4440 sont des marques de fabrique ou des marques d pos es de Sun Microsystems Inc aux Etats Unis et dans d autres pays AMD Opteron et Opteron sont marques d pos es de Advanced Micro Devices Inc Intel est une marque d pos e de Intel Corporation Ce produit est soumis la l gislation am ricaine sur le contr le des exportations et peut tre soumis la r glementation en vigueur dans d autres pays dans le domaine des exportations et importations Les utilisations finales
13. or front panel for 5 seconds to initiate a push to test mode that illuminates all other LEDs both inside and outside of the chassis for 15 seconds 4 Verify that there are no loose or improperly seated components 5 Verify that all cable connectors inside the system are firmly and correctly attached to their appropriate connectors 6 Verify that any after factory components are qualified and supported For a list of supported PCI cards and DIMMs refer to your server s service manual 7 Check that the installed DIMMs comply with the supported DIMM population rules and configurations as described in DIMM Population Rules on page 11 8 Replace the server cover 9 To restore the server to main power mode all components powered on use a ballpoint pen or other stylus to press and release the Power button on the server front panel See FIGURE 1 1 and FIGURE 1 2 When main power is applied to the full server the Power OK LED next to the Power button lights and remains lit Chapter 1 Initial Inspection of the Server 5 10 If the problem with the server is not evident you can obtain additional information by viewing the power on self test POST messages and BIOS event logs during system startup Continue with Viewing Event Logs on page 21 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 CHAPTER 2 Using SunVTS Diagnostic Software This chapter contains information about t
14. pattern 55aa55aa Note Enabling Quick Boot causes the BIOS to skip the memory test See Changing POST Options on page 28 for more information Note Because the server can contain up to 64 MB of memory 128 MB for the X4440 the memory test can take several minutes You can cancel POST testing by pressing any key during POST 3 The BIOS polls the memory controllers for both correctable and uncorrectable memory errors and logs those errors into the service processor Appendix A Event Logs and POST Codes 25 26 Redirecting Console Output Use the following instructions to access the service processor and redirect the console output so that the BIOS POST codes can be read 1 Initialize the BIOS Setup utility by pressing the F2 key while the system is performing the power on self test POST The BIOS Main menu screen is displayed Select the Advanced menu tab The Advanced Settings screen is displayed Select IPMI 2 0 Configuration The IPMI 2 0 Configuration screen is displayed Select the LAN Configuration menu item The LAN Configuration screen displays the service processor s IP address To configure the service processor s IP address optional a Select the IP Assignment option that you want to use DHCP or Static a If you choose DHCP the server s IP address is retrieved from your network s DHCP server and displayed using the following format Current IP address
15. seconds to generate then it is displayed on the screen 5 If the problem with the server is not evident continue with Using the ILOM Service Processor GUI to View System Information on page 43 or Viewing ILOM SP Event Logs on page 45 24 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Power On Self Test POST The system BIOS provides a rudimentary power on self test The basic devices required for the server to operate are checked memory is tested the LSI 1064 disk controller and attached disks are probed and enumerated and the two Intel dual Gigabit Ethernet controllers are initialized The progress of the self test is indicated by a series of POST codes These codes are displayed at the bottom right corner of the system s VGA screen once the self test has progressed far enough to initialize the system video However the codes are displayed as the self test runs and scroll off of the screen too quickly to be read An alternate method of displaying the POST codes is to redirect the output of the console to a serial port see Redirecting Console Output on page 26 How BIOS POST Memory Testing Works The BIOS POST memory testing is performed as follows 1 The first megabyte of DRAM is tested by the BIOS before the BIOS code is shadowed that is copied from ROM to DRAM 2 Once executing out of DRAM the BIOS performs a simple memory test a write read of every location with the
16. self test POST The BIOS Main menu screen is displayed 2 Select Boot The Boot Settings screen is displayed Main Advanced PCIPnP Boot Security Chipset Exit KKKKXK KE EEK EEK EEE KE KE KE KE KE KE KE KE KE KE KE KE KE KE KE KE KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK Boot Settings Configure Settings k KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK OK during System Boot Boot Settings Configuration Boot Device Priority E Hard Disk Drives de ss CD DVD Drives je E Select Screen E R EE Select Item a E Enter Go to Sub Screen E El General Help ig F10 Save and Exit x ESC Exit x KKEKEKKKKKKKKEKK KKK KKK KKK KKK KKK KKK KKK KEK RA RARA RAR RARA RAR AAA ARA RA RARA RARA AAA v02 61 C Copyright 1985 2006 American Megatrends Inc 28 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 3 Select Boot Settings Configuration The Boot Settings Configuration screen is displayed Boot Kk Kk Kk kk kkkkxkxkxkxkxkxkxkxkxkxkxkkkkxkkxkkkkxkkxkkkkkkxkxkkkkkxkxkkkkkxkxkxkkkkkkxkxkkkkkxkxkxkkkkxkxkxkkkkxkkkkxkx k Boot Settings Configuration Allows BIOS to skip KKK KKK KEKE KKK KKK KEKE KKK KKK KEKE K KK KKK KEKE KKK KEK KKK KK KEK KKK KEKKK OK certain tests while Quick Boot Disabled booting This will Quiet Boot Disabled decrease the time AddOn ROM Display Mode Force
17. system timer interrupt Traps INT1Ch vector to POSTINT1ChHandlerBlock CO Early CPU Init Start Disable Cache Init Local APIC Cl Set up boot strap processor information C2 Set up boot strap processor for POST This includes frequency calculation loading BSP microcode and applying user requested value for GART Error Reporting setup question C3 Errata workarounds applied to the BSP 78 amp 110 C5 Enumerate and set up application processors This includes microcode loading and workarounds for errata 78 110 106 107 69 63 C6 Re enable cache for boot strap processor and apply workarounds in the BSP for errata 106 107 69 and 63 if appropriate In case of mixed CPU steppings errors are sought and logged and an appropriate frequency for all CPUs is found and applied NOTE APs are left in the CLI HLT state C7 The HT sets link frequencies and widths to their final values This routine gets called after CPU frequency has been calculated to prevent bad programming OA Initializes the 8042 compatible Keyboard Controller 0B Detects the presence of PS 2 mouse 0c Detects the presence of Keyboard in KBC port Appendix A Event Logs and POST Codes 33 TABLE A 2 POST Code Checkpoints Continued Post Code Description OE Testing and initialization of different Input Devices Also update the Kernel Variables Traps the INTO9h vector so that the POST INT09h handler gets control for IRQ1 Uncompress all a
18. the default password changeme Once you have successfully logged in to the SP it displays its default command prompt gt 4 To start the serial console type the following commands cd SP console start To exit console mode and return to the service processor type escape shift 9 m Continue with the following procedures a Viewing ILOM SP Event Logs on page 45 Viewing Replaceable Component Information on page 48 a Viewing Sensors on page 50 44 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Viewing ILOM SP Event Logs Events are notifications that occur in response to some actions The IPMI system event log SEL provides status information about the server s hardware and software to the ILOM software which displays the events in the ILOM web GUI To view event logs 1 Log in to the SP as Administrator or Operator to reach the ILOM web GUI a Type the IP address of the server s SP into your web browser The Sun Integrated Lights Out Manager Login screen is displayed b Type your user name and password When you first try to access the ILOM SP you are prompted to type the default user name and password The default user name and password are Default user name root Default password changeme 2 From the System Monitoring tab select Event Logs The System Event Logs page is displayed See FIGURE C 1 for a page that shows sample information Appen
19. the network SunVTS software also provides a TTY mode interface for situations in which running a GUI is not possible SunVTS Documentation For the most up to date information on SunVTS software go to http docs sun com app docs prod test validate Diagnosing Server Problems With the Bootable Diagnostics CD SunVTS 6 4 or later software is preinstalled on your server The server is also shipped with the Bootable Diagnostics CD This CD is designed so that the server will boot from the CD This CD boots and starts SunVTS software Diagnostic tests run and write output to log files that the service technician can use to determine the problem with the server Requirements a To use the diagnostics CD you must have a keyboard mouse and monitor attached to the server on which you are performing diagnostics or available through a remote KVM Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Using the Bootable Diagnostics CD To use the diagnostics CD to perform diagnostics 1 2 With the server powered on insert the CD into the DVD ROM drive Reboot the server and press F2 during the start of the reboot so that you can change the BIOS setting for boot device priority When the BIOS Main menu appears navigate to the BIOS Boot menu Instructions for navigating within the BIOS screens appear on the BIOS screens On the BIOS Boot menu screen select Boot Device Priority The Boot Device
20. to See Fault button on the motherboard or the mezzanine board LEDs next to the DIMMs flash to indicate that the system has detected 24 or more CEs in a 24 hour period on that DIMM Chapter 3 Troubleshooting DIMM Problems 15 Note The DIMM Fault and Motherboard Fault LEDs operate on stored power for up to a minute when the system is powered down even after the AC power is disconnected and the motherboard or mezzanine board is out of the system The stored power lasts for about half an hour Note Disconnecting the AC power removes the fault indication To recover fault information look in the SP SEL as described in the Sun Integrated Lights Out Manager 2 0 User s Guide a DIMM fault LED is off The DIMM is operating properly a DIMM fault LED is flashing amber At least one of the DIMMs in this DIMM pair has reported 24 CEs within a 24 hour period m Motherboard Fault LED on mezzanine is on There is a fault on the motherboard This LED is there because you cannot see the motherboard LEDs when the mezzanine board is present Note The Motherboard Fault LED operates independently of the Press to See Fault button and does not operate on stored power See FIGURE 3 1 for the locations of DIMMs and LEDs on the motherboard See FIGURE 3 2 for the locations of DIMMs and LEDs on the mezzanine board 16 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 FIGURE 3 1 DIMMs and LEDs on Mot
21. to see details of the error m Solaris Solaris FMA reports and sometimes retires memory with correctable Error Correction Code ECC errors See your Solaris Operating System documentation for details Use the command fmdump eV 14 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 to view ECC errors m Linux The HERD utility can be used to manage DIMM errors in Linux See the x64 Servers Utilities Reference Manual for details The If HERD is installed it copies messages from dev mcelog to var log messages If HERD is not installed a program called mcelog copies messages from dev mcelog to var log mcelog Bootable Diagnostics CD described in Chapter 2 also captures and logs CEs BIOS DIMM Error Messages The NOD BIOS displays and logs the following DIMM error messages E n Memory Configuration Mismatch The following conditions will cause this error message The DIMMs mode is not paired running in 64 bit mode instead of 128 bit mode The DIMMs speed is not same The DIMMs do not support ECC The DIMMs are not registered The MCT stopped due to errors in the DIMM The DIMM module type buffer is mismatched The DIMM generation I or II is mismatched The DIMM CL T is mismatched The banks on a two sided DIMM are mismatched The DIMM organization is mismatched 128 bit The SPD is missing Trc or Trfc information DIMM Fault LEDs When you press the Press
22. 0000h shadow RAM cacheability Ported to handle any OEM specific programming needed during End POST Copy OEM specific data from POST_DSEG to RUN_CSEG Appendix A Event Logs and POST Codes 35 TABLE A 2 POST Code Checkpoints Continued Post Code Description B1 Save system context for ACPI 00 Prepares CPU for booting to OS by copying all of the context of the BSP to all application processors present NOTE APs are left in the CLI HLT state 61 70 OEM POST Error This range is reserved for chipset vendors and system manufacturers The error associated with this value may be different from one platform to the next 36 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 APPENDIX B Status Indicator LEDs This appendix contains information about the locations and behavior of the LEDs on the server It describes the external LEDs that can be viewed on the outside of the server and the internal LEDs that can be viewed only with the main cover removed External Status Indicator LEDs See the following figures and tables for information about the LEDs that are viewable on the outside of the server m FIGURE B 1 shows and describes the front panel LEDs m FIGURE B 2 shows and describes the back panel LEDs m FIGURE B 3 shows and describes the hard drive LEDs m FIGURE B 4 and FIGURE B 5 show the location of the internal LEDs 37 Front Panel LEDs FIGURE B 1 Front Panel LEDs X4140 shown
23. AaBbCc123 What you type when contrasted with onscreen computer output AaBbCc123 Book titles new words or terms words to be emphasized Replace command line variables with real names or values The settings on your browser might differ from these settings Web Sites Edit your login file Use 1s a to list all files You have mail su Password Read Chapter 6 in the User s Guide These are called class options You must be superuser to do this To delete a file type rm filename Sun is not responsible for the availability of third party web sites mentioned in this document Sun does not endorse and is not responsible or liable for any content advertising products or other materials that are available on or through such sites or resources Sun will not be responsible or liable for any actual or alleged damage or loss caused by or in connection with the use of or reliance on any such content goods or services that are available on or through such sites or resources Preface ix x Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions You can submit your comments by going to http www sun com hwdocs feedback Please include the title and part number of your document with your feedback Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide part number 820 3067 11 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide Augus
24. BIOS needed to boot the E Bootup Num Lock On system E Wait For F1 If Error Disabled Interrupt 19 Capture Enabled ai E ERN Select Screen xx Select Item Poe Change Option ETEL General Help A F10 Save and Exit ESC Exit KKEKEKKKKKKKKEKKKKKKEKKE KK KKK KEK KKK KKK KKK KKK KKK KEK KKK RARA RARA KARA RRA RAR AAA RA RARA AAA v02 61 C Copyright 1985 2006 American Megatrends Inc 4 On the Boot Settings Configuration screen there are several options that you can enable or disable m Quick Boot This option is disabled by default If you enable this the BIOS skips certain tests while booting such as the extensive memory test This decreases the time it takes for the system to boot m Quiet Boot This option is disabled by default If you enable this the Sun Microsystems logo is displayed instead of POST codes a Add On ROM Display Mode This option is set to Force BIOS by default This option has effect only if you have also enabled the Quiet Boot option but it controls whether output from the Option ROM is displayed The two settings for this option are as follows a Force BIOS Remove the Sun logo and display Option ROM output Keep Current Do not remove the Sun logo The Option ROM output is not displayed Appendix A Event Logs and POST Codes 29 Boot Num Lock This option is On by default keyboard Num Lock is turned on during boot If
25. Fault LEDs on page 15 for details a The CPU Fault LEDs indicate a problem with the corresponding CPU When you press the Press to See Fault button if there is a problem with a CPU the corresponding CPU Fault LED flashes Note The DIMM Fault and Motherboard Fault LEDs operate on stored power for up to a minute when the system is powered down even after the AC power is disconnected and the motherboard or mezzanine board is out of the system The stored power lasts for about half an hour m The Motherboard Fault LED on the mezzanine board indicates that there is a problem with the motherboard Appendix B Status Indicator LEDs 39 Note The mezzanine board when present obscures part of the motherboard including the LEDs The Motherboard Fault LED indicates that one or more of the LEDs on the motherboard is active FIGURE B 4 DIMMs and LEDs on Motherboard Fans CPU 0 CPU 1 Failo 8 CP Press to see fault 40 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 FIGURE B 5 DIMMs and LEDs on Mezzanine Board Fans Motherboard Press to Fault see fault Appendix B Status Indicator LEDs 41 42 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 APPENDIX C Using the ILOM Service Processor GUI to View System Information This appendix contains information about using the Integrated Lights Out Manager ILOM Service processor SP GUI to view mon
26. Linux NMI trap catches the interrupt and reports the following NMI confusion report sequence enabled enabled enabled enabled Aug 5 05 on CPU 0 Aug 5 05 on CPU 1 Aug 5 05 Aug 5 05 Aug 5 05 on CPU 1 Aug 5 05 Aug 5 05 Aug 5 05 on CPU 0 Aug 5 05 Aug 5 05 Aug 5 05 Aug 5 05 15 15 15 15 15 15 15 15 15 15 15 15 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel 00 d mpk12 53 159 kernel Uhhuh NMI received for unknown reason 2d Uhhuh NMI received for unknown reason 2d Dazed and confused but trying to continue Do you have a strange power saving mode Uhhuh NMI received for unknown reason 3d Dazed and confused but trying to continue Do you have a strange power saving mode Uhhuh NMI received for unknown reason 3d Dazed and confused but trying to continue Do you have a strange power saving mode Dazed and confused but trying to continue Do you have a strange power saving mode 60 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Note The Linux system reboots but does not inform the BIOS of this incident Handling of System Errors SERR
27. Priority screen appears Select the DVD ROM drive to be the primary boot device Save and exit the BIOS screens Reboot the server When the server reboots from the CD in the DVD ROM drive the Solaris Operating System boots and SunVTS software starts and opens its first GUI window In the SunVTS GUI press Enter or click the Start button when you are prompted to start the tests The test suite will run until it encounters an error or the test is completed Note The CD will take approximately nine minutes to boot 9 When SunVTS software completes the test review the log files generated during the test SunVTS provides access to four different log files m SunVTS test error log contains time stamped SunVTS test error messages The log file path name is var opt SUNWvts logs sunvts err This file is not created until a SunVTS test failure occurs m SunVTS kernel error log contains time stamped SunVTS kernel and SunVTS probe errors SunVTS kernel errors are errors that relate to running SunVTS and not to testing of devices The log file path name is var opt SUNWvts logs vtsk err This file is not created until SunVTS reports a SunVTS kernel error m SunVTS information log contains informative messages that are generated when you start and stop the SunVTS test sessions The log file path name is var opt SUNWvts logs sunvts info This file is not created until a SunVTS test session runs Chapter 2 Us
28. are Error No usable system memory 300 08 26 2005 11 36 12 Memory Memory Device Disabled CPU 0 DIM 0 When the faulty DIMM is beyond the BIOS s low 1MB extraction space proper boot happens ipmitool gt sel list 100 08 26 2005 05 04 04 OEM 0xfb 200 08 26 2005 05 04 09 Memory Memory Device Disabled CPU 0 DIM 0 m Note the following considerations for this revision Uncorrectable ECC Memory Error is not reported Multi bit ECC errors are reported as Memory Device Disabled On first reboot BIOS logs a HyperTransport Error in the DMI log The BIOS disables the DIMM The BIOS sends the SEL records to the BMC The BIOS reboots again The BIOS skips the faulty DIMM on the next POST memory test The BIOS reports available memory excluding the faulty DIMM pair FIGURE D 1 shows an example of a DMI log screen from BIOS Setup Page 54 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 FIGURE D 1 DMI Log Screen Uncorrectable Error BIOS SETUP UTILITY Advanced Event Logging details View all unread events a on the Event Log Mark all events as read Clear Event Log Uieu Event Log 09 12 05 11 51 05 A Hyper Transport sync flood error occurred on last boot Enter Go to Sub Screen Fi General Help F10 Save and Exit ESC Exit v02 53 C Copyright 1985 2002 American Megatrends In Appendix D Error Handling 55 56 Handling of Correctabl
29. ate Appendix A Event Logs and POST Codes 31 TABLE A 1 POST Codes Continued Post Code Description de00 Preparing CPU for booting to OS by copying all of the context of the BSP to all application processors present NOTE APs are left in the CLI HLT state 8613 Initialize PM regs and PM PCI regs at Early POST Initialize multi host bridge if system supports it Setup ECC options before memory clearing Enable PCI X clock lines in the 8131 0024 Uncompress and initialize any platform specific BIOS modules 862a BBS ROM initialization 002a Generic Device Initialization Manager DIM Disable all devices 042a ISA PnP devices Disable all devices 052a PCI devices Disable all devices 122a ISA devices Static device initialization 152a PCI devices Static device initialization 252a PCI devices Output device initialization 202c Initializing different devices Detecting and initializing the video adapter installed in the system that have optional ROMs 002e Initializing all the output devices 0033 Initializing the silent boot module Set the window for displaying text information 0037 Displaying sign on message CPU information setup key message and any OEM specific information 4538 PCI devices IPL device initialization 5538 PCI devices General device initialization 8600 Preparing CPU for booting to OS by copying all of the context of the BSP to all application processors present NOTE APs are left i
30. ck will be in UTC m Via the CLI ILOM web GUL and IPMI Viewing Replaceable Component Information Depending on the component you select information about the manufacturer component name serial number and part number can be displayed To view replaceable component information 1 Log in to the SP as Administrator or Operator to reach the ILOM web GUI a Type the IP address of the server s SP into your web browser The Sun Integrated Lights Out Manager Login screen is displayed b Type your user name and password When you first try to access the ILOM Service Processor you are prompted to type the default user name and password The default user name and password are Default user name root Default password changeme 48 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 2 From the System Information tab select Components The Replaceable Component Information page is displayed See FIGURE C 2 FIGURE C 2 Replaceable Component Information Page REFRESH Loc OUT ABOUT a dministrator roo Integrated Lights Out Manager System Information Versions Session Time Out Components Identification Information Component Management View component information from this page To view further details click on a Component Name Component Management Status Component Name Type sys Host System iSYS MB Motherboard ISYSIMBIPO Host Processo
31. cy shutdown Use a ballpoint pen or other stylus to press and hold the Power button for four seconds to force main power off and enter standby power mode Caution Performing an emergency shutdown can cause open files to become corrupt Use an emergency shutdown only when necessary When main power is off the Power OK LED on the front panel will begin flashing indicating that the server is in standby power mode Caution When you use the Power button to enter standby power mode power is still directed to service processor and power supply fans indicated when the Power OK LED is flashing To completely power off the server you must disconnect the AC power cords from the back panel of the server FIGURE 1 1 X4140 Server Front Panel Locate Button LED 1 er Ea BB 1 PowerButton 4 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 FIGURE 1 2 X4440 Server Front Panel Locate Button LED JE BA Power Button 2 Remove the server cover For instructions on removing the server cover refer to your server s service manual 3 Inspect the internal status indicator LEDs These can indicate component malfunction For the LED locations and descriptions of their behavior see Internal Status Indicator LEDs on page 39 Note The server must be in standby power mode for viewing the internal LEDs You can hold down the Locate button on the server back panel
32. dix C Using the ILOM Service Processor GUI to View System Information 45 FIGURE C 1 System Event Logs Page REFRESH Loc OUT ry Administrator root SP Hostname Sun Integrated Lights Out Manager Sensor Readings Indicators Event Logs Jul Event Log Displays every event in the SP including IPMI Audit and FMA events Click the Clear Log button to delete all current log entries Event Log Event ID Class Severity Date Time Description 162 minor Wed Nov 28 root Open Session object sessionftype value www success 09 39 10 2007 161 minor Wed Nov 28 root Open Session object sessionitype value shell success 09 23 06 2007 160 critical Wed Nov 28 ID 81 pre init timestamp Entity Presence hdd prsnt Device Absent 09 21 01 2007 159 critical Wed Nov 28 ID 80 pre init timestamp Entity Presence hdd2 prsnt Device Absent 09 20 57 2007 Y 4 Il E 3 Select the category of event that you want to view in the log from the drop down list box You can select from the following types of events m Sensor specific events These events relate to a specific sensor for a component for example a fan sensor or a power supply sensor a BIOS generated events These events relate to error messages generated in the BIOS m System management software events These events relate to events that occur within the ILOM software 46 Sun Fire X4140 X4240 and X4440 Server
33. e Errors This section lists facts and considerations about how the server handles correctable errors During BIOS POST The BIOS polls the MCK registers The BIOS logs to DMI The BIOS logs to the SP SEL through the BMC The feature is turned off at OS boot time by default The following Linux versions report correctable ECC syndrome and memory fill errors in var 1log if kernel flag mce is indicated at boot time or if mce is enabled through kernel compile or installation RH3 Updated single core RH4 Updatel a SLES9 SP1 The Linux kernel x86_64 kernel mce c repeats a report every 30 seconds until another error is encountered and an 8131 flag is reset Solaris support provides full self healing and automated diagnosis for the CPU and Memory subsystems FIGURE D 2 shows an example of a DMI log screen from BIOS Setup Page Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 FIGURE D 2 DMI Log Screen Correctable Error BIOS SETUP UTILITY View Event Log 09 12 05 12 33 16 C on Node 1 DIMM Pair 0 SPD addres 5 12 33 16 Bit ECC Memory Error BAGH BAZ If during any stage of memory testing the BIOS finds itself incapable of reading writing to the DIMM it takes the following actions m The BIOS disables the DIMM as indicated by the Memory Decreased message in the example in EXAMPLE D 1 a The BIOS logs an SEL record a The BIOS logs an event in DMI Appendix D Erro
34. etain copies of the logs showing the memory errors per the above rules to send to Sun for verification prior to calling Sun How DIMM Errors Are Handled by the System This section describes system behavior for the two types of DIMM errors UCEs and CEs and also describes BIOS DIMM error messages Uncorrectable DIMM Errors For all operating systems OS s the behavior is the same for UCEs 1 When an UCE occurs the memory controller causes an immediate reboot of the system 2 During reboot the BIOS checks the Machine Check registers and determines that the previous reboot was due to an UCE then reports this in POST after the memtest stage A Hypertransport Sync Flood occurred on last boot 12 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 3 BIOS reports this event in the service processor s system event log SEL as shown in the sample IPMItool output below ipmitool H 10 6 77 249 U root P changeme I lanplus sel list 8 09 25 2007 03 22 03 System Boot Initiated 0x02 Initiated by warm reset Asserted 9 09 25 2007 03 22 03 Processor 0x04 Presence detected Asserted a 09 25 2007 03 22 03 OEM 0x12 Asserted b 09 25 2007 03 22 03 System Event 0x12 Undetermined system hardware failure Asserted c OEM record e0 00000002000000000029000002 d OEM record e0 00000004000000000000b00006 e OEM record e0 00000048000000000011110322 f OEM rec
35. g the cause of a problem with the server is to gather information from the service call paperwork or the onsite personnel Use the following general guideline steps when you begin troubleshooting To gather service information 1 Collect information about the following items m Events that occurred prior to the failure m Whether any hardware or software was modified or installed m Whether the server was recently installed or moved a How long the server exhibited symptoms a The duration or frequency of the problem 2 Document the server settings before you make any changes If possible make one change at a time in order to isolate potential problems In this way you can maintain a controlled environment and reduce the scope of troubleshooting 3 Take note of the results of any change that you make Include any errors or informational messages 4 Check for potential device conflicts before you add a new device 5 Check for version dependencies especially with third party software Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 System Inspection Controls that have been improperly set and cables that are loose or improperly connected are common causes of problems with hardware components Troubleshooting Power Problems If the server will power on skip this section and go to Externally Inspecting the Server on page 3 If the server will not power on check the following Check that
36. handler starts logging each DRAM on the detected error and stops logging when the DIMM interface limit for the same error is reached The BIOS s polling can be disabled through a software interface Uncorrectable The CPU detects an The sync flood method is used to prevent SP SEL Fatal DRAM ECC uncorrectable the erroneous data from being propagated error multiple bit DIMM across the Hypertransport links The system error reboots the BIOS recovers the machine check register information maps this information to the failing DIMM when CHIPKILL is disabled or DIMM pair when CHIPKILL is enabled and logs that information to the SP The BIOS will halt the CPU Unsupported Unsupported The BIOS displays an error message logs an DMI Log Fatal DIMM DIMMs are used or error and halts the system SP SEL configuration supported DIMMs are loaded improperly HyperTranspor CRC or link error Sync floods on HyperTransport links the DMI Log Fatal t link failure on one of the machine resets itself and error information SP SEL Hypertransport Links gets retained through reset The BIOS reports A Hyper Transport sync flood error occurred on last boot press F1 to continue Appendix D Error Handling 65 TABLE D 1 Hardware Error Handling Summary Continued Logged DMI Log or SP Error Description Handling SEL Fatal PCI SERR System or parity Sync floods on HyperTransport links the DMI Log Fatal PERR error on a PCI bus machine resets
37. he SunVTS diagnostic software tool Running SunVTS Diagnostic Tests The servers are shipped with a Bootable Diagnostics CD that contains the Sun Validation Test Suite SunVTS software SunVTS provides a comprehensive diagnostic tool that tests and validates Sun hardware by verifying the connectivity and functionality of most hardware controllers and devices on Sun platforms SunVTS software can be tailored with modifiable test instances and processor affinity features The following tests are supported on x86 platforms CD DVD Test cddvdtest a CPU Test cputest m Cryptographics Test cryptotest m Disk and Diskette Drives Test disktest m Data Translation Look aside Buffer dtlbtest m Emulex HBA Test emlxtest a Floating Point Unit Test fputest a InfiniBand Host Channel Adapter Test ibhcatest m Level 1 Data Cache Test 11dcachetest m Level 2 SRAM Test 12sramtest m Ethernet Loopback Test netlbtest m Network Hardware Test nettest a Physical Memory Test pmemtest a QLogic Host Bus Adapter Test qlctest m RAM Test ramtest m Serial Port Test serialtest m System Test systest a Tape Drive Test tapetest m Universal Serial Board Test usbtest m Virtual Memory Test vmemtest SunVTS software has a sophisticated graphical user interface GUI that provides test configuration and status monitoring The user interface can be run on one system to display the SunVTS testing of another system on
38. herboard Fans CPU 0 Failon e Press to see fault Chapter 3 Troubleshooting DIMM Problems 17 FIGURE 3 2 DIMMs and LEDs on Mezzanine Board Fans CPU 2 CPU 3 CPU2 Fail un m m Motherboard Press to Fault see fault Isolating and Correcting DIMM ECC Errors If your log files report an ECC error or a problem with a DIMM complete the steps below until you can isolate the fault In this example the log file reports an error with the DIMM in CPUO slot 7 The fault LEDs on CPUO slots 6 and 7 are on To isolate and correct DIMM ECC errors 1 If you have not already done so shut down your server to standby power mode and remove the cover 2 Inspect the installed DIMMs to ensure that they comply with the DIMM Population Rules on page 11 18 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 3 Press the PRESS TO SEE FAULT button and inspect the DIMM fault LEDs See FIGURE 3 1 and FIGURE 3 2 A flashing LED identifies a component with a fault m For CEs the LEDs correctly identify the DIMM where the errors were detected For UCEs both LEDs in the pair flash if there is a problem with either DIMM in the pair Note If your server is equipped with a mezzanine board the motherboard DIMMs and LEDs will be hidden beneath it However the Motherboard Fault LED lights to indicate that there is a problem on the motherboard only while AC power is still connected If
39. ilure detected by reading and individual fan module LEDs are lit tach signals 66 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 TABLE D 1 Hardware Error Handling Summary Continued Logged DMI Log or SP Error Description Handling SEL Fatal Multiple fan Fan failure is The Front Fan Fault Service Action Required SP SEL Fatal failure detected by reading and individual fan module LEDs are lit tach signals Single power When any of the Service Action Required and Power Supply SP SEL Non fatal supply failure AC DC Fault LEDs are lit PS_VIN_GOOD or PS_PWR_OK signals are deasserted DC DC power Any The Service Action Required LED is lit the SP SEL Fatal converter POWER_GOOD system is powered down to standby power failure signal is deasserted mode and the Power LED enters standby from the DC DC blink state converters Voltage The SP monitors The Service Action Required LED and Power SP SEL Fatal above below system voltages and Supply Fault LED blink threshold detects voltage above or below a given threshold High The SP monitors The Service Action Required LED and System SP SEL Fatal temperature CPU and system Overheat Fault LED blink The motherboard temperatures and is shut down above the specified critical level detects temperatures above a given threshold Processor The CPU drives the CPLD shuts down power to the CPU The SP SEL Fatal thermal trip THERMTRIP_L Service Action Required
40. ing SunVTS Diagnostic Software 9 m Solaris system message log is a log of all the general Solaris events logged by syslogd The path name of this log file is var adm messages a Click the Log button The Log file window is displayed b Specify the log file that you want to view by selecting it from the Log file window The content of the selected log file is displayed in the window c With the three lower buttons you can perform the following actions ma Print the log file A dialog box appears for you to specify your printer options and printer name Delete the log file The file remains on the display but it will not be available the next time you try to display it m Close the Log file window The window is closed Note If you want to save the log files When you use the Bootable Diagnostics CD the server boots from the CD Therefore the test log files are not on the server s hard disk drive and they will be deleted when you power cycle the server To save the log files you must save them to a removable media device or FTP them to another system 10 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 CHAPTER 3 Troubleshooting DIMM Problems This chapter describes how to detect and correct problems with the server s Dual Inline Memory Modules DIMM s It includes the following sections DIMM Population Rules on page 11 DIMM Replacement Policy on page 12
41. itoring and maintenance information for your server m Making a Serial Connection to the SP on page 44 m Viewing ILOM SP Event Logs on page 45 m Viewing Replaceable Component Information on page 48 m Viewing Sensors on page 50 For more information on using the ILOM SP GUI to maintain the server for example configuring alerts refer to the Integrated Lights Out Manager Administration Guide m If any of the logs or information screens indicate a DIMM error see Chapter 3 a If the problem with the server is not evident after viewing ILOM SP logs and information continue with Running SunVTS Diagnostic Tests on page 7 43 Making a Serial Connection to the SP To make a serial connection to the SP 1 Connect a serial cable from the RJ 45 Serial Management port on server to a terminal device 2 Press ENTER on the terminal device to establish a connection between that terminal device and the ILOM SP Note If you are connecting to the serial port on the SP before it has been powered up or during its power up sequence you will see boot messages The service processor eventually displays a login prompt For example SUNSP0003BA84D777 login The first string in the prompt is the default host name for the ILOM SP It consists of the prefix SUNSP and the MAC address of the ILOM SP The MAC address for each ILOM SP is unique 3 Log in to the SP and type the default user name root with
42. itself and error information SP SEL gets retained through reset The BIOS reports A Hyper Transport sync flood error occurred on last boot press F1 to continue BIOS POST The BIOS could not The BIOS displays an error message logs the DMI Log Non fatal Microcode find or load the error to DMI and boots Error CPU Microcode Update to the CPU The message most likely appears when a new CPU is installed in a motherboard with an outdated BIOS In this case the BIOS must be updated BIOS POST CMOS contents The BIOS displays an error message logs the DMI Log Non fatal CMOS failed the error to DMI and boots Checksum Bad Checksum check Unsupported The BIOS supports The BIOS displays an error message logs the DMI Log Fatal CPU mismatched error and halts the system configuration frequency and steppings in CPU configuration but some CPUs might not be supported Correctable The CPU detects a The CPU corrects the error in hardware No DMI Log Normal error variety of interrupt or machine check is generated by SP SEL operation correctable errors in the hardware The polling is triggered every the MCi_STATUS half second by SMI timer interrupts and is registers done by the BIOS SMI handler The SMI handler logs a message to the SP SEL if the SEL is available otherwise SMI logs a message to DMI The BIOS s polling can be disabled through software SMI Single fan Fan failure is The Front Fan Fault Service Action Required SP SEL Non fatal fa
43. n of the event TABLE 3 1 describes the contents of the display TABLE 3 1 Lines in IPMI Output Event hex Description 8 UCE caused a Hypertransport sync flood which lead to system s warm reset 0x02 refers to a reboot count maintained since the last AC power reset 9 BIOS detected and initiated 4 processors in system a BIOS detected a Sync Flood caused this reboot b BIOS detected a hardware error caused the Sync Flood c to le BIOS retrieved and reported some hardware evidence including all processors Machine Check Error registers events 14 to 18 1f After BIOS detected that a UCE had occurred it located the DIMM and reset 0x03 refers to reboot count 21 to 25 BIOS off lined faulty DIMMs from system memory space and reported them Each DIMM of a pair is being reported since hardware UCE evidence cannot lead BIOS any further than detection of a faulty pair Correctable DIMM Errors If a DIMM has 24 or more correctable errors in 24 hours it is considered defective and should be replaced At this time CEs are not logged in the server s system event logs They are reported or handled in the supported OS s as follows Windows Server a A Machine Check error message bubble appears on the task bar b The user must manually open Event Viewer to view errors Access Event Viewer through this menu path Start gt Administration Tools gt Event Viewer c The user can then view individual errors by time
44. n the CLI HLT state 32 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 POST Code Checkpoints The POST code checkpoints are the largest set of checkpoints during the BIOS pre boot process TABLE A 2 describes the type of checkpoints that might occur during the POST portion of the BIOS These two digit checkpoints are the output from primary I O port 80 TABLE A 2 POST Code Checkpoints Post Code Description 03 Disable NMI Parity video for EGA and DMA controllers At this point only ROM accesses go to the GPNV If BB size is 64K turn on ROM Decode below FFFFO000h It should allow USB to run in the E000 segment The HT must program the NB specific initialization and OEM specific initialization and can program if it need be at beginning of BIOS POST similar to overriding the default values of kernel variables 04 Check CMOS diagnostic byte to determine if battery power is OK and CMOS checksum is OK Verify CMOS checksum manually by reading storage area If the CMOS checksum is bad update CMOS with power on default values and clear passwords Initialize status register A Initialize data variables that are based on CMOS setup questions Initialize both the 8259 compatible PICs in the system 05 Initialize the interrupt controlling hardware generally PIC and interrupt vector table 06 Do R W test to CH 2 count reg Initialize CH 0 as system timer Install the POSTINT1Ch handler Enable IRQ 0 in PIC for
45. nitializes different devices through DIM 39 Initializes DMAC 1 and DMAC 2 3A Initialize RTC date time 3B Test for total memory installed in the system Also Check for DEL or ESC keys to limit memory test Display total memory in the system 3C By this point RAM read write test is completed program memory holes or handle any adjustments needed in RAM size with respect to NB Test if HT Module found an error in BootBlock and CPU compatibility for MP environment 40 Detect different devices parallel ports serial ports and coprocessor in CPU etc successfully installed in the system and update the BDA EBDA etc 50 Programming the memory hole or any kind of implementation that needs an adjustment in system RAM size if required 52 Updates CMOS memory size from memory found in memory test Allocates memory for Extended BIOS Data Area from base memory 34 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 TABLE A 2 POST Code Checkpoints Continued Post Code Description 60 Initializes NUM LOCK status and programs the KBD typematic rate 75 Initialize Int 13 and prepare for IPL detection 78 Initializes IPL devices controlled by BIOS and option ROMs 7A Initializes remaining option ROMs 7C Generate and write contents of ESCD in NVRam 84 Log errors encountered during POST 85 Displays errors to the user and gets the user response for error 87 Execute BIOS setup if needed requested 8C Afte
46. or Handling This appendix contains information about how the servers process and log errors See the following sections Handling of Uncorrectable Errors on page 53 Handling of Correctable Errors on page 56 Handling of Parity Errors PERR on page 59 Handling of System Errors SERR on page 61 Handling Mismatching Processors on page 63 Hardware Error Handling Summary on page 64 Handling of Uncorrectable Errors This section lists facts and considerations about how the server handles uncorrectable errors Note The BIOS ChipKill feature must be disabled if you are testing for failures of multiple bits within a DRAM ChipKill corrects for the failure of a four bit wide DRAM The BIOS logs the error to the SP system event log SEL through the board management controller BMC The SP s SEL is updated with the failing DIMM pair s particular bank address The system reboots The BIOS logs the error in DMI 53 Note If the error is on low 1MB the BIOS freezes after rebooting Therefore no DMI log is recorded m Anexample of the error reported by the SEL through IPMI 2 0 is as follows When low memory is erroneous the BIOS is frozen on pre boot low memory test because the BIOS cannot decompress itself into faulty DRAM and execute the following items ipmitool gt sel list 100 08 26 2005 11 36 09 OEM 0xfb 200 08 26 2005 11 36 12 System Firmw
47. or Readings Indicators Event Logs Sensor Readings View readings for system sensors Click on a sensor name for more information including threshold values Sensor Readings Name Type Reading ISYSIMBIPO PRSNT Entity Presence Present SYSIMBIPOIT_CORE Temperature 16 000 degrees C ISYS MB POM_VDDCORE Voltage 1 140 Volts ISYS MBIPOM_ 1V8 Voltage 1 836 Volts ISYS MBIPOM_ 0V9 Voltage 0 912 Volts ISYS MB PO PROCHOT Entity Fault State Deasserted ISYSIMB P1 PRSNT Entity Presence Present ISYSIMBIP1 T_CORE Temperature 16 000 degrees C fOVCIMOoIDA AUNADO du altace 4 4 4Ntalte 3 Click the Refresh button to update the sensor readings to their current status 4 Click a sensor to display its thresholds A display of properties and values appears See the example in FIGURE C 4 Appendix C Using the ILOM Service Processor GUI to View System Information 51 FIGURE C 4 Sensor Details Page https 10 6 143 113 Mozilla Firefox Sun Integrated Lights Out Manager View all ofthe properties and values for a sensor SYS MB PO PRSNT Property Value type Entity Presence class Discrete Sensor value Present 10 6 143 113 amp 5 If the problem with the server is not evident after viewing sensor readings information continue with Running SunVTS Diagnostic Tests on page 7 52 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 APPENDIX D Err
48. ord e0 00000058000000000000030000 10 OEM record e0 000100440000000000feff 000 11 OEM record e0 00010048000000000000ff3efa 12 OEM record e0 10ab0000000010000006040012 13 OEM record e0 10ab0000001111002011110020 14 OEM record e0 0018304c00 200002000020c0 15 OEM record e0 0019304c00f200004000020c0f 16 OEM record e0 001a304c00f45aa10015080a13 17 OEM record e0 001a3054000000000320004880 18 OEM record e0 001b304c00f200001000020c0f 19 OEM record e0 80000002000000000029000002 la OEM record e0 80000004000000000000b00006 1b OEM record e0 80000048000000000011110322 lc OEM record e0 80000058000000000000030000 14 OEM record e0 800100440000000000feff 000 le OEM record e0 80010048000000000000ff3efa Tf 09 25 2007 03 22 06 System Boot Initiated 0x03 Initiated by warm reset Asserted 20 09 25 2007 03 22 06 Processor 0x04 Presence detected Asserted 21 09 25 2007 03 22 15 System Firmware Progress 0x01 Memory initialization Asserted 22 09 25 2007 03 22 16 Memory Uncorrectable 23 09 25 2007 03 22 16 Memory Uncorrectable ECC Asserted CPU 2 DIMM 0 ECC Asserted CPU 2 DIMM 1 24 09 25 2007 03 22 16 Memory Memory Device Disabled Asserted CPU 2 DIMM 0 25 09 25 2007 03 22 16 Memory Memory Device Disabled Asserted CPU 2 DIMM 1 Chapter 3 Troubleshooting DIMM Problems 13 The lines in the display start with event numbers in hex followed by a descriptio
49. ostic Tests 7 SunVTS Documentation 8 Diagnosing Server Problems With the Bootable Diagnostics CD 8 Requirements 8 Using the Bootable Diagnostics CD 9 Troubleshooting DIMM Problems 11 DIMM Population Rules 11 DIMM Replacement Policy 12 How DIMM Errors Are Handled by the System 12 Uncorrectable DIMM Errors 12 Correctable DIMM Errors 14 BIOS DIMM Error Messages 15 DIMM Fault LEDs 15 Isolating and Correcting DIMM ECC Errors 18 Event Logs and POST Codes 21 Viewing Event Logs 21 Power On Self Test POST 25 How BIOS POST Memory Testing Works 25 Redirecting Console Output 26 Changing POST Options 28 POST Codes 31 POST Code Checkpoints 33 Status Indicator LEDs 37 External Status Indicator LEDs 37 Front Panel LEDs 38 Back Panel LEDs 38 Hard Drive LEDs 39 Internal Status Indicator LEDs 39 Using the ILOM Service Processor GUI to View System Information 43 Making a Serial Connection to the SP 44 Viewing ILOM SP Event Logs 45 Interpreting Event Log Time Stamps 47 Viewing Replaceable Component Information 48 Viewing Sensors 50 Error Handling 53 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Handling of Uncorrectable Errors 53 Handling of Correctable Errors 56 Handling of Parity Errors PERR 59 Handling of System Errors SERR 61 Handling Mismatching Processors 63 Hardware Error Handling Summary 64 Index 69 Contents v vi Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August
50. ou utilisateurs finaux pour des armes nucl aires des missiles des armes biologiques et chimiques ou du nucl aire maritime directement ou indirectement sont strictement interdites Les exportations ou reexportations vers les pays sous embargo am ricain ou vers des entit s figurant sur les listes d exclusion d exportation am ricaines y compris mais de maniere non exhaustive la liste de personnes qui font objet d un ordre de ne pas participer d une fa on directe ou indirecte aux exportations des produits ou des services qui sont r gis par la l gislation am ricaine sur le contr le des exportations et la liste de ressortissants sp cifiquement d sign s sont rigoureusement interdites L utilisation de pi ces d tach es ou d unit s centrales de remplacement est limit e aux r parations ou l change standard d unit s centrales pour les produits export s conform ment la l gislation am ricaine en mati re d exportation Sauf autorisation par les autorit s des Etats Unis l utilisation d unit s centrales pour proc der des mises jour de produits est rigoureusement interdite ES com Ca Adobe PostScript Contents Preface vii Initial Inspection of the Server 1 Service Troubleshooting Flowchart 1 Gathering Service Information 2 System Inspection 3 Troubleshooting Power Problems 3 Externally Inspecting the Server 3 Internally Inspecting the Server 4 Using SunVTS Diagnostic Software 7 Running SunVTS Diagn
51. out of RAM 01d6 Key sequence and OEM specific method is checked to determine if BIOS recovery is forced If next code is E0 BIOS recovery is being executed Main BIOS checksum is tested 01d7 Restoring CPUID moving bootblock runtime interface module to RAM determine whether to execute serial flash 01d8 Uncompressing runtime module into RAM Storing CPUID information in memory 01d9 Copying main BIOS into memory Olda Giving control to BIOS POST 0004 Check CMOS diagnostic byte to determine if battery power is OK and CMOS checksum is OK If the CMOS checksum is bad update CMOS with power on default values 00c2 Set up boot strap processor for POST This includes frequency calculation loading BSP microcode and applying user requested value for GART Error Reporting setup question 00c3 Errata workarounds applied to the BSP 78 amp 110 00c6 Re enable cache for boot strap processor and apply workarounds in the BSP for errata 106 107 69 and 63 if appropriate 00c7 HT sets link frequencies and widths to their final values 000a Initializing the 8042 compatible Keyboard Controller 000c Detecting the presence of Keyboard in KBC port 000e Testing and initialization of different Input Devices Traps the INTO9h vector so that the POST INTO9h handler gets control for IRQ1 8600 Preparing CPU for booting to OS by copying all of the context of the BSP to all application processors present NOTE APs are left in the CLI HLT st
52. ps When the service processor reboots the SP clock is set to Thu Jan 1 00 00 00 UTC 1970 The SP reboots as a result of the following complete system unplug replug power cycle a An IPMI command for example mc reset cold command line interface CLI command for example reset SP Appendix C Using the ILOM Service Processor GUI to View System Information 47 m ILOM web GUI operation for example from the Maintenance tab selecting Reset SP a An SP firmware upgrade After an SP reboot the SP clock is changed by the following events m When the host is booted The host s BIOS unconditionally sets the SP time to that indicated by the host s RTC The host s RTC is set by the following operations a When the host s CMOS is cleared as a result of changing the host s RTC battery or inserting the CMOS clear jumper on the motherboard The host s RTC starts at Jan 1 00 01 00 2002 When the host s operating system sets the host s RTC The BIOS does not consider time zones Solaris and Linux software respect time zones and will set the system clock to UTC Therefore after the OS adjusts the RTC the time set by the BIOS will be UTC a When the user sets the RTC using the host BIOS Setup screen a Continuously via NTP if NTP is enabled on the SP NTP jumping is enabled to recover quickly from an erroneous update from the BIOS or user NTP servers provide UTC time Therefore if NTP is enabled on the SP the SP clo
53. r ISYSIMBIPO DO DIMM ISYS MBIPOID1 DIMM ISYSIMBIP0 D2 DIMM ISYSIMBIPO D3 DIMM ISYS MBIPOID4 DIMM rove mmoroning Miaa 3 Select a component from the drop down list Information about the selected component is displayed 4 If the problem with the server is not evident after viewing replaceable component information continue with Running SunVTS Diagnostic Tests on page 7 Appendix C Using the ILOM Service Processor GUI to View System Information 49 Viewing Sensors This section describes how to view the server temperature voltage and fan sensor readings For a complete list of sensors see Appendix D To view sensor readings 1 Log in to the SP as Administrator or Operator to reach the ILOM web GUI a Type the IP address of the server s SP into your web browser The Sun Integrated Lights Out Manager Login screen is displayed b Type your user name and password When you first try to access the ILOM Service Processor you are prompted to type the default user name and password The default user name and password are Default user name root Default password changeme 2 From the System Monitoring tab select Sensor Readings The Sensor Readings page is displayed See FIGURE C 3 50 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 FIGURE C 3 Sensor Readings Page Administrator root SP Sun Integrated Lights Out Manager 00144F8D2DB7 Sens
54. r Handling 57 EXAMPLE D 1 DMI Log Screen Correctable Error Memory Decreased View Event Log Ma ve 09 12 05 13 30 00 Memory decreased in 05 13 29 54 on Node 1 DIMM Pair 0 SPD address 0A0h 04A2h Memory Error In American Megatrend 58 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 Handling of Parity Errors PERR This section lists facts and considerations about how the server handles parity errors PERR m The handling of parity errors works through NMIs During BIOS POST the NMI is logged in the DMI and the SP SEL See the following example command and output root d mpk12 53 238 root ipmitool H 129 146 53 95 U root P changeme I lan sel list v SEL Record ID 0100 Record Type 00 Timestamp 01 10 2002 20 16 16 Generator ID 0001 EvM Revision 04 Sensor Type Critical Interrupt Sensor Number 00 Event Type Sensor specific Discrete Event Direction Assertion Event Event Data 04ff00 Description PCI PERR m FIGURE D 3 shows an example of a DMI log screen from BIOS Setup Page with a parity error Appendix D Error Handling 59 FIGURE D 3 DMI Log Screen PCI Parity Error BIOS SETUP UTILITY View Event Log r T 09 12 05 14 27 PCI Parity 2002 American Megatrends Inc m The BIOS displays the following messages and freezes during POST or DOS NMI EVENT System Halted due to Fatal NMI m The
55. r all device initialization is done program any user selectable parameters relating to NB SB such as timing parameters non cacheable regions and the shadow RAM cacheability and do any other NB SB PCIX OEM specific programming needed during Late POST Background scrubbing for DRAM and L1 and L2 caches are set up based on setup questions Get the DRAM scrub limits from each node 8D Build ACPI tables if ACPI is supported 8E Program the peripheral parameters Enable Disable NMI as selected 90 Late POST initialization of system management interrupt AO Check boot password if installed Al Clean up work needed before booting to OS A2 Takes care of runtime image preparation for different BIOS modules Fills the free area in F000h segment with OFFh Initializes the Microsoft IRQ Routing Table Prepares the runtime language module Disables the system configuration display if needed A4 Initialize runtime language module A7 Displays the system configuration screen if enabled Initializes the CPUs before boot which includes the programming of the MTRRs A8 Prepare CPU for OS boot including final MTRR values A9 Wait for user input at configuration display if needed AA Uninstall POST INT1Ch vector and INTO9h vector Deinitializes the ADM module AB Prepare BBS for Int 19 boot AC Any kind of Chipsets NB SB specific programming needed during End POST just before giving control to runtime code booting to OS Program the system BIOS 0F
56. re errors handling 64 l ILOM SP GUI general information 43 serial connection 44 time stamps 47 viewing component inventory 48 viewing sensors 50 viewing SP event log 45 inspection 69 external 3 internal 4 Integrated Lights Out Manager Service Processor See ILOM SP GUI internal inspection 4 isolating DIMM ECC errors 18 L LEDs external 37 LEDs ports and slots illustrated 38 39 locations of ports slots and LEDs illustration 38 39 M mismatching processors error handling 63 P parity errors handling 59 PERR 59 population rules for DIMMs 11 ports slots and LEDs illustrated 38 39 POST changing options 28 code checkpoints 33 codes table 31 overview 25 redirecting console output 26 Power button 4 5 Power button location 4 5 power off procedure 4 power problems troubleshooting 3 power on self test see POST processors mismatched error 63 R redirecting console output 26 related documentation viii S safety guidelines vii sensors viewing with ILOM SP GUI 50 serial connection to ILOM SP 44 SERR 61 Service Processor system event log See SP SEL service visit information gathering 2 shutdown procedure 4 slots ports and LEDsillustrated 38 39 SP event log viewing with ILOM SP GUI 45 SP SEL time stamps 47 SunVTS Bootable Diagnostics CD 8 documentation 8 logs 9 overview 7 system errors handling 61 T third party Web sites ix time
57. s Diagnostics Guide August 2008 After you have selected a category of event the Event Log table is updated with the specified events The fields in the Event Log are described in TABLE C 1 TABLE C 1 Event Log Fields Field Description Event ID The number of the event in sequence from number 1 Time Stamp The day and time the event occurred If the Network Time Protocol NTP server is enabled to set the SP time the SP clock will use Universal Coordinated Time UTC For more information about time stamps see Interpreting Event Log Time Stamps on page 47 Sensor Name The name of a component for which an event was recorded The sensor name abbreviations correspond to these components sys System or chassis e p0 Processor 0 e pl Processor 1 e io I O board e ps Power supply e fp Front panel e ft Fan tray mb Motherboard Sensor Type The type of sensor for the specified event Description A description of the event 4 To clear the event log click the Clear Event Log button A confirmation dialog box is displayed 5 Click OK to clear all entries in the log 6 If the problem with the server is not evident after viewing ILOM SP logs and information continue with Running SunVTS Diagnostic Tests on page 7 Interpreting Event Log Time Stamps The system event log time stamps are related to the service processor clock settings If the clock settings change the change is reflected in the time stam
58. se system to malfunction CPU Configuration 7 IDE Configuration 7 ba Hyper Transport Configuration ACPT Configuration E Event Log Configuration E TPMI 2 0 Configuration E MPS Configuration PCI Express Configuration mei Select Screen ES Remote Access Configuration E ER Select Item USB Configuration Enter Go to Sub Screen FI General Help F10 Save and Exit 3 ESC Exit KKEKEKKKKKKKKEKKEKKKK KEKE KKK KKK KKK KKK KR KKK K RARA RAR AA RRA RARA AAA RARA AAA RA RARA AAA v02 61 C Copyright 1985 2006 American Megatrends Inc 22 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 b From the Advanced Settings screen select Event Log Configuration The Advanced Menu Event Logging Details screen is displayed Advanced kkkkxkkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk Event Logging details View all unread events kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk OK on the Event Log View Event Log a Mark all events as read ig Clear Event Log w E Select Screen ETAS Select Item Enter Go to Sub Screen El General Help F10 Save and Exit ESC Exit KKEKEKKKKKKKKKKKKK KKK KKK KKK RARA KK KARA RARA KKK KKK KKK KK KK RARA RA RARA RARA AAA v02 61 C Copyright 1985 2006 American Megatrends Inc
59. stamps in ILOM SP SEL 47 troubleshooting guidelines 2 typographic conventions ix U uncorrectable errors handling 53 70 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008
60. t 2008 CHAPTER 1 Initial Inspection of the Server This chapter includes the following topics m Service Troubleshooting Flowchart on page 1 m Gathering Service Information on page 2 m System Inspection on page 3 Service Troubleshooting Flowchart Use the following flowchart as a guideline for using the subjects in this book to troubleshoot the server TABLE 1 1 Troubleshooting Flowchart To perform this task Refer to this section Gather initial service information Gathering Service Information on page 2 Investigate any powering on Troubleshooting Power Problems on page 3 problems Perform external visual inspection Externally Inspecting the Server on page 3 and internal visual inspection Internally Inspecting the Server on page 4 Chapter 3 View BIOS event logs and POST Viewing Event Logs on page 21 messages Power On Self Test POST on page 25 TABLE 1 1 Troubleshooting Flowchart Continued To perform this task Refer to this section View service processor logs and Using the ILOM Service Processor GUI to View sensor information System Information on page 43 Or view service processor logs and Using IPMItool to View System Information on sensor information page 55 Run SunVTS diagnostics Diagnosing Server Problems With the Bootable Diagnostics CD on page 8 Gathering Service Information The first step in determinin
61. the Motherboard Fault LED on the mezzanine board lights remove the mezzanine board as described in your server s service manual and inspect the LEDs on the motherboard 4 Disconnect the AC power cords from the server Caution Before handling components attach an ESD wrist strap to a chassis ground any unpainted metal surface The system s printed circuit boards and hard disk drives contain components that are extremely sensitive to static electricity Note To recover fault information look in the SP SEL as described in the Sun Integrated Lights Out Manager 2 0 User s Guide 5 Remove the DIMMs from the DIMM slots in the CPU Refer to your server s service manual for details 6 Visually inspect the DIMMs for physical damage dust or any other contamination on the connector or circuits 7 Visually inspect the DIMM slot for physical damage Look for cracked or broken plastic on the slot 8 Dust off the DIMMs clean the contacts and reseat them Caution Use only compressed air to dust DIMMs 9 If there is no obvious damage replace any failed DIMMs For UCEs if the LEDs indicate a fault with the pair replace both DIMMs Ensure that they are inserted correctly with ejector latches secured 10 Reconnect AC power cords to the server Chapter 3 Troubleshooting DIMM Problems 19 11 Power on the server and run the diagnostics test again 12 Review the log file If the tests identify
62. the same error the problem is in the CPU not the DIMMs 20 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 APPENDIX A Event Logs and POST Codes This appendix contains information about the BIOS event log the BMC system event log the power on self test POST and console redirection It contains the following sections m Viewing Event Logs on page 21 m Power On Self Test POST on page 25 Viewing Event Logs Use this procedure to view the BIOS event log and the BMC system event log 1 To turn on main power mode all components powered on if necessary use a ball point pen or other stylus to press and release the Power button on the server front panel See FIGURE 1 1 When main power is applied to the full server the Power OK LED next to the Power button lights and remains lit 2 Enter the BIOS Setup utility by pressing the F2 key while the system is performing the power on self test POST The BIOS Main menu screen is displayed 3 View the BIOS event log a From the BIOS Main Menu screen select Advanced The Advanced Settings screen is displayed 21 Main Advanced PCIPnP Boot Security Chipset Exit KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKEKKKKXEXEXXKX XX Advanced Settings Configure CPU x kkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk OK WARNING Setting wrong values in below sections X may cau
63. tial order and some are repeated because some POST codes are issued by code in add in card BIOS expansion ROMs In the case of early POST failures for example the BSP fails to operate correctly BIOS just halts without logging For some other POST failures subsequent to memory and SP initialization the BIOS logs a message to the SP s SEL 64 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 TABLE D 1 Hardware Error Handling Summary Continued Logged DMI Log or SP Error Description Handling SEL Fatal Single bit With ECC enabled The CPU corrects the error in hardware No SP SEL Normal DRAM ECC in the BIOS Setup interrupt or machine check is generated by operation error the CPU detects the hardware The polling is triggered every and corrects a half second by SMI timer interrupts and is single bit error on done by the BIOS SMI handler the DIMM interface The BIOS SMI handler starts logging each detected error and stops logging when the limit for the same error is reached The BIOS s polling can be disabled through a software interface Single four bit With CHIP KILL The CPU corrects the error in hardware No SP SEL Normal DRAM error enabled in the BIOS interrupt or machine check is generated by operation Setup the CPU the hardware The polling is triggered every detects and corrects half second by SMI timer interrupts and is for the failure of a done by the BIOS SMI handler four bit wide The BIOS SMI
64. vailable language BIOS logo and Silent logo modules 13 Initialize PM regs and PM PCI regs at Early POST Initialize multi host bridge if system will support it Setup ECC options before memory clearing REDIRECTION causes corrected data to written to RAM immediately CHIPKILL provides 4 bit error det corr of x4 type memory Enable PCI X clock lines in the 8131 20 Relocate all the CPUs to a unique SMBASE address The BSP will be set to have its entry point at A000 0 If less than 5 CPU sockets are present on a board subsequent CPUs entry points will be separated by 8000h bytes If more than 4 CPU sockets are present entry points are separated by 200h bytes CPU module will be responsible for the relocation of the CPU to correct address NOTE APs are left in the INIT state 24 Uncompress and initialize any platform specific BIOS modules 30 Initialize System Management Interrupt 2A Initializes different devices through DIM 2C Initializes different devices Detects and initializes the video adapter installed in the system that have optional ROMs 2E Initializes all the output devices 31 Allocate memory for ADM module and uncompress it Give control to ADM module for initialization Initialize language and font modules for ADM Activate ADM module 33 Initializes the silent boot module Set the window for displaying text information 37 Displaying sign on message CPU information setup key message and any OEM specific information 38 I
65. you set this to off the keyboard Num Lock is not turned on during boot m Wait for F1 if Error This option is disabled by default If you enable this the system will pause if an error is found during POST and will only resume when you press the F1 key a Interrupt 19 Capture This option is reserved for future use Do not change a Default Boot Order The letters in the brackets represent the boot devices To see the letters defined position your cursor over the field and read the definition in the right side of the screen 30 Sun Fire X4140 X4240 and X4440 Servers Diagnostics Guide August 2008 POST Codes TABLE A 1 contains descriptions of each of the POST codes listed in the same order in which they are generated These POST codes appear as a four digit string that is a combination of two digit output from primary I O port 80 and two digit output from secondary I O port 81 In the POST codes listed in TABLE A 1 the first two digits are from port 81 and the last two digits are from port 80 TABLE A 1 POST Codes Post Code Description 00d0 Coming out of POR PCI configuration space initialization enabling 8111 s SMBus 00d2 Disable cache full memory sizing and verify that flat mode is enabled 00d3 Memory detections and sizing in boot block cache disabled IO APIC enabled 01d4 Test base 512KB memory Adjust policies and cache first 8MB 01d5 Bootblock code is copied from ROM to lower RAM BIOS is now executing
Download Pdf Manuals
Related Search
Related Contents
STATION MÉTÉO SANS FIL 868 MHz STIHL BGA 85 Instruction Manual Voir la fiche technique du cale roue. John Deere AC-375LP User's Manual Emotiva XPA-1 User's Manual Manuale d`istruzioni BEAD-S-2003-017-A - Ministère de la Défense Copyright © All rights reserved.
Failed to retrieve file