Home

819-3248-10 Sun Fire T1000 Server Service Manual

image

Contents

1. See Note Depending on the configuration of ALOM POST variables see and whether POST detected faults or not the system might boot or the system might remain at the ok prompt If the system is at the ok prompt type boot d Issue the Solaris OS fmadm faulty command fmadm faulty No memory or DIMM faults should be displayed If any faults are reported return to the Diagnostic Flow Chart on page 11 for an approach to diagnosing the fault To Remove the Motherboard and Chassis The motherboard power supply and chassis are replaced as a unit Therefore remove all other FRUs and associated cables from your chassis and install them in the new chassis The FRUs to remove and replace and the procedures to remove and replace them are Sun Fire T1000 Server Service Manual January 2006 Remove the PCI Express card See To Remove the Optional PCI Express Card on page 58 Remove the fan tray assembly and cable See To Remove the Fan Tray Assembly on page 60 Remove the power supply and cable To Remove the Power Supply on page 61 Remove the hard drive and cable See To Remove the Hard Drive on page 63 Remove the memory DIMMs See To Remove DIMMs on page 65 Remove the socketed system configuration SEEPROM from the motherboard and place it on an antistatic mat The system configuration SEPROM contains the persistent storage for the
2. Note Never run the system with the top cover removed The top cover must be in place for proper air flow The cover interlock switch immediately shuts the system down when the cover is removed Caution The system supplies 3 3 Vdc standby power to the circuit boards even when the system is powered off if the AC power cord is plugged in 1 Press the cover release button FIGURE 3 3 2 While pressing the release button grasp the rear of the cover and slide the cover toward the rear of the server about one half inch 3 Lift the cover off the chassis Cover rel ase button Top cover FIGURE 3 3 Location of Top Cover Release Button Removing and Replacing CRUs This section provides procedures for replacing the following customer replaceable parts CRUs inside the server chassis m To Remove the Optional PCI Express Card on page 58 and To Add or Replace the Optional PCI Express Card on page 60 Chapter 3 Removing and Replacing FRUs 57 a To Remove the Fan Tray Assembly on page 60 and To Replace the Fan Tray Assembly on page 61 a To Remove the Power Supply on page 61 and To Replace the Power Supply on page 62 m To Remove the Hard Drive on page 63 and To Replace the Hard Drive on page 64 a To Remove DIMMs on page 65 and To Add or Replace DIMMs on page 66 a To Remove the Clock Battery on the Motherboard on page 70 and To Replace
3. SunVTS software features both character based and graphics based interfaces This procedure assumes that you are using the graphical user interface GUI on a system running the Common Desktop Environment CDE For more information about the character based SunVTS TTY interface and specifically for instructions on accessing it by TIP or telnet commands refer to the SunVTS User s Guide SunVTS software can be run in several modes This procedure assumes that you are using the default mode This procedure also assumes that the Sun Fire T1000 server is headless that is it is not equipped with a monitor capable of displaying bit mapped graphics In this case you access the SunVTS GUI by logging in remotely from a machine that has a graphics display Finally this procedure describes how to run SunVTS tests in general Individual tests may presume the presence of specific hardware or may require specific drivers cables or loopback connectors For information about test options and prerequisites refer to the following documentation m SunVTS Test Reference Manual m SunVTS 6 0 PS3 Doc Supplement SPARC To Exercise the System Using SunVTS Software Log in as superuser to a system with a graphics display The display system should be one with a frame buffer and monitor capable of displaying bit mapped graphics such as those produced by the SunVTS GUI Enable remote display On the display system type usr openwin bi
4. _ bge2 netilbtest bge3 netlbtest FIGURE 2 7 SunVTS Test Selection Panel 6 Optional Select the tests you want to run Certain tests are enabled by default and you can choose to accept these Alternatively you can enable and disable individual tests or blocks of tests by clicking the checkbox next to the test name or test category name Tests are enabled when checked and disabled when not checked TABLE 2 8 lists tests that are especially useful to run on a Sun Fire T1000 server TABLE 2 8 Useful Sun VTS Tests to Run on a Sun Fire T1000 Server SunVTS Tests FRUs Exercised by Tests cmttest cputest fputest iutest DIMMs motherboard lidcachetest dtlbtest and 12sramtest indirectly mptest and systest disktest Disks cables disk backplane nettest netlbtest Network interface network cable motherboard 48 Sun Fire T1000 Server Service Manual January 2006 TABLE 2 8 Useful SunVTS Tests to Run on a Sun Fire T1000 Server Continued SunVTS Tests FRUs Exercised by Tests pmemtest vmemtest ramtest DIMMs motherboard serialtest 1 0 serial port interface hsclbtest Motherboard ALOM system Controller Host to System Controller interface Optional Customize individual tests You can customize individual tests by right clicking on the name of the test For example in the illustration under FIGURE 2 7 right clicking on the text string bg0 nettest brings up a menu that enables you to co
5. online documentation for the Solaris operating environment m Other software documentation that you received with your system Sun Fire T1000 Server Service Manual January 2006 Typographic Conventions Typeface Meaning Examples AaBbCc123 The names of commands files Edit your login file and directories on screen Use 1s a to list all files computer output You have mail AaBbCc123 What you type when contrasted su with on screen computer output password AaBbCc123 Book titles new words or terms Read Chapter 6 in the User s Guide words to be emphasized These are called class options Replace command line variables You must be superuser to do this with real names or values To delete a file type rm filename 1 The settings on your browser might differ from these settings Shell Prompts Shell Prompt C shell machine name C shell superuser machine name Bourne shell and Korn shell Bourne shell and Korn shell superuser Sun Fire T1000 Server Documentation You can view and print the following documents from the Sun documentation web Preface ix site at http www sun com documentation Title Description Part Number Sun Fire T1000 Server Site Planning Data Guide Sun Fire T1000 Server Product Notes Sun Fire T1000 Server Product Overview Sun Fire T1000 Server Getting Started Guide Sun Fire T1000 Server Installation Guide Sun Fire T1000 Server Sy
6. service personnel and system administrators who service and repair computer systems The following topics are covered Overview of Sun Fire T1000 Server Diagnostics on page 9 Using LEDs to Identify the State of Devices on page 14 Using ALOM For Diagnosis and Repair Verification on page 17 Running POST on page 27 Using the Solaris Predictive Self Healing Feature on page 35 Collecting Information From Solaris OS Files and Commands on page 39 Managing System Components with Automatic System Recovery Commands on page 40 m Exercising the System with SunVTS on page 43 Overview of Sun Fire T1000 Server Diagnostics There are a variety of diagnostic tools commands and indicators you can use to troubleshoot a Sun Fire T1000 server m LEDs provide a quick visual notification of the status of the server and of some of the FRUs 10 a ALOM CMT firmware is the system firmware that runs on the system controller In addition to providing the interface between the hardware and OS ALOM also tracks and reports the health of key server components ALOM works closely with POST and Solaris predictive self healing technology to keep the system up and running even when there is a faulty component a Power On self test POST Performs diagnostics on system components upon system reset to ensure the integrity of those components POST is configureable and works with ALOM to take faulty compo
7. 35 The PSH console message provides the following information Type Severity Description Automated Response Impact Suggested Action for System Administrator Details If the Solaris OS PSH facility has detected a faulty component use the fmdump command to identify the fault Note Additional predictive self healing information is available at http www sun com msg 36 Sun Fire T1000 Server Service Manual January 2006 v To Use the fmdump Command to Identify Faults The fmdump command displays the list of faults detected by the Solaris PSH facility Use this command for the following reasons m To see if any faults have been detected by the Solaris PSH facility m If you need to obtain the fault message ID SUNW MSG ID for detected faults m To verify that the replacement of a FRU has cleared the fault and not generated any additional faults If you already have a fault message ID go to Step 2 to obtain more information about the fault from Suns Predictive Self Healing Knowledge Article web site 1 Check the event log using the fmdump command with v for verbose output fmdump v TIME UUID SUNW MSG ID Oct 21 10 32 47 2211 a26d5379 24b8 4a46 bcbf d9elff75albc SUN4U 8000 28 95 fault memory dimm FRU mem component MB CMPO CHO R1 D0 J0701 rsrc mem component MB CMPO CHO R1 D0 J0701 In this example a fault is displayed indicating the following details m Date and time of
8. CH1 R1 D1 but this table lists the DIMM namei in an abbreviated way the preceding MB CMP0 is omitted for clarity Grasp the top corners of the DIMM and remove it from the motherboard Place the DIMM on an antistatic mat To Add or Replace DIMMs Use the following guidelines and FIGURE 3 11 and TABLE 3 1 to plan the memory configuration of your server m Eight slots hold industry standard DDR 2 memory DIMMs providing a total of 16 GBytes of memory m The Sun Fire T1000 server accepts the following DIMM sizes 512 MB a 1 GB a 2GB a All DIMMs installed must be the same size DIMMs must be added four at a time m Rank 0 memory must be fully populated for the Sun Fire T1000 to function Unpackage the replacement DIMMs and place them on an antistatic mat Ensure that the socket ejector tabs are in the open position Line up the replacement DIMM with the connector Push the DIMM into the socket until the ejector tabs lock the DIMM in place Perform the procedures described in Common Procedures for Finishing Up on page 72 Sun Fire T1000 Server Service Manual January 2006 6 Perform the following steps to clear the memory fault a Gain access to the ALOM sc gt prompt Refer to the Sun Fire T2000 Server Advanced Lights Out Management ALOM Guide for instructions b Run the showfaults v command to determine how to clear the fault m If the fault is a Host detected fault displays a UUID such
9. For example if one of the processor cores is deemed faulty by POST the core will be disabled and the system will boot and run using the remaining cores Devices can be manually enabled or disabled using ASR commands see Managing System Components with Automatic System Recovery Commands on page 40 Controlling How POST Runs The server can be configured for normal extensive or no POST execution You can also control the level of tests that run the amount of POST output that is displayed and which reset events trigger POST by using ALOM variables Chapter 2 Sun Fire T1000 Server Diagnostics 27 28 TABLE 2 5 lists the ALOM variables used to configure POST and FIGURE 2 5 shows how the variables work together TABLE 2 5 ALOM Parameters Used For POST Configuration Parameter Values Description setkeyswitch normal The system can power on and run POST based on the other parameter settings For details see FIGURE 2 5 This parameter overrides all other commands diag The system runs POST based on predetermined settings stby The system cannot power on locked The system can power on and run POST but no flash updates can be made diag mode off POST does not run normal Runs POST according to diag level value service Runs POST with preset values for diag level and diag verbosity diag level min If diag mode normal run minimum set of tests max If diag mode normal runs all the minimum tests plus extensive CPU
10. If the suggested action does not recommend replacing a FRU perform the suggested action Contact Sun for additional support if needed The showenvironment command reports over temperature conditions when the ambient room temperature exceeds the upper limit 12 Sun Fire T1000 Server Service Manual January 2006 To Remove the Power Supply on page 61 and To Replace the Power Supply on page 62 To Run the showfaults Command on page 21 Using the Solaris Predictive Self Healing Feature on page 35 Sun Support information http www sun com service contacting To Run the showenvironment Command on page 22 TABLE 2 1 Action No Diagnostic Flow Chart Actions Continued Diagnostic Action Resulting Action For more information see these sections 8 10 11 12 Identify the cause of the over temperature condition Identify the faulty FRU Check the Solaris log files for fault information Run POST Run SunVTS The over temperature condition may be caused excessive ambient room temperature an overheating power supply or a faulty fan tray assembly If ambient room temperature is too high reduce room temperature If over temperature condition still exists go to Action 9 If over temperature condition does not exist go to Action 10 The FRUs require that you shut down the server to perform a cold swap After replacing the faulty FRU go
11. Information on page 51 Common Procedures for Parts Replacement on page 53 Removing and Replacing CRUs on page 57 Common Procedures for Finishing Up on page 72 For a list of CRUs see Appendix A Field Replaceable Units FRUs on page 75 Note Never attempt to run the system with the cover removed The cover must be in place for proper air flow The cover interlock switch immediately shuts the system down when the cover is removed Safety Information This section describes important safety information you need to know prior to removing or installing parts in the Sun Fire T1000 server For your protection observe the following safety precautions when setting up your equipment a Follow all Sun standard cautions warnings and instructions marked on the equipment and described in Important Safety Information for Sun Hardware Systems m Ensure that the voltage and frequency of your power source match the voltage and frequency inscribed on the equipment s electrical rating label m Follow the electrostatic discharge safety practices as described in this section 51 gt The document Important Safety Information for Sun Hardware Systems 816 7190 contains a listing of safety precautions for Sun systems This document is located in the packing carton of your server The Sun Fire T1000 server complies with regulatory requirements for safety and EMI Document about compliance is available onl
12. System Reliability Availability and Serviceability 4 Environmental Monitoring 5 Error Correction and Parity Checking 5 Predictive Self Healing 6 Chassis Identification 6 Additional Service Related Information 7 Sun Fire T1000 Server Diagnostics 9 Overview of Sun Fire T1000 Server Diagnostics 9 Using LEDs to Identify the State of Devices 14 Front and Rear Panel LEDs 16 Power Supply LEDs 17 Using ALOM For Diagnosis and Repair Verification 17 Running ALOM Service Related Commands 19 Connecting to ALOM 19 Switching Between the System Console and ALOM 20 Service Related ALOM Commands 20 v To Run the showfaults Command 21 v To Run the showenvironment Command 22 v To Runthe showfru Command 24 Running POST 27 Controlling How POST Runs 27 v To Change POST Parameters 30 Reasons to Run POST 31 Routine Sanity Check of the Hardware 31 Diagnosing the System Hardware 31 v ToRunPOST 31 Using the Solaris Predictive Self Healing Feature 35 v To Use the fmdump Command to Identify Faults 37 Collecting Information From Solaris OS Files and Commands 39 v To Check the Message Buffer 39 v To View System Message Log Files 39 Managing System Components with Automatic System Recovery Commands 40 v To Run the showcomponent Command 41 To Run the disablecomponent Command 42 v To Run the enablecomponent Command 43 Exercising the System with SunVTS 43 Checking Whether SunVTS Software Is Installed 43 v To Check Whether SunVTS Software Is Installed 44 Exercising
13. and memory tests diag trigger none Do not run POST on reset diag verbosity user reset power on reset error reset all reset none min normal max Runs POST upon user initiated resets Only run POST for the first power on This is the default Runs POST if fatal errors are detected Runs POST after any reset No POST output is displayed POST output displays functional tests with a banner and pinwheel POST output displays all test and informational messages POST displays all test informational and some debugging messages All of these parameters are set using the ALOM setsc command except for the setkeyswitch command Sun Fire T1000 Server Service Manual January 2006 diag_mode user_reset power_on reset error_reset diag_trigger System Boot OpenBoot PROM Service Mode Forces a Sun prescribed level of diagnostic execution Overrides user defined settings as if parameters were diag_level max diag_verbosity max diag_trigger all resets User defined settings are not modified Normal Mode Diagnostic execution is enabled User defined settings control test coverage and verbosity via diag_level diag_verbosity diag_trigger FIGURE 2 5 Flowchart of ALOM Variable for POST Configuration Chapter 2 Sun Fire T1000 Server Diagnostics 29 TABLE 2 6 shows typical combinations of ALOM variables and associated POST mode TABLE 26 ALOM Param
14. command takes effect sc gt reset 42 Sun Fire T1000 Server Service Manual January 2006 v To Run the enablecomponent Command The enablecomponent command enables a disabled component by removing it from the ASR blacklist 1 At the sc gt prompt enter the enablecomponent command sc gt enablecomponent MB CMP0 CH3 R1 D1 sc gt SC Alert MB CMP0 CH3 R1 D1 reenabled 2 After receiving confirmation that the enablecomponent command is complete reset the server for so that the ASR command takes effect sc gt reset Exercising the System with SunVTS Sometimes a server exhibits a problem that cannot be isolated definitively to a particular hardware or software component In such cases it may be useful to run a diagnostic tool that stresses the system by continuously running a comprehensive battery of tests Sun provides the SunVTS software for this purpose This chapter describes the tasks necessary to use SunVTS software to exercise your Sun Fire T1000 server m Checking Whether SunVTS Software Is Installed on page 43 m Exercising the System Using SunVTS Software on page 44 Checking Whether SunVTS Software Is Installed This procedure assumes that the Solaris OS is running on the Sun Fire T1000 server and that you have access to the Solaris OS command line Chapter 2 Sun Fire T1000 Server Diagnostics 43 44 v To Check Whether SunVTS Software Is Installed 1 Check for the
15. connect to the system console after performing the operation Prepares a FRU for removal and illuminates the host system s OK to Remove LED gt Generates a hardware reset on the host server The y option enables you to skip the confirmation question The c option instructs ALOM to connect to the system console after performing the operation Reboots the ALOM system controller The y option enables you to skip the confirmation question Sets the virtual keyswitch Turns the Locator LED on the server on or off 20 Sun Fire T1000 Server Service Manual January 2006 TABLE 2 4 Service Related ALOM Commands Continued ALOM Command Description showenvironment Displays the environmental status of the host server This information includes system temperatures power supply front panel LED hard drive fan voltage and current sensor status See To Run the showenvironment Command on page 22 showfaults v Displays current system faults See To Run the showfaults Command on page 21 showfru g lines s d Displays information about the FRUs in the server FRU The g lines option specifies the number of lines to display before pausing the output to the screen The s option displays static information about system FRUs defaults to all FRUs unless one is specified The d displays dynamic information about system FRUs defaults to all FRUs unless one is specified See To Run the sho
16. drive and remove the drive and tray assembly from the chassis Latches Hard drive Figure showing how to remove the hard disk drive FIGURE 3 9 Removing the Hard Drive Chapter 3 Removing and Replacing FRUs 63 64 v To Replace the Hard Drive 1 Unpackage the replacement hard drive and tray assembly 2 Slide the hard drive and tray assembly into the chassis until it mates with the front of the chassis FIGURE 3 10 Hard drive Latches FIGURE 3 10 Replacing the Hard Drive Snap the catches on the latches to lock the drive and tray assembly into place in the chassis Redress the power and cable through the midwall in the chassis and reconnect the cable to the rear of the drive Perform the procedures described in Common Procedures for Finishing Up on page 72 Perform administrative tasks to reconfigure the hard disk drive The procedures that you perform at this point depend on how your data is configured You might need to partition the drive create file systems load data from backups or have it updated from a RAID configuration Example cfgadm c configure c0t0d0s0C Sun Fire T1000 Server Service Manual January 2006 To Remove DIMMs Caution This procedure requires that you handle components that are sensitive to static discharges that can cause the component to fail To avoid this problem ensure that you follow antistatic practices as described in To Pe
17. gt 1O Bridge unit 1 interrupt test 0 0 gt 1O Bridge unit 1 Config MB bridges 0 0 gt Config port B bus 2 dev 0 func 0 tag 5714 BRIDGE 0 0 gt Config port B bus 3 dev 8 func 0 tag PCIX BRIDGE 0 0 gt 1O Bridge unit 1 PCI id test 0 0 gt INFO 10 count read passed for MB IOB_PCIEb BRIDGE Last read VID 1166 DID 103 0 0 gt INFO 10 count read passed for MB IOB PCIEb BRIDGE GBE Last read VID 14e4 DID 1648 0 0 gt INFO 10 count read passed for MB IOB_PCIEb BRIDGE HBA Last read VID 1000 DID 50 0 0 gt Quick JBI Loopback Block Mem Test 0 0 gt Quick jbus loopback Test 262144 bytes at 00000000 00600000 0 0 gt INFO 0 0 gt POST Passed all devices 0 0 gt POST Return to VBSC 0 0 gt Master set ACK for vbsc runpost command and spin 5 Perform further investigation if needed When POST is finished running the system will continue to boot even if post detects a faulty FRU provided it does not leave the system without memory or a CPU core Note that certain DIMM failures may not be diagnosable to a single DIMM These failures are fatal and will result in both logical banks being unconfigured If POST detects a faulty device the fault is displayed and the fault information is passed to ALOM for fault handling a Interpret the POST messages POST error messages use the following syntax c 5 gt ERROR TEST failing test c s gt H W under test FRU c s gt Repair Instructions Replace items in order listed by H W Ch
18. host ID and Ethernet MAC addresses of the system as well as the ALOM configuration including the IP addresses and ALOM user accounts if configured This information will be lost unless the system configuration SEEPROM is removed and installed in the replacement motherboard The PROM does not hold the fault data and this data will no longer be accessible when the motherboard a nd chassis assembly is replaced The location of this SEEPROM is shown in Appendix A Field Replaceable Units FRUs on page 75 To Replace the Motherboard and Chassis Assembly Reconnect the front panel LED cable Replace the PCI Express card See To Add or Replace the Optional PCI Express Card on page 60 Replace the fan tray assembly and cable See To Replace the Fan Tray Assembly on page 61 Replace the power supply and cable To Replace the Power Supply on page 62 Replace the hard disk drive and cable See To Replace the Hard Drive on page 64 Chapter 3 Removing and Replacing FRUs 69 Replace the memory DIMMs To Add or Replace DIMMs on page 66 Replace the socketed system configuration SEEPROM The location of this SEEPROM is shown in Appendix A Field Replaceable Units FRUs on page 75 Perform the procedures described in Common Procedures for Finishing Up on page 72 Boot the system and run POST to verify that the system is fully operational See Running POST on pag
19. indicate the source of a fault check the message buffer and log files for notifications for faults Hard drive faults are usually captured by the Solaris message files Use the dmesg command to view the most recent system message To view the system messages log file view the contents of the var adm messages file To Check the Message Buffer Log in as superuser Issue the dmesg command dmesg The dmesg command displays the most recent messages generated by the system To View System Message Log Files The error logging daemon syslogd automatically records various system warnings errors and faults in message files These messages can alert you to system problems such as a device that is about to fail Chapter 2 Sun Fire T1000 Server Diagnostics 39 The var adm directory contains several message files The most recent messages are in the var adm messages file After a period of time usually every ten days a new messages file is automatically created The original contents of the messages file are rotated to a file named messages 1 Over a period of time the messages are further rotated to messages 2 and messages 3 and then deleted 1 Log in as superuser 2 Issue the following command more var adm messages 3 If you want to view all logged messages issue the following command more var adm messages 40 Managing System Components with Automatic System Recovery Co
20. location is OK sensor at location is within normal range Environmental faults can be repaired through removal and replacement of the faulty FRU FRU removal is automatically detected by the environmental monitoring and all faults associated with the removed FRU are cleared The message for that case and the alert sent for all FRU removals is fru at location has been removed There is no ALOM command to manually repair an environmental fault ALOM does not handle hard drive faults Use the Solaris message files to view hard drive faults See Collecting Information From Solaris OS Files and Commands on page 39 Running ALOM Service Related Commands This section describes the ALOM commands that are commonly used for service related activities Connecting to ALOM Before you can run ALOM commands you must connect to the ALOM There are several ways to connect to the system controller a Connect an ASCII terminal directly to the serial management port m Use the telnet command to connect to ALOM through an Ethernet connection on the network management port a Connect an external modem to the network management port and dial in to the modem Note Refer to the Sun Fire T1000 Server Advanced Lights Out Manager ALOM Guide for instructions on configuring and connecting to ALOM Chapter 2 Sun Fire T1000 Server Diagnostics 19 Switching Between the System Console and ALOM m To switch from the console output to the
21. or interfere when the server chassis is removed from the rack Disconnect the power cord from the power supply Disconnect all cables from the server and label them From the front of the server unlock both mounting brackets FIGURE 3 1 and pull the server chassis out until the brackets lock in the open position FIGURE 3 1 Unlocking a Mounting Bracket Chapter 3 Removing and Replacing FRUs 55 6 Press the release buttons on both mounting brackets FIGURE 3 2 to release the right and left mounting brackets then pull the server chassis out of the rails The mounting brackets slide approximately 4 in 10 cm further before disengaging FIGURE 3 2 Location of the Mounting Bracket Release Buttons 7 Set the chassis on a sturdy work surface v To Perform Electrostatic Discharge ESD Prevention Measures 1 Prepare an antistatic surface by which to set parts during removal and installation Place ESD sensitive components such as the printed circuit boards on an antistatic mat The following items can be used as an antistatic mat a Antistatic bag used to wrap a Sun replacement part a Sun ESD mat part number 250 1088 a Disposable ESD mat shipped with some replacement parts or optional system components 2 Use an antistatic wrist strap 56 Sun Fire T1000 Server Service Manual January 2006 v To Remove the Top Cover Access to all customer replaceable units CRUs requires the removal of the top cover
22. presence of SunVTS packages Type pkginfo 1 SUNWvts SUNWvtsr SUNWvtsts SUNWvtsmn 6 a If SunVTS software is loaded information about the packages is displayed m If SunVTS software is not loaded you see an error message for each missing package ERROR information for SUNWvts was not found ERROR information for SUNWvtsr was not found The pertinent packages are as follows Package Description SUNWvts SunVTS framework SUNWvtsr Sun VTS Framework root SUNWvtsts Sun VTS for tests SUNWvtsmn Sun VTS man pages If SunVTS is not installed you can obtain the installations packages from the following m Solaris Operating System DVDs m From the Sun Download Center http www sun com oem products vts The Sun VTS 6 0 PS3 software and future compatible versions are supported on the Sun Fire T1000 server SunVTS installation instructions are described in the SunVTS User s Guide Exercising the System Using SunVTS Software Before you begin the Solaris OS must be running You also need to ensure that SunVTS validation test software is installed on your system See Checking Whether SunVTS Software Is Installed on page 43 Sun Fire T1000 Server Service Manual January 2006 SunVTS software requires that you use one of two security schemes The security scheme you choose must be properly configured in order for you to perform this procedure For details refer to the Sun VTS User s Guide
23. sont utilis es sous licence et sont des marques de fabrique ou des marques d pos es de SPARC International Inc aux Etats Unis et dans d autres pays Les produits portant les marques SPARC sont bas s sur une architecture d velopp e par Sun Microsystems Inc L interface d utilisation graphique OPEN LOOK et Sun a t d velopp e par Sun Microsystems Inc pour ses utilisateurs et licenci s Sun reconna t les efforts de pionnier de Xerox pour la recherche et le d veloppement du concept des interfaces d utilisation visuelle ou graphique pour l industrie de l informatique Sun d tient une license non exclusive de Xerox sur l interface d utilisation grephique Xerox cette licence couvrant galement les licenci es de Sun qui mettent en place l interface d utilisation graphique OPEN LOOK et qui en outre se conforment aux licences crites de Sun LA DOCUMENTATION EST FOURNIE EN L TAT ET TOUTES AUTRES CONDITIONS DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE A L APTITUDE A UNE UTILISATION PARTICULIERE OU A L ABSENCE DE CONTREFA ON EG 192 Ca Adobe PostScript Contents Preface vii Sun Fire T1000 Server Overview 1 Sun Fire T1000 Server Features 1 Chip Multitheaded CMT Multicore Processor and Memory Technology 2 Performance Enhancements 2 Remote Manageability With ALOM 3
24. temperature observed by a sensor falls below a low temperature threshold or rises above a high temperature threshold the monitoring subsystem software lights the amber Service required LEDs on the front and back panels If the temperature condition persists and reaches a critical threshold the system initiates a graceful system shutdown All error and warning messages are sent to the ALOM system controller system console and logged in the ALOM log file Additionally some FRUs such as the power supply provide LEDs that indicate a failure within the FRU Additionally the power supply contains an LED that is lit to indicate a failure within the power supply Error Correction and Parity Checking The SPARC T1 multicore processor provides parity protection on its internal cache memories including tag parity and data parity on the D cache and I cache The internal 3MB L2 cache has parity protection on the tags and ECC protection of the data Advanced ECC also called Chipkill detects up to 4 bits in error Chapter 1 Sun Fire T1000 Server Overview 5 Predictive Self Healing The Sun Fire T1000 server features the latest fault management technologies With the Solaris 10 Operating System OS Sun is introducing a new architecture for building and deploying systems and services capable of predictive self healing Self healing technology enables Sun systems to accurately predict component failures and mitigate many serious problems before th
25. the fault Oct 21 10 32 EDT 2004 a Universal Unique Identifier UUID that is unique for every fault a26d5379 24b8 4a46 bchf d9e1ff75albc m Sun message identifier SUNW4U 8000 2S that can be used to obtain additional fault information m Faulted FRU FRU mem component MB CMP0 CHO R1 D0 J0701 that in this example is identified as the DIMM at R1 D0 J0701 2 Use the Sun message ID to obtain more information about this type of fault a In a browser go to the Predictive Self Healing Knowledge Article web site http www sun com msg b Enter the message ID in the SUNW MSG ID field and press Lookup In this example the message ID SUN4U 8000 25 returns the following information for corrective action Chapter 2 Sun Fire T1000 Server Diagnostics 37 38 Memory module errors exceeded acceptable levels Type Fault Severity Major Description The Solaris TM Fault Manager has determined that the number of correctable single bit memory errors reported against a memory DIMM module indicates a fault requiring repair action is present Automated Response The system will attempt to remove the affected page of memory from service Impact The system is at increased risk of incurring an uncorrectable error which will cause a service interruption until the memory DIMM module is replaced Suggested Action for System Administrator For Sun Fire TM T1000 T2000 1280 3800 6800 2900 6900 E12K E15K F20K and F25K
26. to Action 14 The Solaris message buffer and log files record system events and can provide information about faults e If system messages indicate a faulty device replace the FRU Action 11 To obtain more diagnostic information got to Action 7 POST perforsm basic tests of the server components and reports faulty FRUs e If POST indicates a faulty FRU replace the FRU Action 9 e If POST does not indicate a faulty FRU go to Action 12 SunVTS provides tests used to exercise and diagnose FRUs To run SunVTS the server must be running the Solaris OS If SunVTS reports a faulty device replace the FRU Action 9 e If SunVTS does not report a faulty device go to Action 11 To Remove the Fan Tray Assembly on page 60 and To Replace the Fan Tray Assembly on page 61 To Remove the Power Supply on page 61 and To Replace the Power Supply on page 62 Collecting Information From Solaris OS Files and Commands on page 39 Running POST on page 27 Exercising the System with SunVTS on page 43 Chapter 2 Sun Fire T1000 Server Diagnostics 13 TABLE 2 1 Diagnostic Flow Chart Actions Continued Action For more information see No Diagnostic Action Resulting Action these sections 13 Replace faulty The FRUs require that you shut down the server to Removing and Replacing FRU perform a cold swap FRUs on page 51 After replacing the faulty FRU go to Action 14
27. vi Sun Fire T2000 Server Service Manual January 2006 Preface The Sun Fire T1000 Service Manual provides information to aid in troubleshooting problems with and replacing components within the Sun Fire T1000 server This manual is written for technicians service personnel and system administrators who service and repair computer systems The person qualified to use this manual Can open a system chassis identify and replace internal components Understands the Solaris Operating System and the command line interface Has superuser privileges for the system being serviced Understands typical hardware troubleshooting tasks How This Book Is Organized This guide is organized into the following chapters Chapter 1 describes the main features of the Sun Fire T1000 server Chapter 2 describes the diagnostics that are available for monitoring and troubleshooting the Sun Fire T1000 server Chapter 3 describes how to remove and replace the FRUS Appendix A lists the customer replaceable components in the Sun Fire T1000 server vii viii Using UNIX Commands Use this section to alert readers that not all UNIX commands are provided For example This document might not contain information on basic UNIX commands and procedures such as shutting down the system booting the system and configuring devices See one or more of the following for this information m Solaris Handbook for Sun Peripherals m AnswerBook2
28. 0 100 1000 Mbit auto negotiating Each of the 4 Ethernet RJ45s includes two LEDs e A green Link indicator lit when a link is established at any speed e A yellow Activity indicator which blinks during packet transfers 1 DB 9 serial port 1 SATA disk drive 3 5 inch form factor Support for hardware embedded RAID 1 mirroring 4 fans in a single assembly 1 PCI Express PCI E slot for low profile cards supports 1x 4x and 8x width cards 1 power supply PS ALOM system controller integrated on motherboard with a serial and 10 100 Mbit Ethernet port OpenBoot PROM for reset and POST support ALOM CMT for remote management administration Solaris 10 1 06 or later Operating System preinstalled on the hard disk drive Java Enterprise System with a 90 day trial license For additional information on the Sun Fire T1000 server features refer to the Sun Fire T1000 Server Product Overview Remote Manageability With ALOM The Sun Advanced Lights Out Manager ALOM feature is a system controller SC that enables to you remotely manage and administer the Sun Fire T1000 server Chapter 1 Sun Fire T1000 Server Overview 3 4 The ALOM CMT software is preinstalled as firmware and therefore ALOM initializes as soon as you apply power to the system You can customize ALOM to work with your particular installation ALOM enables you to monitor and control your server over a network or by using a dedicated serial port for co
29. 14 Verify the repair Various commands and utilities can be used to To Run the showfaults verify the functionality of the system components Command on page 21 Two useful commands are Managing System The ALOM showfaults command Components with e The ASR showcomponents command Automatic System If the FRU is blacklisted you can manually remove Recovery Commands on it from the black list with the enablecomponent page 40 command Exercising the System If the fault is cleared and the component is not W aN TE pn pape ae blacklisted the repair is verified well enough to boot the server For added assurance you can run the SunVTS diagnostic software 15 Contact Sun for The majority of hardware faults are detected by the Sun Support information Support server s diagnostics In rare cases it is possible that http www sun com a problem requires additional troubleshooting If service contacting you are unable to determine the cause of the problem contact Sun for support Using LEDs to Identify the State of Devices The Sun Fire T1000 server provides the following groups of LEDs m ABrO and rear panel LEDS FIGURE 2 2 FIGURE 2 3 and TABLE 2 2 LPBwer supply LEDs FIGURE 2 3 and TABLE 2 3 These LEDs provide a quick visual check of the state of the system 14 Sun Fire T1000 Server Service Manual January 2006 Power OK Service LED power required on off button LED FIGURE 2 2 Sun Fire T1000 Server Front Panel Ac
30. ALOM sc gt prompt type Pound Period a To switch from the sc gt prompt to the console type console Service Related ALOM Commands TABLE 2 4 describes the typical ALOM commands for servicing a Sun Fire T1000 server For descriptions of all ALOM commands issue the help command or refer to the Sun Fire T1000 Server Advanced Lights Out Management ALOM Guide TABLE 2 4 Service Related ALOM Commands ALOM Command Description help command clearfault UUID powercycle f poweroff y f poweron y c FRU removefru y FRU reset y c resetsc y setkeyswitch normal stby diag locked setlocator on off Displays a list of all ALOM commands with syntax and descriptions Specifying a command name as an option displays help for that command Manually clears system faults UUID is the unique fault ID of the fault to be cleared Performs a poweroff followed by poweron The f option forces an immediate poweroff otherwise the command attempts a graceful shutdown Removes the main power from the host server The y option enables you to skip the confirmation question The f option forces an immediate shutdown CAUTION Using the y option to skip the confirmation question could enable you to inadvertently shut down the system Applies the main power to the host server or FRU The y option enables you to skip the confirmation question The c option instructs ALOM to
31. GURE 2 1 Diagnostic Flow Chart Chapter 2 Sun Fire T1000 Server Diagnostics 11 TABLE 2 1 Diagnostic Flow Chart Actions Action For more information see No Diagnostic Action Resulting Action these sections 1 Check the power The amber Fault LED indicates the power cord in supply fault LED 2 Check the power cord 3 Run the ALOM showfaults command 4 Check fault message for a Sun Message ID 5 Enter the Sun Message ID into the Sun Knowledge Article web site 6 Analyze the suggested actions 7 Run the ALOM showenvironment command unplugged or the power supply is faulty If the Fault LED is lit go to Action 2 Connect the power cord If the Fault LED is still lit replace faulty power supply If the green LEDs are lit go to Action 3 The showfaults command displays faults detected by the system firmware If faults are displayed go to Action 2 e If no faults are displayed go to Action 6 Sun Message IDs SUNW MSG ID indicate that information is available from Sun s knowledge article database If you have a message ID number go to Action 5 e If you do not have a message ID number go to Action 10 Enter the Sun Message ID number into the knowledge article web site at http www sun com msg and go to Action 4 In some cases fault related messages are identified with suggested actions If the suggested action recommends replacing a FRU go to Action 9
32. Refer to your application documentation for specific information on these processes 4 Shut down the OS a At the Solaris OS prompt issue the uadmin command to halt the Solaris OS and to return to the ok prompt uadmin 2 0 WARNING proc_exit init exited syncing file systems done Program terminated ok This command is described in Solaris system administration documentation 5 Switch from the system console prompt to the SC console prompt by issuing the Pound Period escape sequence ok sc gt b Using the SC console issue the poweroff command sc gt poweroff fy SC Alert SC Request to Power Off Host Immediately Note You can also use the Power On Off button on the front of the server to initiate a graceful system shutdown Refer to the Sun Fire T1000 Server Administration Guide for more information about the ALOM poweroff command 54 Sun Fire T1000 Server Service Manual January 2006 To Remove the Server From a Rack If the server is installed in a rack with the extendable slide rails that were supplied with the server use this procedure to remove the server chassis from the rack Optional Issue the following command from the ALOM SC prompt to locate the system that requires maintenance sc gt setlocator on Locator LED is on Once you have located the server press the Locator button to turn it off Check to see that no cables will be damaged
33. ademarks or registered trademarks of Sun Microsystems Inc in the U S and in other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the U S and in other countries Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems Inc The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non exclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees viho implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements U S Government Rights Commercial use Government users are subject to the Sun Microsystems Inc standard license agreement and applicable provisions of the FAR and its supplements DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Copyright 2006 Sun Microsystems Inc 4150 Network Circle Santa Clara Californie 95054 Etats Unis Tous droits r serv s Sun Microsyst
34. an display includes system temperatures hard drive status power supply and fan status and voltage and current sensors Note You do not need user permissions to use this command At the sc gt prompt type the showenvironment command sc gt showenvironment System Temperatures Temperatures in Celsius Sensor Status Temp LowHard LowSoft LowWarn HighWarn HighSoft HighHard MB T AMB OK 28 10 5 0 45 50 55 MB CMPO T TCORE OK 50 10 5 0 85 90 95 MB CMPO T BCORE OK 51 10 5 0 85 90 95 MB IOB T CORE OK 49 10 5 0 95 100 105 SYS LOCATE SYS SERVICE SYS ACT OFF OFF ON 22 Sun Fire T1000 Server Service Manual January 2006 Fans Speeds Revolution Per Minute Sensor Status Speed Warn Low FTO FO OK 6762 2240 1920 FTO F1 OK 6762 2240 1920 FTO F2 OK 6762 2240 1920 FTO F3 OK 6653 2240 1920 Voltage sensors in Volts Sensor Status Voltage LowSoft LowWarn HighWarn HighSoft MB V_VCORE OK 430 20 24 1 36 1 39 MB V_VMEM OK 1 79 69 72 1 87 1 90 MB V_VTT OK 0 89 0 84 0 86 0 93 0 95 MB V_ 1V2 OK 1 18 09 11 1 28 1 30 MB V_ 1V5 OK 1 49 1 36 1 39 1 60 1 63 MB V_ 2V5 OK 2 51 2 27 2 32 2 67 2 72 MB V_ 3V3 OK 3 29 3 06 3 10 3 49 353 MB V 5V OK 5 02 4 55 4 65 5 35 5 45 MB V_ 12V OK 12 25 10 92 11 16 12 84 13 08 MB V_ 3V3STBY OK 3 33 313 3 16 35 53 3 59 System Load in amps Sensor Status Load Warn Shutdown MB I VCORE OK 20 560 80 000 88 000 MB I_VMEM OK 8 160 60 000 66 000 Current sensors Sensor Stat
35. and the server Use an Antistatic Mat Place ESD sensitive components such as the motherboard memory and other PCB cards on an antistatic mat Common Procedures for Parts Replacement Before you can remove and replace parts that are inside the Sun Fire T1000 server you must perform the following procedures m To Shut the System Down on page 53 m To Remove the Server From a Rack on page 55 m To Perform Electrostatic Discharge ESD Prevention Measures on page 56 a To Remove the Top Cover on page 57 The corresponding procedures that you perform when maintenance is complete are described in Common Procedures for Finishing Up on page 72 Required Tools The Sun Fire T1000 server can be serviced with the following tools m Antistatic wrist strap m Antistatic mat m No 2 Phillips screwdriver To Shut the System Down Performing a graceful shutdown makes sure all of your data is saved and the system is ready for restart Chapter 3 Removing and Replacing FRUs 53 1 Log in as superuser or equivalent Depending on the nature of the problem you might want to view the system status or the log files or run diagnostics before you shut down the system Refer to the Sun Fire T1000 Server Administration Guide for log file information 2 Notify affected users Refer to your Solaris system administration documentation for additional information 3 Save any open files and quit all running programs
36. apter 2 Sun Fire T1000 Server Diagnostics 33 under test above c s gt MSG test error message c s gt END_ERROR where c the core number s the strand number Warning and informational messages use the following syntax INFO or WARNING message The following is an example of a POST error message 0 gt Data Bitwalk 0 gt L2 Scrub Data 0 gt L2 Enable 0 gt Testing Memory Channel 0 Rank 0 Stack 0 0 gt Testing Memory Channel 3 Rank 0 Stack 0 0 gt Testing Memory Channel 0 Rank 1 Stack 0 Oo 00000 0 0 gt ERROR TEST Data Bitwalk 0 0 gt H W under test MB CMPO CHO R1 D0 S0 30701 0 0 gt Repair Instructions Replace items in order listed by H W under test above 0 0 gt MSG Pin 3 failed on MB CMPO CHO R1 D0 S0 J0701 0 0 gt END_ERROR 0 0 gt Testing Memory Channel 3 Rank 1 Stack 0 In this example POST is reporting a memory error at DIMM location MB CMP0 CHO0 R1 D0 J0701 b Run the showfaults command to obtain additional fault information The fault is captured by ALOM where the fault is logged the Service required LED is lit and the faulty component is disabled 34 Sun Fire T1000 Server Service Manual January 2006 Example ok sc gt showfaults v ID Time FRU Fault 1 APR 24 12 47 27 MB CMPO CH2 R0 D0 MB CMPO CH2 R0 D0 deemed faulty and disabled In this example MB CMP0 CH2 R0 D0 DIMM 0 at J0701 is disabled Until the faulty component is replaced the
37. as the following sc gt showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMPO CHO RO0 DO0 Host detected fault MSGID SUN4U 8000 2S UUID 7ee0e46b ea64 6565 e684 e996963f7b86 Run the showfaults v command to obtain the UUID to clear the fault sc gt clearfault 7ee0e46b ea64 6565 e684 e996963f7b86 Clearing fault from all indicted FRUs Fault cleared m If the fault resulted in the DIMM being disabled such as the following sc gt showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMPO CHO RO0 DO0 MB CMPO CHO RO0 DO deemed faulty and disabled Run the enablecomponent command to enable the FRU sc gt enablecomponent 7 Perform the following steps to verify that there are no faults a Set the virtual keyswitch to diag mode so that POST will run in service mode sc gt setkeyswitch diag b Issue the poweron command sc gt poweron Chapter 3 Removing and Replacing FRUs 67 68 c Switch to the system console to view POST output sc gt console Watch the POST output for possible fault messages The following output is an indication that POST did not detect any faults 0 0 gt POST Passed all devices 0 0 gt 0 0 gt DEMON Diagnostics Engineering MONitor 0 0 gt Select one of the following functions 0 0 gt POST Return to OBP 0 gt INFO 0 gt POST Passed all devices 0 gt Master set ACK for vbsc runpost command and spin
38. breaking information about the system including required software patches updated hardware and compatibility information and solutions to know issues The product notes are available online at http www sun com documentation Release Notes The Solaris OS Release Notes contain important information about the Solaris operating system The release notes are available online at http www sun com documentation SunSolve Online Provides a collection of support resources Depending on the level of your service contract you have access to Sun patches the Sun System Handbook the SunSolve knowledge base the Sun Support Forum and additional documents bulletins and related links Access this site at http sunsolve sun com Predictive Self Healing Knowledge Database You can access the knowledge article corresponding to a self healing message by taking the Sun Message Identifier SUNW MSG ID and entering it into the field on this page http www sun com msg Chapter 1 Sun Fire T1000 Server Overview 7 8 Sun Fire T1000 Server Service Manual January 2006 CHAPTER 2 sun Fire T1000 Server Diagnostics This chapter describes the diagnostics that are available for monitoring and troubleshooting the Sun Fire T1000 server This chapter does not provide detailed troubleshooting procedures but instead describes the Sun Fire T1000 server diagnostics facilities and how to use them This chapter is intended for technicians
39. e 27 To Remove the Clock Battery on the Motherboard Perform the procedures described in Common Procedures for Parts Replacement on page 53 Using a small flat head screwdriver carefully pry the battery FIGURE 3 12 from the motherboard FIGURE 3 12 Removing the Clock Battery from the Motherboard Sun Fire T1000 Server Service Manual January 2006 v To Replace the Clock Battery on the Motherboard 1 Unpackage the replacement battery 2 Press the new battery into the motherboard FIGURE 3 13 with the facing upward FIGURE 3 13 Replacing the Clock Battery on the Motherboard 3 Perform the procedures described in Common Procedures for Finishing Up on page 72 4 Use the ALOM setdate command to set the day and time Use the setdate command before you power on the host system For details about this command refer to the Sun Fire T1000 Server Advanced Lights Out Management ALOM Guide Chapter 3 Removing and Replacing FRUs 71 Common Procedures for Finishing Up v To Replace the Top Cover 1 Place the top cover on the chassis Set the cover down so that the cover hangs over the rear of the server by about an inch 2 5 cm 2 Slide the cover forward until it latches into place 72 Sun Fire T1000 Server Service Manual January 2006 v To Reinstall the Server Chassis in the Rack Refer to the Sun Fire T1000 System Installation Manual for installation instructions After you have reins
40. e and for troubleshooting as described in the following sections Routine Sanity Check of the Hardware POST tests critical hardware components to verify functionality before the system boots and accesses software If POST detects an error the faulty component is disabled automatically preventing faulty hardware from impacting system operation Under normal operating conditions the server is usually configured to run POST maximum mode for all power on or error generated resets This enables the system to initialize quickly and still have hardware checkups to ensure a healthy system Diagnosing the System Hardware You can use POST as an initial diagnostic tool for the system hardware In this case configure POST to run in diagnostic service mode for maximum test coverage and verbose output v To Run POST This procedure describes how to run POST when you want maximum testing as in the case when you are troubleshooting a system 1 Switch from the system console prompt to the SC console prompt by issuing the escape sequence and type the command setsc diag_mode normal ok sc gt setsc diag mode normal 2 Set the virtual keyswitch to diag so that POST will run in service mode sc gt setkeyswitch diag Chapter 2 Sun Fire T1000 Server Diagnostics 31 3 Reset the system so that POST runs The following example uses the powercycle command For other methods refer to the Sun Fire T1000 Server Administrat
41. ems Inc a les droits de propriete intellectuels relatants la technologie qui est d crit dans ce document En particulier et sans la limitation ces droits de propri t intellectuels peuvent inclure un ou plus des brevets am ricains num r s http www sun com patents et un ou les brevets plus suppl mentaires ou les applications de brevet en attente dans les Etats Unis et dans les autres pays Ce produit ou document est prot g par un copyright et distribu avec des licences qui en restreignent l utilisation la copie la distribution et la d compilation Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme par quelque moyen que ce soit sans l autorisation pr alable et crite de Sun et de ses bailleurs de licence s il y ena Le logiciel d tenu par des tiers et qui comprend la technologie relative aux polices de caract res est prot g par un copyright et licenci par des fournisseurs de Sun Des parties de ce produit pourront tre d riv es des syst mes Berkeley BSD licenci s par l Universit de Californie UNIX est une marque d pos e aux Etats Unis et dans d autres pays et licenci e exclusivement par X Open Company Ltd Sun Sun Microsystems le logo Sun AnswerBook2 docs sun com Java OpenBoot SunSolve Sun VTS Sun Fire et Solaris sont des marques de fabrique ou des marques d pos es de Sun Microsystems Inc aux Etats Unis et dans d autres pays Toutes les marques SPARC
42. erver Installation Guide and Sun Fire T1000 Server Administration Guide Use this flow chart to understand what diagnostics are available to troubleshoot faulty hardware and use TABLE 2 1 to find more information about each diagnostic in this chapter For many faults service can be deferred either because the faulty component has been asr d out the fault is being corrected or the fault is predictive Sun Fire T1000 Server Service Manual January 2006 Suspect faulty hardware 3 Are any faults reported by 1 Numbers in this flowchart s the power supply fault LED lit 2 Connect power cord or replace faulty power correspond to the Action numbers in Table 2 1 the showfaults command Isa fault message ID MSG ID displayed No 5 Enter the message ID into the Sun Knowl edge Article web site for recommended actions article recom FRU supply 9 Do the Solaris logs No indicate a No faulty FRU No 10 Identify and Br replace faulty ment command eports overtemp 8 Find cause of overtemp cond mend a FRU replacement Yes 11 Does POST report any faulty devices 12 Does SunVTS report any faulty devices 13 Perform recom mended corrective actions If needed contact Sun for support FI
43. eters and POST Modes Parameter Normal Diagnostic No POST Diagnostic Keyswitch Mode Execution Service Mode Diagnostic preset default settings values diag_mode normal off service normal setkeyswitch normal normal normal diag diag level max n a max max diag trigger power on reset none all resets all resets error reset diag verbosity normal n a max max Description of POST execution This is the default POST configuration and provides a reasonable compromise between testing thoroughness and quick server initialization POST does not run resulting in quick system initialization but this is not a suggested configuration POST runs the full spectrum of tests with the maximum output displayed POST runs the full spectrum of tests with the maximum output displayed The setkeyswitch parameter when set to diag overrides all the other ALOM POST variables v To Change POST Parameters 1 Access the ALOM sc gt prompt At the console issue the key sequence 2 At the ALOM sc gt prompt use the setsc command to set the POST parameter Example sc gt setsc diag mode service The setkeyswitch parameter is a command that sets the virtual keyswitch so it does not use the setsc command Example sc gt setkeyswitch diag 30 Sun Fire T1000 Server Service Manual January 2006 Reasons to Run POST You can use POST for basic sanity checking of the server hardwar
44. ey actually occur This technology is incorporated into both the hardware and software of the Sun Fire T2000 server At the heart of the predictive self healing capabilities is the Solaris Fault Manager a new service that receives data relating to hardware and software errors and automatically and silently diagnoses the underlying problem Once a problem is diagnosed a set of agents automatically responds by logging the event and if necessary takes the faulty component offline By automatically diagnosing problems business critical applications and essential system services can continue uninterrupted in the event of software failures or major hardware component failures Chassis Identification FIGURE 1 3 and FIGURE 1 4 show the physical characteristics of the Sun Fire T1000 server Locator Power OK Service ED button LED and required Power LED On Off button FIGURE 1 3 Sun Fire T1000 Server Front Panel 6 Sun Fire T1000 Server Service Manual January 2006 Ethernet ports PCI E slot oesennasorssann 7 un Power supply Locator Service Power OK LEDS LED required DB9 System button LED serial console port ports FIGURE 1 4 Sun Fire T1000 Server Rear Panel Additional Service Related Information In addition to this document the following resources are available to help you keep your server running optimally Product Notes The Sun Fire T1000 Server Product Notes 819 3244 contain late
45. ine at http www sun com documentation Safety Symbols The following symbols might appear in this document note their meanings Caution There is a risk of personal injury and equipment damage To avoid personal injury and equipment damage follow the instructions Caution Hot surface Avoid contact Surfaces are hot and might cause personal injury if touched Caution Hazardous voltages are present To reduce the risk of electric shock and danger to personal health follow the instructions Electrostatic Discharge Safety Electrostatic discharge ESD sensitive devices such as the motherboard PCI cards hard drives and memory cards require special handling Caution The boards and hard drives contain electronic components that are extremely sensitive to static electricity Ordinary amounts of static electricity from clothing or the work environment can destroy components Do not touch the components along their connector edges 52 Sun Fire T1000 Server Service Manual January 2006 Use an Antistatic Wrist Strap Wear an antistatic wrist strap and use an antistatic mat when handling components such as drive assemblies boards or cards When servicing or removing server components attach an antistatic strap to your wrist and then to a metal area on the chassis Do this after you disconnect the power cords from the server Following this practice equalizes the electrical potentials between you
46. ion Guide sc gt powercycle Are you sure you want to powercycle the system y n y Powering host off at MON JAN 10 02 52 02 2000 Waiting for host to Power Off hit any key to abort SC Alert SC Request to Power Off Host SC Alert Host system has shut down Powering host on at MON JAN 10 02 52 13 2000 SC Alert SC Request to Power On Host 4 Switch to the system console to view the post output sc gt console Example of POST output SC Alert Host system has reset1 Note Some output omitted 0 0 gt 0 0 gt ERIE Integrated POST 4 x 0 build 17 2005 08 30 11 25 export common source firmware re ontario fireball fio build 17 post Niagara erie integrated firmware re 0 0 gt Copyright 2005 Sun Microsystems Inc All rights reserved SUN PROPRIETARY CONFIDENTIAL Use is subject to license terms 0 0 gt VBSC selecting POST IO Testing 0 0 gt VBSC enabling threads 1 0 0 gt VBSC setting verbosity level 3 0 0 gt Start Selftest 0 0 gt Init CPU 0 0 gt Master CPU Tests Basic 0 0 gt CPU 0 32 Sun Fire T1000 Server Service Manual January 2006 SC Alert Host system has reset1 Note Some output omitted 0 0 gt 0 0 gt Test 6291456 bytes at 00000001 00000000 Memory Channel 0 3 Rank 0 Stack 1 0 0 gt 1O Bridge unit 1 ilu init test 0 0 gt 1O Bridge unit 1 tlu init test 0 0 gt 1O Bridge unit 1 lpu init test 0 0 gt 1O Bridge unit 1 link train port B 0 0
47. issipating less heat than conventional processor designs Depending on the model purchased the processor has six or eight UltraSPARC cores Each core equates to a 64 bit execution pipeline capable of running four threads The result is that the 8 core processor handles up to 32 active threads concurrently Additional processor components such the DDR2 memory controllers L1 cache L2 cache and the Jbus I O interface have been carefully tuned for optimal performance shows the major components in the Sun Fire T1000 server PCI E socket and slot Motherboard and chassis assembly UltraSPARC T1 mullticore processor assembly FIGURE 1 2 Sun Fire T1000 Server Components Performance Enhancements The Sun Fire T1000 server introduces several new technologies with its sun4v architecture and multicore multithreaded UltraSPARC T1 multicore processor 2 Sun Fire T1000 Server Service Manual January 2006 TABLE 1 1 lists feature specifications for the Sun Fire T1000 server TABLE 1 1 Sun Fire T1000 System Features Feature Description Processor 1 UltraSPARC T1 multicore processor 6 or 8 cores Memory 8 slots that can be populated with one of the following types of Ethernet ports DB 9 serial port Internal hard disk drive Cooling PCI interface Power Firmware Operating system Other software DDR 2 DIMMs e 512 MB 4 GB maximum 1 GB 8 GB maximum 2 GB 16 GB maximum 4 ports 1
48. le on or through such sites or resources Contacting Sun Technical Support If you have technical questions about this product that are not answered in this document go to http www sun com service contacting Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions You can submit your comments by going to http www sun com hwdocs feedback Please include the title and part number of your document with your feedback Sun Fire T1000 Server Service Manual part number 819 3248 10 Preface xi xii Sun Fire T1000 Server Service Manual January 2006 CHAPTER 1 Sun Fire T1000 Server Overview This chapter provides an overview of the features of the Sun Fire T1000 server The following topics are covered m Sun Fire T1000 Server Features on page 1 a Chassis Identification on page 6 Sun Fire T1000 Server Features The Sun Fire T1000 server FIGURE 1 1 is a high performance entry level server that is highly scalable and very reliable FIGURE 1 1 Sun Fire T1000 Server Chip Multitheaded CMT Multicore Processor and Memory Technology The UltraSPARC T1 multicore processor is the basis of the Sun Fire T1000 server The UltraSPARC T1 processor is based on chip multithreading CMT technology that is optimized for highly threaded transactional processing The UltraSPARC T1 processor improves throughput while using less power and d
49. mmands The Automatic System Recovery ASR feature enables the server to automatically configure failed components out of operation until they can be replaced In the Sun Fire T2000 server the following components managed by the ASR feature m UltraSPARC T1 processor strands Memory DIMMs a I O bus The database that contains the list of disabled components is called the ASR blacklist asr db In most cases POST and ALOM automatically manage the disabling of faulty components When the faulty FRU is replaced it must be manually enabled Example A component appears faulty and is automatically disabled The problem is due to a loose connector and no FRU replacement is required to fix the problem ALOM which would normally detect a FRU replacement and enable the FRU does not do so In this case after the loose cable is reseated the disabled component must be manually enabled Sun Fire T1000 Server Service Manual January 2006 The ASR commands TABLE 2 7 enable you to view and manually add or remove components from the ASR blacklist These commands are run from the ALOM sc gt prompt TABLE 2 7 ASR Commands Command Description showcomponent Displays system components and their current state enablecomponent asrkey Removes a component from the asr db blacklist where asrkey is the component to enable disablecomponent asrkey Adds a component to the asr db blacklist where asrkey is the component to disable clea
50. n xhost fest system where test system is the name of the Sun Fire T1000 server you plan to test Remotely log in to the Sun Fire T1000 server as superuser Use a command such as rlogin or telnet Chapter 2 Sun Fire T1000 Server Diagnostics 45 4 Start Sun VTS software Type opt SUNWvts bin sunvts display display system 0 where display system is the name of the machine through which you are remotely logged in to the Sun Fire T1000 server If you have installed SunVTS software in a location other than the default opt directory alter the path in this command accordingly The SunVTS GUI appears on the display system s screen 46 Sun Fire T1000 Server Service Manual January 2006 PNT Processor s Memory Cryptography 4 SCSI Devices mpto OtherDevices Network USB Devices FIGURE 2 6 The SunVTS GUI Screen Chapter 2 Sun Fire T1000 Server Diagnostics 47 5 Expand the test lists to see the individual tests The test selection area lists tests in categories such as Network as shown in FIGURE 2 7 To expand a category left click the j icon to the left of the category name FIGURE 2 7 shows the expand category icon which looks like a plus sign and appears to the left of the category name Y Processor s Y Memory Cryptography Y SCSI Devices mpto Y OtherDevices Network bgeO nettest bgel netlbtest
51. nents offline if needed and blacklist them in the asr db m Solaris OS predictive self healing PSH Continuously monitors the health of the CPU and memory and works with ALOM to take a faulty component offline if needed Log files and console messages Provide the standard Solaris OS log files and investigative commands that can be accessed and displayed on the device of your choice m SunVTS is an application you can run that exercises the system provides hardware validation and discloses possible faulty components with recommendations for repair The LEDs ALOM Solaris OS PSH and many of the log files and console messages are integrated For example a fault detected by the Solaris PSH software will display the fault log it pass information to ALOM where it is logged and depending on the fault might result in the illumination of one or more LEDs The diagnostic flowchart in FIGURE 2 1 and TABLE 2 1 describe an approach for using the servers diagnostics that is likely identify a faulty field replaceable unit FRU The diagnostics you use and the order in which you use them depend on the nature of the problem you are troubleshooting so you might not follow this flow step by step The flowchart assumes that you have already performed some rudimentary troubleshooting such as verification of proper installation visual inspection of cables and power and possibly reset server For details refer to the Sun Fire T1000 S
52. nfigure this Ethernet test Start testing Click the Start button that is located at the top left of the SunVTS window Status and error messages appear in the test messages area located across the bottom of the window You can stop testing at any time by clicking the Stop button During testing SunVTS software logs all status and error messages To view these click the Log button or select Log Files from the Reports menu This opens a log window from which you can choose to view the following logs m Information Detailed versions of all the status and error messages that appear in the test messages area m Test Error Detailed error messages from individual tests m VTS Kernel Error Error messages pertaining to SunVTS software itself You should look here if SunVTS software appears to be acting strangely especially when it starts up a UNIX Messages var adm messages A file containing messages generated by the operating system and various applications m Log Files var opt SUNWvts logs A directory containing the log files For further information refer to the documents that accompany the SunVTS software Chapter 2 Sun Fire T1000 Server Diagnostics 49 50 Sun Fire T1000 Server Service Manual January 2006 CHAPTER 3 Removing and Replacing FRUs This chapter describes how to remove and replace field replaceable units FRUs in the Sun Fire T1000 server The following topics are covered Safety
53. nnection to a terminal or terminal server ALOM provides a command line interface that you can use to remotely administer geographically distributed or physically inaccessible machines In addition ALOM enables you to run diagnostics such as POST remotely that would otherwise require physical proximity to the server s serial port You can configure ALOM to send email alerts of hardware failures hardware warnings and other events related to the server or to ALOM The ALOM circuitry runs independently of the server using the server s standby power Therefore ALOM firmware and software continue to function when the server operating system goes offline or when the server is powered off ALOM monitors the following Sun Fire T1000 server components Hard disk drive status Enclosure thermal conditions Power supply status Voltage levels Faults detected by POST Power On Self Test Solaris OS Predictive Self Healing PSH diagnostic facilities For information about configuring and using the ALOM system controller refer to the Sun Fire T1000 Server Advanced Lights Out Manager ALOM Guide System Reliability Availability and Serviceability Reliability availability and serviceability RAS are aspects of a system s design that affect its ability to operate continuously and to minimize the time necessary to service the system Reliability refers to a system s ability to operate continuously without failures and to maintain data in
54. ons that are not covered here Insert the PCI Express card into the connector slot and retention bracket FIGURE 3 5 on the PCI Express riser board On the rear of the chassis engage the retention latch FIGURE 3 4 to secure the card to the chassis Perform the procedures described in Common Procedures for Finishing Up on page 72 Run the Solaris printdiag command to verify that the PCI Express card is being recognized by the system To Remove the Fan Tray Assembly Perform the procedures described in Common Procedures for Parts Replacement on page 53 Disconnect the fan power cable from the motherboard Release the tabs FIGURE 3 6 on both sides of the fan assembly Sun Fire T1000 Server Service Manual January 2006 Fan tray Ce X assembly FIGURE 3 6 Removing the Fan Tray Assembly Remove the fan assembly from the sheet metal mounting brackets To Replace the Fan Tray Assembly Unpackage the replacement fan tray assembly and place it on an antistatic mat Align the fan tray assembly with the sheet metal mounting brackets and slide it into place until tabs on each side lock it into place Reconnect the fan power cable to the motherboard Perform the procedures described in Common Procedures for Finishing Up on page 72 Verify that the Service required and Locator LEDs are not lit To Remove the Power Supply Perform the procedures de
55. rasrdb Removes all entries from the asr db blacklist The showcomponent command may not report all blacklisted DIMMs Note The components asrkeys vary from system to system depending on how many cores and memory are present Use the showcomponent command to see the asrkeys on a specific system Note A reset or powercycle is required after disabling or enabling a component If component status is changed with power on there is no effect to the system until the next reset or powercycle The following examples show the output of these commands v To Run the showcomponent Command The showcomponent command displays the system components asrkeys and reports their status 1 At the sc gt prompt enter the showcomponent command Chapter 2 Sun Fire T1000 Server Diagnostics 41 Example with no disabled components sc gt showcomponent Keys ASR state clean Example showing a disabled component sc gt showcomponent Keys ASR state Disabled Devices MB CMP0 CH3 R1 D1 dimm8 deemed faulty To Run the disablecomponent Command The disablecomponent command disables a component by adding it to the ASR blacklist 1 At the sc gt prompt enter the disablecomponent command sc gt disablecomponent MB CMPO CH3 R1 D1 sc gt SC Alert MB CMP0 CH3 R1 D1 disabled 2 After receiving confirmation that the disablecomponent command is complete reset the server for so that the ASR
56. rform Electrostatic Discharge ESD Prevention Measures on page 56 Perform the procedures described in Common Procedures for Parts Replacement on page 53 Locate the DIMM FIGURE 4 8 that you want to replace Use FIGURE 3 11 and TABLE 3 1 to identify the DIMM you want to remove Make note of the DIMM location so you can install the replacement DIMM in the same socket Push down on the ejector levers on each side of the DIMM until the DIMM is released lt Front Back gt J0501 J0701 J0601 J0801 Channel 0 i B DIMM4 DIMM 1 B H DIMM 0 DIMM 0 Rank 0 Rank 1 Channel 3 E DIMM 0 E I DIMM0 DIMM 1 EB B DIMM 1 J1301 J1101 J1201 J1001 FIGURE 3 11 DIMM Locations TABLE 3 1 maps the DIMM names that are displayed in faults to the socket numbers that identify the location of the DIMM on the motherboard Chapter 3 Removing and Replacing FRUs 65 66 TABLE 3 1 DIMM Names and Socket Numbers Socket Number DIMM Name Used in Messages J0501 CH0 RO0 D0 J0601 CHO RO D1 JO701 CHO R1 D0 JO801 CHO R1 D1 J1001 CH3 R0 D0 J1101 CH3 R0 D1 J1201 CH3 R1 D0 J1301 CH3 R1 D1 DIMM names in messages are displayed with the full name such as MB CMP0
57. rt No 72T256220HR3 7A D Vendor Serial No d03e620 FRU PROM at MB CMPO CH3 R0 D1 SEEPROM SPI SPI SPI SPI SPI SPI D Timestamp MON OCT 03 12 00 00 2005 D Description DDR2 SDRAM 2048 MB D Manufacture Location D Vendor Infineon formerly Siemens D Vendor Part No 72T256220HR3 7A D Vendor Serial No d040920 FRU PROM at MB CMP0 CH3 R1 D0 SEEPROM SPI SPI SPI SPI SPI SPI D Timestamp MON OCT 03 12 00 00 2005 D Description DDR2 SDRAM 2048 MB D Manufacture Location D Vendor Infineon formerly Siemens D Vendor Part No 72T256220HR3 7A D Vendor Serial No d03ec27 FRU PROM at MB CMP0 CH3 R1 D1 SEEPROM Sun Fire T1000 Server Service Manual January 2006 SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture Location SPD Vendor Infineon formerly Siemens SPD Vendor Part No 72T256220HR3 7A SPD Vendor Serial No d040924 sce If you do not provide a command line argument all FRUs are listed Running POST Power on self test POST is a group of PROM based tests that run when the server is powered on or reset POST checks the basic integrity of the critical hardware components in the server motherboard memory and I O buses If POST detects a faulty component it is disabled automatically If the system is capable of running without the disabled component the system will boot when POST is complete
58. s ManR Initial HW Dash Level 02 ManR Initial HW Rev Level O1 ManR Shortname PS SpecPartNo 885 0407 02 FRU PROM at MB CMPO CHO R0 D0 SEEPROM SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture Location SPD Vendor Infineon formerly Siemens SPD Vendor Part No 72T256220HR3 7A SPD Vendor Serial No d03fe27 FRU PROM at MB CMPO CHO RO D1 SEEPROM SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture Location SPD Vendor Infineon formerly Siemens SPD Vendor Part No 72T256220HR3 7A Chapter 2 Sun Fire T1000 Server Diagnostics 25 26 SPD Vendor Serial No d03 623 FRU PROM at MB CMPO CHO R1 D0 SEEPROM SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture Location SPD Vendor Infineon formerly Siemens SPD Vendor Part No 72T256220HR3 7A SPD Vendor Serial No d03fc26 FRU PROM at MB CMPO CHO R1 D1 SEEPROM SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture Location SPD Vendor Infineon formerly Siemens SPD Vendor Part No 72T256220HR3 7A SPD Vendor Serial No d03eb26 FRU PROM at MB CMPO CH3 R0 D0 SEEPROM SPI SPI SPI SPI SPI SPI D Timestamp MON OCT 03 12 00 00 2005 D Description DDR2 SDRAM 2048 MB D Manufacture Location D Vendor Infineon formerly Siemens D Vendor Pa
59. s below limits Using ALOM For Diagnosis and Repair Verification The Sun Advanced Lights Out Manager ALOM is a system controller on the Sun Fire T1000 server motherboard that enables you to remotely manage and administer your server Chapter 2 Sun Fire T1000 Server Diagnostics 17 18 ALOM enables you to run diagnostics remotely such as power on self test POST that would otherwise require physical proximity to the server s serial port You can also configure ALOM to send email alerts of hardware failures hardware warnings and other events related to the server or to ALOM The ALOM circuitry runs independently of the server using the server s standby power Therefore ALOM firmware and software continue to function when the server operating system goes offline or when the server is powered off Note For comprehensive ALOM information refer to the Sun Fire T1000 Server Advanced Lights Out Manager ALOM guide Faults detected by ALOM POST and the Solaris Predictive Self healing PSH technology are forwarded to the ALOM for fault handling FIGURE 2 4 In the event of a system fault ALOM ensures that the Service required LED is lit FRU ID PROMs are updated the fault is logged and alerts are displayed Pr Service Required LED Environmentals FRU LEDs EG H p ALOM OST gt fault manager F FRUID PROMs PROMs Solaris PSH Si Logs ne Lye alerts FIGURE 2 4 ALOM Fault Managemen
60. sS R SUN microsystems Sun Fire T1000 Server Service Manual Sun Microsystems Inc www sun com Part No 819 3248 10 January 2006 Revision A Submit comments about this document at http www sun com hwdocs feedback Copyright 2006 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 U S A All rights reserved Sun Microsystems Inc has intellectual property rights relating to rd fiat is described in this document In particular and without limitation these intellectual property rights may include one or more of the U S patents listed at http www sun com patents and one or more additional patents or pending patent applications in the U S and in other countries This document and the product to which it pertains are distributed under licenses restricting their use copying distribution and decompilation No part of the product or of this document may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Third party software including font technology is copyrighted and licensed from Sun suppliers Parts of the s produtt may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and in other countries exclusively licensed through X Open Company Ltd Sun Sun Microsystems the Sun logo Answerbook2 docs sun com Java OpenBoot SunSolve Sun VTS Sun Fire and Solaris are tr
61. scribed in Common Procedures for Parts Replacement on page 53 Disconnect the power cable from the motherboard and pull it through the midwall Loosen the fastener FIGURE 3 7 on the front of the power supply and slide the power supply forward to remove it from the chassis Chapter 3 Removing and Replacing FRUs 61 FIGURE 3 7 Removing the Power Supply v To Replace the Power Supply 1 Unpackage the replacement power supply 2 Slide the power supply into the chassis and engage the two alignment pins in the rear of the chassis that mate with the power supply 3 Tighten the fastener FIGURE 3 8 to lock the power supply into place in the chassis 4 Redress the power cable through the midwall in the chassis and connect the cable to the motherboard 5 Perform the procedures described in Common Procedures for Finishing Up on page 72 6 Verify that the amber Fault LED on the replaced power supply and the Service required LED is not lit 7 At the sc gt prompt issue the showenvironment command to verify the status of the power supply 62 Sun Fire T1000 Server Service Manual January 2006 Fastener Power supply FIGURE 3 8 Replacing the Power Supply v To Remove the Hard Drive 1 Perform the procedures described in Common Procedures for Parts Replacement on page 53 2 Disconnect the cable from the hard drive 3 Unsnap the catches on the latches FIGURE 3 9 on the front of the disk
62. ssembly on page 60 4 Power supply To Remove the The power supply provides 3 3 Vde PS0 unit PS Power Supply on standby power at 3 3 Amps and 12 page 61 Vdc at 25 Amps 5 Hard drive To Remove the SATA disk drive 3 5 inch form factor HDO Hard Drive on page 63 6 PCI Express To Remove the Optional add on express card PCIO card slot Optional PCI Express Card on page 58 7 Clock battery To Remove the Battery is located on the motherboard SC BAT Clock Battery on the Motherboard on page 70 8 SEEPROM Remove and The socketed SEEPROM contains the MB SEEPROM replace the socketed SEEPROM MAC address and system configuration information Appendix A Field Replaceable Units FRUs 77 TABLE A 2 Location of DIMMs Connector Number J0501 J0601 J0701 J0810 J1001 J1101 J1201 J1301 Location MB CMP0 CH0 RO D0 MB CMP0 CH0 RO D1 MB CMP0 CH0 R1 D0 MB CMP0 CH0 R1 D1 MB CMP0 CH3 R0 D0 MB CMP0 CH3 R0 D1 MB CMP0 CH3 R1 D0 MB CMP0 CH3 R1 D1 78 Sun Fire T1000 Server Service Manual January 2006
63. stem Administration Guide Advanced Lights Out Management ALOM CMT v1 1 Guide Site planning information for the Sun Fire T1000 server Late breaking information about the server The latest notes are posted at http www sun com documentation Provides an overview of the features of this server Information about where to find documentation to get your system installed and running quickly Detailed rack mounting cabling power on and configuration information How to perform administrative tasks that are specific to the Sun Fire T1000 server How to use the Advanced Lights Out Manager ALOM software on the Sun Fire T1000 server 819 3246 819 3244 819 3247 819 3249 819 3248 819 3250 819 3246 Accessing Sun Documentation You can view print or purchase a broad selection of Sun documentation including localized versions at http www sun com documentation Third Party Web Sites Sun is not responsible for the availability of third party web sites mentioned in this document Sun does not endorse and is not responsible or liable for any content advertising products or other materials that are available on or through such sites x Sun Fire T1000 Server Service Manual January 2006 or resources Sun will not be responsible or liable for any actual or alleged damage or loss caused by or in connection with the use of or reliance on any such content goods or services that are availab
64. stem is running at a minimum level in standby and is ready to be quickly returned to full function The service processor is running e Slow blink Indicates that a normal transitory activity is taking place This could indicate that the system diagnostics are running or that the system is booting The Power On Off button turns the server on and off There is no Power On Off button on the rear panel Ethernet Green These LEDs indicate that there is activity on the associated net s Activity LEDs Sun Fire T1000 Server Service Manual January 2006 TABLE 2 2 Front and Rear Panel LEDs LED Color Description Ethernet Yellow These LEDs indicate that the system is linked to the associated Link LEDs net s System Green This LED indicates that there is activity on the associated system console console Activity LED System Yellow These LEDs indicate that the system is linked to the associated console Link system console LED Provided on the front and rear panel Power Supply LEDs The power supply LEDs TABLE 2 3 are located on the back of the power supply TABLE 2 3 Power Supply LEDs Name Color Fault Amber DC OK Green AC OK Green Description On Power supply has detected a failure Off Normal operation On Normal operation DC output voltage is within normal limits Off Power is off On Normal operation Input power is within normal limits Off No input voltage or input voltage i
65. system can boot using memory that was not disabled Note You can use ASR commands to display and control disabled components See Managing System Components with Automatic System Recovery Commands on page 40 Using the Solaris Predictive Self Healing Feature The Solaris OS predictive self healing technology enables Sun Fire T1000 server to diagnose problems while the Solaris OS is running and mitigate many serious problems before they occur The Solaris OS uses the fault manager daemon fmd 1M which starts at boot time and runs in the background to monitor the system If a component generates an error the daemon handles the error by correlating the error with data from previous errors and other related information to diagnose the problem Once diagnosed the fault manager daemon assigns the problem a unique identifier UUID that distinguishes the problem across any set of systems When possible the fault manager daemon initiates steps to self heal the failed component and take the component offline The daemon also logs the fault to the syslogd daemon and provides a fault notification with a message ID MSGID You can use message ID to get additional information about the problem from Sun s knowledge article database The predictive self healing technology covers the following Sun Fire T1000 server components a UltraSPARC T1 multicore processor Memory a I O bus Chapter 2 Sun Fire T1000 Server Diagnostics
66. systems it is imperative that the System Controller be checked for evidence of a faulty system board to ensure that the appropriate service action is performed Use the fmdump 1M command fmdump vu lt event id gt to view the results of diagnosis and the specific Field Replaceable Unit FRU identified for repair The event id can be found in the EVENT ID field of the message For example EVENT ID 39b30371 009 c76c 90ee b245784d2277 Details The Message ID SUN4U 8000 2S indicates the Solaris Fault Manager has received reports that multiple correctable single bit errors associated with a memory DIMM module have been detected Diagnosis applied to the error reports has determined that a fault requiring repair action is present A service case should be opened and time scheduled to replace the FRU identified in the fmdump 1M output on which the suspect DIMM is located Sun Fire T1000 Server Service Manual January 2006 If Customer Enabled Services apply to the product then refer to the FRU replacement procedures in the appropriate service manual c Follow the suggested actions to repair the fault Collecting Information From Solaris OS Files and Commands With the Solaris OS running on the Sun Fire T1000 server you have the full compliment of Solaris OS files and commands available for collecting information and for troubleshooting If POST ALOM or the Solaris PSH features did not
67. t ALOM sends alerts to all ALOM users that are logged in sending the alert through email to a configured email address and writing the event to the ALOM event log m Fault recovery The system automatically detects that the fault condition is no longer present ALOM extinguishes the Service required LED and updates the FRUs PROM indicating that the fault is no longer present a Fault repair The fault has been repaired by human intervention In most cases ALOM detects the repair and extinguishes the Service required LED In the event that ALOM does not perform these actions you must perform these tasks manually with clearfault or enablecomponent commands ALOM can detect the removal of a FRU in many cases even if the FRU is removed while ALOM is powered off This enables ALOM to know that a fault diagnosed to a specific FRU has been repaired The ALOM clearfault command enables you to Sun Fire T1000 Server Service Manual January 2006 manually clear certain types of faults without a FRU replacement or if ALOM was unable to automatically detect the FRU replacement ALOM does not automatically detect hard drive replacement Persistent environmental faults can automatically recover A temperature that is exceeding a threshold may return to normal limits An unplugged a power supply can be plugged in and so on Recovery of environmental faults is automatically detected Recovery events are reported using one of two forms fru at
68. talled the server chassis in the rack reconnect all cables that you disconnected when you remover the chassis from the rack v To Apply Power to the Server 1 Reconnect the power cord to the power supply Note As soon as the power cord is connected standby power is applied Depending on the configuration of the firmware the system might boot Safety Information on page 43 Chapter 3 Removing and Replacing FRUs 73 74 Sun Fire T1000 Server Service Manual January 2006 APPENDIX A Field Replaceable Units FRUs FIGURE A 1 shows the locations of the field replaceable units FRUs in the Sun Fire T1000 server TABLE A 1 lists the FRUs TABLE A 2 lists the locations of the DIMMs The Channel Rank DIMM locations 75 FIGURE A 1 Field Replaceable Units 76 Sun Fire T1000 Server Service Manual January 2006 TABLE A 1 Sun Fire T1000 Server FRU List Replacement Item No CRU Instructions Description Location 1 Motherboard To Remove the The motherboard and chassis are MB and chassis Motherboard and replaced as a single assembly The assembly Chassis on motherboard is provided in different page 68 configurations to accommodate the different processor models 6 core and 8 core 2 DIMMs To Remove Can be ordered in the following sizes See TABLE A 2 DIMMs on e 512 MB and page 65 e 1GB FIGURE 3 11 e 2GB 3 Fan assembly To Remove the A single assembly containing 4 fans FAN TRAY Fan Tray A
69. tegrity System availability refers to the ability of a system to recover to an operational state after a failure with minimal impact Serviceability relates to the time it takes to restore a system to service following a system failure Together reliability availability and serviceability features provide for near continuous system operation To deliver high levels of reliability availability and serviceability the Sun Fire T1000 server offers the following features Environmental monitoring Error detection and correction for improved data integrity Easy access for most component replacements Extensive POST tests that automatically delete faulty components from the configuration Sun Fire T1000 Server Service Manual January 2006 m PSH automated run time diagnosis capability that takes faulty components off line For more information about using RAS features refer to the Sun Fire T1000 Server System Administration Guide Environmental Monitoring The Sun Fire T1000 server features an environmental monitoring subsystem designed to protect the server and its components against Extreme temperatures Lack of adequate airflow through the system Power supply failure Hardware faults Temperature sensors throughout the system monitor the ambient temperature of the system and internal components The software and hardware ensure that the temperatures within the enclosure do not exceed predetermined safe operating ranges If the
70. the Clock Battery on the Motherboard on page 71 To locate these CRUs refer to Appendix A Field Replaceable Units FRUs on page 75 v To Remove the Optional PCI Express Card Use this procedure to remove the optional low profile PCI Express card from the server 1 Perform the procedures described in Common Procedures for Parts Replacement on page 53 2 Remove any cable s that are attached to the card 58 Sun Fire T1000 Server Service Manual January 2006 3 On the rear of the chassis release the retention latch FIGURE 3 5 that secures the PCI Express card to the chassis PCI Express card Retention latch FIGURE 3 4 Releasing the PCI Express Card Retention Latch 4 Gently work the PCI Express card out of the socket on the PCI Express riser board FIGURE 3 5 and the retention bracket Retention bracket PCI Express card riser board FIGURE 3 5 Removing and Replacing the PCI Express Card Chapter 3 Removing and Replacing FRUs 59 60 Place the PCI Express card on an antistatic mat To Add or Replace the Optional PCI Express Card Use this procedure to replace the PCI Express card Unpackage the replacement PCI Express card and place it on an antistatic mat Note Only low profile PCI E cards with low brackets will fit into the chassis There are a variety of PCI E cards on the market Read the product documentation for your device for additional installation requirements and instructi
71. the System Using SunVTS Software 44 v To Exercise the System Using SunVTS Software 45 iv Sun Fire T2000 Server Service Manual January 2006 For further information refer to the documents that accompany the SunVTS software 49 3 Removing and Replacing FRUs 51 Safety Information 51 Safety Symbols 52 Electrostatic Discharge Safety 52 Use an Antistatic Wrist Strap 53 Use an Antistatic Mat 53 Common Procedures for Parts Replacement 53 Required Tools 53 v Y v v To Shut the System Down 53 To Remove the Server From a Rack 55 To Perform Electrostatic Discharge ESD Prevention Measures 56 To Remove the Top Cover 57 Removing and Replacing CRUs 57 v 4444 A4 A4 A4 id di To Remove the Optional PCI Express Card 58 To Add or Replace the Optional PCI Express Card 60 To Remove the Fan Tray Assembly 60 To Replace the Fan Tray Assembly 61 To Remove the Power Supply 61 To Replace the Power Supply 62 To Remove the Hard Drive 63 To Replace the Hard Drive 64 To Remove DIMMs 65 To Add or Replace DIMMs 66 To Remove the Motherboard and Chassis 68 To Replace the Motherboard and Chassis Assembly 69 To Remove the Clock Battery on the Motherboard 70 Contents v v To Replace the Clock Battery on the Motherboard 71 Common Procedures for Finishing Up 72 v To Replace the Top Cover 72 v To Reinstall the Server Chassis in the Rack 73 v To Apply Power to the Server 73 A Field Replaceable Units FRUs 75
72. tivity Link an i console Fault LED Power OK LED LED Te DB9 Sais cons LED Locator Servic required Ethernet Link Activity ED LED LED ports LED LED FIGURE 2 3 Sun Fire T1000 Server Rear Panel LEDs Chapter 2 Sun Fire T1000 Server Diagnostics 15 16 Front and Rear Panel LEDs Two LEDs and one LED button are located in the upper left corner of the front panel TABLE 2 2 The LEDs are also provided on the rear panel TABLE 2 2 Front and Rear Panel LEDs LED Color Description Locator White Enables you to identify a particular server The LED is controlled LED and using one of the following methods button Issuing the setlocator on or off command e Pressing the button to toggle the indicator on or off This LED provides the following indications Off Normal operating state e Fast blink The server received a signal as a result of one of the preceding methods and is indicating here I am that it is operational Service Yellow If on indicates that service is required The ALOM showfaults required command will indicate any faults causing this indicator to light LED Power OK Green The LED provides the following indications LED and e Off The system is unavailable Either it has no power or Power ALOM is not running On Off Steady on Indicates that the system is powered on and is button running in its normal operating state No service actions are required e Standby blink Indicates the sy
73. us MB BAT V_BAT OK Power Supplies Supply Status Underspeed Overtemp Overvolt Undervolt Overcurrent PSO OK OFF OFF OFF OFF OFF Chapter 2 Sun Fire T1000 Server Diagnostics 23 Note Some information might not be available when the server is in standby mode v To Run the showfru Command Note By default the output of the showfru command for all FRUs is very long The showfru command displays information about the FRUs in the server Use this command to see information about an individual FRU or for all the FRUs Note You do not need user permissions to use this command 24 Sun Fire T1000 Server Service Manual January 2006 Atthe sc gt prompt enter the showfru command sc gt showfru s FRU_PROM at MB SEEPROM SEGMENT SD ManR ManR UNIX_Timestamp32 TUE OCT 18 21 17 55 2005 ManR Description ASSY Sun Fire T1000 Motherboard ManR Manufacture Location Sriracha Chonburi Thailand ManR Sun Part No 5017302 ManR Sun Serial No 002989 ManR Vendor Celestica ManR Initial HW Dash Level 03 ManR Initial HW Rev Level O1 ManR Shortname T1000_MB SpecPartNo 885 0505 04 FRU PROM at PS0 SEEPROM SEGMENT SD ManR ManR UNIX_Timestamp32 SUN JUL 31 19 45 13 2005 ManR Description PSU 300W AC_INPUT A207 ManR Manufacture Location Matamoros Tamps Mexico ManR Sun Part No 3001799 ManR Sun Serial No G00001 ManR Vendor Tyco Electronic
74. wfru Command on page 24 showkeyswitch Displays the status of the virtual keyswitch showlocator Displays the current state of the Locator LED as either on or off showlogs b lines e lines Displays the history of all events logged in the ALOM event buffer g lines v showplat form v Displays information about the host system s hardware configuration and whether the hardware is providing service Note For the ALOM ASR commands see TABLE 2 7 v To Run the showfaults Command The showfaults command displays faults handled by ALOM Use the showfaults command for the following reasons m To see if any faults have been passed to or detected by ALOM a To obtain the fault message ID SUNW MSG ID m To verify that the replacement of a FRU has cleared the fault and not generated any additional faults Chapter 2 Sun Fire T1000 Server Diagnostics 21 At the sc gt prompt type the showfaults command sc gt showfaults v Last POST run WED OCT 20 19 32 24 2004 POST status Passed all devices ID Time FRU Fault 1 OCT 21 14 32 48 MB CMP0 CHO R1 D0 Host detected fault MSGID SUN4U 8000 2S UUID a26d5379 24b8 4a46 bcbf d9elff75albc In this example showfaults is reporting a memory error at DIMM location MB CMP0 CH0 R1 D0 10701 v To Run the showenvironment Command The showenvironment command displays a snapshot of the server s environmental status The information this command c

Download Pdf Manuals

image

Related Search

Related Contents

  Bosch NWC-0900 User's Manual  AE30H Funkempfänger - Alan-Albrecht Service  名 SRH-M226AT  User Manual - jawon medical  Infotainment manual - Cruze EU, v.1 (rev ), fr-FR  Manuel d`installation de PharmTaxe  

Copyright © All rights reserved.
Failed to retrieve file