Home
Ultra Enterprise 2 Cluster Server Service Manual
Contents
1. network network at all Node 0 Node 1 Boot 0 H Boot 1 CD ROM Boot 0 H Boot 1 ICD ROM Enet hmed _ Private ume Enet qe1 qed neh ee qe0 qel SQEC _ SQEC Multihost disks SPARCstorage Arrays 2 FC S FC S System board FC OM FC OM FC OM FC OM_ System board FC OM FC OM FC OM FC OM Serial port A Serial port A Port 3 Port 2 Terminal concentrator Ethernet port 1 The second boot drive and Port 1 the CD ROM are optional devices 2 The UltraSPARC processor speed and the DIMM size should be the same on both Ss eee a nodes Administration terminal or Primary public workstation network 1 2 Figure 1 1 Ultra Enterprise 2 Cluster using SPARCstorage Arrays Functional Block Diagram Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 lll 1 1 1 Minimum Hardware Required for an Ultra Enterprise 2 Cluster Using SPARCstorage Arrays Figure 1 2 shows the minimum hardware required to support the PDB 1 2 or Solstice HA 1 3 software using SPARCstorage Arrays SPARCstorage Arrays Figure 1 2 Ultra Enterprise 2 Cluster Hardware Two Ultra Enterprise 2 Servers each containing e One Ultra SPARC processor modules e 64 Mbyte RAM e 2 1 Gbyte internal
2. t Primary public Primary public a eee network network Node 0 Node 1 Boot 0 CD ROM E SCSI 2 SCSI 2 hme1 Private net 1 hme1 hmec TPE SunSwift SunSwift TPE ffhmeo SCSI 2 In Out SCSI 2 System board Mirrored data System board SPARCstorage MultiPacks SCSI 2 In Out SCSI 2 SunSwift SunSwift hme2 Private net 2 hme2 Serial port A Serial port A Port 2 Terminal ren 2 TIT concentrator empor 1 The second internal drive not Port 1 shown and the CD ROM are optional devices 2 The SPARC processor speed and the DIMM size should be the same on both nodes f _ L Administration terminal or Primary public workstation network Figure 1 3 Ultra Enterprise 2 Cluster using MultiPacks Functional Block Diagram 1 5 Product Description 1 6 1 2 1 Minimum Hardware Required for an Ultra Enterprise 2 Cluster using SPARCstorage MultiPacks Figure 1 4 shows the minimum hardware required to support the HA 1 3 or PDB 1 2 software Two Ultra Enterprise 2 Servers each containing e One SPARC processor module e 64 Mbyte DIMM for HA 128 Mbyte DIMM for PDB e Two SunSwift SBus Adapter cards Two Sun Private Net cables e Two six or twelve disk SPARCstorage MultiPacks e Four SCSI 2 cables Terminal concentrator supports up to three two node clu
3. soc link 4020 soc Unsupported Link Service command soc link 4030 soc Unknown FC 4 command soc link 4040 soc unsupported FC frame R_CTL soc link 4010 soc incomplete continuation entry soc link 3010 soc unknown LS_Command B 3 2 pln Driver Transport error FCP_RSP_CMD_INCOMPLETE Transport error FCP_RSP_CMD_DMA_ERR Transport error FCP_RSP_CMD_TRAN_ERR Transport error FCP_RSP_CMD_RESE Transport error FCP_RSP_CMD_ABORTED An error internal to the SPARCstorage Array controller has occurred during an I O operation This may be due to a hardware failure in a SCSI interface of the SPARCstorage Array controller a failure of the associated SCSI bus drive tray in the SPARCstorage Array package or a faulty disk drive Transport error FCP_RSP_CMD_TIMEOUT The SCSI interface logic on the SPARCstorage Array controller board has timed out on a command issued to a disk drive This may be caused by a faulty drive drive tray or array controller Transport error FCP_RSP_CMD_OVERRUN B 6 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 B This error on an individual I O operation may indicate either a hardware failure of a disk drive in the SPARCstorage Array a failure of the associated drive tray or a fault in the SCSI interface on the SPARCstorage Array controller The system will try to access the failed ha
4. La Power f switch A el e a e e e of gt o pu TZ g Figure 7 15 Terminal Concentrator Rear View Power m In indicator bd STATUS Je POWER UNIT NET ATTN LOAD ACTIVE 1 2 3 4 5 6 7 8 Ly yu Figure 7 16 Terminal Concentrator Front View Shutdown and Restart Procedures 7 15 7 16 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Internal Access 8 Use Table 8 1 and Table 8 1 as a guide to determine the action you must take before you can access the Field Replaceable Unit FRU Note For all power down and power up procedures also refer to the Ultra Enterprise Cluster PDB Administration Guide Table 8 1 Ultra Enterprise 2 Assembly Access FRU Item Replace live Power Down Node Memory No Yes Internal Disk s No Yes CPU No Yes Mother board No Yes Power supply No Yes CD Tape Floppy No Yes SunSwift No Yes SCI SBus adapter No Yes SCI SBus cable Yes No Power cable No Yes 8 1 8 2 Table 8 2 SPARCstorage MultiPack Assembly and Accessories Access FRU Item Replace live Power Down MultiPack Disk drive Yes Power supply No Yes Ethernet cable Yes Power cable No Yes SCSI cable No Yes Terminal Concentrator Yes Serial cable Yes Table 8 3 Terminal Concentrator Access FRU Item Replace live Power Down MultiPack Terminal Concentrator Yes Seri
5. Detached MultiPack ae MultiPack a Figure 7 3 First SCSI Cable Attached to the New Node 9 Connect the SCSI cable from the detached MultiPack to the new node Figure 7 3 10 Power up the detached MultiPack 11 Use the running node to attach the MultiPack Use the vxdiskadm command of the CVM or VxVm to attach the MultiPack Figure 7 4 Powered down Node running node vxdiskadm rt Attached i New node with old 7 MultiPack root disk MultiPack a Figure 7 4 First MultiPack Attached 12 Use the running node to detach the next MultiPack Figure 7 5 7 4 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 N lll Powered down S Node running node vxdiskadm oe or a Figure 7 5 Second MultiPack Detached MultiPack Detached MultiPack New node with old root disk 13 Power off the detached MultiPack 14 Physically disconnect the SCSI cable that goes from the detached MultiPack to the powered down node at the powered down node Figure 7 6 Powered down N Running node node Pa
6. MultiPack Detached MultiPack da with old root disk oe we Figure 7 6 Second SCSI Cable Attached to the New Node 15 Connect the SCSI cable from the detached MultiPack to the new node Figure 7 6 16 Power up the detached MultiPack 17 Use the running node to attach the MultiPack Figure 7 7 Shutdown and Restart Procedures 7 5 lll N Powered down Node running node N P vxdiskadm Attached MultiPack New node with old MultiPack TON Pa root disk Figure 7 7 Second MultiPack Attached 18 Connect the private net cables Figure 7 8 Private net Running node a g MultiPack New node with old MultiPack T Pa root disk Figure 7 8 New Node in the Cluster 19 Have the system administrator rejoin the node to the cluster 7 1 3 Server Startup 1 Begin with a safety inspection a Ensure that the AC power switch on the rear of the server is off Figure 7 1 b Verify the power cord is connected to the correct facilities power outlet 2 Turn the AC power switch to ON 1 You will hear the fans begin to turn and the green LED on the front of the server will light 7 6 Ultra Enterprise 2 Cluster
7. Figure 3 1 Errors on Both Nodes on Same SPARCstorage Array To isolate the probable failure to a SPARCstorage Array controller board 1 Check the AC and DC lights on the disk array power supply Refer to the SPARCstorage Array Model 100 Series Service Manual If the light display is normal proceed to step 2 Otherwise check the AC power or the power supply 2 Have the system administrator prepare the cluster for replacement of a controller in a SPARCstorage Array 3 Shut down the SPARCstorage Array as described in Section 7 3 1 Complete Disk Array Shutdown 4 Replace the controller board as described in the SPARCstorage Array Model 100 Series Service Manual Hardware Troubleshooting 3 3 3 4 5 Bring up the array tray as described in Section 7 3 4 Single Drive and Tray Startup 6 Have the system administrator return the node to the cluster 3 2 2 Multiple Disk Errors or Disk Access Error For One Node Only SPARCstorage Array Errors Node 0 Node 1 Figure 3 2 Multiple Disk Errors on One Node Note You can remove and replace a disk drive without powering off the disk array You only need to pull out the tray in which the drive is located To isolate a failed disk or the path to the disk for example an optical cable or a Fibre Channel Optical Module on the node or on the SPARCstorage Array 1 Have the system administrator prepare the node for disk replacement 2
8. 10 1 Serial Port RJ 45 Receptacle 0 6 0c c eee eee eee A 2 15 pin Ethernet Receptacle 0 66 e cece eee A 3 Twisted Pair Ethernet RJ 45 Receptacle 4 A 4 Ultra Enterprise 2 Cluster Service Manual November 1996 Tables Table 2 1 HA Device to Troubleshooting Cross Reference 2 8 Table 2 2 HA Error Messages and Symptoms 00005 2 9 Table 2 3 HA Device Replacement Cross Reference 2 11 Table 2 4 Graphical User Interfaces 0 000 c cece eee 2 12 Table 2 5 PDB Device Troubleshooting Cross Reference 2 18 Table 2 6 PDB Device Replacement Cross Reference 2 19 Table 3 1 POST COGS icine atthe ceded aoka Gare ee eTe 3 6 Table 6 1 Safety Precautions eces poveri c cence ee 6 2 Table 7 1 Shutdown Procedure Summary 0 6 666 e cece eee 7 7 Table 8 1 Ultra Enterprise 2 Assembly Access 60000000 ee 8 1 Table 8 2 SPARCstorage MultiPack Assembly and Accessories Access 8 2 Table 8 3 Terminal Concentrator Access 0066 e cece 8 2 Table 8 4 List of Service Manuals erie virsiraniceeni mesio eee 8 2 Table 10 1 Replaceable Parts List and Documentation Cross Reference 10 2 Table A 1 Serial Port Pinout and Signals 6 0 nena A 2 Table A 2 Ethernet Port Pinout and Signals 00005 A 3 xiii xiv Table A 3 Private Ethernet Port Pinout and Signals Ultra Enter
9. 7 14 at OY ER SY A ij X s Y Va AC power switch AC plug B O oO Figure 7 14 SPARCstorage MultiPack AC Power Switch and AC Plug 7 4 3 Complete MultiPack Startup AN Warning Never move a SPARCstorage MultiPack when the power is on Failure to heed this warning can result in catastrophic disk drive failure Always power the system off before moving the array 1 Begin with a safety inspection a Ensure that the SPARCstorage MultiPack AC power switch is off Figure 7 14 b Verify that the power cord is connected to the chassis and a wall socket 2 Turn on the AC power switch on the chassis rear You should hear the fans begin turning 3 Watch the front panel LEDs When powering on the LEDs light to indicate which drive bays have drives installed Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 i It may take a few minutes for a SPARCstorage MultiPack t become ready depending on the total number of disk drives 7 5 Terminal Concentrator To power the terminal concentrator on or off use the power switch on the back panel as depicted in Figure 7 15 The power indicator on the front panel is lit when the power is on Figure 7 16
10. lll HS e VxVm 4 3 Software Faults 4 3 1 Operating System Failures To determine the severity and content of operating system related error messages refer to the Solaris documentation that came with your system The following message is a sample message node0 Unix Link down cable problem 4 3 2 Solstice HA 1 3 For a listing of error messages related to the Solstice HA software refer to Appendix A of the Solstice HA 1 3 User s Guide 4 3 3 PDB Failures For an explanation of the error messages related to the PDB software refer to Chapter 4 of the Ultra Enterprise Cluster PDB Administration Guide and the Ultra Enterprise Cluster PDB Error Messages Manual 4 3 4 SPARCstorage Array Failures For a listing of error messages specific to SPARCstorage Array firmware and device drivers see Appendix B Firmware and Device Driver Error Messages and the Ultra Enterprise PDB Error Messages Manual 4 3 5 SPARCstorage MultiPack Failures Error messages are displayed on the system console 4 2 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 HS lll 4 3 6 NFS or Other Data Service Failures To determine the severity and content of NFS related error messages refer to the Solaris documentation that came with your system for example the NFS Administration Guide For information on other data services refer to the applicable administration guide Software Troubleshooting 4 3
11. 2 4 HA PDB Differences 0 0 eee eee eee eee 2 4 iii 2 5 Troubleshooting Flow in an HA Cluster 2 4 2 5 1 HA Node lAkeOVver sos skcivtas tics scekaew awe 2 4 2 5 2 HA Node Switchover soi 2444444 eede ee siees 2 6 2 5 3 HA Failures Without Takeover 2 6 2 5 4 HA Fault Classes and Principal Assemblies 2 6 2 5 5 HA Device Troubleshooting Cross Reference 2 8 2 5 6 HA Error Messages Symptoms 2 9 2 5 7 HA Device Replacement Cross Reference 2 11 2 6 PDB Cl stet GUIs shiek deer cenceeiunaiwines eee eeee 2 12 2 7 Troubleshooting Flow in a PDB Cluster 2 12 2 7 1 PDB Fault Classes and Principal Assemblies 2 16 2 7 2 PDB Device Troubleshooting Cross Reference 2 18 2 7 3 PDB Error Messages Symptoms 2 18 2 7 4 PDB Device Replacement Cross Reference 2 19 3 Hardware Troubleshooting 0 06 cece eee eens 3 1 3 1 Solaris Reconfiguration Reboot oi cey uy eceacs own 3 2 3 2 SPARCstorage Array and Optical Connections Faults 3 3 3 2 1 Multidisk Errors from Both Nodes on the Same SPARCstorage Array 3 io vss se euducerediuncinedes 3 3 3 2 2 Multiple Disk Errors or Disk Access Error For One Node ONY ot ctu Cnet cere ete eee ews 3 4 3 2 3 SPARCstorage Array Fails to Communicate 3 5 3 3 MultiPack and SCSI Connection Faults 3 12 3 3 1 Multidisk Errors from Both Nodes on the Same SPARCstora
12. or disk drive Array and Optical Connections Faults Terminal Concentrator Terminal Section 3 6 Terminal concentrator Concentrator and Serial Connection Faults Model 100 Series Service Manual 2 10 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 No lll 2 5 7 HA Device Replacement Cross Reference Table 2 3 lists the devices and corresponding documents that contain the applicable replacement procedures Table 2 3 HA Device Replacement Cross Reference Document Device Reference Part No Ultra 2 Server Ultra 2 Series Service Manual Chapter 8 802 2561 Power supply Major Subassemblies Boot disk Chapter 9 Storage Devices System board System Board and Component Replacement SBus card DSIMM CPU module Optical Module Fibre Channel Optical Module Installation Guide 801 6326 FC S SBus card Fibre Channel SBus Adapter card Installation Guide 801 6313 SPARCstorage Array SPARCstorage Array Model 100 Series Service 802 2206 Controller Manual Chapter 5 Major Subassemblies and Disk drives the Disk Drive Installation Manual for the Power supply SPARCstorage Array 801 2207 SPARCstorage SPARCstorage Multipack Service Manual Chapter 801 4430 MultiPack 3 Parts Replacement disk drives SunSwift SBus SunSwift SBus Adapter Installation User s Guide 802 6021 Adapter card SCI SCI SBus Adapter User s Guide 802 7103 Troubleshooting Overview 2 11 2 2 6 PDB
13. Hardware Troubleshooting 3 35 lll Qo This terminal concentrator is now ready for telnet 1M use Confirm that you are able to establish a connection to this terminal concentrator You may also want to set the superuser password and other site specific configuration settings If desired you may disconnect the serial cable and store it for future use 3 36 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Software Troubleshooting 4 Much of the fault management is performed by the Solstice HA or the PDB cluster software While the underlying hardware architecture ensures that there is no single point of hardware failure and there are redundant paths to all components the software detects isolates and recovers from failures 4 1 Troubleshooting Solstice HA 1 3 Software Most software problems are manifested as messages on the system console which displays messages from the following sources Solaris operating environment Solstice HA cluster software SPARCstorage Array firmware and device driver NFS Sun s distributed computing file system and other data services Solstice DiskSuite 4 2 Troubleshooting PDB Software Most software problems are manifested as messages on the Cluster Monitor which displays messages from the following sources Solaris operating environment PDB cluster software SPARCstorage MultiPack device driver SPARCstorage Array firmware and device driver Cluster Volume Manager 4 1
14. Qo lll Caution If you replace the array controller the system administrator must reprogram the new controller with the original World Wide Name WWN If this number is incorrect the Solstice DiskSuite software will not recognize the new controller and the disk array cannot be rejoined to the cluster For WWN reprogramming procedures refer to the Solstice HA 1 3 User s Guide or the Ultra Enterprise PDB Cluster Administration Guide as applicable 6 Log on as superuser and shut down the processor for the node Verify that the system returns to the ok prompt after the shutdown is complete If the system returns to the gt prompt after the shutdown enter n to display the ok prompt 7 Enter the following commands at the ok prompt ok true to diag switch ok true to fcode debug ok reset 8 Immediately press Cont rol to get the telnet prompt and then enter the following telnet gt send break After the ok prompt is displayed enter the following ok show devs SBus slot 2 of the system board has an SQEC and SBus slots 0 1 and 3 have an FC S You should see output similar to the following output sbus 1f 0 SUNW soc 1 0 sbus 1f 0 SUNW soc 0 0 Hardware Troubleshooting 3 7 3 8 9 Locate the lines in the output that list the information on the FC S cards installed in the node You can find the lines by looking for soc x x in the output The first x
15. See Ultra Enterprise 2 Cluster Hardware Planning and Installation Manual Chapter 5 Hardware Installation for cabling details 2m SCI cable 530 2360 01 5m SCI cable 530 2361 01 10m SCI cable 530 2362 01 Private net cables 1 meter Ethernet 530 2149 5 meter Ethernet 530 2150 2 SPARCstorage Array SPARCstorage Array Model 100 Series 802 2206 Service Manual SPARCstorage Array Model 200 Series 802 2028 Service Manual See Ultra Enterprise 2 Cluster Hardware Planning and 802 6313 Installation Manual Chapter 5 Hardware Installation for cabling details Disk drive 801 2207 Fiber optic cables See Ultra Enterprise 2 Cluster Hardware Planning and 801 6313 2 meter cable Installation Manual Chapter 5 Hardware Installation for cabling details 537 1004 10 2 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 1 0 E Table 10 1 Replaceable Parts List and Documentation Cross Reference Continued Document Part Key Description Part Number Reference Number 15 meter cable 537 1006 System administration Service manual provided with equipment workstation or terminal Serial port 1 to terminal 530 2151 or 530 2152 concentrator cable 3 Terminal concentrator 370 1434 802 6314 Terminal concentrator See Ultra Enterprise 2 Cluster Hardware Planning and 802 6313 cabling Installation Manual Chapter 5 Hardware Installation for cabling details 2 meter serial cable 530 2152 5 meter serial cable
16. 4 4 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Diagnostics 5 5 1 Failure Diagnosis and Confirmation of Component Repair Using SunVTS Before the PDB software is installed use the SunVTS diagnostic for initial hardware configuration confirmation and component diagnosis during server hardware installation SunVTS is packaged with the Solaris operating system For instructions on installing and using SunVTS refer to the SunVTS 2 0 User s Guide 5 2 Verify HA 1 3 Configuration Using the hacheck 1m Command The Solstice HA 1 3 hacheck command verifies system configurations For more information regarding this command refer to the Solstice HA 1 3 User s Guide for information concerning the error messages associated with the hacheck command refer to Appendix A of the same manual 5 3 Verify PDB Configuration Use the Cluster Monitor Front Panel for a graphic representation of the cluster see Figure 2 4 on page 2 15 Use the pdbconf script to verify the cluster the private network interface and the quorum device For additional information refer to the Ultra Enterprise Cluster PDB Software Planning and Installation Guide 5 1 5 2 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Safety and Tools Requirements 6 6 1 Safety Precautions For your protection observe the following safety precautions while repairing your equipment Follow all cautions warnings and instr
17. Note Do not use the probe scsi command in a PDB system as this can cause the system to hang at the boot prom monitor 1 Have the system administrator remove the node from the cluster and halt it After the system halts several system messages are displayed When the messages finish the ok prompt is displayed Hardware Troubleshooting 3 15 lll Qo ok probe scsi all This command may hang the system if a Stop A or halt command has been executed Please type reset all to reset the system before executing this command Do you wish to continue y n y sbus 1f 0 SUNW fas 2 8800000 Target 2 Unit 0 Disk SEAGATE ST32550W SUN2 1G041600000000 Copyright c 1995 Seagate All rights reserved ASA2 Target 3 Unit 0 Disk SEAGATE ST32550W SUN2 1G041600000000 Copyright c 1995 Seagate All rights reserved ASA2 Target 4 Unit 0 Disk SEAGATE ST32550W SUN2 1G041600000000 Copyright c 1995 Seagate All rights reserved ASA2 Target 5 Unit 0 Disk SEAGATE ST32550W SUN2 1G041600000000 Copyright c 1995 Seagate All rights reserved ASA2 Target 8 Unit 0 Disk SEAGATE ST32550W SUN2 1G041600000000 Copyright c 1995 Seagate All rights reserved ASA2 Target 9 Unit 0 Disk SEAGATE ST32550W SUN2 1G041600000000 Copyright c 1995 Seagate All rights reserved ASA2 sbus 1f 0 SUNW fas 0 8800000 Target 2 Unit 0 Disk SEAGATE ST32550W SUN2 1G0416000
18. PDB differences 2 4 hacheck command 5 1 I internal access reference guide 8 1 L List of Service Manuals 8 2 loopback connector 6 4 M maintenance authorization 2 4 manual switchover HA 2 6 N network failures 3 17 primary A 3 network failure private 3 17 public 3 21 node takeover HA 2 4 O on line serviceability PDB 1 1 Solstice HA 1 1 optional hardware Ultra Enterprise 2 Cluster 1 4 1 7 P parts list 10 2 PDB Cluster Console 2 12 Cluster Control Panel 2 12 Cluster Monitor 2 12 configuration verify 5 1 differences 2 4 graphical user interfaces 2 12 on line serviceability 1 1 pdbconf script 5 1 ping command 3 27 pinout 10Base5 A 3 RJ 45 A 2 terminal concentrator A 1 port terminal concentrator 2 2 misconfigured 2 2 power off server 7 2 SPARCstorage Array 7 10 SPARCstorage MultiPack 7 14 terminal concentrator 7 15 power on server 7 6 SPARCstorage Array 7 11 SPARCstorage MultiPack 7 14 terminal concentrator 7 15 precautions safety 6 1 system safety 6 3 primary network connection A 3 probe scsi command 3 16 R reboot 3 2 Solaris reconfiguration 3 2 replacing disk drives SPARCstorage MultiPack 9 2 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 major subassemblies 9 2 9 3 terminal concentrator 9 3 trays and disk drives SPARCstorage Array 9 2 required hardware Ultra Enterprise 2 Cluster 1 3 1 6 required to
19. Setup Instructions 802 5933 Sun Ultra 2 Series Installation Guide 802 5934 Sun Ultra 2 Series Service Manual 802 2561 SPARCstorage Array 100 SPARCstorage Array 100 Installation and Service Binder Set 825 2513 SPARCstorage Array Model 100 Series Installation Manual 801 2205 SPARCstorage Array Model 100 Series Service Manual 801 2206 SPARCstorage Array Regulatory Compliance Manual 801 7103 SPARCstorage Array 100 User s Guide Binder Set 825 2514 SPARCstorage Array Configuration Guide 802 2041 SPARCstorage Array User s Guide 802 2042 SPARCstorage Array Product Note 802 2043 xviii Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Table P 2 List of Related Documentation Continued Product Family Title Part Number SPARCstorage Array 200 SPARCstorage MultiPack Ultra Enterprise 2 Cluster HA Ultra Enterprise 2 Cluster PDB SPARCstorage Array 200 Manuals SPARCstorage Array Model 200 Series Installation Manual SPARCstorage Array Model 100 Series Service Manual SPARCstorage Array Battery and Prom Installation Note SPARCstorage Array Regulatory Compliance Manual SPARCstorage MultiPack Installation Guide SPARCstorage MultiPack User s Guide SPARCstorage MultiPack Installation Supplement SPARCstorage MultiPack Service Manual Ultra Enterprise 2 Cluster HA Document Binder Set Getting Started roadmap Solstice HA 1 3 User s Guide Ultra Enterprise 2 Cluster Hardware Planning and Installa
20. Solstice High Availability HA 1 3 software and the Parallel Database PDB 1 2 software Two different basic cluster configurations are available One configuration uses SPARCstorage Arrays for multihost data storage The other configuration uses SPARCstorage MultiPacks Both configurations support the HA and PDB software packages and use the internal onboard hard disk as the boot device which can be mirrored if a second drive is provided Other minor differences exist between the HA and PDB configurations These differences are in the network interconnects both public and private Both HA and PDB software provide online serviceability Online serviceability enables system administrators to take one node of the cluster off line for repair or routine maintenance while the data services remain available from the other node 1 1 Ultra Enterprise 2 Cluster Using SPARCstorage Arrays The Ultra Enterprise 2 Cluster is implemented on the Ultra Enterprise 2 Server platform using either two SPARCstorage Array Model 100 Series disk arrays Two identical compute nodes and a shared set of disk arrays comprise a cluster Figure 1 1 is a functional block diagram of the Ultra Enterprise 2 Cluster using SPARCstorage Arrays This diagram shows the HA configuration using SQECs and onboard Ethernet connectors for the Private Nets 1 1 lll Primary public Primary public
21. in soc x x tells you the SBus slot in which the FC S card is installed For example looking at the preceding output the first line sbus f SUNW soc 1 0 tells you that an FC S card is installed in SBus slot 1 10 Locate the FC S card that is connected to the SPARCstorage Array that is not communicating with the node 11 Determine what the SBus slot number is for that FC S card For more information on SBus slot numbers for your system refer to the Ultra 2 Series Service Manual If you can find an entry in the show devs output for the FC S card installed in that SBus slot go to Step 12 e If you cannot find an entry in the show devs output for the FC S card installed in that SBus slot replace the FC S card in that SBus slot according to the instructions given in the Ultra 2 Series Service Manual Following replacement of the FC S card have the system administrator return the node to the cluster 12 Enter the following at the ok prompt ok path select dev where path is the entire path given in the line containing the soc x x output The path must be preceded by a double open quote and a space Thus using the previous output as an example you would enter ok sbus f SUNW soc 1 0 select dev Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll Note From this point on if you enter a command incorrectly and you get the error
22. interfaces e Power supply e Ultra Enterprise 2 Server faults e Power supply Boot disk drive and SCSI cable UltraSPARC CPU modules e DIMMs Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 No lll SBus cards SunSwift SBus Adapter cards System board Fibre Channel Optical Modules FC OM Fibre Channel SBus cards FC S SBus Quad Ethernet Controller card interface SQEC Public Net SBus card SCI SBus Adapter card e Cluster Faults e Terminal concentrator serial connections e Private net connections Software faults e Application program died e System crash panic e Hung system lock up e Cluster wide failures All troubleshooting begins at the system console Cluster Monitor or with other operator information The system console or Cluster Monitor must be checked regularly by the system administrator Troubleshooting Overview 2 17 lll No 2 7 2 PDB Device Troubleshooting Cross Reference Table 2 5 cross references devices to the appropriate troubleshooting manual Table 2 5 PDB Device Troubleshooting Cross Reference Device Trouble Part Area Reference Number SPARCstorage SPARCstorage MultiPack Service Manual 802 4430 MultiPack Chapter 2 Diagnostics for Troubleshooting SPARCstorage SPARCstorage Array Model 100 Series Service 802 2206 Array Manual Chapter 2 Troubleshooting Controller SPARCstorage Array Model 200 Series Service 802 2028 Fiber optic Manual
23. node 0 is most likely defective as this message indicates that the private net 1 cable is functional Replace the system board in node 0 and have the system administrator return node 0 to the cluster If the message string indicated in step 5 is not returned then the private net 1 cable is probably defective Note In an HA cluster check the green LEDs labeled 0 1 2 and 3 on the SQEC cards in both nodes to verify that private net 2 has not failed The 0 LED on both SQEC cards private net 2 should be lighted 3 5 2 Public Network Failure Messages on the system console will identify the specific port that has failed Otherwise for information on test commands as well as additional troubleshooting refer to the documentation that came with your public network interface Hardware Troubleshooting 3 21 3 3 6 Terminal Concentrator and Serial Connection Faults 3 22 Note It is not necessary to stop or remove either node from a cluster to replace the terminal concentrator Isolate terminal concentrator faults using the diagrams depicted in Section 3 6 4 Terminal Concentrator Flow Diagrams as well as the information contained in Section 3 6 5 Additional Troubleshooting Tips STATUS POWER UNIT NET ATTN LOAD ACTIVE de 20 ad 4 be 36 8 a en at SE FE D an AIE EN Ea E a e EE Gy SET E
24. of link errors has been exhausted ransport error CMD_DATA_OVR Transport error Unknown CQ type Transport error Bad SEG CNT Transport error Fibre Channel Invalid X_ID Transport error Fibre Channel Exchange Busy Transport error Insufficient CQEs Transport error ALLOC FAIL Transport error Fibre Channel Invalid S_ID Transport error Fibre Channel Seq Init Error Transport error Unknown FC Status These errors indicate the driver or host adapter microcode has detected a condition from which it cannot recover The associated I O operation will fail This message should be followed or preceded by other error messages refer to these other error messages to determine what action you should take to fix the problem Timeout recovery failed resetting This message may be displayed by the pln driver if the normal I O timeout error recovery procedures were unsuccessful In this case the software will perform a hardware reset of the host adapter and attempt to continue system operation reset recovery failed This message will be printed only if the hardware reset error recovery has failed following the failure of normal fibre channel link error recovery The associated SPARCstorage Array s will be inaccessible by the system This situation should only occur due to failed host adapter hardware Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 w lll B 4 I
25. piln_ctlr_attach controller struct pln_ctlr_attach scsi_device alloc pin_ctlr_attach pln_address alloc pin_ctlr_attach controller struct pln_ctlr_attach scsi_device alloc pin_ctlr_attach pln_address alloc alloc failed failed failed alloc failed failed failed The pln driver was unable to obtain enough kernel memory space for some of its internal structures if one of these messages is displayed The SPARCstorage Array s associated with these messages will not be functional pin_init mod_install failed error sd Module installation of the pln driver failed None of the SPARCstorage Arrays connected to the machine will be operable B 3 Hardware Errors Errors under this classification are generally due to hardware failures transient or permanent or improper configuration of some subsystem components B 4 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 w lll B 3 1 soc driver soc wwn 3010 soc No SSA World Wide Name using defaults The associated SPARCstorage Array has an invalid World Wide Name WWN A default World Wide Name is being assumed by the software The system will still function with a default World Wide Name if only one SSA gives this message they all would be using the same default WWN A valid World Wide Name should be programmed into the SPARCstorage Array refer to the ssaadm 1m man pages and the Solstice HA 1 2 Administration Guide or
26. prom monitor 3 3 1 Multidisk Errors from Both Nodes on the Same SPARCstorage MultiPack SPARCstorage MultiPack Errors Errors Node 0 Node 1 Figure 3 4 Errors on Both Nodes on Same SPARCstorage MultiPack To isolate the probable failure to a SPARCstorage MultiPack 1 Check the power on LED on the MultiPack Refer to the SPARCstorage MultiPack Service Manual If the LED display is normal proceed to step 2 Otherwise check the AC power or the power supply Check that the front panel LEDs are lit Check that the light LEDs match the corresponding installed drive in the MultiPack Check the SCSI ID switch If the MultiPack has six drives installed check that the ID switch is fully in either the 1 6 SCSI target address position or the 9 14 SCSI target address position Refer to Appendix B SCSI Bus Information in the SPARCstorage MultiPack Service Manual Check the SCSI cables to the MultiPack Check that both ends of the SCSI cables are connected Hardware Troubleshooting 3 13 3 3 2 Multiple Disk Errors or Disk Access Error For One Node Only SPARCstorage MultiPack Errors Node 0 Node 1 Figure 3 5 Multiple Disk Errors on One Node Only To replace a SCSI controller on the node 1 Have the system administrator prepare the node for SCSI controller replacement See Section 7 1 2 Server Shutdown with SPARCstorage MultiPacks and a Spare Ultra Enterprise 2 Ser
27. the PDB 1 2 System Administration Guide for more information soc wwn 3020 soc Could not get port world wide name If there is a failure on the SPARCstorage Array and the driver software is unable to obtain the devices WWN this message is displayed soc wwn 5020 soc INCORRECT WWN Found Expected This message is usually the result of plugging the wrong fibre channel cable into a host adapter It indicates that the World Wide Name of the device connected to the host adapter does not match the World Wide Name of the device connected when the system was booted soc driver 3010 soc host adapter fw date code lt not available gt This may appear if no date code is present in the host adapter microcode This situation should not occur under normal circumstances and possibly indicates the use of invalid SPARCstorage Array drivers or a failed host adapter For reference the expected message is soc driver 1010 soc host adapter fw date code This is printed at boot time to indicate the revision of the microcode loaded into the host adapter Firmware and Device Driver Error Messages B 5 lll s soc link 4060 soc invalid FC packet The soc driver has detected some invalid fields in a packet received from the host adapter The cause of this is most likely incorrectly functioning hardware either the host adapter itself or some other SBus hardware
28. two possibilities exist e The port is busy being used by someone else e The port is not accepting network connections because the terminal concentrator settings are incorrect Refer to the Ultra Enterprise 2 Cluster Hardware Planning and Installation Guide Section 6 4 Resetting the Terminal Concentrator Configuration Parameters To isolate and correct the problem telnet to the terminal concentrator and specify the port interactively telnet tc_lm Trying ip_address Connected to tc_l1m Escape character is You may have to press Return to display the following prompts Rotaries Defined oli Enter Annex port name or number 2 Port s busy do you wish to wait y n yl If you see the preceding message the port is in use You can use the cli who command to determine which node has the port If you see the following message the port is misconfigured Port 2 Error Permission denied Rotaries Defined ecli Enter Annex port name or number To correct the problem 1 Select the command line interpreter and log on as superuser Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 2 2 In terminal concentrator administrative mode set the port to slave mode as follows Enter Annex port name or number cli Annex command line Interpreter Copyright 1991 Xylogics Inc annex su password annex admin Annex administration
29. 00000 Copyright c 1995 Seagate All rights reserved ASA2 Target 3 Unit 0 Disk SEAGATE ST32550W SUN2 1G041600000000 Copyright c 1995 Seagate All rights reserved ASA2 3 16 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 3 2 At the ok prompt enter the appropriate command to probe the system for SCSI 2 devices To probe all SCSI 2 devices installed in the system type ok probe scsi all The preceding command displays a list of drives The example shown below is for a Ultra Enterprise 2 Cluster 3 Verify that the drive in question is listed The Target lines identify the SCSI 2 addresses of installed devices If the address is listed for the device in question installation was successful If the address is absent run the appropriate diagnostics to identify the problem 4 Reboot the system using the command ok reset The screen goes blank for several seconds as the system reboots 5 Have the system administrator return the node to the cluster 3 5 Network Failures 3 5 1 Private Network Failure Caution Problems on the private networks may be due to temporary communication conditions A fix on the private network must be verified with before and after traffic condition measurements to determine that comparable traffic has been supported Do not consider a problem resolved without running netstat before and after you replace a cable and savi
30. 530 2151 4 SPARCstorage SPARCstorage MultiPack Service Manual 802 4430 MultiPack SCSI_2 cable 530 1804 or 530 1805 Illustrated Parts Breakdown 10 3 10 10 4 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Connector Pinouts and Cabling A A 1 SPARCstorage Array Fiber Optic Cables Refer to Chapter 6 of the Ultra Enterprise 2 Cluster Hardware Planning and Installation Manual for information on connecting SPARCstorage Arrays to a node using the fiber optic cables A 2 Terminal Concentrator Ports Refer to the Chapter 6 of the Ultra Enterprise 2 Cluster Hardware Planning and Installation Manual to connect serial ports on the terminal concentrator to the system console and the serial ports on your system nodes A 2 A 2 1 RJ 45 Serial Port Connectors Port 1 of the terminal concentrator is designated as the terminal concentrator console port Ports 2 and 3 are designated for nodes 0 and 1 respectively The connector configuration is shown in Figure A 1 and the pin allocations are given in Table A 1 Figure A 1 Serial Port RJ 45 Receptacle Table A 1 Serial Port Pinout and Signals Signals ports 1 6 Signals ports 7 8 Pin Number partial modem full modem 1 No connection RTS 2 DTR DTR 3 TXD TXD 4 No connection CD 5 RXD RXD 6 GND GND 7 No connection DSR 8 CTS CTS Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 D gt
31. 6 PDB cluster 2 16 flow HA cluster 2 4 PDB cluster 2 12 hardware 3 1 MultiPack SCSI connections 3 12 network failures 3 17 node failures boot disks 3 14 Index 3 control board 3 14 serial connections 3 22 SPARCstorage Array 3 5 controller board 3 3 disk data path 3 4 optical connections 3 3 terminal concentrator 3 22 flow diagrams 3 25 list of symptoms HA 2 9 PDB 2 18 maintenance authorization 2 4 overview 2 1 principal assemblies HA cluster 2 6 PDB cluster 2 16 remote site 2 1 software 4 1 HA 4 1 NFS or other data service 4 3 PDB 4 1 SPARCstorage Array 4 2 SPARCstorage MultiPack 4 2 terminal concentrator 2 1 U Ultra 2 Enterprise Cluster illustrated parts breakdown 10 1 parts list 10 2 Ultra Enterprise 2 Cluster block diagram 1 2 1 5 optional hardware 1 4 required hardware 1 3 1 6 Ultra Enterprise 2 Cluster optional hardware 1 7 Ultra Enterprise 2 Server system shutdown 7 2 system startup 7 6 Index 4 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997
32. A Cluster Hardware 55 1 3 Takeover Troubleshooting Flow Diagram 2 6 Both Nodes Have Errors on Same SPARCstorage Array 3 3 Multiple Disk Errors One Node Only 54 3 4 EED UDis plays cacti nt ieee ete cn cane pean eee ees 3 6 Private Net 1 Pailure ssri oda ti tied snoke iwed been CaS 3 16 Private Net 1 Troubleshooting Part 1 4 3 17 Private Net 1 Troubleshooting Part 2 66 3 18 Indicator Locations ssia siipeen easpa eea aia eee 3 19 Troubleshooting Flow Diagram Overview 3 22 Branch A Telnet to Terminal Concentrator Does Not Succeed 0 0 ccc cece cece eens 3 23 Branch A1 Terminal Concentrator Does Not Respond to Ping Command 02 ccc cece eee eee nee 3 24 Branch B Terminal Concentrator Cannot Connect toa Node 3 25 Branch B 1 Single Node Not Responding 3 26 ix Figure 7 1 Figure 7 2 Figure 7 3 Figure 7 4 Figure 7 5 Figure 9 1 Figure 10 1 Figure A 1 Figure A 2 Figure A 3 Server AC Power Switch 0 00 0 7 2 SPARCstorage Array AC Power Switch and AC Plug 7 4 LCD Display While Powering On the System 7 5 Terminal Concentrator Rear View 00 eee eee 7 6 Terminal Concentrator Front View 000 0 7 7 Terminal Concentrator Connector and Power Switch LOCATON oe 52eo E EE 9 3 Ultra Enterprise 2 Cluster Server Main Components
33. ARCstorage Array Model 100 Series Service 802 2206 Controller Manual Chapter 5 Major Subassemblies Disk drives and the Disk Drive Installation Manual for the Power supply SPARCstorage Array 801 2207 SPARCstorage SPARCstorage Multipack Service Manual 801 4430 MultiPack Chapter 3 Parts Replacement disk drives Troubleshooting Overview 2 19 lll No 2 20 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Hardware Troubleshooting 3 Prior to servicing components within a node that is joined in a cluster the system administrator must perform certain tasks that are necessary in a high availability system refer to the Solstice HA 1 3 User s Guide or the Ultra Enterprise Cluster PDB Administration Guide The procedures in this chapter indicate when the system administrator s assistance is needed Before you attempt a reconfiguration reboot after hardware component replacement read Section 3 1 Solaris Reconfiguration Reboot The following table lists the locations of the procedures If you are viewing this in the AnswerBook online documentation viewing system place your cursor on the desired procedure or location and double click the SELECT button on your mouse to go directly to the task Solaris Reconfiguration Reboot page 3 2 SPARCstorage Array and Optical Connections Faults page 3 3 Multidisk Errors from Both Nodes on the Same SPARCstorage Array page 3 3 Multiple
34. Cluster GUIs Three Graphical User Interfaces GUIs enable the system administrator to facilitate troubleshooting the Cluster Control Panel ccp the Cluster Console cconsole and the Cluster Monitor clustmon See Table 2 4 for a brief description of each GUI refer to the Ultra Enterprise Cluster PDB Administration Guide for more detailed information Table 2 4 Graphical User Interfaces GUI Description Cluster Control Enables launching of the Cluster Console cconsole Panel ctelnet or crlogin the Cluster Monitor clustmon and other administrative tools Cluster Console Enables execution of commands on multiple nodes simultaneously Cluster Monitor Enables monitoring the current status of all nodes in the cluster 2 7 Troubleshooting Flow in a PDB Cluster 2 12 The following troubleshooting procedures are based on console access for both nodes Refer to the Ultra Enterprise 2 Cluster PDB Administration Guide for console access The troubleshooting presented in this section of the manual is based on error messages displayed on the system administration console Cluster Monitor or other sources In addition the Cluster Monitor GUI displays information and graphics that can be used to isolate faults To maintain the system in high availability mode troubleshooting should be accomplished in the following order 1 Checking system Console or Cluster Monitor messages and troubleshooting instructions to determine princ
35. Complete Disk Array Shutdown 7 10 7 3 2 Complete Disk Array Startup naaass 7 11 7 3 3 Single Drive and Tray Shutdown 7 13 vi Ultra Enterprise 2 Cluster Hareware Service Manual April 1997 7 34 Single Drive and Tray Startup 00s 00000s 7 13 74 SPARCstorage MultiPack ccs0sveveceeseeweeewns 7 13 7 4 1 Single Drive Shutdown 624 4h 4icecedtinniicess 7 13 7 4 2 Complete MultiPack Shutdown 7 14 7 4 3 Complete MultiPack Startup 7 14 7 5 Terminal Concentratot lt lt iiciscinieterdesugeda duces 7 15 8 Intermal Access 4 4244 o54ctexdagcassaeeenieeli ce chises 8 1 9 Major Subassemblies 6 6 055s4s s0eseeaeeew ese sess aus 9 1 9 1 Ultra Enterprise 2 Server n enn cere eeene ete 9 2 9 2 SPARCstorage AITAY eree eannan aE aE a ARER RE 9 2 92 1 Disk Drives 2 24302 ipass te tee eee esee ens eees 9 2 9 2 2 Major Subassemblies 0 00000 9 2 9 3 SPARC storage MultiPack 2 s20 60004 0s 0a LARS Seeds 9 2 93 1 Disk Drives rrise pete ce eee eee Bree ee eee 9 2 9 3 2 Power Supply s eee cabereade ree eae ee ree oes 9 3 9 4 Terminal Concentrator 5 00s0seses seesaw eee 9 3 9 5 Cluster Cabling mcvccane uspesi Vereen andrea dea 9 4 10 Illustrated Parts Breakdown cece eee e cence 10 1 A Connector Pinouts and Cabling 00 c eee eee A 1 A 1 SPARCstorage Array Fiber Optic Cables A 1 A 2 Term
36. Disk Errors or Disk Access Error For One Node Only page 3 4 SPARCstorage Array Fails to Communicate page 3 5 MultiPack and SCSI Connection Faults page 3 12 Multidisk Errors from Both Nodes on the Same SPARCstorage MultiPack page 3 13 Multiple Disk Errors or Disk Access Error For One Node Only page 3 14 Node Failures page 3 14 System Board and Boot Disk page 3 14 3 1 Network Failures page 3 17 Private Network Failure page 3 17 Public Network Failure page 3 21 Terminal Concentrator and Serial Connection Faults page 3 22 System Indicators page 3 22 Serial Connections page 3 23 Additional Troubleshooting Tips page 3 29 3 1 Solaris Reconfiguration Reboot 3 2 Caution If the controller in the SPARCstorage Array is replaced the system administrator must reprogram the original World Wide Name WWN in the new controller If this isn t done correctly the DiskSuite software will not recognize the new controller and the system administrator will not be able to return the node to the cluster For WWN reprogramming procedures refer to the Solstice HA 1 3 User s Guide or the Ultra Enterprise Cluster PDB Administration Guide as applicable Note It is not necessary to perform a reconfiguration reboot to add disks to an existing SPARCstorage Array or MultiPack For this procedure refer to the Solstice HA 1 3 User s Guide or the Ultra Enterprise Cluster PDB Administration Guide as appli
37. E 0 too many continuation entries no unsolicited commands to get unknown status unsolicited Illegal state flags invalid fc_ioclass reset with resets disabled B 5 2 pln Driver pln_ ddi_dma_sync failed Invalid transport status Unknown state change Grouped disks not supported fr rsp scsi_pktfr ing fr packet B 12 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Index Numerics 10Base5 connector A 3 B block diagram Ultra Enterprise 2 Cluster 1 2 1 5 C cluster cabling replacement of 9 4 Cluster Console PDB GUI 2 12 Cluster Control Panel PDB GUI 2 12 Cluster Monitor front panel figure 2 15 item properties figure 2 16 message viewer figure 2 14 PDB cluster troubleshooting 2 12 PDB GUI 2 12 configuration verify HA 5 1 PDB 5 1 connection faults MultiPack 3 12 SPARCstorage Array 3 3 connection refused 2 1 correcting misconfigured port 2 2 D differences HA PDB 2 4 E Ethernet connector 10Base5 A 3 terminal concentrator A 3 F failure diagnosis 5 1 failures network private 3 17 public 3 21 NFS or other data service 4 3 operating system 4 2 PDB software 4 2 Solstice HA 1 3 software 4 2 SPARCstorage Array 4 2 SPARCstorage MultiPack 4 2 with takeover 2 4 Index 1 Index 2 without takeover 2 6 3 4 G graphical user interfaces PDB cluster 2 12 H HA configuration verify 5 1 HA
38. Hardware Service Manual April 1997 N lll AN Warning After the system starts do not move or attempt to move the server while the system power is on Failure to heed this caution can result in catastrophic disk drive failure Always power the server off completely before you attempt to move the server 3 Watch the system console for possible error messages from the POST diagnostic program POST tests subassemblies in the server and some interface paths between subassemblies 4 If no faults exist at the conclusion of testing the system boots Following a successful boot have the system administrator rejoin the node to the cluster If you want to run diagnostics again or if the system hangs try aborting the system If that fails power cycle the server 7 2 Component Replacement without a Spare Ultra Enterprise 2 Server AN If a spare UltraEnterprise 2 is unavailable for service maintenance the failed server undissociated MultiPack can be shut down as described in this section The procedures in this section assume that Node 0 is the failed node and MP 0 is the MultiPack attached to Node 0 Caution To avoid damaging internal circuits do not connect or disconnect any cable while power is applied to the system except the private network cables Table 7 1 Shutdown Procedure Summary Replaceable Unit Perform Steps Ultra 2 Processor Board 1to9 CPU module or memory power Supply 1 and 2 and c
39. ID SUNWssa is implied and is not shown soc link 6010 soc port Fibre Channel is ONLINE Note that most disk drive and media related errors will result in messages from the ssd drivers See the man pages for sd 7 pln 7 and soc 7 for information on these messages B 2 System Configuration Errors B 2 This class of errors may occur because of insufficient system resources for example not enough memory to complete installation of the driver or because of hardware restrictions of the machine into which the SPARCstorage Array host adapter is installed This class of errors may also occur when your host system encounters a hardware error on the host system board such as a failed SIMM Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 B 2 1 soc Driver soc attach 4004 soc attach failed bad soft state soc attach 4010 soc attach failed unable to map eeprom soc attach 4020 soc attach failed unable to map XRAM soc attach 4030 soc attach failed unable to map registers soc attach 4040 soc attach failed unable to access status register soc attach 4050 soc attach failed unable to access hostadapter XRAM soc attach 4060 soc attach failed unable to install interrupt handl soc attach 4003 soc attach failed alloc soft state soc attach 4070 soc attach failed offline packet structure allocat These messages indicate that the initialization of the soc drive
40. M 1048576 bytes is installed PARITY option is not installed Twisted Pair alternate interface installed umber of ports 3 3 6 6 Resetting the Terminal Concentrator Configuration Parameters You may need to reset the terminal concentrator configuration information to a known state One specific case is if you need to recover from an unknown terminal concentrator administrative password You can reset the configuration information using the erase terminal concentrator ROM monitor command The erase command resets all configuration information to default values however these defaults are not what were programmed when you initially received your terminal concentrator Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 3 The following procedure shows how to reset all parameters to their defaults and then set the few parameters necessary for use in the Ultra Enterprise 2 environment For more information see the Terminal Concentrator General Reference Guide Before starting you will need the following A terminal for example a Sun Workstation running t ip 1 located near the terminal concentrator The RJ 45 to DB 25 serial cable for connecting the terminal concentrator to your terminal An Ethernet connection to the terminal concentrator A system from which you can telnet 1 to the terminal concentrator 1 Connect the terminal concentrator console port to a suitable terminal connect
41. MICRO XL UX R amp 0 1 8 ports admin port 2 admin set port mode slave You may need to reset the appropriate port Annex subsystem or reboot the Annex for the changes to take affect admin reset 2 admin After you reset the port it should be configured correctly If not refer to Section 3 6 6 Resetting the Terminal Concentrator Configuration Parameters For additional details on terminal concentrator commands refer to the Terminal Concentrator General Reference Guide part number 801 5972 2 2 Troubleshooting Philosophy Note Ultra Enterprise 2 clusters have redundant online components which can continue system operation even through failure repair and relocation of one assembly or device However to maintain a high level of availability failed components should be replaced as soon as possible Ultra Enterprise 2 clusters have two identical system nodes joined into a cluster You must take several service precautions to maintain cluster operation during maintenance procedure For most hardware repair operations the node with the faulty part must be removed from the cluster as indicated in Section 2 3 Maintenance Authorization Additionally the system administrator may have to perform related software tasks before and following the removal of a node from the cluster For example instances of the database application on a node may have to be halted prior to removing a node from the clust
42. Replace the defective disk drive as described in the SPARCstorage Array Model 100 Series Service Manual 3 Have the system administrator return the node to the cluster 4 If disk drive errors still exist after the drive is replaced proceed to the next section to isolate the problem Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll 3 2 3 SPARCstorage Array Fails to Communicate If a SPARCstorage Array is not communicating with a node do a physical inspection with the following steps 1 3 4 Ensure that the SPARCstorage Array subsystem is connected to a working power outlet Check the power cord connection of the SPARCstorage Array power supply Check the power supply AC power switch Ensure that the fiber optic cable is connected properly at both ends If the node and the SPARCstorage Array subsystem are still not communicating one of the following components is probably faulty Fiber optic cable connecting the node to the SPARCstorage Array FC S card or FC OM module in the node FC OM module in the SPARCstorage Array Array controller in the SPARCstorage Array To determine if one of the preceding components has failed 1 Ask the system administrator to prepare the node for troubleshooting which requires shutting down the SPARCstorage Array Shut down the SPARCstorage Array as described in Section 7 3 1 Complete Disk Array Shutdown Set the DIAG switch on the r
43. Reset Entering Monitor Mode monitor 7 Use the erase command to reset the EEPROM memory configuration information Caution Do not erase the FLASH memory self boot image Doing so will require reloading of the self boot image from the Sun network terminal server CD ROM or from another terminal concentrator which is beyond the scope of this manual Alternatively the entire terminal concentrator can be replaced monitor erase Erase 1 EEPROM i e Configuration information 2 FLASH i e Self boot image Enter 1 or 2 1 Erase all non volatile EEPROM memory y n n y Erasing 32736 bytes of non volatile memory Please wait 16K gt Data Oxff 16K gt Data 0x0 Initialized checksum record installed Erasing 32736 bytes of non volatile memory complete monitor 3 32 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 3 8 Use the addr command to assign the IP address subnet mask and other network parameters to the terminal concentrator Some parameters are not critical to the SPARCcluster environment just accept the defaults and enter the subnet mask appropriate for your network The broadcast address is the IP address of the terminal concentrator with the host portion set to all ones For example for a standard class C IP address of 192 9 200 5 the broadcast address would be 192 9 200 255 monitor addr Enter Internet
44. System indicators Test indicator Test switch Status indicators Figure 3 9 Terminal Concentrator Indicator Locations 3 6 1 System Indicators Figure 3 9 shows the location of terminal concentrator system test and status indicators The system indicators are Power ON if unit is receiving AC power and the internal DC power supply is working Unit ON if unit successfully passes its self test Net ON when unit successfully transmits test data to and receives test data from the network Attn ON when unit requires operator attention Flashing when unit encounters a problem Load ON when the unit is loading or dumping Flashing when unit is trying to initiate a load Active FLASHING when unit successfully transmits data to and receives data from the network flashing during diagnostics Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 3 The test indicator is located next to the test switch The indicator lights when the terminal concentrator enters test mode The status indicators numbered 1 to 8 display serial port activity during normal operations When the terminal concentrator is first configured during the SPARCcluster installation the indicators should all be OFF If any status indicator lights there may be a hardware failure 3 6 2 Serial Connections Isolate serial connections between the terminal concentrator and each node by using the troubleshooting flow diagrams in Sect
45. Ultra Enterprise 2 Cluster Hardware Service Manual Y amp Sun microsystems THE NETWORK IS THE COMPUTER Sun Microsystems Computer Company A Sun Microsystems Inc Business 2550 Garcia Avenue Mountain View CA 94043 USA 415 960 1300 fax 415 969 9131 Part No 802 6316 12 Revision A April 1997 Copyright 1997 Sun Microsystems Inc 2550 Garcia Avenue Mountain View California 94043 1100 U S A Allrights reserved This product or document is protected by copyright and distributed under licenses restricting its use copying distribution and decompilation No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Portions of this product may be derived from the UNIX system and from the Berkeley 4 3 BSD system licensed from the University of California UNIX is a registered trademark in the United States and in other countries and is exclusively licensed by X Open Company Ltd Third party software including font technology in this product is protected by copyright and licensed from Sun s suppliers RESTRICTED RIGHTS LEGEND Use duplication or disclosure by the government is subject to restrictions as set forth in subparagraph c 1 ii of the Rights in Technical Data and Computer Software clause at DFARS 252 227 7013 and FAR 52 227 19 Sun Sun Microsystems the Sun logo Ultra Enterprise AnswerBook SunDocs SunExpress S
46. able internal disk SD1 tape drive CD Floppy SCI card SunSwift SBus card SCSI cable 1 to 6 SCI cable Can be replaced live Shutdown and Restart Procedures 7 7 7 8 7 2 1 Server Shutdown Note If you will not be disconnecting any SCSI connection to the MultiPack only perform steps 1 through 5 1 Have the system administrator remove the node from the cluster and halt the failed node 0 using the appropriate HA or PDB procedure Wait for the system halted message and the boot monitor prompt 2 Turn off the AC power switch on the back of the failed node Figure 7 1 3 Disconnect the private net cables Figure 7 2 4 Use the running node Node 1 to detach one of the MultiPacks using the appropriate procedure a For PDB use the vxdiskadm command of the CVM or VxVm to detach the MultiPack from the failed node Figure 7 9 as described in the Ultra Enterprise Cluster PDB Volume Manager Administration Guide b For HA prepare the MultiPack for service as described in the Solstice HA User s Guide Powered down Private net node 0 Running node 1 vxdiskadm Detached eae N n Ou ka ae MultiPack Figure 7 9 Private Nets Detached 5 Power off the detached MultiPack Figure 7 14 Note If you are replacing a SunSwift card install the new card and stop here 6 Physically disconnect the SCSI cable that goes
47. address lt uninitialized gt terminal concentrator IP address Internet address terminal concentrator IP address Enter Subnet mask 255 255 255 0 subnet mask Enter Preferred load host Internet address lt any host gt lt return gt Enter Broadcast address 0 0 0 0 broadcast address Broadcast address broadcast address Enter Preferred dump address 0 0 0 0 lt return gt Select type of IP packet encapsulation i 802 ethernet lt ethernet gt lt return gt Type of IP packet encapsulation ethernet Load Broadcast Y N Y n Load Broadcast N monitor Hardware Troubleshooting 3 33 lll Qo 9 Set the terminal concentrator to boot from itself instead of the network To do this use the sequence command at the monitor prompt and press Return after verifying the correct settings as follows monitor seq Enter a list of 1 to 4 interfaces to attempt to use for downloading code or upline dumping Enter them in the order they should be tried separated by commas or spaces Possible interfaces are thernet net ti SELF self Enter interface sequenc net self Interface sequence self monitor 10 Power cycle the terminal concentrator to reboot it It takes a minute or two to boot and display the annex prompt Annex Command Line Interpreter Copyright 1991 Xylogics Inc annex 11 Become the terminal concentrator
48. al cable Yes For internal access procedures refer to the service manuals that came with your system Table 8 4 lists the applicable manuals Table 8 4 List of Service Manuals Document Part Description Part Number Reference Number Ultra 2 Server Ultra 2 Series Service Manual 801 5933 FC S SBus card 595 3213 801 6316 FC OM module 595 3214 801 6326 SQEC SBus card 605 1520 801 7123 SunSwift SBus card 595 2345 802 6021 SPARCstorage Array SPARCstorage Array Model 100 Series 802 2206 Service Manual SPARCstorage Array Model 200 Series 802 2028 Service Manual Disk Drive Disk Drive Installation Manual for the 801 2207 SPARCstorage Array Model 100 Series SPARCstorage SPARCstorage MultiPack Service Manual 802 4430 MultiPack and disk drive Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Co lll Table 8 4 List of Service Manuals Continued Document Part Description Part Number Reference Number System administration Service manual provided with equipment workstation or terminal Terminal concentrator 370 1434 802 6313 See Ultra Enterprise Cluster PDB Hardware Planning and Installation Manual Chapter 5 Hardware Installation for cabling details Fiber optic See Ultra Enterprise Cluster PDB 802 6313 and SCSI 2 cables Hardware Planning and Installation Manual Chapter 5 Hardware Installation for cable details Internal Access 8 3 8 4 Ultra Enterprise 2 Cluster Hardw
49. are Service Manual April 1997 Major Subassemblies 9 This chapter supplies the information necessary to remove and reinstall the replaceable parts that are unique to Ultra Enterprise 2 Clusters For non unique replaceable parts you will be referred to the appropriate service manual The following table lists the locations of the procedures If you are viewing this in AnswerBook place your cursor on the desired procedure or location and click the SELECT button on your mouse twice to go directly to the task Ultra Enterprise 2 Server page 9 2 SPARCstorage Array page 9 2 Disk Drives page 9 2 Major Subassemblies page 9 2 SPARCstorage MultiPack page 9 2 Terminal Concentrator page 9 3 Cluster Cabling page 9 4 9 1 9 9 1 Ultra Enterprise 2 Server 1 Shut the server down as described in Section 7 1 Ultra Enterprise 2 Server 2 Once the server has been shut down remove and replace the system board any replaceable part on the system board the boot disk or the power supply by following the procedures described in the Ultra 2 Series Service Manual 3 After parts replacement power on the server as indicated in Section 7 1 3 Server Startup 9 2 SPARCstorage Array 9 2 1 Disk Drives Replace the defective drive as described in the SPARCstorage Array Model 100 Series Service Manual 9 2 2 Major Subassemblies 1 Shut the disk tray down as described in Section 7 3 1 Compl
50. ations about the suitability of this software for any purpose It is provided as is without express or implied warranty THIS PUBLICATION IS PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT Copyright 1997 Sun Microsystems Inc 2550 Garcia Avenue Mountain View Californie 94043 1100 U S A Tous droits r serv s Ce produit ou document est prot g par un copyright et distribu avec des licences qui en restreignent l utilisation la copie et la d compilation Aucune partie de ce produit ou de sa documentation associ e ne peut tre reproduite sous aucune forme par quelque moyen que ce soit sans l autorisation pr alable et crite de Sun et de ses bailleurs de licence s il y ena Des parties de ce produit pourront tre deriv es du syst me UNIX et du syst me Berkeley 4 3 BSD licenci par l Universit de Californie UNIX est une marque enregistr e aux Etats Unis et dans d autres pays et licenci e exclusivement par X Open Company Ltd Le logiciel d tenu par des tiers et qui comprend la technologie relative aux polices de caract res est prot g par un copyright et licenci par des fournisseurs de Sun Sun Sun Microsystems le logo Sun Ultra Enterprise AnswerBook SunDocs SunExpress Solstice PDB SunFDDI SunFastEthernet SunSwift SunVTS et Solaris sont des ma
51. cable Avoid performing Solaris reconfiguration reboots when any hardware especially a SPARCstorage Array SPARCstorage MultiPack or other disks is not operational powered off or otherwise inoperable A reconfiguration reboot is performed using the OBP boot r command or by creating the file reconfigure on the server and then rebooting The reconfiguration reboot will change the device special files in devices and symlinks in dev dsk and dev rdsk associated with the disk devices A reconfiguration reboot may not restore the original controller minor unit numbering if the hardware configuration has changed for example if a FC S card has been relocated or the WWN of a disk array controller is incorrect thus causing Solstice DiskSuite to reject the disks Once the original numbering is restored Solstice DiskSuite will be able to access the associated metadevices Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll 3 2 SPARCstorage Array and Optical Connections Faults System console messages indicate whether a node has a failed disk array controller or cable First isolate the fault using the procedures in the following sections if the fault matches the section heading Otherwise go to Section 3 2 3 SPARCstorage Array Fails to Communicate and proceed as directed 3 2 1 Multidisk Errors from Both Nodes on the Same SPARCstorage Array SPARCstorage Array Errors Errors Node 0 Node 1
52. cedures Preface xxi Ordering Sun Documents SunDocs is a distribution program for Sun Microsystems technical documentation Easy convenient ordering and quick delivery is available from SunExpress You can find a full listing of available documentation on the World Wide Web http www sun com sunexpress Country Telephone Fax Belgium 02 720 09 09 02 725 88 50 Canada 800 873 7869 800 944 0661 France 0800 90 61 57 0800 90 61 58 Germany 01 30 81 61 91 01 30 81 61 92 Holland 06 022 34 45 06 022 34 46 Japan 0120 33 9096 0120 33 9097 Luxembourg 32 2 720 09 09 32 2 725 88 50 Sweden 020 79 57 26 020 79 57 27 Switzerland 0800 55 19 26 0800 55 19 27 United Kingdom United States 0800 89 88 88 1800 873 7869 0800 89 88 87 1800 944 0661 Sun Welcomes Your Comments xxii Please use the Reader Comment Card that accompanies this document We are interested in improving our documentation and welcome your comments and suggestions If a card is not available you can email or fax your comments to us Please include the part number of your document in the subject line of your email or fax message Email smcc docs sun com e Fax SMCC Document Feedback 1 415 786 6443 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Preface xxiii xxiv Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Product Description 1 Ultra Enterprise 2 Clusters are configured to support the
53. connector Fibre Channel Optical Module Terminal Section 3 6 Terminal Concentrator and Serial concentrator Connection Faults Ultra Enterprise 2 Ultra 2 Series Service Manual Chapter 2 SunVTS 802 2561 Server and Chapter 3 Troubleshooting Procedures SBus Quad SBus Quad Ethernet Controller Manual 801 7123 Ethernet Controller Appendix C Running Diagnostics SunSwift SBus SunSwift SBus Adapter Installation User s Guide 802 6021 Adapter card SCI SCI SBus Adapter User s Guide 802 7103 2 7 3 PDB Error Messages Symptoms Refer to the Ultra Enterprise PDB Cluster Error Messages Manual Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 2 U m a 2 7 4 PDB Device Replacement Cross Reference Table 2 6 references devices to replacement procedures Table 2 6 PDB Device Replacement Cross Reference Part Device Trouble Area Reference Number Ultra 2 Server Ultra 2 Series Service Manual Chapter 8 802 2561 Power supply Major Subassemblies Boot disk Chapter 9 Storage Devices System board System Board and Component Replacement SBus card DIMM CPU module SBus Quad Ethernet SBus Quad Ethernet Controller Manual 801 7123 Controller SunSwift SBus Adapter SunSwift SBus Adapter Installation and User s 802 6021 card Guide Optical Module Fibre Channel Optical Module Installation Guide 801 6326 FC S SBus card Fibre Channel SBus Adapter card Installation 801 6313 Guide SPARCstorage Array SP
54. d spun down all drives in the array trays turn off the AC power switch on the array Figure 7 12 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 N lll AC plug AC power switch o ee Sun o pse gme Z oo j D um mm o0 f O0 S Figure 7 12 SPARCstorage Array AC Power Switch and AC Plug 7 3 2 Complete Disk Array Startup AN Warning Never move the SPARCstorage Array when the power is on Failure to heed this warning can result in catastrophic disk drive failure Always power the system off before moving the array 1 Begin with a safety inspection a Ensure that the SPARCstorage Array AC power switch is off Figure 7 12 b Verify that the power cord is connected to the chassis and a wall socket 2 Turn on the AC power switch on the chassis rear You should hear the fans begin turning 3 Watch the front panel LCD display When powering on the LCD displays the icons shown in Figure 7 13 It may take some time for a SPARCstorage Array to boot depending on the total number of disk drives For example a SPARCstorage Array with 18 disk drives may take several minutes to boot while a SPARCstorage Array with 30 disks drives may take much longer to boot Shutdown and Restart P
55. d should remain plugged in to ensure proper grounding Warning This equipment contains lethal voltages Accidental contact can result in serious injury or death Safety and Tools Requirements 6 3 lll O AN AN 6 4 Tools Required 6 4 Caution Improper handling by unqualified personnel can cause serious damage to this equipment Unqualified personnel who tamper with this equipment may be held liable for any resulting damage to the equipment Persons who remove any of the outer panels to access this equipment must observe all safety precautions and ensure compliance with skill level requirements certification and all applicable local and national laws All procedures contained in this document must be performed by qualified service trained maintenance providers Caution Before you begin carefully read each of the procedures in this manual If you have not performed similar operations on comparable equipment do not attempt to perform these procedures The following list represents the minimum tools and test equipment required to service the server Screwdriver Phillips 1 Screwdriver Phillips 2 Screwdriver slotted 3 16 inch Sun ESD mat Grounding wrist strap Needlenose pliers Digital multimeter DMM SPARCstorage Array loopback connector part number 130 2837 01 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Shutdown and Restart Procedures Perform
56. disk drive e Two Fibre Channel SBus FC S cards each equipped with one fibre Channel Optical Module FC OM Administration workstation e One SBus Quad Ethernet Controller SQEC card for HA or two SunSwift cards for PDB Two Sun Private Network cables Two SPARCstorage Arrays SSAs with six disk drives in each array e Four fiber optic cables Terminal concentrator supports up to three two node clusters Product Description 1 3 lll e Three serial cables e Administration workstation Ethernet cables 1 1 2 Ultra Enterprise 2 Cluster Optional Devices SunFastEthernet SFE SBus card for the public network HA only SunFDDI 5 0 SAS DAS SBus card for the public network HA only e CD ROM drive e Additional disk drives second boot drive and disk drives in SPARCstorage Arrays Tape drive SCI SBus Adapter card for the private net PDB only 1 2 Ultra Enterprise 2 Cluster Using SPARCstorage MultiPacks The Ultra Enterprise 2 Cluster can be implemented on the Ultra Enterprise 2 Server platform using two to four six or twelve drive SPARCstorage MultiPacks Figure 1 3 is a functional block diagram of the Ultra Enterprise 2 Cluster using two SPARCstorage MultiPacks and SunSwift hme connections for the Private Nets This configuration supports both the HA 1 3 and PDB 1 2 software 1 4 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 lll
57. ear of the SPARCstorage Array to DIAG EXT Setting the DIAG switch to DIAG EXT provides more thorough testing but it also causes the array to take longer to boot Press the Reset switch to reset the SPARCstorage Array Check the front panel LCD display and see if there is a specific POST code for the SPARCstorage Array displayed in the alphanumeric portion of the LCD display Figure 3 3 shows the location of the alphanumeric portion of the LCD and Table 3 1 lists the SPARCstorage Array POST codes Hardware Troubleshooting 3 5 lll Qo Alphanumeric display Figure 3 3 LCD Display on SPARCstorage Array Table 3 1 POST Codes POST Code Meaning Action 01 LCD failure Replace fan tray 08 Fan failure Replace fan tray 09 Power supply failure Replace power supply 30 Battery failure Replace battery module Any other number Controller failure Replace controller e If you do not see a SPARCstorage Array POST code displayed set the DIAG switch back to DIAG then go to step 6 e If you see a SPARCstorage Array POST code displayed set the DIAG switch back to DIAG then replace the indicated component as described in Chapter 5 Major Subassemblies in the SPARCstorage Array Model 100 Series Service Manual Notify the system administrator that the node is ready to be returned to the cluster following component replacement 3 6 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997
58. er to prevent the cluster operation from terminating Or pertinent software tasks may have to be performed after replacing a disk drive or a controller and prior to or after Troubleshooting Overview 2 3 lll No rejoining a node to the cluster For these and other software specific tasks refer to the Solstice HA 1 3 User s Guide or the Ultra Enterprise 2 Cluster PDB Systerm Administration Guide 2 3 Maintenance Authorization The site system administrator must be contacted to remove a node from the cluster and after maintenance to return the node to cluster membership Additionally the system administrator performs all necessary related software tasks The procedures in this manual identify points where the system administrator must be contacted Note The equipment owner s administrative requirements supersede the procedures contained in this document 2 4 HA PDB Differences Depending upon the type of cluster HA or PDB there are differences in the disk access model as to whether it is shared as in a PDB cluster or non shared as in an HA cluster Additionally the PDB cluster supports a Cluster Monitor GUI whereas the HA cluster does not Refer to Section 2 5 Troubleshooting Flow in an HA Cluster or Section 2 7 Troubleshooting Flow in a PDB Cluster depending upon the type of cluster you are troubleshooting 2 5 Troubleshooting Flow in an HA Cluster 2 5 1 HA Node Takeover The Solstice HA sof
59. erminal concentrator end Telnet to the node that was alive Is there a response Yes No Replace the serial cable Replace the terminal concentrator Verify normal operation Figure 3 14 Branch B 1 Single Node Not Responding 3 6 5 Additional Troubleshooting Tips 3 6 5 1 Terminal Concentrator Indicators After POST has passed the eight status indicators on the terminal concentrator Figure 3 9 indicate activity on the serial ports Messages from the node should cause the appropriate port LEDs 2 and 3 to blink Text entered into the administration workstation should also cause the LEDs to blink This can be useful when trying to determine whether the terminal concentrator node or cable is bad Hardware Troubleshooting 3 29 3 30 3 6 5 2 Terminal Concentrator System Information The ROM monitor command config enables you to verify the hardware and software revisions of the terminal concentrator 1 Press the reset button and after 5 seconds press the test button The config command must be issued from a terminal connected to port 1 of the terminal concentrator 2 When the monitor prompt appears type monitor config REVISION CONFIGURATION INFORMATION Amount of memory 2 Meg Board ID 52 Serial Number 172743 REV ROM Maj Rev 40 Min Rev 0 ROM Software Rev 0601 LB Type 8s V24 FMC 1 EXPANSION Type None 15 EEPROM size 32768 bytes FLASH PRO
60. erprise 2 Cluster Hardware Service Manual April 1997 D gt lll A 4 SPARCstorage MultiPack SCSI 2 Cables Refer to Appendix B of the SunSwift SBus Adapter Installation and User s Guide for information on the SCSI 2 Connector Signals Connector Pinouts and Cabling A 5 A 6 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Firmware and Device Driver Error Messages B B 1 Message Formats Error indications from the SPARCstorage Array drivers pln and soc are always sent to syslog var adm messages Additionally depending on the type of event that generated the message it may be sent to the console These messages are limited to significant events like cable disconnections Messages sent to the console are in the form WARNING instance lt message gt The syslog messages may contain additional text This message ID identifies the message its producer and its severity ID SUNWssa soc messageid instance lt message gt Some examples soc3 Transport error Fibre Channel Online Timeout ID SUNWssa soc link 6010 socl port 0 Fibre Channel is ONLIN fl B 1 In the Ultra Enterprise 2 Cluster PDB Error Messages Manual messages are presented with the message ID and the message text even though the message ID is not displayed on the console The character implies a numeric quantity and implies a string of characters or numbers The prefix
61. es e SCSI 2 cables and SunSwift SBus Adapters Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 No lll Ultra Enterprise 2 Server faults Boot disk s System board UltraSPARC processor module s DIMMs Power supply Fibre Channel Optical Modules FC OM Fibre Channel SBus cards FC S SBus Quad Ethernet Controller card interface SQEC Public Net SBus card e Cluster faults e Private net cables and interfaces e Terminal concentrator and serial connections e Public network connections e Software faults e Application program crash e System crash panic e System hang lock up e Cluster wide failures All troubleshooting begins at the system console The console should be checked regularly as should any other source of operator information For example the output of hastat should be checked regularly For more information on the hastat command refer to the Solstice HA 1 3 User s Guide Troubleshooting Overview 2 7 2 8 2 5 5 HA Device Troubleshooting Cross Reference Table 2 1 lists the system devices and corresponding troubleshooting manuals Table 2 1 HA Device to Troubleshooting Cross Reference Device SPARCstorage MultiPack SPARCstorage Array Controller Fiber optic connector Fibre Channel Optical Module Ultra Enterprise 2 Server Terminal concentrator SBus Quad Ethernet Controller SunSwift SBus Adapter card Reference SPARCstorage MultiPack Service Man
62. essages may also be caused by a failed host adapter Fibre Channel Optical Module fiber optic cable or array controller soc link 4080 soc Connections via Fibre Channel Fabric are unsupported Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 B The current SPARCstorage Array software does not support fibre channel fabric switch operation This message indicates that the software has detected the presence of a fabric soc login 5010 soc Fibre Channel login failed soc login 5020 soc fabric login failed soc login 5030 soc N PORT login not successful soc login 5040 soc N PORT login failure These messages may occur if part of the fibre channel link initialization or login procedures fail Retries of the login procedure will be performed soc login 6010 soc Fibre Channel login succeeded The soc driver will display this message following a successful fibre channel login procedure part of link initialization if the link had previously gone from an operable to an inoperable state The login succeeded message indicates the link has again become fully functional soc login 4020 soc login retry count exceeded for port soc login 4040 soc login retry count exceeded These errors indicate that the login retry procedure is not working and the port card associated with the message is terminating the login attempt The associated SPARCstorage Array will be i
63. ete Disk Array Shutdown and Section 7 3 3 Single Drive and Tray Shutdown 2 Replace the defective subassembly as described in the SPARCstorage Array Model 100 Series Service Manual 3 Bring up the disk tray as described in Section 7 3 2 Complete Disk Array Startup and Section 7 3 4 Single Drive and Tray Startup 9 3 SPARCstorage MultiPack 9 3 1 Disk Drives Replace the defective drive as described in the SPARCstorage MultiPack User s Guide 9 2 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Ko lll 9 3 2 Power Supply 1 Shut the down the MultiPack as described in Section 7 4 2 Complete MultiPack Shutdown 2 Replace the defective subassembly as described in the SPARCstorage MultiPack User s Guide 3 Bring up the MultiPack as described in Section 7 4 3 Complete MultiPack Startup 9 4 Terminal Concentrator 1 Power off the terminal concentrator by using the AC power switch located on the back panel Figure 9 1 2 Remove the power network and serial cables from the terminal concentrator Serial connectors Network connectors Power Switch Ad k23 E81 LA SH k6 10221 ke u UJ UJ Figure 9 1 Terminal Concentrator Connector and Power Switch Location 3 Remove the defective terminal concentrator 4 Install the new terminal concen
64. from the detached and powered down MultiPack to the powered down node Figure 7 10 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 N lll Running node lt a Powered down node 0 N Detached and pow ered down 1 i Attached Multi MultiPack Ng In Out In Out nuke Pack MP1 MPO Figure 7 10 First SCSI Cable Detached 7 If the remaining cable between the working Node 1 and the powered down MultiPack is connected to the SCSI Out port reconnect it to the SCSI In port Figure 7 11 This will allow the MultiPack to automatically terminate the SCSI bus in this single host configuration Powered down Running node node 0 7S lt 1 ie mm Detached and pow ered down Attached Multi I I MultiPack _ _ Pack MP1 MPO Figure 7 11 SCSI Cable Moved from Out Port to In port 8 Power up the detached MultiPack MP0 and wait for all disks in the MultiPack to become ready 9 Reattach the detached MultiPack MP0 to the running node a For PDB use the vxdiskadm command of the CVM or VxVm to attach the MultiPack to the running node as described in the Ultra Enterprise Cluster PDB Volume Manager Administration Guide b For HA prepare the MultiPack for cluster operation as described in the Solstice HA User s Guide Shutdown and Restart Procedures 7 9 10 Repeat steps 4 throug
65. ge MultiPack 0008 3 13 iv Ultra Enterprise 2 Cluster Hareware Service Manual April 1997 3 3 2 Multiple Disk Errors or Disk Access Error For One Node OLY nt 2022cS oo 1vSereveveieeuededandeds 3 14 34 Node Failures i0i 2nhehasby see tenes en enicnnises 3 14 3 4 1 System Board and Boot Disk 3 14 3 4 2 Using the probe scsi Command 3 15 3 5 Network Failures 5 5 354 win einen ie acwraa age aoa hed dae ee 3 17 3 5 1 Private Network Failure oii pc eae ee oa ad ee 3 17 3 5 2 Public Network Failure i 02 0c eeeeeni ei vies 3 21 3 6 Terminal Concentrator and Serial Connection Faults 3 22 3 6 1 System ndicators aes Osea eee enews 3 22 3 6 2 Serial Connections 2 440605 os ve ve vee Reuse es 3 23 3 6 3 Intermittent Router Problems 3 23 3 6 4 Terminal Concentrator Flow Diagrams 3 25 3 6 5 Additional Troubleshooting Tips 3 29 3 6 6 Resetting the Terminal Concentrator Configuration Par meters crsreuiccccsieroie oe BR ERR RRO eS 3 30 4 Software Troubleshooting 0 c eee cece eee 4 1 4 1 Troubleshooting Solstice HA 1 3 Software 4 1 4 2 Troubleshooting PDB Software sassanannasn 4 1 4 3 Software Faults ows sp oi set now harmatew eh ee ee a Rae 4 2 43 1 Operating System Failures 2 2 2 5 584 k i vaus 4 2 432 S l stice HAN Deas oping see eee eee Geese ele ene 4 2 43 3 PDB Rares 22 22 teate tava t ner e
66. h 9 for all Multipacks attached to the system 11 Repair the node 7 2 2 Server Startup After the failed node is repaired reconnect it to the cluster as follows 1 Perform steps 4 through 6 in reverse order to make sure the cables are connected to the right In and Out ports on the MultiPack 2 Reconnect the Private Net cables 3 Power on and boot up the repaired node node 0 4 Have the system administrator rejoin the node to the cluster 7 3 SPARCstorage Array 7 10 A SPARCstorage Array Model 100 contains three drive trays and a SPARCstorage Array Model 200 contains six drive trays each tray contains up to 10 drives To replace a single drive or tray in a SPARCstorage Array you do not have to power down the array Instead you can spin down only the drives in the tray containing the drive to be replaced See Section 7 3 3 Single Drive and Tray Shutdown 7 3 1 Complete Disk Array Shutdown AN Caution Do not disconnect the power cord from the utility outlet when you work on the SPARCstorage Array This connection provides a ground path that prevents damage from uncontrolled electrostatic discharge Prior to powering down a complete SPARCstorage Array you must have the system administrator prepare the array for servicing indicate which component is going to be replaced and then spin down all drives in the array trays After the system administrator has prepared the array for servicing an
67. h nodes up Figure 2 1 HA Node Takeover Troubleshooting Flow Diagram Troubleshooting Overview 2 6 2 5 2 HA Node Switchover System administrators can manually direct one system to take over the data services for the other node This is referred to as a switchover refer to the Solstice HA 1 3 User s Guide 2 5 3 HA Failures Without Takeover For noncritical failures no software takeover occurs However to continue providing HA data services you should troubleshoot in the following order 1 nH oO FF Q You will be contacted by the system administrator to replace a defective part or to further isolate a system problem to a failed part Have the system administrator prepare the applicable assembly containing the failed part for service Isolate the fault to the smallest replaceable part Shut down the assembly containing the defective part Replace the failed part Have the system administrator return the repaired assembly to the cluster 2 5 4 HA Fault Classes and Principal Assemblies Ultra 2 Cluster HA Server troubleshooting depends on several different principal assemblies and classes of faults The fault classes and their associated assemblies are SPARCstorage Array faults e Data disks e Array controller e Fibre Channel Optical Modules FC OM e Fibre Channel SBus cards FC S e Fiber optic cables and interfaces e Power supply SPARCstorage MultiPack faults e Data disk driv
68. ift card For cabling details See Ultra 2 12 04 52 ha jan unix hmel Server Hardware Planning and Link Down cable problem Installation Manual Chapter 5 Hardware Installation Public Network var adm messages 0Apr 23 Public net Section 3 5 2 Public Network 12 04 52 ha jan unix gel No SQEC or cable Failure carrier twisted pair cable problem or disabled hub link test var adm messages 0Apr 23 Onboard TPE Section 3 4 Node Failures 12 04 52 ha jan unix hme0 interface cable For cabling details See Ultra 2 Link Down cable problem or public Server Hardware Planning and network Installation Manual Chapter 5 Hardware Installation Manual SBus Quad Ethernet Controller Manual SunSwift SBus Adapter User s Guide Refer to your public network documentation SBus Quad Ethernet Controller Manual Refer to your public network documentation Sun Ultra 2 Series Service Manual Troubleshooting Overview lll No Table 2 2 HA Error Messages and Symptoms Continued Troubleshooting Error Message Symptom Probable Cause Cluster Service Reference Reference soc link 5010 Disk array cable Section 3 2 SPARCstorage SPARCstorage Array Fiber Channel is OFFLINE c2t4d8a2 failed See PDB Error Messages Manual and SPARCstorage Array Messages for additional messages No messages from one of the nodes on the system console no messages from either node on the system console
69. inal Concentrator Ports 0200 0008 A 1 A 2 1 RJ 45 Serial Port Connectors A 2 A 2 2 Public Network Connector 4 A 3 A 3 Private Network Cables 202022 e eee A 4 Contents vii viii A 4 SPARCstorage MultiPack SCSI 2 Cables A 5 Firmware and Device Driver Error Messages B 1 B 1 Message Formats csicn dees ws ened en eee ies B 1 B 2 System Configuration Ertots 0i0 sekesc ewe eeeeees B 2 Bi2 1 so DINVED sc c6c6 ce bee edeabee risi kirimane B 3 B22 PIM INVER ce tenk reeks nieres soi GREE ES B 4 B3 Hardware Error eenas ee cee tee ere Renee B 4 B31 SOR ONIVEDs ncsn0nter ee bette teeta awereneres B 5 B 3 2 PIA DIVE rece eee ee deen ari a neeenseee B 6 B 4 Informational Messages 00 000 cece eee B 9 B41 SOC Drivers ss 23s c208aduk 14 aN eee BER ESS ee eas B 9 BAZ pln DMIVER ccs ie en ecdees ase eRe ee Rec ReEes B 10 B 5 Internal Software Errors 006 6 s eee e ve eee keene B 12 B51 so Drivers 1223 2344040 bee erne ep pees ees B 12 B 5 2 pl DVER secerat tk k SEa EREE EEEE eae ers B 12 Ultra Enterprise 2 Cluster Hareware Service Manual April 1997 Figures Figure 1 1 Figure 1 2 Figure 2 1 Figure 3 1 Figure 3 2 Figure 3 3 Figure 3 4 Figure 3 5 Figure 3 6 Figure 3 7 Figure 3 8 Figure 3 9 Figure 3 10 Figure 3 11 Figure 3 12 Ultra Enterprise 2 Cluster HA Server Functional Block Diagram 1 2 Ultra 2 Server H
70. ing shutdown and startup tasks are necessary for subassembly removal and replacement procedures These procedures are specifically structured for a high availability or parallel database system At appropriate points references will indicate that the system administrator be contacted for example to remove a node from a cluster in preparation for service to rejoin a node to the cluster after servicing or to perform necessary software tasks prior to maintenance of various system components Thus the database services are maintained The following table lists the locations of the procedures If you are viewing this using the AnswerBook on line documentation viewing system place your cursor on the desired procedure or location and double click the SELECT button on your mouse to go directly to the task Ultra Enterprise 2 Server page 7 2 Server Shutdown with SPARCstorage Arrays page 7 2 Server Shutdown with SPARCstorage MultiPacks and a Spare Ultra page 7 3 Enterprise 2 Server Server Startup page 7 6 Component Replacement without a Spare Ultra Enterprise 2 Server page 7 7 To avoid damaging internal circuits do not connect or disconnect any page 7 7 cable while power is applied to the system except the private network cables Server Startup page 7 10 SPARCstorage Array page 7 10 7 1 lll N Complete Disk Array Shutdown page 7 10 Complete Disk Array Startup page 7 11 Single Dri
71. install software and reconfigure the net addresses Use the CLI version of the terminal concentrator command stats Refer to the Terminal Concentrator Installation Notes and General Reference Guide Figure 3 11 Branch A Telnet to Terminal Concentrator Does Not Succeed Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll The terminal concentrator loads software but does not respond to the ping command Verify that the Ethernet interface cable on the terminal concentrator is seated in its connector If it is seated verify that the software is loaded Connect a serial cable between the administration workstation serial port A and port 1 of the terminal concentrator Type tip a inashell tool window The terminal concentrator prompt monitor should be displayed Use CLI command stats to Yes verify correct IP address If Prompt displayed 3 correct and TC is still not responding replace TC If ping doesn t work after If address is correct but the terminal concentrator still does terminal concentrator has been not answer when pinged replace the terminal concentrator replaced troubleshoot the and follow installation procedures Use the CLI version of external network the terminal concentrator command stats Refer to the Terminal Concentrator Installation Notes and General Reference Guide Figure 3 12 Branch A1 Terminal Concentrator Doe
72. ion 2 5 Troubleshooting Flow in an HA Cluster 3 6 3 Intermittent Router Problems If you experience either of the following conditions Terminal concentrator connections made via routers exhibit intermittent problems while connections from hosts on the same network as the terminal concentrator continue to work normally The terminal concentrator shows no signs of rebooting Establish a default route within the terminal concentrator and disable the routed feature You must disable the routed feature to prevent the default route from being lost To disable the routed feature Hardware Troubleshooting 3 23 lll Qo 1 Telnet to the terminal concentrator and log on as superuser telnet ss tc Trying terminal concentrator Connected to ss tc Escape character is Rotaries Defined cli Enter Annex port name or number cli Annex Command Line Interpreter Copyright 1991 Xylogics Inc annex su Password annex 2 At the terminal concentrator prompt enter annex edit config annex You should see the following as the first line of help text on a screen editor Ctrl W save and exit Ctrl X exit Ctrl F page down Ctrl B page up a To establish a default route within the terminal concentrator enter the following where default_router is the IP address for your router sgateway net default gateway default_router metric 1 hardwire b Fo
73. ion in order to perform the following steps If your terminal connection is a Sun workstation use the Sun cable and connect the RJ 45 connector to the terminal concentrator console port port 1 and the DB 25 connector to serial port A on the workstation 2 If you are using a workstation and this step was not previously done edit the etc remote file to add the following line a dv dev term a br 9600 This allows t ip 1 to connect to serial port A at 9600 baud 3 From the workstation type the following command to connect the workstations serial port A to terminal concentrator port 1 tipa connected Note Your administration workstation may have a combined serial port labeled SERIAL A B In this case you cannot use the TTY B port without the appropriate splitter cable See the documentation supplied with your workstation for more information 4 Verify that the terminal concentrator power is on Hardware Troubleshooting 3 31 lll Qo 5 Reset the terminal concentrator Depress the Test button Figure 6 1 for three or more seconds until the Power LED blinks rapidly Release the button 6 Wait for the Test LED to turn off and within 30 seconds press the Test button again Verify that the orange Test LED lights indicating the unit is in test mode The terminal concentrator performs a self test that lasts about 30 seconds Wait for the monitor prompt to appear System
74. iple assembly at fault Contacting system administrator to remove a node from the cluster Isolating fault to smallest replaceable component Shutting down specific disk tray system node or terminal concentrator Si ee ON Replacing defective component Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 No lll 6 Contacting system administrator to return node to cluster This troubleshooting flow is shown in Figure 2 2 If a system appears to be Refer to the Ultra Enterprise Cluster PDB Administration malfunctioning but the problem Guide and bring up the Cluster Monitor Front Panel is unknown proceed as follows Figure 2 4 The Cluster Monitor Front Panel displays the clus ter configuration highlighting in red components requiring at l tention as well as indicating the status of the PDB software You can then use the Follow Mouse Pointer facility to select Are error messages diss components of the system refer to the Ultra Enterprise Cluster played on the system PDB Administration Guide for this procedure which results in administrator s work the display of additional status information in the Item Properties station or other source window Figure 2 5 If the GUI display indicates a faulty com ponent see Chapter 3 for hardware troubleshooting of the com ponent or Chapter 4 for additional software troubleshooting Refer to the Ultra Enterprise Cluster PDB Administrati
75. istrator return the node to the cluster 32 If the node still does not communicate with the SPARCstorage Array have the system administrator prepare the node for replacement of a controller in a SPARCstorage Array 33 Take down the SPARCstorage Array See Section 7 3 1 Complete Disk Array Shutdown Caution If you replace the array controller the system administrator must reprogram the new controller with the original World Wide Name WWN If this number is incorrect the Solstice DiskSuite software will not recognize the new controller and the disk array cannot be rejoined to the cluster For WWN reprogramming procedures refer to the Solstice HA 1 3 User s Guide or the PDB Cluster Administration Guide as applicable 34 Replace the array controller 35 Bring up the applicable disk array See Section 7 3 2 Complete Disk Array Startup 36 Have the system administrator return the node to the cluster 3 3 MultiPack and SCSI Connection Faults The Cluster Monitor messages indicate when a node has a failed MultiPack Isolate the fault using the procedures in the following sections In addition refer to the SPARCstorage MultiPack Service Manual and the Solstice HA 1 3 User s Guide or Ultra Enterprise Cluster PDB Administration Guide 3 12 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll Note Do not use the probe scsi command as this can cause the system to hang at the boot
76. lll A 2 2 Public Network Connector The primary public Ethernet network connects to the AUI Ethernet transceiver port on the terminal concentrator The port receptacle is shown in Figure A 2 pin allocations are given in Table A 2 0e0e00000 00000000 Figure A 2 15 pin Ethernet Receptacle Table A 2 Ethernet Port Pinout and Signals Pin Number Signal Chassis ground Collision Transmit No connection Receive Ground for transceiver power N OA GFF BA QO N e co No connection Ko Collision ja oO Transmit ray any No connection a N Receive ies 12 volts for transceiver power 14 15 No connection Connector Pinouts and Cabling A 3 A A 3 Private Network Cables The nodes in an HA configuration are connected via two private nets using two special Ethernet cables The cables are twisted pair Category Type 5 For private net cabling information refer to Ultra Enterprise 2 Cluster Hardware Planning and Installation Guide The pinout for these cables is shown in A 4 Figure A 3 and listed in Table A 3 Figure A 3 Twisted Pair Ethernet RJ 45 Receptacle Table A 3 Private Ethernet Port Pinout and Signals Connects to pin Pin number Signal number Signal 1 Tx 3 Rx 2 Tx 6 Rx 3 Rx 1 Tx 4 No connection 5 No connection 6 Rx 2 Tx 7 No connection 8 No connection Ultra Ent
77. llow this with a carriage return and then press Control W to save and exit 3 24 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll 3 Disable the router feature using the set command annex admin set annex routed n 4 Boot the terminal concentrator annex boot 3 6 4 Terminal Concentrator Flow Diagrams Telnet to the terminal concentrator does not succeed This branch focuses on the ability of the terminal concentrator to talk gt on the net successfully Telnet for one node only does not respond This branch focuses on the failure of a terminal concentrator serial port Figure 3 10 Terminal Concentrator Troubleshooting Flow Diagram Overview Hardware Troubleshooting 3 25 Telnet to terminal concentrator does not succeed gt Disconnect all serial cables from the rear of the terminal concentrator 3 26 Power cycle the terminal concentrator TC Watch the LEDs on the front panel during normal boot to see whether the operating system software loads successfully All indicators should light briefly If software is loaded the Load light turns off and the Active light blinks once and then goes out Does TC respond to ping Yes pa Does software load y i Check power connection to terminal concentrator Re install serial cables If software still cannot load replace the terminal concentrator Re
78. message Level 15 Interrupt or Data Access Exception then you must repeat the command given in Step 12 to reselect the FC S card 13 Enter the following at the ok prompt ok soc post e If you see the message passed go to Step 14 For example ok soc post SOC POST Test Passed e If you see the message failed replace the FC S card in that SBus slot according to the instructions in the processor service manual that came with your system Following replacement of the FC S card have the system administrator return the node to the cluster 14 Disconnect the fiber optic cable from the FC OM on the node 15 Install the loopback connector part number 130 2837 01 from the ship kit in the FC OM on the node Caution Do not run the loopback tests on a FC OM that is not looped back This action may cause disk errors or unpredictable results 16 Enter the following at the ok prompt ok 40 is frame dsize ok 1 is frame num ok 1 is sb burst size 17 Locate the FC OM s in the FC S card and determine whether the FC OM s are in slot A or B in the FC S card You should be able to see the letters A and B silk screened on the outside of the FC S card Do only steps18a and 18b in loopback mode Hardware Troubleshooting 3 9 lll Qo Note Due to a silk screening error the A and B on the outside of the FC S card are reversed so the command to test slo
79. naccessible by the system Note that the fibre channel specification requires each device to attempt a login to a fibre channel fabric even though one may not be present A failure of the fabric login procedure due to link errors even in a point to point topology may result in the printing of fabric login failure messages even with no fabric present Link errors detected A number of retryable errors may have occurred on the fibre channel link This message may be displayed if the number of link errors exceeds the allowable link bit error rate 1 bit 10 bits If you see this message clean the fiber optic Firmware and Device Driver Error Messages B 11 lll s cable according to the instructions given in the SPARCstorage Array 100 Service Manual If the problem still exists replace either the fiber optic cable or the Fibre Channel Optical Module B 5 Internal Software Errors These messages may be printed by the driver in a situation where it has detected some inconsistency in the state of the machine These may sometimes be the result of failed hardware usually either the SPARCstorage Array host adapter or SBus hardware These are not expected to occur under normal operation B 5 1 soc Driver SOC SOC SOC SOC SOC SOC SOC driver 4010 driver 4030 driver 4080 link 3020 link 4050 link 4070 login 1010 soc soc soc soc soc soc soc Illegal state SOC_COMPLET
80. nformational Messages Messages in this category will be used to convey some information about the configuration or state of various SPARCstorage Array subsystem components B 4 1 soc Driver soc driver 1010 soc host adapter fw date code This string will be printed at boot time to indicate the revision of the microcode loaded into the host adapter soc link 6010 soc port Fibre Channel is ONLINE soc link 5010 soc port Fibre Channel is OFFLINE Under a variety of circumstances the fibre channel link may appear to the host adapter to have entered an inoperable state Frequently such a condition is temporary The following are possible causes for the fibre channel link to appear to go offline A temporary burst of errors on the fibre cable In this case the OFFLINE message should be followed by an ONLINE message shortly afterwards Unplugging of the fibre channel cable from either the host adapter or the SPARCstorage Array e Powering off a connected SPARCstorage Array Failure of a Fibre Channel Optical Module in either the host adapter or the SPARCstorage Array Failure of an optical cable e Failure of a SPARCstorage Array controller Failure of a host adapter card Note that any pending I O operations to the SPARCstorage Array will be held by the driver for a period of time one to two minutes following a link off line in case the link should ret
81. ng the output to a mail message to the support organization for their records Compare the traffic conditions in the two net stat outputs for similar levels In an HA cluster System console messages or unlit green LEDs on the SQEC cards indicate that one of the private networks has failed For example the output of the hastat command will indicate if there are problems with the private networks Also the Message Log at the bottom of the hastat display Hardware Troubleshooting 3 17 output or the var adm messages file should be checked for private network related error messages The use of the hastat command and the var adm messages file is described in the Solstice HA 1 3 User s Guide For supplemental troubleshooting procedures refer to the SBus Quad Ethernet Controller Manual and the SunSwift SBus Adapter Installation and User s Guide Also see the following section One or Both Nodes Up and Running in a Cluster In the following example Figure 3 6 both nodes are up and running in a cluster private net 1 has failed and the software continues to use private net 2 Caution Do not replace a cable without first running net stat and saving the output To confirm the designations for the private network ports on a node Use the netstat i command on each node to determine which private links are available For example for node 0 with private nets on hme0 and qe0 netstat i Name Mtu Ne
82. oduct Notes SunVTS 2 0 User s Guide Disk Drive Installation Manual for the SPARCstorage Array Model 100 Series SBus Quad Ethernet Controller Manual Fibre Channel SBus Card Installation Manual Fibre Channel Optical Module Installation Manual SunSwift SBus Adapter User s Guide 825 3834 802 6784 802 6785 802 6316 825 3783 802 6792 802 6793 825 2227 801 6127 801 5972 851 2369 802 4215 802 6724 802 7196 802 7221 801 2207 801 7123 801 6313 801 6326 802 6021 XX Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Notes Cautions and Warnings A AN Warning This equipment contains lethal voltage Accidental contact can result in serious injury or death Caution Improper handling by unqualified personnel can cause serious damage to this equipment Unqualified personnel who tamper with this equipment may be held liable for any resultant damage to the equipment Individuals who remove any outer panels or open covers to access this equipment must observe all safety precautions and ensure compliance with skill level requirements certification and all applicable local and national laws Procedures contained in this document must be performed by qualified service trained maintenance providers Note Before you begin carefully read each of the procedures in this manual If you have not performed similar operations on comparable equipment do not attempt to perform these pro
83. ols 6 4 resetting terminal concentrator port 2 2 router problems intermittent 3 23 S safety precautions 6 1 system precautions 6 3 script pdbconf 5 1 serial port connector terminal concentrator A 2 server system shutdown 7 2 system startup 7 6 slave mode setting terminal concentrator port to 2 2 software troubleshooting 4 1 Solaris reconfiguration 3 2 Solstice HA on line serviceability 1 1 SPARCstorage Array 7 10 9 2 complete shutdown 7 10 complete startup 7 11 replacing major subassemblies 9 2 replacing trays and disk drives 9 2 single drive tray shutdown 7 13 single drive tray startup 7 13 SPARCstorage MultiPack 9 3 complete shutdown 7 14 complete startup 7 14 replacing disk drives 9 2 replacing major subassemblies 9 3 single drive shutdown 7 13 stats command 3 27 Subassemblies 9 1 SunVTS 5 1 swapping cables algorithm 3 29 switchover manual for HA 2 6 T takeover failures with 2 4 failures without 2 6 3 4 HA node 2 4 terminal concentrator Ethernet pinout A 3 indicator LEDs 3 29 port resetting 2 2 power on and off 7 15 replacement of 9 3 serial pinout A 1 setting port mode to slave 2 2 tip hardwire command 3 27 3 28 tools required 6 4 troubleshooting error messages HA 2 9 PDB 2 18 SPARCstorage Array B 1 device driver B 1 firmware B 1 failures operating system 4 2 PDB 4 2 Solstice HA 1 3 Failures 4 2 fault classes HA cluster 2
84. olstice PDB SunFDDI SunFastEthernet SunSwift SunVTS and Solaris are trademarks or registered trademarks of Sun Microsystems Inc in the United States and in other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the United States and in other countries Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems Inc The OPEN LOOK and Sun Graphical User Interfaces were developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox Corporation in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a nonexclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements XPM library Copyright 1990 93 GROUPE BULL Permission to use copy modify and distribute this software and its documentation for any purpose and without fee is hereby granted provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation and that the name of GROUPE BULL not be used in advertising or publicity pertaining to distribution of the software without specific written prior permission GROUPE BULL makes no represent
85. on Guide and bring up the Cluster Monitor Message Viewer Figure 2 3 If a similar message to that displayed on the console for the failed node gt is present select that message and observe the More Information display This display hasaSuggested Fix field which may Yes indicate applicable procedures to correct the condition indicated by the message y Is a procedure indicated in Suggested Fix field Perform indicated procedure Figure 2 2 PDB Cluster Troubleshooting Flow Diagram Troubleshooting Overview 2 13 Figure 2 3 PDB Cluster Monitor Message Viewer Window 2 14 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Cluster Monitor Front Panel fi es ces Figure 2 4 PDB Cluster Monitor Front Panel Troubleshooting Overview Graphical picture area Footer area 2 15 2 16 Cluster Monitor Item Properties Figure 2 5 PDB Cluster Monitor Item Properties Window 2 7 1 PDB Fault Classes and Principal Assemblies Ultra Enterprise 2 PDB Cluster troubleshooting is dependent on several different principal assemblies and classes of faults The fault classes and their associated assemblies are e SPARCstorage MultiPack faults e Data disk drives e SCSI 2 cables and SunSwift SBus Adapters e SPARCstorage Array faults e Data disks e Array controller e Fibre Channel Optical Modules FC OM Fibre Channel SBus cards FC S e Fiber optic cables and
86. ons 3 10 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 3 24 25 26 27 Swap the FC OM s from the SPARCstorage Array and the FC S card in the node Power up the disk array and node Install the loopback connector on the FC OM on the node Test only the slots that contain an FC OM Caution Do not run the loopback tests on a FC OM that is not looped back This action may cause disk errors or unpredictable results a If slot A has an FC OM enter the following at the ok prompt ok soc txrx extb b If slot B has an FC OM in the FC S card enter the following at the ok prompt ok soc txrx exta 28 29 e If you see the message passed go to Step 28 If you see the message failed replace the FC OM from the appropriate slot on the FC S card Following replacement of the FC OM have the system administrator return the node to the cluster Replace the fiber optic cable Refer to Chapter 5 Major Subassemblies in the SPARCstorage Array Model 100 Series Service Manual for cable replacement instructions Replace the cable and bring up the applicable disk array See Section 7 3 2 Complete Disk Array Startup Hardware Troubleshooting 3 11 lll Qo 30 At the ok prompt enter the following commands ok false to diag switch ok false to fcode debug ok Ctrl telnet gt send break ok reset 31 Have the system admin
87. ot surfaces Avoid contact Surfaces are hot and may cause personal injury if touched A terminal to which alternating current or voltage may be applied Protective earth conductor Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 6 FUSE REPLACEMENT For continued protection against risk i MARKING of fire and electric shock replace ONLY with same type and rating of fuse 6 3 System Precautions a a gt Prior to servicing this equipment ensure that you are familiar with the following precautions Ensure that the voltage and frequency of the power outlet to be used matches the electrical rating labels on the cabinet Wear antistatic wrist straps when handling any magnetic storage devices or system boards Only use properly grounded power outlets as described in the Ultra Enterprise 2 Cluster Hardware Planning and Installation Guide Caution DO NOT make mechanical or electrical modifications to the chassis Sun Microsystems is not responsible for regulatory compliance of modified cabinets Caution Power off the equipment as directed in Chapter 7 Shutdown and Restart Procedures before performing any of the procedures described in this book Caution Before servicing a power supply or power sequencer ensure that the chassis AC power cord is removed from the AC wall socket However when servicing low voltage circuitry such as a system board the AC power cor
88. prise 2 Cluster Hardware Service Manual April 1997 Preface This manual provides servicing instructions for the Ultra Enterprise 2 Clusters These instructions are designed for experienced and qualified maintenance personnel How This Book Is Organized Part 1 System Information Chapter 1 Product Description describes the clusters standard features system configurations and internal and external options Part 2 Troubleshooting Chapter 2 Troubleshooting Overview describes the overall architecture for troubleshooting the system Chapter 3 Hardware Troubleshooting provides procedures for the isolation of various faults relative to major system components Chapter 4 Software Troubleshooting describes software troubleshooting and provides references to lists of error messages generated by the software Chapter 5 Diagnostics describes on line diagnostics and scripts for verifying hardware installation XV Part 3 Preparing for Service Chapter 6 Safety and Tools Requirements provides safety precautions and a list of required tools Chapter 7 Shutdown and Restart Procedures contains procedures for shutting down and restarting the Ultra Enterprise 2 server SPARCstorage Array SPARCstorage MultiPack and the terminal concentrator Part 4 Subassembly Removal and Replacement Chapter 8 Internal Access provides a guide to the procedure
89. prise Cluster PDB Administration Guide or the Solstice HA 1 3 User s Guide 1 Have the system administrator remove the node from the cluster The server can then be shut down as indicated in the following procedure Caution To avoid damaging internal circuits do not connect or disconnect any cable while power is applied to the system Exceptions to this are the fiber optic and private net cables 2 Halt the system using the appropriate HA or PDB commands 3 Wait for the system halted message and the boot monitor prompt 4 Turn off the AC power switch on the back of the server Figure 7 1 5 Disconnect the private net cables Figure 7 2 6 Use the running node to detach one of the MultiPacks Use the vxdiskadm command of the CVM or VxVm to detach the MultiPack Figure 7 2 Private net Powered down a Node running node without N a vxdiskadm root disk pe n Detached MultiPack Multipack N eee ee Figure 7 2 First MultiPack Detached 7 Power off the detached storage device Figure 7 14 Shutdown and Restart Procedures 7 3 lll N are 8 Physically disconnect the SCSI cable that goes from the detached MultiPack to the powered down node at the powered down node Figure 7 3 Powered down NY Running node node Pa New node with old root disk
90. r was unable to complete due to insufficient system virtual address mapping resources or kernel memory space for some of its internal structures The host adapter s associated with these messages will not be functional soc driver 4020 soc alloc of request queue failed soc driver 4040 soc DVMA request queue alloc failed soc driver 4050 soc alloc of response queue failed soc driver 4060 soc DVMA response queue alloc failed soc driver 4070 soc alloc failed soc driver 4090 soc alloc failed soc driver 4100 soc DMA address setup failed soc driver 4110 soc DVMA alloc failed These messages indicate there are not enough system DVMA or kernel heap resources available to complete driver initialization The associated host adapter s will be inoperable if any of these conditions occurs Firmware and Device Driver Error Messages B 3 lll s soc attach 4001 soc attach failed device in slave only slot soc attach 4002 soc attach failed hilevel interrupt unsupported soc driver 4001 soc Not self identifying The SBus slot into which the host adapter is installed cannot support the features required to operate the SPARCstorage Array The host adapter should be relocated to a different SBus slot If you see this error message it s possible that you are running an unsupported configuration for example you may have the SPARCstorage Array connected to a server that is not supported B 2 2 pln Driver
91. rdware again after you see this message Transport error FCP_RSP_SCSI_PORT_ERR The firmware on the SPARCstorage Array controller has detected the failure of the associated SCSI interface chip Any I O operations to drives connected to this particular SCSI bus will fail If you see this message you may have to replace the array controller Transport error Fibre Channel Offline soc link 6010 soc port Fibre Channel is ONLINI Fl If you see these messages together the system was able to recover from the error so no action is necessary Transport error Fibre Channel Offline Transport error Fibre Channel Online Timeout If you see these messages together an I O operation to a SPARCstorage Array drive has failed because the fibre channel link has become inoperable The driver will detect the transition of the link to an inoperable state and will then initiate a time out period Within the time out period if the link should become usable again any waiting I O operations will be resumed However if the time out should expire before the link becomes operational any I O operations will fail Firmware and Device Driver Error Messages B 7 B 8 The time out message means that the host adapter microcode has detected a time out on a particular I O operation This message will be printed and the associated I O operation will fail only if the retry count of the driver for this class
92. rocedures 7 11 lll N 4 After POST has completed ask the system administrator to restart all drive trays within the array and then rejoin the node to the cluster During the power on self test POST the Q o A H a POST and service icons are displayed in the it upper left corner of the LCD display The four _ alphanumeric LCDs display the code for the currently running POST test If problems are detected during POST an error code flashes continuously on the alphanumeric LCDs See Section 3 2 3 SPARCstorage Array Fails to Communicate for a listing and explanation of POST error After POST is completed the following 1 7 information will be displayed in this order R g e The last four digits of the World Wide Name for the particular SPARCstorage Array Two fiber icons which indicate the status of the fiber links e A drive icon solid bar for each installed drive in the drive trays During normal operation you should see the same icons solidly displayed on the front panel display Figure 7 13 LCD Display While Powering On the System 7 3 3 Single Drive and Tray Shutdown Note The procedure for a single disk is the same as that for a tray that is prior to replacing a disk within a tray you must first spin down all drives in the tray 7 12 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 N lll 1 Have
93. roubleshooting 3 19 lll Qo Node 0 Node 1 hme0 Private net 1 hme0 qe0 qe0 snoop Private net 2 Figure 3 7 Private Net 1 Troubleshooting Part 1 4 Use the snoop command on node 1 nodel snoop d qe0 If the following string is returned by snoop most likely the onboard le0 port on node 1 is defective This message string indicates that the le0 port of node 0 and the cable for private net 1 cable are functional e In this instance request that the system administrator remove node 1 prior to replacing the related SBus card Once the card is replaced indicate to the system administrator that node 1 is ready to be returned to the cluster node0 priv1 gt nodel priv1 UDP D 6666 S 6666 LEN 120 3 20 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll Node 0 Node 1 hmed Private net 1 hmeO Private net 2 40 snoop qed Figure 3 8 Private Net 1 Troubleshooting Part 2 If the string indicated in step 5 is not returned by the snoop command then connect the private net 1 cable between the ge0 ports of both nodes 5 Following this continue using the snoop command on node 1 snoop will be run as initiated in step 5 until interrupted by a CTRL C If the message string indicated in step 5 is repeated then the le0 port on
94. rques d pos es ou enregistr es de Sun Microsystems Inc aux Etats Unis et dans d autres pays Toutes les marques SPARC utilis es sous licence sont des marques d pos es ou enregistr es de SPARC International Inc aux Etats Unis et dans d autres pays Les produits portant les marques SPARC sont bas s sur une architecture d velopp e par Sun Microsystems Inc Les utilisateurs d interfaces graphiques OPEN LOOK et Sun ont t d velopp s de Sun Microsystems Inc pour ses utilisateurs et licenci s Sun reconna t les efforts de pionniers de Xerox Corporation pour la recherche et le d veloppement du concept des interfaces d utilisation visuelle ou graphique pour l industrie de l informatique Sun d tient une licence non exclusive de Xerox sur l interface d utilisation graphique cette licence couvrant aussi les licenci s de Sun qui mettent en place les utilisateurs d interfaces graphiques OPEN LOOK et qui en outre se conforment aux licences crites de Sun Biblioth que XPM Copyright 1990 93 GROUPE BULL L utilisation la copie la modification et la distribution de ce logiciel et de sa documentation a quelque fin que ce soit sont autoris es a titre gracieux condition que la mention du copyright ci dessus apparaisse dans tous les exemplaires que cette mention et cette autorisation apparaissent sur la documentation associ e et que l utilisation du nom du GROUPE BULL des fins publicitaires ou de distribution soit soumise dans tou
95. s Not Respond to ping Command Hardware Troubleshooting 3 27 lll Qo y 3 28 The terminal concentrator does not connect to a cluster node First check the serial cable connection between the cluster node and the terminal concentrator No Connection good Correct problem and verify proper operation Check if the port is being used Connect a serial cable from the administration workstation to port 1 of the terminal concentrator Type tip aina shell tool window Type who at the monitor prompt You should see a list of current users on each port Check to see whether another process is running on the port in question Yes Some other workstation is connected y to the port Contact the workstation owner to free up the port Is another proces running on the port No Figure 3 13 Branch B Terminal Concentrator Cannot Connect to a Node Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll Switch the serial cable at the cluster node end with the serial cable from the cluster node that is alive Telnet to the node that was alive Is there a response from the previously alive node No Yes The problem is in the cluster node Repair the node Return the serial cables to their original positions The problem is the serial cable or the terminal concentrator Switch the same serial cables at the t
96. s les cas a une autorisation pr alable et crite Le GROUPE BULL ne donne aucune garantie relative a l aptitude du logiciel r pondre une utilisation particuli re Le logiciel est fourni en l tat sans garantie explicite ou implicite CETTE PUBLICATION EST FOURNIE EN L ETAT SANS GARANTIE D AUCUNE SORTE NI EXPRESSE NI IMPLICITE Y COMPRIS ET SANS QUE CETTE LISTE NE SOIT LIMITATIVE DES GARANTIES CONCERNANT LA VALEUR MARCHANDE L APTITUDE DES PRODUITS A REPONDRE A UNE UTILISATION PARTICULIERE OU LE FAIT QU ILS NE SOIENT PAS CONTREFAISANTS DE PRODUITS DE TIERS RO Qa Adobe PostScript Contents lig fs oe eee ee ee a ee ee eee eer ey eee Xv 1 Product Description 13303 5ts lt lt cadawvenserereeoew eer eans 1 1 1 1 Ultra Enterprise 2 Cluster Using SPARCstorage Arrays 1 1 1 1 1 Minimum Hardware Required for an Ultra Enterprise 2 Cluster Using SPARCstorage Arrays 1 3 1 1 2 Ultra Enterprise 2 Cluster Optional Devices 1 4 1 2 Ultra Enterprise 2 Cluster Using SPARCstorage Wi IV ACh Seon os eth peeeted tess cee eather ete 1 4 1 2 1 Minimum Hardware Required for an Ultra Enterprise 2 Cluster using SPARCstorage MultiPacks 1 6 1 2 2 Ultra Enterprise 2 Cluster Optional Devices 1 7 2 Troubleshooting Overview ssessessssssssereseo 2 1 2 1 Troubleshooting a Remote Site n on nunnur 2 1 2 2 Troubleshooting Philosophy 050005 2 3 2 3 Maintenance Authorization 6 0 eee 2 4
97. s necessary to access system components during removal and replacement Chapter 9 Major Subassemblies contains procedures for the removal and replacement of system subassemblies and parts Part 5 Illustrated Parts Breakdown Chapter 10 Illustrated Parts Breakdown provides illustrations of the major replaceable parts in a system and lists part numbers Part 6 Appendixes and Index Appendix A Connector Pinouts and Cabling provides a list of pinouts and cabling for Ultra Enterprise 2 Cluster Server specific items Appendix B SPARCstorage Array Firmware and Device Driver Error Messages provides a list of SPARCstorage Array error messages specific to the firmware and device driver When You Need Help with UNIX Commands xvi This manual may not include specific software commands or procedures Instead it may name software tasks and refer you to operating system documentation or the handbook that was shipped with your new hardware The type of information that you might need to use references for includes Shutting down the system Booting the system Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Configuring devices Other basic software procedures See one or more of the following e Solaris 2 x Handbook for SMCC Peripherals contains Solaris 2 x software commands e AnswerBook on line documentation system for the complete set of documentation suppor
98. sters Three serial cables e Administration workstation Ethernet cables Administration workstation SPARCstorage MultiPacks Figure 1 4 Ultra Enterprise 2 Cluster Hardware Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 1 2 2 Ultra Enterprise 2 Cluster Optional Devices CD ROM drive Additional disk drives second internal drive and disk drives in SPARCstorage MultiPacks Tape drive SunFastEthernet SFE SBus card for the public net HA only SunFDDI 5 0 SAS DAS SBus card for the public net HA only SCI SBus Adapter card for the private net PDB only Product Description 1 7 1 8 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Troubleshooting Overview 2 An Ultra Enterprise 2 Cluster comprises redundant online components that operate continuously when an assembly or device fails To maintain the high level of availability failed components must be replaced as soon as possible Usually single node cluster operation must continue during maintenance procedures 2 1 Troubleshooting a Remote Site Use telnet to communicate with either node in the cluster via the terminal concentrator For example telnet terminal concentrator name The normal response is Trying ip_address Connected to tc_lm Escape character is If you get the following message telnet connect Connection refused 2 1 2 2
99. superuser and use the admin command to enter the administrative mode indicated by the admin prompt The superuser password at this step is the IP address set using the addr command above for example 192 9 200 5 annex su Password the password does not display annex admin Annex administration MICRO XL UX R7 0 1 8 ports admin 3 34 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Qo lll 12 Set the following port parameters Note This command line is case sensitive Be sure to enter this line exactly as shown admin set port 1 8 mode slave type dial_in imask_7bits Y You may need to reset the appropriate port Annex subsystem or reboot the Annex for changes to take effect admin 13 Quit the administrative mode and then reboot the terminal concentrator admin quit annex boot bootfile lt return gt warning lt return gt xxx Annex terminal concentrator IP address shutdown message from port 1 K kK Annex terminal concentrator IP address going down IMMEDIATELY Note The terminal concentrator will not be available for a minute or two until it completes booting 14 Quit the tip program by pressing Return followed by a tilde and a period lt return gt EOT The return tilde period key sequence does not echo as entered however you will see the tilde after you enter the period
100. t A will actually test the FC OM in the slot labeled B and vice versa 18 Only test the slots that contain an FC OM a If slot A has an FC OM enter the following at the ok prompt ok soc txrx extb b If slot B has an FC OM enter the following at the ok prompt ok soc txrx exta e If you see the message passed go to Step 20 If you see the message failed replace the FC OM from the appropriate slot on the FC S card according to the instructions given in the processor service manual that came with your system Following replacement of the FC S card have the system administrator return the node to the cluster Note The SPARCstorage Array diagnostics can only check the FC OMs on the node Therefore in the following steps you switch the FC OMs from the SPARCstorage Array with the FC OMs from the FC S card on the node 19 Repeat steps 8 through 18 for each FC OM module Steps 13 14 and 16 can be skipped 20 Remove the loopback connector from the FC OM on the node 21 Power down the node and disk array 22 Remove the FC OM s from the FC S card in the node For the necessary instructions refer to the Ultra 2 Series Service Manual 23 Remove the FC OM s from the SPARCstorage Array taking care to keep them separate from the FC OM s that you just removed from the node Refer to Chapter 5 Major Subassemblies in the SPARCstorage Array Model 100 Series Service Manual for those instructi
101. t Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue 100 8232 loopback localhost 1042674 0 1042674 0 0 0 hme0 1500 204 152 64 0 ha lewis priv1 564258 0 563153 719 59 0 ged 1500 204 152 65 0 ha lewis priv2 248295 0 247619 1 0 0 gel 1500 mpk17 network 75 ha lewis 3723131 0 1345255 0 22784 0 gel 1 1500 mpk17 network 75 relo lewis 0 0 0 0 0 0 gel 2 1500 mpkl7 network 75 relo martin 0 0 0 0 0 0 3 18 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Node 0 node0 priv1 hmeOd node0 priv2 qed Private net 1 failed Private net 2 Figure 3 6 Private Net 1 Failure To troubleshoot private net 1 to a defective card or cable in an HA cluster 1 Have the system administrator prepare a node for removal from the cluster Node 1 hme0 nodel priv1 qe0 nodel priv2 Note In this procedure node 1 is removed from the cluster When there is one node remaining in a cluster the software will continue to send messages across the private nets The following procedure uses these message packets to confirm communication between the nodes For this example assume that the software recovers on private net 2 Figure 3 7 2 Remove the private net 2 cable cable between the qe0 ports of both nodes 3 Connect the private net 1 cable cable for failed net between the hme0 port of node 0 and the qe0 port of node 1 Hardware T
102. the system administrator prepare the SPARCstorage Array containing the disk or tray for servicing and spin down all drives in the tray 2 Once all drives in the tray have stopped remove the tray to access individual drives for service 7 3 4 Single Drive and Tray Startup 1 Ask the system administrator to start all drives in the tray 2 Resynchronize the mirrors in the that tray to put the disks in the tray back in service 7 4 SPARCstorage MultiPack Refer to the SPARCstorage MultiPack Service Manual 7 4 1 Single Drive Shutdown The SPARCstorage MultiPack contains up to 12 drives To replace a drive you do not have to power down the MultiPack 1 Have the system administrator prepare the SPARCstorage MultiPack for servicing 2 Replace the drive as directed in the SPARCstorage MultiPack Service Manual 7 4 2 Complete MultiPack Shutdown work on the SPARCstorage MultiPack This connection provides a ground path Caution Do not disconnect the power cord from the utility outlet when you that prevents damage from uncontrolled electrostatic discharge 1 Prior to powering down a complete SPARCstorage MultiPack you must first have the system administrator prepare the MultiPack for servicing indicate which MultiPack is going to be replaced 2 After the system administrator has prepared the MultiPack for servicing turn off the AC power switch on the MultiPack Figure 7 14 Shutdown and Restart Procedures 7 13
103. ti Giese 4 2 4 3 4 SPARCstorage Array Failures 4 2 4 3 5 SPARCstorage MultiPack Failures 4 2 Contents v 4 3 6 NFS or Other Data Service Failures 4 3 5 DIB PNOSUCS 6 ih dint eee anaes Rea mse hae Hs Ke ERT RRR O RS 5 1 5 1 Failure Diagnosis and Confirmation of Component Repair Using SONnV TS sisi caceceeteenteniadadentiea ane es 5 1 5 2 Verify HA 1 3 Configuration Using the hacheck 1m Command 0 00 cc cee e eee 5 1 5 3 Verify PDB Confipurationia i4sas oi os crue eet ous sets 5 1 6 Safety and Tools Requirements 0 0 eee ee eens 6 1 6 1 Safety Precautions i u ure cures cede ee eweew sees ees 6 1 6 2 Symbols siaa erie EERDE veer eat EEEE 6 2 6 3 Syst m Precautions cue oh oledes uno buey ehaaeeeees 6 3 6 4 Wools Required cccnergnseeeneetesaeyveyucrnws Sau 6 4 7 Shutdown and Restart Procedures 0eeeeeeee 7 1 7 1 Ultra Enterprise 2 Servers cis bitte eden 7 2 7 1 1 Server Shutdown with SPARCstorage Arrays 7 2 7 1 2 Server Shutdown with SPARCstorage MultiPacks and a Spare Ultra Enterprise 2 Server 7 2 las Servet Stat P neneme a ARAR EKRAR 7 6 7 2 Component Replacement without a Spare Ultra Enterprise 2 SEVE e orrei aAA AN e pE Ea eee A OEGE 7 7 7 2 1 Server Shutdown os une es ope cee eae et Perea oes 7 8 722 DEVE ANUP axis habe beees wre teineeanemaes 7 10 7 3 SPARCstorage Array cc2a5 seen eeewe sens ung maaan ul 7 10 7 3 1
104. ting the Solaris 2 x operating environment Other software documentation that you received with your system Typographic Conventions The following table the typographic changes used in this book Typeface or Symbol Meaning Example AaBbCc123 The names of commands files Edit your login file and directories on screen Use ls a to list all files computer output machine_name You have mail AaBbCc123 What you type contrasted machine_name su with on screen computer Password output AaBbCc123 Command line placeholder To delete a file type rm filename replace with a real name or value AaBbCc123 Book titles new words or Read Chapter 6 in User s Guide terms or words to be These are called class options emphasized You must be root to do this Preface xvii Shell Prompts in Command Examples Table P 1 shows the default system prompt and superuser prompt for the C shell Bourne shell and Korn shell Table P 1 Default System and Superuser Prompt Shell Prompt C shell prompt machine_name C shell superuser prompt machine_name Bourne shell and Korn shell prompt Bourne shell and Korn shell superuser prompt Related Documentation Table P 2 lists the documents which contain information that may be helpful to the system administrator and service provider Table P 2 List of Related Documentation Product Family Title Part Number Ultra 2 Server Series Sun Ultra 2 Series Hardware
105. tion Manual Solstice HA 1 3 Software Programmer s Guide Ultra Enterprise 2 Cluster Hardware Service Manual Solstice HA 1 3 New Product Information Ultra Enterprise 2 Cluster PDB Preparation Binder Set Getting Started roadmap Ultra Enterprise Cluster PDB Software Planning and Installation Guide Ultra Enterprise 2 Cluster PDB Hardware Planning and Installation Manual 802 2027 802 2028 801 2029 802 2031 802 4427 802 4428 802 4429 802 4430 825 3494 802 6317 805 0317 802 6313 805 0318 802 6316 805 0629 825 3833 805 0428 802 6790 802 6313 Preface xix Table P 2 List of Related Documentation Continued Product Family Title Part Number Terminal Concentrator Solstice Disksuite SunVTS Diagnostic Other Referenced Manuals Ultra Enterprise 2 Cluster PDB System Binder Set Ultra Enterprise Cluster PDB Administration Guide Ultra Enterprise Cluster PDB Volume Manager Administration Guide Ultra Enterpris 2 Cluster Hardware Service Manual Ultra Enterprise Cluster Messages PDB Binder Set Ultra Enterprise Cluster PDB Error Messages Manual Ultra Enterprise PDB 1 2 Release Notes shipped with Ultra Enterprise PDB 1 2 CD ROM Terminal Concentrator Binder Set Terminal Concentrator Installation Notes Terminal Concentrator General Reference Guide Solstice Disksuite 4 1 Binder Set Solstice Disksuite 4 1 User s Guide Solstice Disksuite 4 1 Reference Guide Solstice Disksuite 4 1 Installation Pr
106. trator a Connect the power network and serial cables to the terminal concentrator b Power on the terminal concentrator by using the AC power switch located on the back panel Figure 9 1 Major Subassemblies 9 3 9 9 5 Cluster Cabling 9 4 Refer to Chapter 7 of the Ultra Enterprise 2 Cluster Hardware Planning and Installation Guide for details on cabling the terminal concentrator the private networks and the SPARCstorage Array optical connections or SPARCstorage MultiPack SCSI 2 connections Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Illustrated Parts Breakdown 10 Figure 10 1 shows the main components of the Ultra Enterprise 2 Cluster Table 10 1 lists the replaceable parts and the documents in which the replacement procedures are located Figure 10 1 Ultra Enterprise 2 Cluster Main Components 10 1 10 Note HA servers use SPARCstorage Arrays and associated SBus adapters and cables In addition to SPARCstorage Arrays PDB servers can also use SPARCstorage MultiPacks and associated SBus adapters and cables Table 10 1 Replaceable Parts List and Documentation Cross Reference Document Part Key Description Part Number Reference Number 1 Ultra 2 Server Ultra 2 Series Server Service Manual 802 2561 FC S SBus card 595 3213 801 6316 SQEC SBus card 605 1520 801 7123 FC OM module 595 3214 801 6326 SunSwift SBus card 605 1568 802 6021 SCI SBus Adapter 530 2345 802 6313
107. tware enables one node to take over when a critical hardware or software failure is detected When a failure is detected an error message is sent to the system console When a takeover occurs the node assuming control becomes the I O master for the diskset of the failed node and redirects the clients of the failed node to itself The troubleshooting flow for a takeover is shown in Figure 2 1 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Service provider notified Service provider i Fault detected Requests system administrator to prepare node for service Solstice HA y Isolates fault for hardware re software fers to Chapter 3 Hardware Migrates diskset Troubleshooting for software y refers to Chapter 4 Software Troubleshooting Restores data service y y Shuts down applicable assembly refers to Chapter 7 Shutdown Migrates logical node name and Restart Procedures Y Acknowledges configuration Replaces faulty part using Chap y ter 9 Major Subassemblies Migrates logical node name y y Requests system administra tor to return node to cluster Services requests and returns surviving node to client y System administrator performs switchover Cluster returns to HA bot
108. ual Chapter 2 Diagnostics for Troubleshooting SPARCstorage Array Model 100 Series Service Manual Chapter 2 Troubleshooting Ultra 2 Series Service Manual Chapter 2 SunVTS as well as Chapter 3 Troubleshooting Procedures Section 3 6 Terminal Concentrator and Serial Connection Faults SBus Quad Ethernet Controller Manual SunSwift SBus Adapter Installation User s Guide Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 Part Number 802 4430 802 2206 802 2561 802 6316 801 7123 802 6021 No lll 2 5 6 HA Error Messages Symptoms Table 2 2 lists error messages or symptoms probable cause and troubleshooting references Table 2 2 HA Error Messages and Symptoms Troubleshooting Error Message Symptom Probable Cause Cluster Service Reference Reference Processor Node Either node reboots Ultra 2 Server Section 3 4 Node Failures Ultra 2 Series Service boot disk failure loss of performance meter response from one node Private Net var adm messages 0Apr 23 Cable For cabling details See Ultra 2 12 04 52 ha jan unix hme0 Server Hardware Planning and Link Down cable problem Installation Manual Chapter 5 Hardware Installation var adm messages 0Apr 23 SQEC or cable Section 3 5 1 Private Network 12 04 52 ha jan unix qe0 No Failure carrier twisted pair cable problem or disabled hub link test var adm messages 0Apr 23 SunSw
109. uctions marked on the equipment Ensure that the voltage and frequency rating of the power outlet you use matches the electrical rating label on the equipment Only use properly grounded power outlets Never push objects of any kind through openings in the equipment as they may touch dangerous voltage points or short out components that could result in fire or electric shock Refer servicing of equipment to qualified personnel To protect both yourself and the equipment observe the precautions in Table 6 1 6 1 lll O Table 6 1 Safety Precautions Item Problem Precaution Wrist or ESD Wear a conductive wrist strap or foot strap when handling printed circuit foot strap boards ESD mat ESD An approved ESD mat provides protection from static damage when used with a wrist strap or foot strap The mat also cushions and protects small parts that are attached to printed circuit boards Cover panels System damage Re install all cover panels after performing any service work on the system and overheating SBus slot covers System damage Install SBus slot covers in all unused SBus slots and overheating 6 2 6 2 Symbols WARNING A A HOT SURFACE AC PROTECTIVE EARTH Hazardous voltages are present To reduce the risk of electrical shock and danger to personal health follow the instructions A risk of personal injury data loss or equipment damage exists Follow the instructions CAUTION H
110. urn to an operable state so that pending Firmware and Device Driver Error Messages B 9 operations can be completed However if sufficient time elapses following the transition of the link to off line without a corresponding on line transition the driver will fail the I O operations associated with the formerly connected SPARCstorage Array It is normal to see the ONLINE message for each connected SPARCstorage Array when the system is booting soc link 1010 soc message Peripheral devices on the Fibre Channel like the SPARCstorage Array can cause messages to be printed on the system console syslog under certain circumstances Under normal operation at boot time the SPARCstorage Array will display the revision date of its firmware following a fibre channel login This message will be of the form soc link 1010 soc message SSA EEprom date Fri May 27 12 35 46 1996 Other messages from the controller may indicate the presence of warning or failure conditions detected by the controller firmware B 4 2 pln Driver Transport error Received P_RJT status but no header Transport error Fibre Channel P_RJT Transport error Fibre Channel P_BSY These messages indicate the presence of invalid fields in the fibre channel frames received by the host adapter This may indicate a fibre channel device other than Sun s fibre channel device for the SPARCstorage Array The m
111. ve and Tray Shutdown page 7 13 Single Drive and Tray Startup page 7 13 SPARCstorage MultiPack page 7 13 Single Drive Shutdown page 7 13 Complete MultiPack Shutdown page 7 14 Complete MultiPack Startup page 7 14 Terminal Concentrator page 7 15 7 1 Ultra Enterprise 2 Server 7 1 1 Server Shutdown with SPARCstorage Arrays 1 Have the system administrator remove the node from the cluster any cable while power is applied to the system Exceptions to this are the fiber Caution To avoid damaging internal circuits do not connect or disconnect optic and private net cables 2 Halt the system using the appropriate commands 3 Wait for the system halted message and the boot monitor prompt 4 Turn off the AC power switch on the back of the server Figure 7 1 O AY Ma 0 CERD 0 we eee eel C Jo 0 UJ OC 99000 Figure 7 1 Server AC Power Switch 7 2 Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 N lll 7 1 2 Server Shutdown with SPARCstorage MultiPacks and a Spare Ultra Enterprise 2 Server AN EE New node with old root disk For the procedure to remove the root disk from the node to be removed and to install the disk into the new node refer to the Ultra Enter
112. ver 2 Replace the defective SCSI controller Refer to the Ultra 2 Series Service Manual 3 Have the system administrator return the node to the cluster See Section 7 1 3 Server Startup 3 4 Node Failures 3 14 3 4 1 System Board and Boot Disk For system board or boot disk failures messages on the system console or Cluster Monitor identify the malfunctioning node You can further isolate this class of faults by referring to the troubleshooting procedures in the Ultra 2 Series Service Manual Ultra Enterprise 2 Cluster Hardware Service Manual April 1997 3 After determining which part is defective use the following procedure to replace the part 1 Have the system administrator prepare the node for replacement of a processor part 2 After the node has been removed from the cluster you can shut down the server to replace a defective boot disk system board UltraSPARC processor module SBus card SIMM and so forth Use the server shutdown procedure to avoid interrupting other cluster components See Section 7 1 1 Server Shutdown with SPARCstorage Arrays 3 Replace the defective device as indicated in the Ultra 2 Series Service Manual 4 Bring up the applicable server See Section 7 1 3 Server Startup 5 Have the system administrator return the node to the cluster 3 4 2 Using the probe scsi Command Use this command to verify operation of a new or replaced SCSI 2 device
Download Pdf Manuals
Related Search
Related Contents
Katalog 2015 Spec SC Vibraphone User Guide Bedienungsanleitung herunterladen SMC EZ Connect Wireless Adapter Untitled Click here to the SHAR Chromatic Tuner`s (ST100) manual Photocell beam sensor User Manual (P5101 V10) VizRoom User's Manual - University of Alberta Copyright © All rights reserved.
Failed to retrieve file