Home

AlphaServer 1200 DIGITAL Ultimate Workstation 533 Service Manual

image

Contents

1. StorageWorks PKW0514 97 Drives Shelf System Overview 1 36 The StorageWorks drives are to the right of the system cage Up to seven drives fit into the shelf The system supports fast wide Ultra SCSI disk drives The RAID controller is also supported With an optional Ultra SCSI Bus Splitter Kit the StorageWorks shelf can be split into two buses System Overview 1 37 Chapter 2 Power Up This chapter describes system power up testing and explains the power up displays The following topics are covered Control Panel Power Up Sequence SROM Power Up Test Flow SROM Errors Reported XSROM Power Up Test Flow XSROM Errors Reported Console Power Up Tests Console Device Determination Console Power Up Display Fail Safe Loader PowerUp 2 1 2 1 Control Panel The control panel display indicates the likely device when testing fails Figure 2 1 Contol Panel and LCD Display Control Panel PO TEST 11 CPUO PKW0510 97 e When the On Off button LED is on power is applied and the system is running When it is off the system is not running but power may or may not b
2. PCI Bus 0 System to System to ed Bus 64 Bits PCI Bus us 64 Bits K pridae 0 Bridge 1 A IOD ODI aie PCI Slot k gt System L PCI Slot 2 ge Motherboard PCI Slot k gt L PCI Slot EISA Note When the EISA ISA slot on PCI Bus 0 is used the last Pz le gt PCI Slot Bus oe PCI slot on PCI Bus 1 is not available EISA Slot 4 i v XBUS XBUS BDATA Xceivers Xceivers Real Time SMPO VQ Mouse 12C Bus nvram Flash Clock parallel port Keyboard Interface 8Kx8 RA floppy cntrl PKW0502 97 System Overview 1 8 Both systems use the Alpha chip for the CPU The CPU memory and I O devices connect to the system motherboard On the system motherboard is e The system bus e Two system bus to PCI bus chip sets that bridge two PCI buses to the system bus e Two 64 bit PCI buses with three PCI options slots each e One EISA ISA bus bridged to one of the PCIs If an EISA ISA option is used one PCI slot cannot be used e One CD ROM controller built in to the other PCI e One EISA ISA to XBUS bridge to the built in XBUS options A fully configured system can have two CPUs eight DIMM memory pairs and a total of six I O options The I O options can be all PCI options or a combination of PCI options and a single EISA ISA option The system bus has a 144 bit data bus protected by 16 bits of ECC and a 40 bit c
3. Power Supply J30 Cover Interlock Push button ON OFF f H F Switch J2 OCP pack S J7 Cover DC_ENABLE_L Interlock Motherboara Switch PKW0503 97 NOTE The cover interlock must be engaged to enable power up To override the cover interlock use a suitable object to close the interlock circuit Disk damage will result if the system is run with the top cover off System Overview 1 3 12 Operator Control Panel and Drives The control panel includes the On Off Halt and Reset buttons and an LCD display Figure 1 3 Control Panel Assembly CD ROM peoo Floppy gt OCP Display gt Jel _ lel Jel Pam PKW 0501 97 OCP display The OCP display is a 16 character LCD that indicates status during power up and self test While the operating system is running the LCD displays the system type Its controller is on the XBUS CD ROM The CD ROM drive is used to load software firmware and updates Its controller is on PCI1 on the PCI backplane on the system motherboard Floppy disk The floppy drive is used to load software and firmware updates The floppy controller is on the XBUS on the PCI backplane on the system motherboard System Overview 1 4 On Off button Powers t
4. RCM LI iI oo jae Switchpack SET DEF 4 RPD DIS 3 j T E ee P MODEMOFF 2 4 VAUX irom EN RCM ias power supplies l 1 PKW0504C 97 System Overview 1 24 The system allows both local and remote control A set of switches enables or disables remote control Table 1 2 Remote Control Switch Functions Switch Condition Function 1 EN RCM On default Allows remote system control Off Does not allow remote system control 2 Modem Off On Disables the RCM modem port Off default Enable the RCM modem port 3 RPD DIS On Disables remote power down Off default Enables remote power down 4 SET DEF On Resets the RCM microprocessor defaults Off default Allows use of conditions set by the user The default settings allow complete remote control The user would have to change the switch settings to any other desired control See Appendix C for information on controlling the system remotely The remote console manager connects to a modem through the modem port on the bulkhead The RCM uses VAUX power provided by the system power supplies The standard I O ports keyboard mouse COM1 and COM2 serial ports and parallel ports are on the same bulkhead System Overview 1 25 1 8 5 Power Control Logic The power control section of the motherboard controls power sequencing and monitors power supply voltage system temperature and fans Figure 1 13 Power Control Logic Syste
5. Power Supply J30 Cover Interlock wa Push button ON OFF J Switch J2 OCP pack i DC_ENABLE_L Motherboard PKW0503A 97 System Overview 1 28 Figure 1 14 shows the distribution of power throughout the system Opens in the circuit or the RCM signal RCM_DC_EN_L or a power supply detected power fault interrupt DC power applied to the system The opens can be caused by the On Off button or the cover interlock A failure anywhere in the circuit will result in the removal of DC power A potential failure is the relay used in the remote control logic to control the RCM_DC_EN_L signal The cover interlock is located under the top cover between the system card cage and the storage area To override the interlock place a suitable object in the interlock switch that closes it System Overview 1 29 1 10 Power Supply Two power supplies provide system power Figure 1 15 Back of Power Supply and Location Power Current Supply share y Power PCS Supply 0 HH l 5V Return IN 12V Return Hil 5V Return 4 3 4V Return Misc Signal PKW0513 97 System Overview 1 30 Description Two power supplies each provide 450 W to the system Redundant power is not available at this time Power Supply Features e 88 132 and 176 264 Vrms AC input e 450 watts output Output voltages are as follows Output Voltage Min Voltag
6. System Motherboard PKW0518 97 Removaland Replacement 616 Removal Shut down the operating system and power down the system Expose the card cage side of the system see Section 6 3 Remove both memory riser cards Remove all CPUs Remove all PCI and EISA options DE Ce She From the back of the cabinet using a Phillips head screwdriver unscrew the four screws holding the CPU and memory riser card brace from the system frame Remove the brace 7 Unplug all cables connected to the motherboard and clear access to all screws holding the motherboard in place 8 Using a Phillips head screwdriver unscrew the eleven screws holding the motherboard in place and remove it from the system Note the two guide studs one in the upper right corner and the other in the lower left corner that protrude through holes in the motherboard Replacement Reverse the steps in the Removal procedure Verification Power up the system press the Halt button if necessary to bring up the SRM console and issue the show device command at the console prompt to verify that the system sees all system options and peripherals Removaland Replacement 6 17 6 9 PCI EISA Option Removal and Replacement Figure 6 8 Removing PCI EISA Option by Slot Cover J Screws es e l ZA LY V Z S SF Option Card gt IP00225 WARNING To prevent fire use only modules with current limited outputs See National Electrical
7. B cache Enor CPU Enon TEST ERR on cpu0d FRU cpud rr 2 tst 11 exp 5555555555555555 rcv aaaaaaaaaaaaaaaa adr f ff8 CPU running the test Expected data Received data B cache location error occurred Memory Enor Memory Module Indicated 20 621 04 TEST ERR on cpu0 FRU MEMI1L err c tst 21 22 aa Ay CPU running test Low member of memory pair 1 Memory testing complete on cpu0 Memory Configuration Enor Operator Enor ERR mem pair0 misconfigured ERR mem_pairl card size mismatch ERR mem_pair6 card type mismatch ERR mem _pairl EMPTY FEPROM Failures PCI Enor Sctr 1 PAL headr PTTRN fail Sctr 1 PAL headr CHKSM fail Sctr 1 PAL code CHKSM fail Sctr 3 CONSLE headr PTTRN fail Sctr 3 CONSLE headr CHKSM fail Sctr 3 CONSLE code CHKSM fail Power Up 2 15 2 7 Console Power Up Tests Once the SRM console is loaded it tests each IOD further Table 2 5 describes the IOD power up tests and Table 2 6 describes the PCI power up tests Table 2 5 IOD Tests Test Name Test Description 1 IOD CSR Access test Read and write all CSRs in each IOD 2 Loopback test Dense space writes to the IOD s PCI dense space to check the integrity of ECC lines 3 ECC test Loopback tests similar to test 2 but with a varying pattern to create an ECC of Os Single and double bit errors are
8. cece eeseeeseecsseeceseeeeseecseecsseecsseeseseeessaeeesaeees C 23 C 7 Modem Dialog Details 00 0 0 eee eeseccsseeceseeeeeecsaeecsseeceseecesaeeesaeeesaeers C 26 Index Examples 2 1 SROM Errors Reported at Power Up ssesecesscccesseeeseessneessseeeeseeeesaes 2 11 2 2 XSROM Errors Reported at Power Up escceesseecsseeseseeeeseeessaeeesneers 2 15 3 1 Test Command Syntax sisson ons se EE TEETE Ea EEATT 3 6 3 2 Releasing Reestablishing Secure Mode eseeeseeseeereeeresereerreerrrerrrerrreee 3 7 3 3 Sample Test Command essseesseesesssreesieertseresereseresrrestrrsteesrrssresereesreseees 3 8 3 4 Sample Test Memory Command sseesseeseeesreesreererereseresereserereresereesese 3 10 3 5 Sample Test Command for PCI sseeseeeeeeeeesreeseereseresereseressreseresereeerese 3 12 3 6 Show POWEL Sperei ore veh aes tateee sale teats sits esalotests tens Uebetemsistentiesontels 3 14 3 7 Show Memory tee AWA NER 3 14 3 8 SHOW FRU ccs ooccs AEN deter bay sated etes adh stoned deans bessoten E 3 15 4 1 MCHK 670 wiceiehiveiteseotisienleeetindinesicivali tine havaianas 4 12 42 MCHK 670 CPU and IOD Detected Failure 0 see eeeeeeseeeeneeeeneeeeee 4 17 4 3 MCHK 670 Read Dirty Failure cee eeeeeeseeseseeceseeceseeeesseessaeesseeeens 4 22 4 4 MCHK 660 IOD Detected Failure System Bus Error cceeecees 4 28 4 5 MCHK 660 IOD Detected Failure PCI Error ccccccesssseeeceeeeeseesneee 4 3
9. i SRM console loaded into memory SROM tests execute SRM console tests execute l XSROM loaded into each CPU s S cache SRM console either remains in the system or loads AlphaBlOS console PKW0432B 96 Definitions SROM The SROM is a 128 Kbit ROM on each CPU module The ROM contains minimal diagnostics that test the Alpha chip and the path to the XSROM Once the path is verified it loads XSROM code into the Alpha chip and jumps to it XSROM The XSROM or extended SROM contains back up cache and memory tests the I O subsystem tests for embedded devices and a fail safe loader The PowerUp 2 4 XSROM code resides in sector 0 of FEPROM 0 on the XBUS Sector 2 of FEPROM 0 contains a duplicate copy of the code and is used if sector 0 is corrupt Code for sizing DIMM memory resides in sector 1 of FEPROM 0 along with the PAL code FEPROM Two 1 Mbyte programmable ROMs FEPROMS are on the XBUS on PCIO FEPROM 0 contains two copies of the XSROM the OpenVMS and DIGITAL UNIX PAL code and the SRM console and decompression code FEPROM 1 contains the AlphaBIOS and NT HAL code See Figure 2 3 These two FEPROMs can be flash updated Refer to Appendix A Figure 2 3 Contents of FEPROMs FEPROM 0 Sector XSROM and Fail Safe Idr Pal Code XSROM DIMM XSROM and Fail Safe Idr Decompress SRM Console S Code
10. a ih Y IP00215 WARNING CPU modules and memory modules have parts that operate at high temperatures Wait 2 minutes after power is removed before touching any module Removaland Replacement 6 8 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 Remove the memory riser card next to the CPU you are removing see Section 6 6 4 Loosen the two captive screws holding the module to the card cage 5 The CPU is held in place with levers at both ends simultaneously pull the levers away from the module handle and pull the CPU from the cage Replacement Reverse the steps in the Removal procedure Verification DIGITAL UNIX and OpenVMS Systems 1 Bring the system up to the SRM console by pressing the Halt button if necessary 2 Issue the show cpu command to display the status of the new module Verification Windows NT Systems 1 Start AlphaBIOS Setup select Display System Configuration and press Enter 2 Using the arrow keys select MC Bus Configuration to display the status of the new module Removaland Replacement 6 9 6 5 CPU Fan Removal and Replacement Figure 6 4 Removing CPU Fan PKW 0516 97 Removaland Replacement 610 Removal 1 Follow the CPU Removal and Replacement procedure 2 Unplug the fan from the module 3 Remove the four Phillips head screws holding the fan to the Alpha chip s heatsink
11. The show power command can be used to identify power temperature and fan faults Example 3 6 Show Power P00 gt gt gt show power Power Supply Power Supply System Fans CPU Fans Temperature Current ambient temperatu System shutdown temperatu The system was last reset via a sys 0 Environmental events are logged i 0 1 Statu good good good good good S r is set r is 20 degrees C to 55 degrees C tem software reset n nvram The show memory command shows memory DIMMs and their starting addresses Example 3 7 Show Memory P00 gt gt gt show memory Slot 0 1 2 3 Total Type DIMM DIMM DIMM DIMM MB Base 256 0 256 20000000 256 40000000 256 60000000 1 2GB Troubleshooting 3 14 The show fru command lists all FRUs in the system Example 3 8 Show FRU POO gt gt gt show fru Digital Equipment Corporation AlphaServer 1200 Console V5 0 2 OpenVMS PALcode V1 19 12 Digital UNIX PALcode V1 21 20 Module Part Type Rev Name Serial System Motherboard 25147 01 0 0000 mthrbrdo NI72000047 Memory 256 MB DIMM N A 0 0000 mem0 N A Memory 256 MB DIMM N A 0 0000 meml N A Memory 256 MB DIMM N A 0 0000 mem2 N A Memory 256 MB DIMM N A 0 0000 mem3 N A CPU 4MB Cache B3007 AA 3 0000 cpu KA705TRVNS Bridge IODO IOD1 25147 01 600 0032 iod0 iod1 NI72000047 PCI Motherboard 25147 01 a 0003 saddled NI72000047 Bu
12. 5085553332 The dial out string has the following requirements The string cannot exceed 47 characters Enclose the entire string following the set rem_dialout command in quotation marks Enter the characters ATDT after the opening quotation marks Do not mix case Enter the character X after AT if the line to be used also carries voice mail The valid characters for the dial out string are the characters on a phone keypad 0 9 and A comma requests that the modem pause for 2 seconds and a semicolon is required to terminate the string The elements of the dial out string are explained in Table C 3 Managing the System Remotely C 17 Table C 3 Elements of the Dial Out Sting ATXDT 9 15085553333 EFEFEF 5085553332 AT Attention X Forces the modem to dial blindly not look for a dial tone Enter X if the dial out line modifies its dial tone when used for services such as voice mail D Dial T Tone for touch tone Pause for 2 seconds In the example 9 gets an outside line Enter the number for an outside line if your system requires it Dial the paging service Pause for 12 seconds for paging service to answer Message usually a call back number for the paging service Return to command mode Must be entered at end of string Managing the System Remotely C 18 C 5 Using the RCM Switc hpack The RCM operating mode is controlled by a swi
13. Console Commands and Environment Variables B3 B 3 Halt Assertion A halt assertion allows you to disable automatic boots of the operating system so that you can perform tasks from the SRM console Under certain conditions you might want to force a halt assertion A halt assertion differs from a simple halt in that the SRM console remembers the halt The next time you power up the system ignores the SRM power up script nvram and ignores any environment variables that you have set to cause an automatic boot of the operating system The SRM console displays this message Halt assertion detected NVRAM power up script not executed AUTO_ACTION BOOT RESTART and OS_TYPE NT ignored if applicable Halt assertion is useful for disabling automatic boots of the operating system when you want to perform tasks from the SRM console It is also useful for disabling the SRM power up script if you have accidentally inserted a command in the script that will cause a system problem These conditions are described in the sections Disabling Autoboot and Disabling the SRM Power Up Script You can force a halt assertion using the Halt button the RCM halt command or the RCM haltin command Observe the following guidelines for forcing a halt assertion Halt Assertion with Halt Button or RCM Halt Command Press the Halt button on the local system or enter the RCM halt command from a remote system while the system is powering
14. EL_STAT ceecsceesseceeeeneeeeeenees 5 2 External Interface Address Register EI_ADDR cccsecceeeteeeeeeenees 5 6 MC Error Information Register 0 MC_ERRO Offset 800 5 8 MC Error Information Register 1 MC_ERR1 Offset 840 0 0 5 9 CAP Error Register CAP_ERR Offset 880 0 eeeeseeeeeesreeeeeeeees 5 11 PCI Error Status Register 1 PCI_LERR1 Offset 1040 ee 5 14 Chapter6 Removal and Replacement 6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6 11 6 12 6 13 6 14 6 15 6 16 6 17 6 18 6 19 System Salety stati eile iether ae ee ae 6 1 FRU List s c33 soni a nina a iti rey eine Ri ear Sat 6 2 SYSLEMLEXPOSULE arosine aere neea eaeoe eene ae Seea eane Tei eoe ah aoe e Saa Soes 6 6 CPU Removal and Replacement sseseeeseeereeereeereserserrreressreseresereseresrese 6 8 CPU Fan Removal and Replacement ceecesesceesseeceneeeeseeeesneessaeers 6 10 Memory Riser Card Removal and Replacement 0 0 0 0 eeeseseeeseeeeneeeeee 6 12 DIMM Removal and Replacementt esceescesseeesseeseneeceseeeesaeersneers 6 14 System Motherboard Removal and Replacement c ceseceseeeesneeeeeee 6 16 PCI EISA Option Removal and Replacement sseeeceseseeeneeseneeeeee 6 18 Power Supply Removal and Replacement esceeseceeseeeeseeeneeeeeeeeesees 6 20 Power Harness Removal and Replacement eeseeeseeceseeeeseeeseeeeeees 6 22 System Fan
15. FEPROM Failures PCI Motherboard Enon Sector 0 failures XSROM flash unload failure Sctr 0 XSROM headr PTTRN fail Sctr 0 XSROM headr CHKSM fail Sctr 0 XSROM code CHKSM fail Sector 2 failures XSROM recovery flash unload failure Sctr 2 XSROM headr PTTRN fail Sctr 2 XSROM headr CHKSM fail Sctr 2 XSROM code CHKSM fail PowerUp 2 11 2 5 XSROM Power Up Test How Once the SROM has completed its tests and verified the path to the FEPROM containing the XSROM code it loads the first 8 Kbytes of XSROM into the primary CPU s S cache and jumps to it Figure 2 6 XSROM Power Up Howchart XSROM banner to OCP console device it Run memory texts Print trace to OCP console dev Clear SC_FHIT force hit Print errors to OCP console dev Enable all 3 S cache banks Done message to console dev Run B cache tests Boot processor Print errors to OCP console dev redetermination Done message to console dev i Boot processor Primary redetermination verifies checksum of PAL decomp console ji code Initialize B cache Pass and enable duplicate tag Primary unloads PAL decompression code or fail safe loader depending Fail an Fail safe loader y F upon results of checksum Size system memory through I squared C bus l Primary jumps to PALcode and starts the console y Print mem info to
16. MCHK 670 CPU detected failure 4 11 MCHK 670 read dirty failure 4 21 MCHK while in PAL 4 57 Memory 1 12 addressing 1 14 addressing rules 1 15 DIMM removal and replacement 6 14 DIMMs 1 15 operation 1 13 option configuration rules 1 13 variants 1 13 riser card removal and replacement 6 12 Memory DIMMs 1 12 6 3 Memory errors corrected read data error 4 53 read data substitute error 4 53 Memory pairs 1 13 Memory riser card 6 3 removal and replacement 6 12 Memory tests 2 14 2 21 Memory broken 4 53 Modem dial in procedure C 5 dialog details C 26 using in RCM C 3 Index 3 N Node IDs 4 56 NVRAM 2 3 2 8 O Operating the system remotely C 2 Operator control panel 1 4 removal and replacement 6 28 os_type environment variable SRM 2 7 2 23 P Page table entry invalid error 4 52 PALcode 2 23 PALcode described 4 57 PCI Error Status Register 1 5 14 PCI master abort 4 52 PCI parity error 4 52 PCI slot numbering 1 23 PCI system error 4 52 PCI EISA option removal and replacement 6 18 PCI_ERR Register 5 14 PIO buffer overflow error PIO_OVFL 4 51 Power circuit 1 28 failures 1 29 Power cords 6 4 Power error conditions 1 27 Power faults 1 33 Power harness removal and replacement 6 22 Power problems at power up 3 5 Power supply 1 30 fault protection 1 31 removal and replacement 6 20 voltages 1 31 Power system components 6 3 Power up down sequence 1 33 powero
17. Number of CPUs mpnum Event validity Event severity Entry type CPU Minor class Software Flags Active CPUs Hardware Rev System Serial Number lodule Serial Number lodule Type System Revision achine Check Reason EI STAT whip16 x00000016 AlphaServer 4000 1200 Series x00000002 CPU logging event mperr x00000000 1 by O S claims event is valid 3 High Priority 100 CPU Machine Check Errors 3 Bcache error 630 entry x00000000 x00000003 x00000000 C1563 x0000 x00000000 x0086 Alpha Chip Detected ECC Err From B Cache xFFFFFFF085FFFFFF DATA SOURCE IS BCACHE D ref fill EV56 Chip Rev 5 EI ADDRESS xFFFFFF00138D85EF FIL SYNDROME x00000000000800 4 ISR x0000000100200000 WHOAMI x00000000 Module Revision 0 MID 0 GID 0 Sys Environmental Regs x00000000 Base Addr of Bridge x00000000 Dev Type amp Rev Register x06008021 CAP Chip Revision x00000001 Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg MC Error Info Register 0 x00000000 MC Bus Trans Addr lt 31 4 gt 0 MC Error Info Register 1 x00000000 MC bus trans addr lt 39 32 gt x00000000 CAP Error Register IPA Status Register MDPA Error Syndrome Reg PB Status Register MDPB Error Syndrome Reg PALcode Revision MC Command is Illegal Illegal Device ID 2 x00000000 x00000000 x00000000 MDPA Status Register Data Not Valid x0000
18. The poweroff command is equivalent to pressing the On Off button on the control panel to the off position RCM gt poweroff If the system is already powered off or if switch 3 RPD DIS on the switchpack has been set to the on setting disabled this command has no immediate effect To power the system on again after using the poweroff command you must issue the poweron command If for some reason it is not possible to issue the poweron command the local operator can start the system as follows 1 Press the On Off button to the off position and disconnect the power cord 2 Reconnect the power cord and press the On Off button to the on position Managing the System Remotely C 11 poweron The poweron command requests the RCM to power on the system The poweron command is equivalent to pressing the On Off button on the control panel to the on position For the system power to come on the following conditions must be met e AC power must be present at the power supply inputs e The On Off button must be in the on position e All system interlocks must be set correctly The RCM exits command mode and reconnects the user s terminal to the system console port RCM gt poweron Focus returned to COM port NOTE If the system is powered off with the On Off button the system will not power up The RCM will not override the off state of the On Off button If the system is already powered on the poweron command has no effec
19. monitors and controls the system remotely The control logic resides on the system board The RCM is a separate console from the SRM and AlphaBIOS consoles The RCM is run from a serial console terminal or terminal emulator A command interface lets you reset halt and power the system on or off regardless of the state of the operating system or hardware You can also use RCM to monitor system power and temperature You can invoke the RCM either remotely or through the local serial console terminal Once in RCM command mode you can enter commands to control and monitor the system Only one RCM session can be active at a time e To connect to the RCM remotely you dial in through a modem enter a password and then type an escape sequence that invokes RCM command mode You must set up the modem before you can dial in remotely e To connect to the RCM locally you type the escape sequence at the SRM console prompt on the local serial console terminal When you are not monitoring the system remotely you can use the RCM dial out alert feature With dial out alerts enabled the RCM dials a paging service to alert you about a power failure within the system CAUTION Do not issue RCM commands until the system has powered up If you enter certain RCM commands during power up or reset the system may hang In that case you would have to disconnect the power cord at the power outlet You can however use the RCM halt command during power up to f
20. 00000000 00000000 00000000 00000000 000047cf ffffff00 0000 000 00000000 ff7fefff ftfftere 3 O4ffffff fffffff0O 000000a7 00000000 0004eaef ffffff00 0210 0214 0318 031c 0320 0324 0338 033c 0340 0344 0348 034c 0350 0354 0358 035c 0360 0364 0368 036c 0370 0374 0378 037c 0380 0384 0388 038c 0390 0394 0398 039c 03a0 03a4 03a8 O03ac 03b0 03b4 03b8 03be 03c0 03c4 03c8 03cc 03d0 03d4 03d8 03dc 03e0 03e4 03e8 03ec 03 0 03f4 03 8 03fc 0400 0404 0410 0414 0418 041c Eror Logs 4 53 Example 4 9 INFO 5Command P00 gt gt gt info 5 cpu00 per_cpu logout area mchk crd_flag mchkS crd_flag 4 mchkScrd_offsets mchkS crd_offsets 4 mchk crd_mchk_code mchk crd_mchk_code 4 mchkS crd_ei_stat k crd_ei_stat 4 k crd_ei_addr k crd_ei_addr 4 k crd_fill_syn kScrd_fill_syn 4 kScrd_isr kScrd_isr 4 mchk flag mchk flag 4 mchkS isr mchkS isr 4 mchkS icsr mchkSicsr 4 mchk ic_perr_stat mchk ic_perr_stat 4 mchk dc_perr_stat mchk dc_perr_stat 4 mchk va k va 4 k mm_stat k mm_stat 4 k sc_addr k sc_addr 4 k sc_stat k sc_stat 4 mchk bc_tag_addr mchk bc_tag_addr 4 mchk ei_addr mchk ei_addr 4 mchk fill_syn mchk fill_syn 4 mchk ei_stat mchk ei_stat 4 mchk 1d_lock mchk ld_lock 4 3 Q PRR yyy ORRO 3 Q poppy ap IOD 0 base address 9e0000000 WHOAMI 0000003a PCI_REV CAP_CTL 02490fb1 HAE_MEM INT_CTL 00000003
21. 1 1 PCI Motherboard Sot Numbering Slot PCIO PCI1 1 PCI to EISA ISA Internal CD ROM bridge controller 2 PCI slot PCI slot PCI slot PCI slot 4 PCI slot PCI slot The logic for two PCI buses is on each PCI motherboard e PCIO is a 64 bit bus with a built in PCI to EISA ISA bus bridge PCIO has three PCI slots and one EISA ISA slot When the EISA ISA slot is used PCI slot 4 on PCI bus 1 is not available An 8 bit XBUS is connected to the EISA ISA bus On this bus there is an interface to the system I C bus mouse and keyboard support an I O combo controller supporting two serial ports the floppy controller and a parallel port a real time clock two 1 Mbyte flash ROMs containing system firmware and an 8 Kbyte NVRAM e PCII is a 64 bit bus with a built in CD ROM SCSI controller with three PCI slots Cable connectors to the CD ROM the floppy and the OCP are on the motherboard Connectors for the mouse keyboard two COM ports the serial port and a modem are on the system bulkhead The bulkhead is part of the system motherboard System Overview 1 23 1 8 4 Remote Control Logic A section of the motherboard provides remote control operation of the system A four switch switchpack enables or disables remote control features Figure 1 12 Remote Control Logic System Motherboard
22. 1 System Documentation Title Order Number User and Installation Documentation Kit QZ 011AA GW AlphaServer 1200 User s Guide EK AS120 UG AlphaServer 1200 Basic Installation EK AS120 IG User and Installation Documentation Kit QZ 013AA GW DIGITAL Ultimate Workstation 533 User s Guide EK UW120 UG DIGITAL Ultimate Workstation 533 Basic Installation EK UW 120 IG Service Information AlphaServer 1200 DIGITAL Ultimate Workstation EK AS120 SV 533 Service Manual Information on the Intemet Using a Web browser you can access the AlphaServer InfoCenter at http www digital com info alphaserver products html Access the latest system firmware either with a Web browser or via FTP as follows ftp ftp digital com pub Digital Alpha firmware Interim firmware released since the last firmware CD is located at ftp ftp digital com pub Digital Alpha firmware interim xii Chapter 1 System Overview The DIGITAL AlphaServer 1200 and DIGITAL Ultimate Workstation 533 systems are made from the same base system unit The base unit consists of up to two CPUs up to 2 Gbytes of memory 6 I O slots and up to 7 SCSI storage devices Both systems are enclosed in pedestals AlphaServer 1200 systems can be mounted in a standard 19 rack AlphaServer 1200 systems support OpenVMS DIGITAL UNIX and Windows NT Ultimate Workstation 533 systems support Windows NT and graphics Topics in this chapter include the following S
23. 2 Summary of SRM Console Commands Command Function alphabios Loads and starts the AlphaBIOS console boot Loads and starts the operating system clear envar clear password continue crash deposit edit examine halt help info num initialize Ifu Resets an environment variable to its default value Sets the password to 0 Resumes program execution Forces a crash dump at the operating system level Writes data to the specified address Invokes the console line editor on a RAM file or on the nvram file power up script Displays the contents of a memory location register or device Halts the specified processor Same as stop Displays information about the specified console command Displays various types of information about the system Info shows a list describing the num qualifier Info 3 reads the impure area that contains the state of the CPU before it entered PAL mode Info 5 reads the PAL built logout area that contains the data used by the operating system to create the error entry Info 8 reads the IOD and IOD1 registers Resets the system Runs the Loadable Firmware Update Utility Continued on next page Halts Console Commands and Environment Variables B 6 Table B 2 Summary of SRM Console Commands Continued Command Function login Turns off secure mode enabling access to all SRM console commands during the current session man Displays information about the specif
24. 3 2 2 SROM Tests eine a R a a aaa a 2 10 2 3 XSROM VESUSs aasre e eere E E aE RSS ahis 2 13 2 4 Memory BEE TE E 2 14 2 5 LOI D A KS E E E A E 2 16 2 6 PCIE Motherboard TestSanieiri ii ainan aE aA E Ta EO 2 17 4 1 Types of Error Log Events rnrn e e eea e eE EEE ES eSEE Eoi Ents 4 5 4 2 DECevent Report Formats 0c555 scccdessccsptisscsebesscvapbesscvavdasecesposaceevbesaeds 4 10 4 3 CAP Error Register Data Pattern ee eeceeeseseseeceneeeeseeeesaeessaeeseneeeeee 4 47 4 4 System Bus ECC Error Data Pattern oe eee cece eceesseeeeeeseeeeeeeneees 4 48 4 5 System Bus Nonexistent Address Error Troubleshooting eeee 4 49 4 6 Address Parity Error Troubleshooting ccscseseeceseecesneeeeeesseesseeeeee 4 50 4 7 Cause of PIO_OVFL E ror cccccccececcccccccceessssecccccseeeesseccccssseeeneeesseees 4 51 viii 4 8 4 9 4 10 5 1 5 2 5 3 5 4 5 6 6 1 A 1 A 2 A 3 B 1 B 2 B 3 C 1 C 2 C 3 C 4 KCC Syndrome Bits Tables cies esses cise crniee stint a 4 54 Decoding Commands csceescceeseeceseecesseecsseecsacecsneeceseeeesaeessaeesseeenes 4 55 Node IDS a ects vest estee ee dl cores ccc Ee RE SEa ae stat ei pests toasted EREEREER 4 56 External Interface Status Register esseeeseeeeeeeeereseresrresererrrrsrrrsrerererere 5 4 Loading and Locking Rules for External Interface Registers 0 5 7 MC Error Information Register 0 ceceesecesecsseeeeseeeesseecsseesseeesneeeesaes 5 8 MC
25. 31 0 gt even for a PCI DAC cycle When the PCI_ERR_VALID bit in CAP_ERR is clear the contents are undefined 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1312 11 10 09 08 07 06 05 04 03 02 01 00 Failing Address ADDR lt 31 0 gt PKW0551C 97 Table 5 6 PCI Eror Status Register 1 Initial Name Bits Type State Description ADDR lt 31 0 gt lt 31 0 gt RO 0 Contains address bits lt 31 0 gt of the transaction on the PCI bus when an error is detected Eror Regisers 514 Chapter 6 Removal and Replacement This chapter describes removal and replacement procedures for field replaceable units FRUs 6 1 System Safety Observe the safety guidelines in this section to prevent personal injury CAUTION Wear an antistatic wrist strap whenever you work on a system WARNING When the system interlocks are disabled and the system is still powered on voltages are low in the system but current is high Observe the following guidelines to prevent personal injury 1 Remove any jewelry that may conduct electricity before working on the system 2 If you need to access the system card cage power down the system and wait 2 minutes to allow components in that area to cool Removaland Replacement 61 6 2 FRU List Figure 6 1 shows the locations of FRUs and Table 6 1 lists the part numbers of all field replaceable units Figure 6 1 System FRU Locations Disks OCP and Se Display S pe TAa CD R
26. 4 1 Summary of SRM Environment Variables Environment variables pass configuration information between the console and the operating system Their settings determine how the system powers up boots the operating system and operates Environment variables are set or changed with the set envar command and returned to their default values with the clear envar command Their values are viewed with the show envar command The SRM environment variables are specific to the SRM console Table B 3 Environment Variable Summary Environment Variable Function auto_action bootdef_dev boot_osflags com _baud console cpu_enabled ew 0_mode ew 0_protocols kbd_hardware_ type kzpsa _host_id language Specifies the console s action at power up a failure or a reset Specifies the default boot device string Specifies the default operating system boot flags Changes the default baud rate of the COM1 or the COM2 serial port Specifies the device on which power up output is displayed serial terminal or graphics monitor Enables or disables a specific secondary CPU Specifies the connection type of the default Ethernet controller Specifies network protocols for booting over the Ethernet controller Specifies the default console keyboard type Specifies the default value for the KZPSA host SCSI bus node ID Specifies the console keyboard layout Continued on next page Halts Console Commands and En
27. 40 bit command address bus a 128 bit plus ECC data bus and several control signals and clocks The system bus is part of the system motherboard Figure 1 9 System Bus Block Diagram MEMO ROW SIM_ADR ADR ngi COL SYNC DATA DRAMS CTRL A MEM CTRL amp CNTRL ARB MC Bus Control CPU1 MC ADR CPUO lt 39 4 gt A CTRL L MC DATA ADR a P EVADA lt 127 0 gt A EV DATA PCI ISA MC to PCI Bridge PCI ISAO IODO PCH IOD1 PKW0506 97 System Overview 1 18 The system bus consists of a 40 bit command address bus a 128 bit plus ECC data bus and several control signals clocks and a bus arbiter The bus requires that all CPUs have the same high speed oscillator providing the clock to the Alpha chip The system bus connects up to two CPUs up to eight DIMM memory pairs on two riser cards and two I O bus bridges The system bus clock is provided by an oscillator on the CPU in slot CPUO This oscillator is adjusted to maintain the system bus at a 66 MHz speed no matter what the speed of the CPU is The system bus backplane initiates memory refresh transactions Five volt 3 43 volt and 12 volt power is provided directly to the motherboard from the power supplies System Overview 1 19 1 8 2 System Bus to PCI Bus Bridge The bridge
28. 6 x800ED800 MC bus trans addr lt 39 32 gt x00000000 MC_Command x00000018 Device Id x0000003B Eror Logs 417 CAP Error Register PCI Bus Trans Error Adr IPA Status Register DPA Error Syndrome Reg PB Status Register DPB Error Syndrome Reg IOD SUBPACKET gt WHOAMI Base Address of Bridge Dev Type amp Rev Register MC PCI Command Register Memory Host Addr Exten IO Host Addr Extension Interrupt Control Interrupt Request Interrupt Mask Register 0 Interrupt Mask Register 1 MC Error Info Register 0 MC Error Info Register 1 CAP Error Register PCI Bus Trans Error Adr IPA Status Register DPA Error Syndrome Reg MC er xC0000000 x000003FD x00000000 x00000000 x00000000 x00000000 x000000BF x000000FBEO x06008021 x46480FF1 ror info valid Uncorrectable ECC err det by MDPB MC error info latched MDPA Chip Revision x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 MDPB Chip Revision x00000000 MPDB Error Syndrome of uncorrectable read error x00000000 x00000000 x00000000 x00000000 Cycle Cycle Cycle Cycle 0 ECC Syndrome 1 ECC Syndrome 2 ECC Syndrome 3 ECC Syndrome IOD 1 Register Subpacket Device ID x0000003F Bcache Size 2MB VCTY ASIC Rev 0 Module Revision 0 000000 CAP Chip Revision Host to PCI Revision x00000003 I O Backplane Rev
29. Code NFPA 70 or Safety of Information Technology Equipment Including Electrical Business Equipment EN 60 950 Removaland Replacement 6 18 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 To remove the faulty option Disconnect cables connected to the option Remove cables to other options that obstruct the option you are removing Unscrew the small Phillips head screw securing the option to the card cage Slide it from the system Replacement Reverse the steps in the Removal procedure Verification DIGITAL UNIX and OpenVMS Systems 1 3 Power up the system press the Halt button if necessary to bring up the SRM console and run the ECU to restore EISA configuration data Issue the show config command or show device command at the console prompt to verify that the system sees the option you replaced Run any diagnostic appropriate for the option you replaced Verification Windows NT Systems 1 Start AlphaBIOS Setup select Display System Configuration and press Enter 2 Using the arrow keys select PCI Configuration or EISA Configuration to determine that the new option is listed Removaland Replacement 619 6 10 Power Supply Removal and Replacement Figure 6 9 Removing Power Supply 4 rear screws 6 32 inch ami gt o Power a J SS 2 3 Supply 1 SS M J _ o z 9 Power Supply 0 2 internal sc
30. DAC960 and the DEC_KZPSA Either device could have caused the parity error Since this is an MCHK 660 the IOD detected the error on the bus and CPUO is logging the error CPUO registers are not important in this case since it is servicing the IOD interrupt There are three devices that can put data on the system bus CPUs memory or an IOD The CAP Error Register for IOD1 saw a serious error and the MC Error Info Register was not able to captured error information The presence of PCI Subpackets informs the diagnosis summarized by NOTE The error log example has been edited to decrease its size registers of interest are in bold type The MC bus is the system bus Refer to Table 4 9 for information on decoding commands and refer to Table 4 10 for information on node IDs Eror Logs 429 Example 4 5 MCHK 660 IOD Detected Failure PCI Error Timestamp of occurrence Host name System type register Number of CPUs mpnum CPU logging event mperr Event validity Event severity Entry type CPU Minor class Software Flags Active CPUs Hardware Rev System Serial Number Module Serial Number Module Type System Revision MCHK 660 Regs Flags PCI Mask Machine Check Reason PAL SHADOW PAL SHADOW PAL SHADOW PAL SHADOW L SHADOW L SHADOW PAL SHADOW PAL SHADOW PALTEMP 0 PALTEMP 1 PALTEMP2 Q ADU BWNRO PALTEMP 22 PALTEMP 23 Exception Address Reg Exception Summary Reg Exception Mask Reg PAL
31. Data Pattem Most Likely Cause Action 1000 0000 000x xxx0 1Oxx XXXX XXXX XXXX Data sourced by MID 2 Replace CPUO0 1000 0000 000x xxx0 1 1xx XXXX XXXX XXXK Data sourced by MID 3 Replace CPU1 1000 0000 000x xxx OOXX XXXX XXXX XXXK Data sourced by MID 4 Replace Mbrd 1000 0000 000x xxx OLXx XXXX XXXX XXXK Data sourced by MID 5 Replace Mbrd ErorLogs 444 4 4 4 PIO Buffer Overflow Eror PIO_OVFL Step 5 Enter the value of the CAP_CTRL register bits lt 19 16 gt Actual_PEND_NUM in the following formula Compare the results as indicated in Table 4 7 to determine the most likely cause of the error When an IOD is implicated in the analysis of the error replace the one that capturered the error in its CAP Error Register Expected_PEND_NUM 12 2 X 1 Y Where X Number of PCIs Y Number of CPUs Table 4 7 Cause of PiO_OVFLEnor Action Comparison Most Likely Cause Actual_PEND_NUM Broken hardware on IOD Replace Mbrd Expected_PEND_NUM Actual_PEND_NUM lt Broken hardware on IOD Replace Mbrd Expected_PEND_NUM Actual_PEND_NUM gt Expected_PEND_NUM PEND_NUM setup incorrect Fix the software ErorLogs 445 4 4 5 Page Table Enty Invalid Enor Step 6 This error is almost always a software problem However if the software is known to be good and the hardware is suspected swap the motherboard 4 4 6 PCI Master Abort Step 7 Master aborts normally occur when the operating system is
32. Flt Sts Reg x0000000000005F10 If Err Reference Resulted in DTB Miss Fault Inst RA Field x000000000000001C Fault Inst Opcode x000000000000000B Scache Address Reg xFFFFFF0000018FEF Scache Status Reg x0000000000000000 Bcache Tag Address Reg xFFFFFF8061CDOFFF Last Bcache Access Resulted in a Miss Value of Parity Bit for Tag Control Status Bits Dirty Shared amp Valid is Clear Value of Tag Control Dirty Bit is Clear Value of Tag Control Shared Bit is Clear Value of Tag Control Valid Bit is Set Value of Parity Bit Covering Tag Store Address Bits is Clear Tag Address lt 38 20 gt Is x000000000000061C Ext Interface Address Reg xFFFFFFO06000050F Fill Syndrome Reg x0000000000000C0C Ext Interface Status Reg xFFFFFFFOO5FFFFFF Error Occurred During D ref Fill LD LOCK xFFFFFF00002006FF IOD SUBPACKET gt IOD 0 Register Subpacket WHOAMI x000002FA Module Revision 0 VCTY ASIC Rev 1 Bcache Size 4MB CPU 0 This Bus Bridge Phy Addr x000000F9E0000000 IOD 0 Dev Type amp Rev Register x0600A332 CAP Chip Revision x00000002 Host to PCI Revision x00000003 Command Register x46480FF1 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host Bus to PCI Bridg MC PCI Module Self Test Passed LED On Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address
33. Harness cseeeesecesscecsseeceseeeesseecseecsneeceseeeeseeeesaeers 6 22 6 11 Removing System Fan eeseeesceceseeceseecseecscecsseecsseeeesaeessaeesseeeees 6 24 6 12 Removing Cover Interlocks 0 0 0 eee eeseeeeseceseeeeeseeceseecseeceseeeesaeeesaeers 6 26 6 13 Removing OCP eem e e e aea e dee ee Eae a a deters tees 6 28 6 14 Removing CD ROM nieces rarei E EE ET E 6 30 6 15 Removing FIOppy arrere e a a e ke 6 32 6 16 Removing StorageWorks Disk esseeseeeseseresereesresereesrrsrersrrrsreseresereseres 6 34 6 17 Removing StorageWorks Backplane seseseeeseeereeereeserssrrsrrrsresererereseres 6 36 6 18 Removing StorageWorks Ultra SCSI Bus Extendet 0 ee eeeeeeeeeneees 6 38 A 1 Running a Utility from a Graphics Monitor eeeeeeeeeeeeeeeereeereeereeereeere A 2 A 2 Starting LFU from the AlphaBIOS Console eecceesseesseseseeeeseeeeeees A 6 A 3 AlphaBIOS Setup Screen eee ceeeseeeceesseeecessececeesseeesesseeecesseeeees A 25 A 1 System Partition Not Defined seseeeseeeeeeesesereeereserrseresrresrresrersrerrsrerrere A 29 C 1 RCM Connect Ons srr ee a abit raat enea eae ent eee C 3 C 2 Location of RCM Switchpack on System Board ceeeeeeeeeeeereeeeneees C 19 C 3 RCM Switches Factory Settings 0 0 00 eeseesseccesseeceneessseeeeseesesaeeesaeers C 20 Tables 1 1 PCI Motherboard Slot Numbering 0 0 0 0 cesceesceesseeceseeeeseeeeseeessneerseeeeee 1 23 2 1 Control Panel Display ueit a iil NEEE EE EAE EES 2
34. Ref resulted in DTB miss RA Field x0000000006 Eror Logs 426 Scache Address Reg Scache Status Reg Bcache Tag Address Reg Ext Interface Address Reg Fill Syndrome Reg Ext Interface Status Reg LD LOCK IOD SUBPACKET gt WHOAMI Base Address of Bridge Dev Type amp Rev Register MC PCI Command Register Memory Host Addr Exten IO Host Addr Extension Interrupt Control Interrupt Request Interrupt Mask Register 0 Interrupt Mask Register 1 MC Error Info Register 0 MC Error Info Register 1 Opcode Field x00000000000029 xFFFFFFO000024EAF 00000000 xXFFFFFF80FFED6FFF Parity for ds and v bits Cache block dirty Cache block valid Tag address lt 38 20 gt is x00000000000FFE xFFFFFFOOFCOOQOOO0F x0000000000C5D2 xFFFFFFFOO4FFFFFE Error occurred during D ref fill e xFFFFFF000020065F IOD 0 Register Subpacket x000000BA Device ID x0000003A Bcache Size 2MB VCTY ASIC Rev 0 Module Revision 0 x000000F 9E0000000 x06008021 CAP Chip Revision x00000001 Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg x46480FF1 Module Self Test Passed LED On Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions
35. Removal and Replacement eeeeeeesseeeesseeseneeesseesneees 6 24 Cover Interlock Removal and Replacement eeeeeeeeeeseeeeneeeeneeeeee 6 26 Operator Control Panel Removal and Replacement eseeeeeeeeeeeeee 6 28 CD ROM Removal and Replacement ceeceeeseeesseeceneeeeseeeesneessaeers 6 30 Floppy Removal and Replacement eee eeseeesceceseeceseeeesseeseeessneeeeee 6 32 SCSI Disk Removal and Replacement 0 cc eeeeeseeceeeeeesneeneeeseessneees 6 34 StorageWorks Backplane Removal and Replacement eeceeeeeeee 6 36 StorageWorks Ultra SCSI Bus Extender Removal and Replacement 6 38 Appendix A Running Utilities A l A 2 A 3 A 4 A 5 A 5 1 A 5 2 A 5 3 A 5 4 A 5 5 Running Utilities from a Graphics Monitor ee eeeeesseesseeeeeeeeeeeeeeeaee A 2 Running Utilities from a Serial Terminal cee eeceeeseeesseeeeeeeeeseeeeeees A 3 RumMiIn ECU hen he e a a a TE E E O E R A 4 Running RAID Standalone Configuration Utility 0 00 eee eeeeeeeeeeeeeeee A 5 Updating Firmware with LFU sarine eeeeeeecsseeceseeceseeeeseeeesaeessaeesseeesee A 6 Updating Firmware from the CD ROM ceseeseeeeseeceeeeeseeeseneeeseeeeee A 8 Updating Firmware from the Floppy Disk Creating the Diskettes A 12 Updating Firmware from the Floppy Disk Performing the Update A 14 Updating Firmware from a Network DeViIce esceeseeeeseeeeseeeesneeees A 18 LEU Commands 3
36. a network device Following the examples is an LFU command reference Example A 2 Booting LFU from the CD ROM POO gt gt gt show dev ncr0O polling ncrO NCR 53C810 slot 1 bus 0 PCI hose 1 SCSI Bus ID 7 dka500 5 0 1 1 DKa500 RRD46 1645 POO gt gt gt boot dka500 boot dka500 5 0 1 1 flags 0 0 block 0 of dka500 5 0 1 1 is a valid boot block Jumping to bootstrap code The default bootfile for this platform is AS1200 AS1200_LFU EXE Hit lt RETURN gt at the prompt to use the default bootfile Running Utilities A 7 A 5 1 Updating Fimware from the CD ROM Insert the update CD ROM start LFU and select cda0 as the load device Example A 3 Updating Firmware from the CD ROM xxxx TLoadable Firmware Update Utility Select firmware load device cda0 dva0 ewa0 or Press lt return gt to bypass loading and proceed to LFU cda0 oO Pleas nter the name of the options firmware files list or Press lt return gt to use the default filename AS1200FW AS1200CP Copying AS1200CP from DKA500 5 0 1 1 Copying asl1200 TCREADME from DKA500 5 Copying asl1200 TCSRMROM from DKA500 5 0l x 0 1 1 Copying asl1200 TCARCROM from DKA500 5 0 1 1 Function Description Display Displays the system s configuration table Exit Done exit LFU reset List Lists the device revision firmware name and update revision Lfu Restarts LFU Readme Li
37. amp fe0v0x0s0 2 V 34 V FC FAX Managing the System Remotely C 27 command RCM C 11 A Achitecture block diagram 1 8 2 6 alert_clr command RCM C 8 alert_dis command RCM C 8 alert_ena command RCM C 8 Alpha 21164 microprocessor 1 8 Alpha chip composition 1 11 AlphaBIOS console 1 7 loading 2 7 upgrading A 25 auto_action environment variable SRM 2 23 B B3007 AA CPU module 1 11 B3007 CA CPU module 1 11 B cache 2 21 2 23 C CAP chip 1 21 CAP Error Register 5 11 CAP Error Register Data Pattern 4 47 CAP_ERR Register 5 11 CD ROM removal and replacement 6 30 COM1 port 2 19 Command codes 4 55 Command summary SRM B 2 Console Index SRM 2 23 Console commands show fru 3 15 show memory 3 14 show power 3 14 test 3 8 test memory 3 10 test pci 3 12 Console device determination 2 18 Console device options 2 19 Console device changing 2 19 console environment variable SRM 2 21 2 23 Console power up tests 2 16 Control panel 2 2 display 2 21 Halt assertion 1 5 Halt button 1 4 1 5 messages in display 2 3 Reset button 1 5 Controls Halt assertion 1 5 Halt button 1 5 On Off button 1 4 Reset button 1 5 Cover interlock 1 3 1 28 overriding 1 29 removal and replacement 6 26 CPU module 1 10 configuration rules 1 11 fan removal and replacement 6 10 removal and replacement 6 8 variants 1 11 CPU modules 1 9 6 3 Index 1 D Data path chip 1 21 DECeve
38. bit is cleared and the registers are not unlocked or cleared Software must reexecute the IPR read sequence On the second read operation error bits are in 0 1 x state all the related IPRs are unlocked and EI_STAT is cleared Eror Registers 5 7 5 3 MC Eror Information Register 0 MC_ERRO Offset 800 The low order MC bus system bus address bits are latched into this register when the system bus to PCI bus bridge detects an error event If the event is a hard error the register bits are locked A write to clear symptom bits in the CAP Error Register unlocks this register When the valid bit MC_ERR_VALID in the CAP Error Register is clear the contents are undefined 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1312 11 10 09 08 07 06 05 04 03 02 01 00 0 Failing Address ADDR lt 31 4 gt PKW0551 97 Table 5 3 MC Enor Information Register 0 Initial Name Bits Type State Description ADDR lt 31 4 gt lt 31 4 gt RO 0 Contains the address of the transaction on the system bus when an error is detected Reserved lt 3 0 gt RO 0 Eror Registers 58 5 4 MC Eror Information Register 1 MC_ERR1 Offset 840 The high order MC bus system bus address bits and error symptoms are latched into this register when the system bus to PCI bus bridge detects an error If the event is a hard error the register bits are locked A write to clear symptom bits in the CAP Error Register unlocks this regis
39. console device and run AlphaBIOS as the console program During power up the SROM and the XSROM always send progress and error messages to the OCP and to the COM serial port if the SRM console environment variable set with the set console command is set to serial If the console environment variable is set to graphics no messages are sent to COM1 If the console device is connected to COM1 the SROM XSROM and console power up messages are sent to it once it has been initialized If the console device is a graphics device console power up messages are sent to it but SROM and XSROM power up messages are lost No matter what the console environment variable setting each of the three programs sends messages to the control panel display Messages Console Setto Sent By Serial Graphics SROM COMI Lost though a subset is sent to the OCP XSROM COMI Lost though a subset is sent to the OCP SRM console __COM1 _ VGA though a subset is sent to the OCP Changing Where the Console Output Is Displayed You can change where console output is displayed assuming the SRM console has fully powered up and the os_type environment variable is set to openvms or unix The following does not work if os_type is set to nt If the console environment variable is set to serial and no serial terminal is attached to COM1 pressing a carriage return on a graphics monitor attached to the system makes it the console device and the console prompt is sent
40. data from a CPU Bad dirty data from a CPU Bad data from MID 2 Bad data from MID 3 Bad data from MID 4 Bad data from MID 5 Bad data from MID 4 Bad data from MID 5 Bad data from MID 6 Bad data from MID 7 Go to Step 10 Go to Step 10 Replace CPU s Replace CPU s Replace CPUO Replace CPU1 Replace Mbrd Replace Mbrd Replace Mbrd Replace Mbrd Replace Mbrd Replace Mbrd Error Logs 4 42 4 4 2 System Bus Nonexistent Address Enor Step 3 Determine which node if any should have responded to the command address identified in MC_ERRI1 Perform the action indicated Table 4 5 System Bus Nonexistent Address Enor Troubleshooting MC_ERR1 Data Pattem Most Likely Cause Action 1000 0000000x XXXX XXXX XXXX OXXX XXXX 1000 00000000 xxxx xxxx xxxx 1xxx 100x 100000000000 xxxx xxxx xxxx 1xxx 101x 100000000000 xxxx xxxx xxxx 1xxx 110x 100000000000 xxxx xxxx xxxx xxx 111x Software generated an MC ADDR gt TOP_OF_MEM reg PCIO bridge did not respond PCI1 bridge did not respond PCI2 bridge did not respond PCI3 bridge did not respond Fix software Replace Mbrd Replace Mbrd Replace Mbrd Replace Mbrd Eror Logs 443 4 4 3 System Bus Address Parity Enor Step 4 Determine which node put the bad command adress on the system bus identified in MC_ERRI1 Perform the action indicated Table 4 6 Address Parity Eror Troubleshooting MC_ERRI1
41. each event Terse Provides binary event information and displays register values and other ASCII messages in a condensed format Summary Produces a statistical summary of the events in the log Fsterr Produces a one line per entry report for disk and tape devices The syntax is OpenVMS DIAGNOSE TRANSLATE lt format gt DIGITAL UNIX gt dia o lt format gt Error Logs 4 10 4 3 Eror Log Examples and Analysis The following sections provide examples and analysis of error logs 4 3 1 MCHK 670 CPU Detected Failure The error log in Example 4 1 shows the following 1 CPU logged the error in a system with two CPUs 2 During a D ref fill the External Interface Status Register logged an uncorrectable EEC error When a CPU chip does not find data it needs to perform a task in any of its caches it requests data from off the chip to fill its D caches It performs a D ref fill Bit lt 30 gt is clear indicating that the source of the error is the B cache Neither IOD CAP Error Register saw an error The error was detected by a CPU and the data was not on the system bus Otherwise the IODs would have seen the error Therefore CPU1 is broken NOTE The error log example has been edited to decrease its size registers of interest are in bold type The MC bus is the system bus Refer to Table 4 9 for information on decoding commands and refer to Table 4 10 for information on node IDs Er
42. for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYPE Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type Long ARB_MODE PCI Arbitration Round Robin 00000000 x00000000 x00000003 MC PCI Intr Enabled Device intr info enabled if en_int 1 x00800000 Interrupts asserted x00000000 Hard Error x00C50010 x00000000 x4A26DBFO MC bus trans addr lt 31 4 gt x04A26DBF x800ED600 MC bus trans addr lt 39 32 gt x00000000 MC_Command x00000016 O Device Id x0000003B MC error info valid CAP Error Register xA0000000 Uncorrectable ECC err det by MDPA MC error info latched 4 PCI Bus Trans Error Adr DPA Status Register DPA Error Syndrome Reg x00000000 x80000000 MDPA Chip Revision x00000000 MDPA Error Syndrome of uncorrectable read error x1E00001E Cycle 0 ECC Syndrome x0000000000001E Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 ErorLogs 427 PB Status Register DPB Error Syndrome Reg IOD SUBPACKET gt WHOAMI Base Address of Bridge Dev Type amp Rev Register MC PCI Command Register Memory Host Addr Exten IO Host Addr Extension Interrupt Control Interrupt Request Interrupt Mask Register 0 Interrupt Mask Register 1 x00000000 x00000000 x000000BA Cycle 3 ECC Syndrome x0000000000001E MDPB Chip Revision x00000000 Cycle 0 ECC Syndrome x000
43. formatted diskettes From an OpenVMS system copy files onto two ODS2 formatted diskettes as shown in Example A 4 Running Utilities A 13 Example A 4 Creating Update Diskettes on an OpenVMS System Console Update Diskette inquire ignore Insert blank HD floppy in DVAO then continue set verify set proc priv all init density hd index begin dva0 tcods2cp mount dva0 tcods2cp create directory dva0 copy tcreadme sys dva0 copy asl1200fw txt dva0 copy asl1200cp txt dva0 copy tcsrmrom sys dva0 copy tcarcrom sys dva0 dismount dva0 set noverify exit as1200 as1200 as1200 as1200 as1200 as1200 tcreadme sys as1200fw txt as1200cp txt tcsrmrom sys tcarcrom sys Ur Ur UY UY LU LA LNA OY OD I O Update Diskette inquire ignore Insert blank HD floppy in DVAO then continue set verify set proc priv all init density hd index begin dva0 tcods2io mount dva0 tcods2io create directory dva0 as1200 create directory dva0 options copy tcreadme sys dvaQ0 as1200 tcreadme sys copy asl200fw txt dva0 as1200 as1200fw txt copy asl200i0 txt dva0 as1200 as1200i0 txt copy cipca315 sys dva0 options cipca315 sys copy dfpaa310 sys dva0 options dfpaa310 sys copy kzpsaAl0 sys dva0 options kzpsaal0 sys dismount dva0 set noverify exit is OP aia OP Sa OP sia OP Sa OP Sia OP ia OP ia OP i OP tia OP ia OP OP ti OP tia CP a OP OP a Running Utilities A 14 A 5 3 Upd
44. indicates a data parity error ErorRegisters 55 5 2 Extemal Interface Address Register EF ADDR The EI ADDR register contains the physical address associated with errors reported by the EI_STAT register It is unlocked by a read of the EI_STAT Register This register is meaningful only when one of the error bits is set Address FF FFFO 0148 Access R lat als ol All 1s ler ala 32l EI ADDR ALIS lt 39 32 gt PKW0454 96 Eror Registers 56 Table 5 2 Loading and Locking Rules for Extemal Interface Registers Conect Unc onect Second able able Enor Hard Load Lock Action When Enor Enor Register Register El SIATIs Read 0 0 Not No No Clears and unlocks possible all registers 1 0 Not Yes No Clears and unlocks possible all registers 0 1 0 Yes Yes Clears and unlocks all registers 1 1 0 Yes Yes Clear bit c does not unlock Transition to 0 1 0 state 0 1 1 No Already Clears and unlocks locked all registers 1 1 1 No Already Clear bit c does locked not unlock Transition to 0 1 1 state These are special cases It is possible that when El_ADDR is read only the correctable error bit is set and the registers are not locked By the time EI_STAT is read an uncorrectable error is detected and the registers are loaded again and locked The value of EI_ADDR read earlier is no longer valid Therefore for the 1 1 x case when EI_STAT is read correctable the error
45. ncr810_diag Tests the CD ROM controller For both IOD tests and PCI 0 and PCI 1 tests trace and failure status is sent to the OCP If any of these tests fail a warning is sent to the SRM console device after the console prompt or AlphaBIOS pop up box The IOD LEDs on the system motherboard are controlled by the diagnostics If a LED is off a failure occurred PowerUp 2 17 2 8 Console Device Determination After the SROM and XSROM have completed their tasks the SRM console program as it starts determines where to send its power up messages Figure 2 7 Console Device Determination Howc hart Power Up Reset or P00 gt gt gt Init Le lt i oe 7 l Console Envar ag Console Envar 3 S serial graphics a Yes Yes Enable COM port 1 ee a and send messages z Yes F _ VGA adapter p VGA becomes the as system IS powering up ee 7 on P console device Enable COM port 1 and send messages as system is powering up Warning message sent if a VGA adapter is seen on PCI 1 PKW0434 96 Power Up 2 18 Console Device Options The console device can be either a serial terminal or a graphics monitor Specifically e A serial terminal connected to COM1 off the bulkhead The terminal connected to COMI must be set to 9600 baud This baud rate cannot be changed e A graphics monitor off an adapter on PCIO Systems running Windows NT must have a graphics monitor as the
46. port to start working Managing the System Remotely Continued on next page C 24 Table C 4 RCM Troubleshooting continued Symptom Possible Cause Suggested Solution RCM installation is complete but system does not power up You reset the system to factory defaults but the factory settings did not take effect The remote user sees a string on the screen The message unknown command is displayed when the user enters a carriage return by itself Cannot enable modem or modem will not answer RCM Power Control is set to DISABLE AC power cords were not removed before you reset switch 4 on the RCM switchpack The modem is confirming whether the modem has really lost carrier This occurs when the modem sees an idle time followed by a 3 followed by a carriage return with no subsequent traffic If the modem is still connected it will remain so The terminal or terminal emulator is including a linefeed character with the carriage return The modem is not configured correctly to work with the RCM The modem has been disabled on the RCM switchpack Invoke RCM and issue the poweron command Refer to Section C 5 This is normal behavior Change the terminal or terminal emulator setting so that new line is not selected Modify the modem initialization and or answer string as described in Section C 7 Refer to Section C 5 Man
47. see Section 6 3 3 Remove the power and signal cables from the Ultra SCSI bus extender on the side of the StorageWorks shelf 4 Remove the power harness and all signal cables from the StorageWorks backplane 5 Using a short Phillips head screwdriver remove the screws holding the backplane to the back of the shelf and remove from the system Replacement Reverse the steps in the Removal procedure Verification Power up the system Use the show device console command to verify that the StorageWorks shelf is configured into the system Removaland Replacement 6 37 6 19 StorageWorks Ultra SCSI Bus Extender Removal and Replacement Figure 6 18 Removing StorageWorks Ultra SCSI Bus Extender StorageWorks Backplane Ultra SCSI bus extender optional Ultra SCSI bus extender PKW0522B 97 Removaland Replacement 638 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system See Section 6 3 3 Remove the power and signal cables from the Ultra SCSI bus extender on the side of the StorageWorks shelf 4 On early systems the Ultra SCSI bus extender is stuck to the side of the StorageWorks enclosure with adhesive standoffs in later systems it is mounted on plastic standoffs to which it snaps If the system has the adhesive simply pry each corner of the extender free and
48. sizing the PCI bus However if the master abort occurs after the system is booted read PCI_LERR1 and determine which PCI device should have responded to this PCI address Replace this device 4 4 7 PCI System Enor Step 8 For this error to occur a PCI device asserted SERR Read the error registers in all the PCI devices to determine which device The PCI device that set SERR should have information logged in its error registers that should indicate a device 4 4 8 PCI Parity Eror Step 9 Read PCI_ERR1 and determine which PCI device normally uses that PCI address space Replace that device Also read the error registers in all the PCI devices to determine which device was driving the PCI bus when the parity error occurred Eror Logs 4 46 4 4 9 Broken Memory Step 10 Refer to the following sections Fora Read Data Substitute Enor unconectable ECC enon When a read data substitute RDS error occurs determine which memory module pair caused the error as follows 1 Run the memory diagnostic to see if it catches the bad memory If so replace the memory module that it reports as bad At the SRM console prompt enter the show mem command P00 gt gt gt show mem This command displays the base address and size of the memory module pair for each slot OR Read the configuration packet found in the error log to retrieve the base address and size of the memory module pair Compare this address to the failing addres
49. the RCM Follow the steps below 1 Turn off the system 2 Unplug the AC power cords NOTE If you do not unplug the power cords the reset will not take effect when you power up the system Remove the system covers See Section 6 3 Locate the RCM switchpack on the system board and set switch 4 to ON Replace the system covers and plug in the power cords ON O RP oe Power up the system to the SRM console prompt Powering up with switch 4 set to ON resets the escape sequence password and modem enable states to the factory defaults 7 Power down the system unplug the AC power cords and remove the system covers 8 Set switch 4 to OFF 9 Replace the system covers and plug in the power cords 10 Power up the system to the SRM console prompt and type the default escape sequence to invoke RCM command mode RCM 11 Reset the modem password Reset the escape sequence if desired as well as any other states Managing the System Remotely C 22 C 6 Troubleshooting Guide Table C 4 is a list of possible causes and suggested solutions for symptoms you might see Table C 4 RCM Troubleshooting Symptom Possible Cause Suggested Solution The local console terminal is not accepting input The console terminal is displaying garbage Cables not correctly installed Switch on switchpack set to disable Modem session was not terminated with the hangup command A remote RCM session is
50. the bus In this case multiple error log entries occur and must be analyzed together to determine the cause of the error ErorLogs 43 4 1 1 Hard Enors There are two categories of hard errors e System independent errors detected by the CPU These errors are processor machine checks handled as MCHK 670 interrupts and are Internal EV5 or EV56 cache errors CPU B cache module errors e System dependent errors detected by both the CPU and IOD These errors are system machine checks handled as MCHK 660 interrupts and are CPU detected external reference errors IOD hard error interrupts The IOD can detect hard errors on either side of the bridge 4 1 2 Soft Enors There are two categories of soft errors e System independent errors detected and corrected by the CPU These errors are CPU module correctable errors handled as MCHK 630 interrupts e System dependent errors that are correctable single bit errors on the system bus and are handled as MCHK 620 interrupts ErorLogs 44 4 1 3 Enor Log Events Several different events are logged by OpenVMS and DIGITAL UNIX Windows NT does not log errors in this fashion Table 4 1 Types of Enor Log Events Enor Log Event Description MCHK 670 Processor machine checks These are synchronous errors that inform precisely what happened at the time the error occurred They are detected inside the CPU chip and are fatal errors MCHK 660 System machine checks These are asynchronous
51. to it If the console environment variable is set to graphics and no graphics monitor is attached to the adapter pressing a carriage return on a serial terminal attached to COMI makes it the console device and the console prompt is sent to it In either case power up information is lost Power Up 2 19 2 9 Console Power Up Display The entire power up display prints to a serial terminal if the console environment variable is set to serial and parts of it print to the control panel display The last several lines print to either a serial terminal or a graphics monitor Example 2 3 Power Up Display SROM V3 0 on cpu0d SROM V3 0 on cpul XSROM V5 0 on cpu0d XSROMb V5 0 on cpul BCache testing comp BCache testing comp mem pair0 256 MB mem_pairl 256 MB mem_pair2 64 MB mem_pair3 64 MB lete lete 20s 2 lie ZION 62 Ure 2B 24 Memory testing comp Memory testing comp lete lete on on 24 on on cpul cpud cpud cpul Power Up O 6 2 20 At power up or reset the SROM code on each CPU module is loaded into that module s I cache and tests the module If all tests pass the processor s LED lights If any test fails the LED remains off and power up testing terminates on that CPU The first determination of the primary processor is made and the primary processor executes a loopback test to each PCI bridge If this test passes the bridge LED lights If it fails the LED remain
52. unlocks the EI STAT register subject to conditions given in Table 5 2 which defines the loading and locking rules for external interface registers NOTE If the first error is correctable the registers are loaded but not locked On the second correctable error the registers are neither loaded nor locked Registers are locked on the first uncorrectable error except the second hard error bit This bit is set only for an uncorrectable error that follows an uncorrectable error A correctable error that follows an uncorrectable error is not logged as a second error B cache tag parity errors are uncorrectable in this context ErorRegisters 53 Table 5 1 Extemal Interface Status Register Name Bits Type Description COR_ECC_ERR lt 31 gt EI_ES BC_TC_PERR BC_TPERR CHIP_ID lt 30 gt lt 29 gt lt 28 gt lt 27 24 gt lt 23 0 gt R R Correctable ECC Error Indicates that fill data received from outside the CPU contained a correctable ECC error External Interface Error Source When set indicates that the error source is fill data from main memory or a system address command parity error When clear the error source is fill data from the B cache This bit is only meaningful when lt COR_ECC_ERR gt lt UNC_ECC_ERR gt or lt EI_PAR_ERR gt is set in this register This bit is not defined for a B cache tag error BC_TPERR or a B cache tag control parity error BC_TC_ERR B C
53. x00000000 x00000000 x00000000 x04120000 x0C x01 x04 x00 Eror Logs 434 CONFIG Address Device and Vendor ID Command Register Status Register Revision ID Device Class Cache Line S Latency T Header Type Bist Base Address Base Address Base Address Base Address Base Address Base Address Code Register Register Register Register Register Register NDNOBWNHE x000000FBC0002000 Slot or Device Number 4 x00081011 DEC_KZPSA Fast Wide Differential SCSI Vendor ID x1011 Digital Equip Corp Device ID x00000008 x0107 I O Space Accesses Response Memory Space Accesses Response PCI Bus Master Capability Monitor for Special Cycle Ops Generate Mem Wrt Invalidate Cmds Parity Error Detection Response Wait Cycle Address Data Stepping SERR Sys Err Driver Capability Fast Back to Back to Many Target xA2CO Device is 33 Mhz Capable Enabled Enabled Enabled DISABLED DISABLED IGNORE DISABLED Enabled DISABLED 7 Device Supports User Defineable Features Fast Back to Back to Different Targets Is Supported in Target Device Device Select Timing Medium RECEIVED MASTER ABORT Master Sets When Its Transaction Terminated by MasterAbort DETECTED PARITY ERROR This Device Detected x00 x010000 Mass Storage SCSI Bus Controller x10 xFF x00 Single Function Device x80 x04128000 x00000000 x00100000 x04000000 x00000000 x00000000 Expansion Rom Base Addres x04100000 I
54. 0 Hard Error MC Bus Trans Addr lt 31 4 gt E0000000 MC bus trans addr lt 39 32 gt x000000FD MC Command is Read0 IO CPUO Master at Time of Error Device ID 2 x00000002 NoT VALID Serious error PCI error address reg locked MDPA Status Register Data Not Valid Eror Logs 432 DPA Error Syndrome Reg PB Status Register DPB Error Syndrome Reg PALcode Revision PCI SUBPACKET gt Node Qty CONFIG Address Device and Vendor ID x00000000 MDPA Syndrome Register Data Not Valid x00000000 MDPB Status Register Data Not Valid x00000000 MDPB Syndrome Register Data Not Valid Palcode Rev 1 21 20 PCI 1 Subpacket 4 x000000FBC0000800 Slot or Device Number 1 x00011000 NCR 53C810 NCR_810 SCSI Narrow SingleEnded Vendor ID x1000 NCR Device ID x00000001 Command Register x0147 I O Space Accesses Response Enabled Memory Space Accesses Response Enabled PCI Bus Master Capability Enabled Status Register Revision ID Device Class Code Cache Line S Latency T Header Type Bist Base Address Register Base Address Register Base Address Register Base Address Register Base Address Register Base Address Register Expansion Rom Base Addres Interrupt P1 Interrupt P2 Min Gnt Max Lat NO BPWNE CONFIG Address Device and Vendor ID Monitor for Special Cycle Ops DISABLED Generate Mem Wrt Invalidate Cmds DISABLED Parity Error Detection Response Normal Wait Cycle Address Data S
55. 0 220174080 ID Program Device Pass Hard Soft Bytes Written Bytes Read 000046da7 memtest memory 0 0 404750336 404750336 000046e0 memtest memory 101 0 0 1058932480 1058932480 000046e9 memtest memory 1000 0 0 1047399552 1047399552 000046f2 memtest memory 999 0 0 1046351104 1046351104 000046fb memtest memory 38 0 0 398410240 398410240 Troubleshooting 3 10 ID Program Device Pass Hard Soft Bytes Written Bytes Read 000046d7 memtest memory 1 0 0 583008256 583008256 000046e0 memtest memory 1456 0 0 1525491840 1525491840 000046e9 memtest memory 1446 0 0 1515007360 1515007360 000046f2 memtest memory 1444 0 0 1512910464 1512910464 000046fb memtest memory 550 0 0 575597952 575597952 ID Program Device Pass Hard Soft Bytes Written Bytes Read 000046da7 memtest memory al 0 0 761266176 761266176 000046e0 memtest memory 1901 0 0 1992051200 1992051200 000046e9 memtest memory 1892 0 0 1982615168 1982615168 000046f2 memtest memory 1889 0 0 1979469824 1979469824 000046fb memtest memory 720 0 0 753834112 753834112 ID Program Device Pass Hard Soft Bytes Written Bytes Read 000046da7 memtest memory 1 0 0 937426944 937426944 000046e0 memtest memory 2346 0 0 2458610560 2458610560 000046e9 memtest memory 2337 0 0 2449174528 2449174528 000046f2 memtest memory 2333 0 0 2444980736 2444980736 000046fb memtest memory 890 0 0 932070272 932070272 Memory test complete Test time has expired P00 gt gt gt Troubleshooting 3 11 3 5 2 Testing PCI The tes
56. 000 Hard Error MC Bus Trans Addr lt 31 4 gt 7FBF080 MC Error Info Register 1 x801E8800 MC bus trans addr lt 39 32 gt x00000000 CAP Error Register Sys Environmental Regs PCI Bus Trans Error Adr DPA Status Register DPA Error Syndrome Reg PB Status Register DPB Error Syndrome Reg IOD SUBPACKET gt WHOAMI Base Address of Bridge Dev Type amp Rev Register MC PCI Command Register Mem Host Address Ext Reg IO Host Adr Ext Register MC Command is Read0 Mem 6 Device ID 2 x00000002 6 MC bus error assoc w read dirtyO MC error info valid Uncorrectable ECC err det by MDPA xE0000000 Uncorrectable ECC err det by MDPB MC error info latched x00000000 x00000000 x00000000 MDPA Status Register Data Not Valid x00000000 MDPA Syndrome Register Data Not Valid x00000000 MDPB Status Register Data Not Valid x000D00D0 MDPB Syndrome Register Data Not Valid IOD 1 Register Subpacket x000000BA Module Revision 0 VCTY ASIC Rev 0 Bcache Size 2MB MID 2 GID 7 x000000FBE0000000 x06008021 CAP Chip Revision x00000001 Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg x06480FF1 Module SelfTest Passed LED on Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check Enabled M
57. 0000 Interrupts asserted x00000000 x00C50001 x00000000 xE0000000 MC bus trans addr lt 31 4 gt x0E000000 x000E88FD MC bus trans addr lt 39 32 gt x000000FD MC_Command x00000008 Device Id x0000003A x00000000 no error seen xC0018B48 x00000000 MDPA Chip Revision x00000000 x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 x00000000 MDPB Chip Revision x00000000 x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 Palcode Rev 1 21 3 Error Logs 4 14 4 3 2 MCHK 670 CPU and IOD Detected Failure The error log in Example 4 2 shows the following 9 2 ooo CPU1 logged the error in a system with two CPUs The External Interface Status Register logged an uncorrectable ECC error during a D ref fill When a CPU chip does not find data it needs to perform a task in any of its caches it requests data from off the chip to fill its D cache It performs a D ref fill Bit lt 30 gt is set indicating that the source of the error is memory or the system Bits lt 32 gt and lt 35 gt are set indicating an uncorrectable ECC error and a second external interface hard error respectively Both IOD CAP Error Registers logged an error The command at the time of the error was a read The bus master at the time of the error was CPU1 The Dirty bit bi
58. 0000 MDPA Syndrome Register Data Not Valid x00000000 MDPB Status Register Data Not Valid x00000000 MDPB Syndrome Register Data Not Valid Palcode Rev 1 21 3 Eror Logs 437 4 3 7 MCHK 620 Conectable Enor The MCHK 620 error is a correctable error detected by the IOD The error log in Example 4 7 shows the following CPUO logged the error in a system with two CPUs The External Interface Status Register is not valid The MC Error Info Registers 0 and captured the error information The commander at the time of the error was CPUO 6000 The command at the time of the error was a write back memory command The IOD detected a recoverable error on the system bus The MC command at the time of the error is a Write Back Mem Command x00000016 The system bus commander at the time of the error is CPUO Since this is a write the defective FRU is CPUO NOTE The error log example has been edited to decrease its size registers of interest are in bold type The MC bus is the system bus Refer to Table 4 9 for information on decoding commands and refer to Table 4 10 for information on node IDs Eror Logs 438 Example 4 7 MCHK 620 Conectable Eror Logging OS 2 DIGITAL UNIX System Architecture 2 Alpha Event sequence number 32 Timestamp of occurrence 28 JUN 1997 19 45 42 Host name sect06 System type register x00000016 AlphaServer 4000 1200 Series Number of CPUs mpnum x00000002 CPU logging event mperr x00000000 o Even
59. 00000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 IOD 1 Register Subpacket Device ID x0000003A Bcache Size 2MB VCTY ASIC Rev 0 Module Revision 0 x000000FBE0000000 x06008021 x46480FF1 CAP Chip Revision x00000001 Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg Module Self Test Passed LED On Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYPE Mem Rd Line Prefetch Type Medium x00000000 x00000000 x00000003 x00800000 x00C50001 x00000000 MC Error Info Register 0 x4A26DBFO MC Error Info Register 1 x800ED600 CAP Error Register xA0000000 PCI Bus Trans Error Adr x00000000 IPA Status Register x80000000 DPA Error Syndrome Reg x1E00001E PB Status Register x00000000 DPB Error Syndrome Reg x00000000 PALcode Revision RM_TYPE Mem Rd Multiple Cmd Type ARB_MODE PCI Arbitration Long Round Robin MC PCI Intr Enabled Device intr info enabled if en_int 1 Interrupts asserted x00000000 Hard Error MC b
60. 00000000 XFFFFFF 8028 6F 7FFF External cache hit Parity for ds and v bits Cache block dirty Cache block valid Ext cache tag addr parity bit Tag address lt 38 20 gt is x00000000000286 xFFFFFF0028 681A8F x00000000004B00 xFFFFFFF 984FFFFFFE 2 Uncorrectable ECC error Error occurred during D ref fill Second external interface hard error XFFFFFF000020040F IOD 0 Register Subpacket x000000BF Device ID x0000003F Bcache Size 2MB VCTY ASIC Rev 0 Module Revision 0 x000000F 9E0000000 x06008021 CAP Chip Revision x00000001 Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg x46480FF1 Module Self Test Passed LED On Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYPE Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type Long ARB_MODE PCI Arbitration Round Robin x00000000 00000000 x00000003 MC PCI Intr Enabled Device intr info enabled if en_int 1 x00810000 Interrupts asserted x00010000 Hard Error x00C50010 x00000000 x28681A80 MC bus trans addr lt 31 4 gt x028681A8
61. 000000000000D189 xFFFFFFF944FFFFFF 2 Error Source is Memory or System UNCORRECTABLE ECC ERROR Error Occurred During D ref Fill Error XFFFFFFO00O7FBFOOF IOD 0 Register Subpacket x000000BA Module Revision 0 VCTY ASIC Rev 0 Bcache Size 2MB MID 2 GID 7 x000000F 9E0000000 x06008021 CAP Chip Revision Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg x06480FF1 Module SelfTest Passed LED on Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled x00000001 ErorLogs 422 Mem Host Address Ext Reg IO Host Adr Ext Register Interrupt Ctrl Register Interrupt Request Interrupt Mask0 Register Interrupt Maskl Register MC Error Info Register 0 Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYPE Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type Long ARB_MODE Arbitration MC PCI Priority Mode x00000000 x00000000 x00000003 x00800000 x00C50010 x00000000 x07FBF080 HAE Sparse Mem Adr lt 31 27 gt x00000000 PCI Upper Adr Bits lt 31 25 gt x00000000 Write Device Interrupt Info Struct Enabled Interrupts asserted x00000
62. 00000020000 Bse Addr for PALcode x0000000000000008 Interrupt Summary Reg x0000000000200000 External HW Interrupt at IPL21 AST Requests 3 0 x0000000000000000 IBOX Ctrl and Status Reg x000000C160000000 Timeout Counter Bit Clear IBOX Timeout Counter Enabled ErorLogs 421 Icache Par Err Stat Reg Dcache Par Err Stat Reg Virtual Address Reg Memory Mgmt Flt Sts Reg Scache Address Reg Scache Status Reg Bcache Tag Address Reg Ext Interface Address Reg Fill Syndrome Reg Ext Interface Status Reg LD LOCK RK IOD SUBPACKET WHOAMI Base Address of Bridge Dev Type amp Rev Register MC PCI Command Register Floating Point Instructions will cause FEN Exceptions PAL Shadow Registers Enabled Correctable Error Interrupts Enabled ICACHE BIST Self Test Was Successful TEST_STATUS_H Pin Asserted x0000000000000000 x0000000000000000 x0000000000044000 x0000000000005D10 If Err Reference Resulted in DTB Miss Fault Inst RA Field x0000000000000014 Fault Inst Opcode x000000000000000B xFFFFFFO0000254BF x0000000000000000 xFFFFFF8007EE2FFF Last Bcache Access Resulted in a Miss Value of Parity Bit for Tag Control Status Bits Dirty Shared amp Valid is Set Value of Tag Control Dirty Bit is Clear Value of Tag Control Shared Bit is Clear Value of Tag Control Valid Bit is Clear Value of Parity Bit Covering Tag Store ddress Bits is Set Tag Address lt 38 20 gt Is x000000000000007E xFFFFFFO007FBFO8F x
63. 1 might appear interspersed as in Example 2 3 This is normal behavior Test 24 can take several minutes if the memory is very large The message PO TEST 24 MEM is displayed on the control panel display the second asterisk rotates to indicate that testing is continuing If a failure occurs a message is sent to the COM1 port and to the control panel display Each CPU sends a test completion message to COM1 Continued on next page PowerUp 2 21 Example 2 3 Power Up Display Continued starting console sizing memory 0 256 MB DIM 1 256 MB DIM 64 MB DIMM 64 MB DIMM starting console probing IOD1 hose bus 0 slot 1 bus 0 slot 2 on CPU 0 M M on CPU 1 1 NCR 53C810 DECchip 21041 AA bus 0 slot 3 probing IODO hose bus 0 slot 1 probing EISA NCR 53C810 0 PCEB Bridge bus 1 bus 0 slot 2 bus 0 slot 3 Configuring I O a Ner0 hose 1 b Tulip0O hose 1 Nerl hose 1 b Floppy0 hose 0 McO hose 0 bus tulipl hose 0 S3 Trio64 Trio32 DECchip 21140 AA dapters us 0 slot 1 bus 0 slot 2 us 0 slot 3 bus 1 slot 0 0 slot 2 bus 0 slot 3 System temperatur AlphaServer 1200 P00 gt gt gt is 31 degrees C Ko Console V5 0 02 SEP 1997 18 18 26 Power Up 2 22 The final primary CPU determination is made The primary CPU unloads PALcode and decompression code from the FEPROM on PCI 0 to its B cache The primary CPU then jumps to th
64. 15 CPU Types There are several CPU variants differentiated by CPU speeds Figure 1 5 CPU Module Placement Bulkhead connectors Power connectors Floppy connector PCI 0 Slot 2 PCI 0 Slot 3 PCI 0 Slot 4 PCI 1 Slot 2 PCI 1 Slot 3 PCI 1 Slot 4 EISA ISA Slot Fan connectors CPU 0 MEM L CPU 1 RCM MEM H Switch pack OCP connector LEDs gt PCI Bridges ga Internal SCSI q connector X RCM power down connector Speaker connector PKW0504A 97 System Overview 1 10 Alpha Chip Composition The Alpha chip is made using state of the art chip technology has a transistor count of 9 3 million consumes 50 watts of power and is air cooled a fan is on the chip The default cache system is write back and when the module has an external cache it is write back The Alpha chip used in these systems is the 21164 Chip Description Unit Description Instruction 8 Kbyte cache 4 way issue Execution 4 way execution 2 integer units 1 floating point adder 1 floating point multiplier Memory Merge logic 8 Kbyte write through first level data cache 96 Kbyte write back se
65. 3 4 6 MCHK 630 Correctable CPU Error essceeesseessceceseeeeseeeesseessaeessneeeees 4 42 4 7 MCHK 620 Correctable Error ecceeeceesscecsseeceneeceseeceseeeesaeessaeessneeeees 4 45 4 8 INFO 3 Command s 2 sce syed easceds sorte aeoe ae ee testes se IERE EREE aN EERS a Eh 4 59 4 9 INFOS Commatid s 2 22 cncds set on n a diate mal dine acai 4 61 4 10 INFO 8 Command eee eeeceeeseeeseeeesseecsacecsacecseecesaeecsaeecseessneeeesaes 4 63 A 1 Starting LFU from the SRM Console eeseseseccsseeeesseessneesseeeeseeeesees A 5 A 2 Booting LFU from the CD ROM eeseeeseeereeereeerresrrerersressreeeresereseresreeee A 6 A 3 Updating Firmware from the Internal CD ROM scseeeeseeeeeeeeeeeererereeenes A 7 A 4 Creating Update Diskettes on an OpenVMS System seceeeeeeereees A 12 A 5 Updating Firmware from the Internal Floppy Disk eeeeeeeeeeeeees A 13 A 6 Selecting AS1200FW to Update Firmware from the Internal Floppy A 16 A 7 Updating Firmware from a Network Device scceeseeeseeeeseeeeeeeeeneers A 17 C 1 Sample Remote Dial In Dialog eee ee seesseecsseeceseeeeeseecsaeecsseeeeneeeesaes C 5 C2 Invoking and Leaving RCM Locally eeeeeseeesseeeeneeseseecsseeeeseeeesaes C 6 C 3 Configuring the Modem for Dial Out Alerts 0 0 0 0 eee eeseeeeseeeeereeeeeeers C 16 C 4 Typical RCM Dial Out Command 0 ceeeeeeeceeneeeeneeeeneeceseeeeseeeesaeers C 17 Figures 1 1 System BnclOsUrLes anio eee
66. 3 INT_REQ 00000000 NT_MASKO INT_MASK1 00000000 MC_ERRO e0000000 MC_ERR1 CAP_ERR 00000000 PCI_ERR 00000000 IPA_STAT DPA_SYN 00000000 PB_STAT 00000000 MDPB_SYN INT_TARG 0000003a INT_ADR 00006000 NT_ADR_EXT PERF_MON 00406ebf PERF_CONT 00000000 CAP_DIAG DIAG_CHKA 10000000 DIAG_CHKB 10000000 SCRATCH WO_BASE 00100001 WO_MASK 00000000 TO_BASE W1_BASE 00800001 W1_MASK 00700000 T1_BASE W2_BASE 8000000 W2_MASK 3 f 00000 T2_BASE W3_BASE 00000000 W3_MASK 1 00000 T3_BASE W_DAC 00000000 SG_TBIA 00000000 HBASE IOD 1 WHOAMI 0000003a PCI_REV 06000221 CAP_CTL 02490fb1 HAE_MEM 00000000 HAE_IO INT_CTL 00000003 INT_REQ 00000000 NT_MASKO INT_MASK1 00000000 MC_ERRO e0000000 MC_ERR1 CAP_ERR 00000000 PCI_ERR 00000000 IPA_STAT DPA_SYN 00000000 PB_STAT 00000000 MDPB_SYN INT_TARG 0000003a INT_ADR 00006000 NT_ADR_EXT PERF_MON 004e31a6 PERF_CONT 00000000 CAP_DIAG DIAG_CHKA 10000000 DIAG_CHKB 10000000 SCRATCH WO_BASE 00100001 WO_MASK 00000000 TO_BASE W1_BASE 00800001 W1_MASK 00700000 T1_BASE W2_BASE 80000001 W2_MASK 3 00000 T2_BASE W3_BASE 00000000 W3_MASK 1 00000 T3_BASE W_DAC 00000000 SG_TBIA 00000000 HBASE 00000000 00210000 000e88fd 00000000 00000000 00000000 00000000 21011131 00001000 00008000 00000000 0000b800 00000000 00000000 00000000 000e88fd 00000000 00000000 00000000 00000000 00000000 00001000 00008000 00000000 0000a000 00000000 Eror Logs 4 56 C
67. 420 Example 4 3 MCHK 670 Read Dirty Failure Logging OS 2 DIGITAL UNIX System Architecture 2 Alpha Event sequence number 4 Timestamp of occurrence 08 APR T997 10 20 37 Host name sect06 System type register x00000016 AlphaServer 4000 1200 Series Number of CPUs mpnum x00000002 CPU logging event mperr x00000000 1 Event validity 1 O S claims event is valid Event severity 1 Severe Priority Entry type 100 CPU Machine Check Errors CPU Minor class 1 Machine check 670 entry Software Flags x0000000300000000 IOD 0 Register Subpkt Pres IOD 1 Register Subpkt Pres Active CPUs x00000003 Hardware Rev x00000000 System Serial Number C1563 Module Serial Number Module Type x0000 System Revision x00000000 MCHK 670 Regs Flags x00000000 PCI Mask x0000 Machine Check Reason x0098 Fatal Alpha Chip Detected HardError PAL SHADOW REG 0 x0000000000000000 PAL SHADOW REG 1 x0000000000000000 PAL SHADOW REG 2 x0000000000000000 PAL SHADOW REG 3 x0000000000000000 PAL SHADOW REG 4 x0000000000000000 PAL SHADOW REG 5 x0000000000000000 PAL SHADOW REG 6 x0000000000000000 PAL SHADOW REG 7 x0000000000000000 PALTEMP 0 xFFFFFC0O0006CO0O0CO PALTEMP 1 x00000000000061A8 PALTEMP 2 xFFFFFCO0004E1E00 PALTEMP 22 xFFFFFC00006530E0 PALTEMP 23 x0000000003D2BA58 Exception Address Reg xFFFFFC000047395C Native mode Instruction Exception PC x3FFFFF000011CE57 Exception Summary Reg x0000000000000000 Exception Mask Reg x0000000000000000 PAL Base Address Reg x00000
68. 630 Correctable CPU Errot eesceesceeesceeeseeceeeeeeeessneeeees 4 41 4 3 7 MCHK 620 Correctable Error eeeeceeesessseeeeneeceseeeesseeesneeesneeeses 4 44 4 4 Troubleshooting IOD Detected Errors eeseeeseeceseeeeseeeeeseesseeeseeeees 4 47 4 4 1 System Bus ECC EIFON pessier eetesep seenior anpes edene nep onea Ti esnea 4 48 4 4 2 System Bus Nonexistent Address Error eseceeseeessesenceeeseeeeeees 4 49 4 4 3 System Bus Address Parity Error esceessecssceceseeceseeeeeeeesseeeeneers 4 50 4 4 4 PIO Buffer Overflow Error PIO_OVFL cccccccsssssesceeeeeeseesneee 4 51 4 4 5 Page Table Entry Invalid Error 0 cee eeeeeseseeeseeeeseeeeeeeeesneeesneeeeee 4 52 4 4 6 PCI Master Abots esna nee e e e E ial ae aee 4 52 4 4 7 PCI System ETON iesen aen eE EE AAEE E E e tes 4 52 4 4 8 PCI Parity Error c0 00 01 jag m e r n a 4 52 4 4 9 Broken Memory anete np a esepte erene paupa aae aya 4 53 4 4 10 Command Cod sisa iren ire E E E EE E EN 4 55 4 4 11 Node IDS ii eor a aE EET E E EEA 4 56 4 5 Double Error Halts and Machine Checks While in PAL Mode 4 57 4 5 1 PALCOde Overview o e oen enee seann aeae ianiai nunteeesedepsengsiotemetetetes 4 57 4 5 2 Double Error H ltnnsidins nnns ren i ei Mh ea eis 4 58 4 5 3 Machine Checks While in PAL w cee eccccecccccccsceeeeeeeeccccssecsseeeeneees 4 58 Chapter5 Enor Registers 5 1 5 2 5 3 5 4 55 5 6 External Interface Status Register
69. 64Kb 64Kb 64Kb 64Kb FEPROM 1 AlphaBlOS Code 1 Mbyte PKW0515 97 Power Up 2 5 For the console to run the path from the CPU to the XSROM must be functional The XSROM resides in FEPROMO on the XBUS off the EISA bus off PCI 0 off IOD 0 See Figure 2 4 This path is minimally tested by SROM Figure 2 4 Console Code Critical Path Block Diagram Memory ciel Pair System Bus 128 Bit Data Bus 16 ECC and 40 Bit Command Address Bus PCI Bus 0 y v System to System to Pol pusi 64 Bits PCI Bu PCI Bus 64 Bits A priege Bridge 1 k IOD ODI Lanes PCI Slot gt System gt PCI Slot i g Motherboard PCI Slot gt PCI Slot EISA Note When the EISA slot on Bus PCI Bus 0 is used the last PCI Slot k PCI slot on PCI Bus 1 isnot PC Slot available i EISA Slot 4 Vv XBUS XBUS BDATA Xceivers Xceivers Real Time COO AE Mouse lC Bus NVRAM av Clock parallel port Keyboard Interface 8Kx8 MB floppy cntrl 2 PKW0502A 97 Power Up 2 6 The SROM contents are loaded into each CPU s I cache and executed on power up reset After testing the caches on each processor chip it tests the path to the XSROM Once this path is tested and deeme
70. AlphaServer 1200 DIGITAL Ultimate Workstation 533 Service Manual Order Number EK AS120 SV A01 This manual is for anyone who services these systems It includes troubleshooting information configuration rules and instructions for removal and replacement of field replaceable units Digital Equipment Corporation Maynard Massachusetts First Printing January 1998 Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights nor do the descriptions contained in this publication imply the granting of licenses to make use or sell equipment or software in accordance with the description The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document The software if any described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license No responsibility is assumed for the use or reliability of software or equipment that is not supplied by Digital Equipment Corporation or its affiliated companies Copyright 1998 by Digital Equipment Corporation All rights reserved The following are trademarks of Digital Equipment Corporation AlphaServer OpenVMS Storage
71. Base Address Reg Interrupt Summary Reg IBOX Ctrl and Status Reg 19 AUG 1997 12 53 41 sect04 x00000016 x00000002 x00000000 1 O S claims event is valid 1 Severe Priority 100 CPU Machine Check Errors AlphaServer 4000 1200 Series o 2 660 Entry x0000002300000000 IOD 0 Register Subpkt Pres IOD 1 Register Subpkt Pres PCI 1 Bus Snapshot Present x00000003 x00000000 GA12000000 x0000 x00000000 x00000000 x0002 x0202 IOD Detected Hard Error OR DTag Parity Error If Cached CPU x0000000000000000 x0000000000000000 x0000000000000000 x0000000000000000 x00000B6D00000000 x0000000000000000 x0000000000000000 x0000000000000000 x00000000000000B6 x0000000000000001 xFFFFFC00003B8B90 xFFFFFC000052E3A0 x0000000002729A38 x00000001200077F0 Native mode Instruction Exception PC x0000000048001DFC x0000000000000000 x0000000000000000 x0000000000014000 Base Addr for PALcode x0000000000200000 External HW Interrupt at IPL21 AST Requests 3 0 x0000000000000000 x000000C160020000 Timeout Counter Bit Clear IBOX Timeout Counter Enabled x0000000000000005 ErorLogs 430 Floating Point Instructions will Cause FEN Exceptions PAL Shadow Registers Enabled Correctable Error Interrupts Enabled ICACHE BIST Self Test Was Successful TEST_STATUS_H Pin Asserted Icache Par Err Stat Reg x0000000000000000 Dcache Par Err Stat Reg x0000000000000000 Virtual Address Reg x0000000140008000 Memory Mgmt
72. C Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYPE Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type Long ARB_MODE Arbitration MC PCI Priority Mode x00000000 x00000000 HAE Sparse Mem Adr lt 31 27 gt x00000000 PCI Upper Adr Bits lt 31 25 gt x00000000 Eror Logs 423 Interrupt Ctrl Register 00000003 Interrupt Request x00800001 Interrupt Mask0 Register x00C50001 Interrupt Maskl Register 00000000 MC Error Info Register 0 x07FBF080 MC Error Info Register 1 x801E8800 CAP Error Register xE0000000 Uncorrectable ECC err det by MDPB Sys Environmental Regs x00000000 PCI Bus Trans Error Adr x00000000 IPA Status Register 00000000 DPA Error Syndrome Reg x00000000 IPB Status Register x00000000 DPB Error Syndrome Reg x000D00D0 PALcode Revision Write Device Interrupt Info Struct Enabled Interrupts asserted x00000001 Hard Error MC Bus Trans Addr lt 31 4 gt 7FBF080 7 MC bus trans addr lt 39 32 gt x00000000 MC Command is Read0 Mem 6 Device ID 2 x00000002 MC bus error assoc w read dirty Mc error info valid Uncorrectable ECC err det by MDPA MC error info latched 4 DPA Status Register Data Not Valid DPA Syndrome Register Data Not Valid DPB Status Register Data Not Valid M M M MD
73. C bus Go to Step 3 address 100x x01x x000 0000 0000 0000 000x xxxx MC_ADR_PERR MC bus Go to Step 4 address parity error 100x x00x 1000 0000 0000 0000 000x xxxx PIO_OVEFL PIO buffer Go to Step 5 overflow 0000 0000 0000 0000 0000 0000 0001 Ixxx PTE_INV Page table entry is Go to Step 6 invalid 0000 0000 0000 0000 0000 0000 0001 x1xx MAB Master abort Go to Step7 0000 0000 0000 0000 0000 0000 0001 xx1x SERR PCI system error Go to Step 8 0000 0000 0000 0000 0000 0000 0001 xxx1 PERR PCI parity error Go to Step 9 Error Logs 4 41 4 4 1 System Bus ECC Enor Step 2 Read the MC_ERRI register and match the contents with the data pattern Perform the action indicated Table 4 4 System Bus ECC Enor Data Pattem MC_ERR1 Data Pattem Most Likely Cause Action for Memory Read 1000 0000 0000 xxxx xxxx 1Oxx Oxxx Xxxx 1000 0000 0000 xxxx xxxx 1 11x Oxxx xxxx 1000 0000 0001 xxxx xxxx 10xx Oxxx Xxx 1000 0000 0001 xxxx xxxx 11 1x Oxxx xxxx for Memory or I O White 1000 0000 000x xxx0 10xx 01 1x xxxx Xxxx 1000 0000 000x xxxO 1 Lxx 011x xxxx Xxxx 1000 0000 000x xxx1 OOxx 011x Xxxx XXxx 1000 0000 000x xxx1 OLxx 011x xxxx Xxxx for Memory Fill Transactions 1000 0000 000x xxx1 OOxx 110x xxxx XXXX 1000 0000 000x xxx1 OLxx 110x xxxx Xxxx 1000 0000 000x xxx1 10xx 110x xxxx Xxxx 1000 0000 000x xxx1 1 Lxx 110x xxxx Xxxx Bad nondirty data from memory bad memory Bad nondirty data from memory bad memory Bad dirty
74. CC stored PKW0450A 96 ErorLogs 42 Lines Protec ted Device ECC Protected System bus data lines IOD on every transaction CPU when using the bus B cache IOD on every transaction CPU when using the bus Parity Protected System bus command address lines IOD on every transaction CPU when using the bus Duplicate tag store IOD on every transaction CPU when using the bus B cache index lines CPU PCI bus IOD EISA bus EISA bridge As shown in Figure 4 1 and the accompanying table the CPU chip is isolated by transceivers XVER from the data and command address lines on the module This allows the CPU chip access to the duplicate tag and B cache while the system bus is in use The CPU detects errors only when it is the consumer of the data The IOD detects errors on each system bus cycle regardless of whether it is involved in the transaction System bus errors detected by the CPU may also be detected by the IOD It is necessary to check the IOD for errors any time there is a CPU machine check e If the CPU sees bad data and the IOD does not the CPU is at fault e If both the CPU and the IOD see bad data on the system bus either memory or a secondary CPU is the cause In such a case the Dirty bit bit lt 20 gt in the IOD MC_ERR 1 Register should be set or clear If the Dirty bit is set the source of the data is a CPU s cache destined for a different CPU If the Dirty bit is not set memory caused the bad data on
75. D MC Command is Read0 IO CPUO Master at Time of Error Device ID 2 x00000002 x00000000 4 x00000000 x00000000 MDPA Status Register Data Not Valid x00000000 MDPA Syndrome Register Data Not Valid x00000000 MDPB Status Register Data Not Valid x00000000 MDPB Syndrome Register Data Not Valid IOD 1 Register Subpacket x000002FA Module Revision 0 VCTY ASIC Rev 1 Bcache Size 4MB CPU 0 x000000FBE0000000 IOD 1 x06002332 CAP Chip Revision x00000002 Host to PCI Revision x00000003 I O Backplane Revision x00000003 Internal CAP Chip Arbiter Enabled Device Class Host Bus to PCI Bridg x46480FF1 Module Self Test Passed LED On Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYPE Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type ARB_MODE PCI Arbitration x00000000 x00000000 x00000003 x00800000 x00C50111 x00000000 xE0000000 x000E88FD 00000012 xC157B5C0 x00000000 Long Round Robin HAE Sparse Mem Adr lt 31 27 gt x00000000 PCI Upper Adr Bits lt 31 25 gt x00000000 Write Device Interrupt Info Struct Enabled Interrupts asserted x0000000
76. Error Information Register 1 00 cee eeeseeeseceseeceseeeeseeeesneessaeeseneeeeee 5 10 CAP Error Register oran aa ann isin hava ee 5 12 PCI Error Status Register 1 eeeeeseecesseessneecsneeceseeeeseeeesaeessaeessaeeeees 5 14 Field Replaceable Unit Part Numbers ceeseeeseeseneeceseeeeseeeesaeeesneers 6 3 AlphaBIOS Option Key Mapping sccsseccsssecssseeesneecseecsseeeeseeeesaes A 3 File Locations for Creating Update Diskettes on a PC ssec A 12 LFU Command Summary ceeeescecsseeceseeeseecseecsceceseecssaeeesaeeesaeers A 22 Results of Pushing the Halt Button ecccceeeescceeeeeceeeeeneeeeeeeneeeeeenees B 2 Summary of SRM Console Commands eseesseecesseeseneessneeeeseeeesaes B 6 Environment Variable Summary escesseecsssecssneeseeecseecsseeeeseeeesaes B 8 RCM Command Summary ceseeesceceseeceseeeeseeeesseecsaeecsaeesseessseeeesaes C 7 RCM Status Command Fields cee ee eceeseeceseeceseeseeeceneecsseesesaeessaeers C 15 Elements of the Dial Out String eee eeeeeeeeceseeeesneeseeeceeeeeesaeeeeaeers C 18 RCM Troubleshooting cee eeecccessecesseeesneecsseeeeseeessaeecseecsseeeesaeessaeers C 23 Preface Intended Audience This manual is written for the customer service engineer Document Struc ture This manual uses a structured documentation design Topics are organized into small sections for efficient online and printed reference Each topic begins with an abstrac
77. INT_REQ INT_MASK1 00000000 MC_ERRO CAP_ERR 84000000 PCI_ERR IPA_SYN 00000000 PB_STAT IOD 1 base address fbe0000000 WHOAMI 0000003a PCI_REV 00004838 00000320 00000000 00000118 00001328 00980000 00000000 eba00003 4143040a d1200067 47 90416 eba00003 d1200068 7ec38000 63ff4000 00000320 00000000 00000000 00000000 60000000 000000c1 00000000 00000000 00000000 00000000 fF 8000a0 FEFEFE 00149d0 0000000 001904f fffffOO 0000000 0000000 f7fefff FEFEEEE 066bc3ef ffffff00 000000a7 00000000 O4ffffff fffffffO0 00005 b6f ffffffoo MmMrhoonooom 06008221 00000000 00800000 20000000 00000000 00000000 06000221 00000000 00010000 800e88fd 00000000 00000000 HAE_ IO INT_MASKO MC_ERR1 MDPA_STAT MDPB_SYN Eror Logs 4 54 CAP_CTL INT_CTL INT_MASK1 CAP_ERR JPA_SYN 02490fb1 00000003 00000000 84000000 00000000 PCI_ERR IPB_STAT 00000000 00800000 20000000 00000000 00000000 HAE_IO 00000000 INT_MASKO 00010000 MC_ERR1 800e88fd MDPA_STAT 00000000 MDPB_SYN 00000000 ErorLogs 455 Example 4 10 INFO 8 Command P00 gt gt gt info 8 IOD 0 WHOAMI 0000003a PCI_REV 06008221 CAP_CTL 02490fb1 HAE_MEM 00000000 HAE_IO INT_CTL 0000000
78. LFU utility by issuing the Ifu command at the SRM console prompt or by selecting Update AlphaBIOS in the AlphaBIOS Setup screen LFU is part of the SRM console Example A 1 Starting LFU from the SRM Console P00 gt gt gt lfu x xx x TLoadable Firmware Update Utility Select firmware load device cda0 dva0 ewa0 or Press lt return gt to bypass loading and proceed to LFU cda0 UPD gt Figure A 2 Starting LFU from the AlphaBlOS Console AlphaBIOS Setup Display System Configuration Hard Disk Setup CMOS Setup Install Windows NT Utilities gt About AlphaBIOS Press ENTER to upgrade your AlphaBIOS from floppy or CD ROM ESC Exit PK 0726A 96 Running Utilities A 6 Use the Loadable Firmware Update LFU utility to update system firmware You can start LFU from either the SRM console or the AlphaBIOS console e From the SRM console start LFU by issuing the Ifu command e From the AlphaBIOS console select Upgrade AlphaBIOS from the AlphaBIOS Setup screen see Figure A 2 A typical update procedure is 1 Start LFU 2 Use the LFU list command to show the revisions of modules that LFU can update and the revisions of update firmware 3 Use the LFU update command to write the new firmware 4 Use the LFU exit command to exit back to the console The sections that follow show examples of updating firmware from the local CD ROM the local floppy and
79. NIX gt dia i disk rz disk ra92 cpu The commands shown here create output using only the entries for RZ disks RA92 disks and CPUs The EXCLUDE qualifier is used to create output for all devices except those named in the command OpenVMS S DIAGNOSE TRANSLATE EXCLUDE MEMORY DIGITAL UNIX gt dia x mem Eror Logs 48 Use the BEFORE and SINCE qualifiers to select events before or after a certain date and time OpenVMS DIAGNOSI or E TRANSLA E BEFO DIAGNOSI DIGITAL UNIX gt dia t s 15 Jjan E TRANSLA E SINC 1997 e 20 Jjan 1997 RE 15 JAN 1997 10 30 00 E 15 JAN 1997 10 30 00 If no time is specified the default time is 00 00 00 and all events for that day are selected The BEFORE and SINCE qualifiers can be combined to select a certain period of time OpenVMS DIAGNOSI E TRANSLAT E SINC E 15 JAN 1997 B EFOR E 20 JAN 1997 If no value is supplied with the SINCE or BEFORE qualifiers DECevent defaults to TODAY Eror Logs 49 4 2 3 Selecting Altemative Reports Table 4 2 describes the DECevent report formats Report formats are mutually exclusive No combinations are allowed The default format is Full Table 4 2 DECevent Report Formats Format Description Full Translates all available information for each event Brief Translates key information for
80. OM SCSI Floppy Power Supplies PKW0521 97 Removal and Replacement 6 2 Table 6 1 Held Replaceable Unit Part Numbers CPU Modules B3007 AA 400 MHz CPU 4 Mbyte cache B3007 CA 533 MHz CPU 4 Mbyte cache Memory Modules 54 25084 DA 32 Mbyte DIMM synchronous 20 47405 D3 54 25092 DA 128 Mbyte DIMM synchronous 20 45619 D3 54 25149 01 Memory riser card System Bac kplane Display and support hardware 54 25 147 01 System motherboard RX23L AB Floppy CD ROM 54 23302 02 OCP assembly 70 31349 01 Speaker assembly Fans 70 31351 01 Cooling fan 120x120 70 31350 01 Cooling fan 92x92 12 24701 34 CPU fan Power System Components 30 43 120 02 Power supply SCSI Hardware 54 23365 01 SCSI backplane Ultra SCSI bus extender Removaland Replacement 63 Table 6 1 Held Replaceable Unit Part Numbers continued Power Cords BN26J 1K North America Japan 12V 75 inches long BN19H 2E Australia New Zealand 2 5m long BN19C 2E Central Europe 2 5m long BNI9A 2E UK Ireland 2 5m long BN19E 2E Switzerland 2 5m long BN19K 2E Denmark 2 5m long BN19Z 2E Italy 2 5m long BN19S 2E Egypt India South Africa 2 5m long BN18L 2E Israel 2 5m long Ultra SCSI Cables and Jumpers From To 17 04143 01 68 pin con cable SCSI controller Ultra SCSI bus extender 17 04022 03 68 pin con cable Ultra SCSI bus SCSI backpln signal c
81. PAL Shadow Registers Enabled Correctable Err Intrpts Enabled ICACHE BIST Successful TEST_STATUS_H Pin Asserted ErorLogs 412 Icache Par Err Stat Reg x00000000 Dcache Par Err Stat Reg x00000000 Virtual Address Reg XFFFFFFFE8F 63BD38 Memory Mgmt Flt Sts Reg x000000000166D1 Ref which caused err was a write Ref resulted in DTB miss RA Field x0000000000001B Opcode Field x0000000000002C Scache Address Reg xFFFFFF0O0000254BF Scache Status Reg x00000000 Bcache Tag Address Reg XFFFFFF80E98F7FFF External cache hit Parity for ds and v bits Cache block dirty Cache block valid Ext cache tag addr parity bit Tag address lt 38 20 gt is x00000000000E98 Ext Interface Address Reg xFFFFFFOOE984DBCF Fill Syndrome Reg x0000000000002B Ext Interface Status Reg xFFFFFFF104FFFFFF 2 Uncorrectable ECC error Error occurred during D ref fill LD LOCK xFFFFFF003797340F IOD SUBPACKET gt IOD 0 Register Subpacket WHOAMI x000000BB Device ID x0000003B Bcache Size 2MB VCTY ASIC Rev 0 Module Revision 0 Base Address of Bridge x000000F9E0000000 Dev Type amp Rev Register x06008021 CAP Chip Revision x00000001 Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg MC PCI Command Register x46480FF1 Module Self Test Passed LED On Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEP
82. PB Syndrome Register Data Not Valid Palcode Rev 1 21 3 Eror Logs 424 4 3 4 MCHK 660 IOD Detected Failure System Bus Enon The error log in Example 4 4 shows the following CPUO logged the error in a system with two CPUs The External Interface Status Register does not record an error Both IOD CAP Error Registers logged an error The MC Error Info Registers 0 and captured the error information The commander at the time of the error was CPU known from MC_ERR1 eoeoeoe 8 The command on the bus at the time was a write back memory command Since this is an MCHK 660 the IOD detected the error on the bus and CPUO is logging the error CPUO registers are not important in this case since it is servicing the IOD interrupt There are three devices that can put data on the system bus CPUs memory or an IOD From MC_ERR Register 1 we know that at the time of the error CPU put bad data on the bus while writing to memory See Section 4 4 for a procedure designed to help with IOD detected errors NOTE The error log example has been edited to decrease its size registers of interest are in bold type The MC bus is the system bus Refer to Table 4 9 for information on decoding commands and refer to Table 4 10 for information on node IDs ErorLogs 425 Example 4 4 MCHK 660 IOD Detected Failure System Bus Error Logging OS System Architecture Event sequence number Timestamp of occurrence Host name System
83. Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_ TYPE Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type Long ARB_MODE PCI Arbitration Round Robin Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr lt 31 27 gt x00000000 IO Host Adr Ext Register x00000000 PCI Upper Adr Bits lt 31 25 gt x00000000 Interrupt Ctrl Register x00000003 Write Device Interrupt Info Struct Enabled Eror Logs 431 Interrupt Request Interrupt Mask0 Register Interrupt Maskl Register MC Error Info Register 0 MC Error Info Register 1 CAP Error Register PCI Bus Trans Error Adr IPA Status Register DPA Error Syndrome Reg PB Status Register DPB Error Syndrome Reg IOD SUBPACKET gt WHOAMI This Bus Bridge Phy Addr Dev Type amp Rev Register MC PCI Command Register Mem Host Address Ext Reg IO Host Adr Ext Register Interrupt Ctrl Register Interrupt Request Interrupt Mask0 Register Interrupt Maskl Register MC Error Info Register 0 MC Error Info Register 1 CAP Error Register PCI Bus Trans Error Adr IPA Status Register x00000000 Interrupts asserted x00000000 x00C50110 x00000000 xE0000000 MC Bus Trans Addr lt 31 4 gt E0Q000000 x000E88FD MC bus trans addr lt 39 32 gt x000000F
84. R bit are locked on hard errors CAP_ERR remains locked until the CAP error is written to clear each individual error bit 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1312 11 10 09 08 07 06 05 04 03 02 01 00 reserved E PIO_OVFL PERR LOST_MC_ERR SERR MC_ADR_PERR MAB NXM PTE_INV CRDA PCI_ERR_VALID CRDB RDSA RDSB MC_ERR_VALID PKW0551B 97 Eror Regisers 5 11 Table 5 5 CAP Enor Register Name Bits Type Initial State Description MC_ERR VALID RDSB RDSA CRDB CRDA NXM MC_ADR_PERR lt 31 gt lt 30 gt lt 29 gt lt 28 gt lt 27 gt lt 26 gt lt 25 gt RO RWIC RWIC RWIC RWIC RWIC RWIC Logical OR of bits lt 30 23 gt in this register When set MC_ERRO and MC_ERRI are latched Uncorrectable ECC error detected by MDPB Clear state in MDPB before clearing this bit Uncorrectable ECC error detected by MDPA Clear state in MDPA before clearing this bit Correctable ECC error detected by MDPB Clear state in MDPB_STAT before clearing this bit Correctable ECC error detected by MDPA Clear state in MDPA_STAT before clearing this bit System bus master transaction status NXM Read with Address bit lt 39 gt set but transaction not pended or transaction target above the top of memory register CPU will also get a fill er
85. RM console finds the halt assertion flag set the conditions of the environment variables auto_action boot restart and os_type NT are ignored the SRM console runs and prints the following message Halt assertion detected NVRAM power up script not executed AUTO_ACTION BOOT RESTART and OS_TYPE NT ignored if applicable P00 gt gt gt Halts Console Commands and Environment Variables B 2 B 2 Using the Halt Button Use the Halt button to halt the DIGITAL UNIX or OpenVMS operating system when it hangs or you want to use the SRM console Use the Halt button to force Windows NT systems to bring up the SRM console rather than booting or halting in AlphaBIOS Using Halt to Shut Down the Operating System You can use the Halt button if the DIGITAL UNIX or OpenVMS operating system hangs Pressing the Halt button halts the operating system back to the SRM console firmware From the console you can use the crash command to force a crash dump at the operating system level The Windows NT operating system does not support halts on this system Pressing the Halt button during a Windows NT session has no effect Using Halt to Clearthe Console Password The SRM console firmware allows you to set a password to prevent unauthorized access to the console If you forget the password the Halt button with the login command lets you clear the password and regain control of the console See Section 4 8 of your system User s Guide Halts
86. Replacement Reverse the above procedure Verification If the system powers up the CPU fan is working Removaland Replacement 6 11 6 6 Memory Riser Card Removal and Replacement CAUTION Several different memory DIMMs work in these systems Be sure you are replacing the broken DIMM with the same variant Figure 6 5 Removing Memory Riser Card IPOO216B he E l Vitii WARNING CPU modules and memory riser cards have parts that operate at high temperatures Wait 2 minutes after power is removed before touching any module Removaland Replacement 6 12 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 There are two riser cards one High and one Low After you have determined which should be removed loosen the two captive screws that secure the riser card to the card cage 4 Lift the riser card from the card cage Replacement Reverse the steps in the Removal procedure NOTE Memory DIMMs are installed in pairs and it is important that the pairs are the same size When you replace a bad DIMM be sure to replace it with the same size DIMM as the one you removed Verification DIGITAL UNIX and OpenVMS Systems 1 Bring the system up to the SRM console by pressing the Halt button if necessary 2 Issue the show memory command to display the status of the new memory 3 Ve
87. SCSI sections Figure 6 2 Exposing the System Top Cover Release Latch l oc re I T EE FEH co it c cer Z a AA 2 e A B Z E ZA BZ ee Removal and Replacement 6 6 Exposing the System CAUTION Be sure the system On Off button is in the off position before removing system covers Shutdown the operating system Press the On Off button to turn the system off 1 2 3 Unlock and open the door that exposes the storage shelf 4 Pull down the top cover latch shown in Figure 6 2 until it latches in the down position 5 Grasp the finger groove at the rear of the top cover and pull it straight back about 2 inches and then lift it off the cabinet 6 Pull a side panel back a few inches tilt the top away from the machine and lift it off Repeat for the other side Dressing the System Reverse the steps in the exposure process Removaland Replacement 6 7 6 4 CPU Removal and Replacement CAUTION Several different CPU modules work in these systems Unless you are upgrading the system be sure you are replacing the CPU you are removing with the same variant of CPU Figure 6 3 Removing CPU Module
88. SED DO NOT ABORT kzpsal Updating to All Verifying All PASSED DO NOT ABORT srmflash Updating to V6 0 3 Verifying V6 0 3 PASSED UPD gt exit Running Utilities A 21 The update command updates the device specified or all devices In this example the wildcard indicates that all devices supported by the selected update file will be updated Typically LFU requests confirmation before updating each console s or device s firmware The all option removes the update confirmation requests The exit command returns you to the console from which you entered LFU either SRM or AlphaBIOS Running Utilities A 22 A5 5 LFU Commands The commands summarized in Table A 3 are used to update system firmware Table A 3 LFU Command Summary Command Function display Shows the system physical configuration exit Terminates the LFU program help Displays the LFU command list Ifu Restarts the LFU program list Displays the inventory of update firmware on the selected device readme Lists release notes for the LFU program update Writes new firmware to the module verify Reads the firmware from the module into memory and compares it with the update firmware These commands are described in the following pages Running Utilities A 23 display The display command shows the system physical configuration Display is equivalent to issuing the SRM console command show configuration Because it shows the
89. Step 1 of the ECU provides online help It is recommended that you select this step and become familiar with the utility before proceeding Running Utilities A 4 A 4 Running RAID Standalone Configuration Utility The RAID Standalone Configuration Utility is used to set up RAID disk drives and logical units The Standalone Utility is run from the AlphaBIOS Utility menu These systems support the KZPSC xx PCI RAID controller SWXCR The KZPSC xx kit includes the controller RAID Array 230 Subsystems software and documentation 1 Start AlphaBIOS Setup If the system is in the SRM console issue the command alphabios If the system has a graphics monitor you can set the SRM console environment variable to graphics At the Utilities screen select Run Maintenance Program Press Enter In the Run Maintenance Program dialog box type swxcrmgr in the Program Name field Press Enter to execute the program The Main menu displays the following options Or Or O O O Ora HUAN OBWN FE D WO 10 View Update Configuration Automatic Configuration New Configuration Initialize Logical Drive Parity Check Rebuild Tools Select SWXCR Controller Setup Diagnostics Refer to the RAID Array Subsystems documentation for information on using the Standalone Configuration Utility to set up RAID drives Running Utilities A 5 A 5 Updating Firmware with LFU Start the Loadable Firmware Update
90. TS 64 Bit Data Transactions PCI Address Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYPE Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type Long ARB_MODE PCI Arbitration Round Robin Memory Host Addr Exten x00000000 IO Host Addr Extension x00000000 Interrupt Control x00000003 MC PCI Intr Enabled Device intr info enabled if en_int 1 Interrupt Request x00000000 Interrupts asserted x00000000 Interrupt Mask Register 0 x00C50010 Interrupt Mask Register 1 x00000000 MC Error Info Register 0 xE0000000 MC bus trans addr lt 31 4 gt x0E000000 MC Error Info Register 1 x000E88FD MC bus trans addr lt 39 32 gt x000000FD MC_Command x00000008 Device Id x0000003A CAP Error Register x00000000 no error seen PCI Bus Trans Error Adr x00000000 Eror Logs 413 DPA Status Register DPA Error Syndrome Reg PB Status Register DPB Error Syndrome Reg IOD SUBPACKET gt WHOAMI Base Address of Bridge Dev Type amp Rev Register MC PCI Command Register Memory Host Addr Exten IO Host Addr Extension Interrupt Control Interrupt Request Interrupt Mask Register 0 Interrupt Mask Register 1 MC Error Info Register 0 MC Error Info Register 1 CAP Error Register PCI Bus Trans Error Adr IPA Status Regis
91. Type Bist Base Address Register Base Address Register Base Address Register Base Address Register Base Address Register Base Address Register Expansion Rom Base Addres Interrupt Pl Interrupt P2 Min Gnt Max Lat NUBWNHE x0200 Device is 33 Mhz Capable 7 No Support for User Defineable Features Fast Back to Back to Different Targets Is Not Supported in Target Device Device Select Timing Medium x05 x010000 Mass Storage SCSI Bus Controller x10 xF8 x00 Single Function Device x00 x00101100 x04129000 x00000000 x00000000 x00000000 x00000000 x04110000 x08 x01 x00 x00 x000000FBC0001800 Slot or Device Number 3 x00011069 Mylex DAC960 KZPSC RAID Controller Vendor ID x1069 Mylex Device ID x00000001 x0107 I O Space Accesses Response Enabled Memory Space Accesses Response Enabled PCI Bus Master Capability Enabled Monitor for Special Cycle Ops DISABLED Generate Mem Wrt Invalidate Cmds DISABLED Parity Error Detection Response IGNORE Wait Cycle Address Data Stepping DISABLED SERR Sys Err Driver Capability Enabled Fast Back to Back to Many Target DISABLED x8200 Device is 33 Mhz Capable 7 No Support for User Defineable Features Fast Back to Back to Different Targets Is Not Supported in Target Device Device Select Timing Medium DETECTED PARITY ERROR This Device Detected x02 x010400 Mass Storage RAID Controller x10 XFF x00 Single Function Device x00 x00101000 x0412A000 x00000000
92. Update revision The revision of the firmware update image readme The readme command lists release notes for the LFU program update The update command writes new firmware to the module Then LFU automatically verifies the update by reading the new firmware image from the module into memory and comparing it with the source image To update more than one device you may use a wildcard but not a list For example update k updates all devices with names beginning with k and update updates all devices When you do not specify a device name LFU tries to update all devices it lists the selected devices to update and prompts before devices are updated The default is no The all option removes the update confirmation requests enabling the update to proceed without operator intervention CAUTION Never abort an update operation Aborting corrupts the firmware on the module verify The verify command reads the firmware from the module into memory and compares it with the update firmware If a module already verified successfully when you updated it but later failed tests you can use verify to tell whether the firmware has become corrupted Running Utilities A 25 A 6 Updating Rimware from AlphaBiOS Insert the CD ROM or diskette with the updated firmware and select Upgrade AlphaBIOS from the main AlphaBIOS Setup screen Use the Loadable Firmware Update LFU utility to perform the update The LFU exit command causes a system
93. Works VAX and the DIGITAL logo The following are third party trademarks Lifestyle 28 8 DATA FAX Modem is a trademark of Motorola Inc UNIX is a registered trademark in the U S and other countries licensed exclusively through X Open Company Ltd U S Robotics and Sportster are registered trademarks of U S Robotics Windows NT is a trademark of Microsoft Inc All other trademarks and registered trademarks are the property of their respective holders FCC Notice The equipment described in this manual generates uses and may emit radio frequency energy The equipment has been type tested and found to comply with the limits for a Class A digital device pursuant to Part 15 of FCC Rules which are designed to provide reasonable protection against such radio frequency interference Operation of this equipment in a residential area may cause interference in which case the user at his own expense will be required to take whatever measures are required to correct the interference Shielded Cables If shielded cables have been supplied or specified they must be used on the system in order to maintain international regulatory compliance Warning This is a Class A product In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures Achtung Dieses ist ein Ger t der Funkst rgrenzwertklasse A In Wohnbereichen k nnen bei Betrieb dieses Ger tes Rundfunkst6rungen auftreten i
94. XBUS for access Yy Print to console device and OCP v Initialize all S cache banks t Check integrity of XSROM l Pass Load first 8K of XSROM into S cache Jump to XSROM overlay in S cache Fail twice gt HANG PKW0432 96 Power Up 2 8 The Alpha chip built in self test tests the I cache at power up and upon reset Each CPU chip loads its SROM code into its I cache and starts executing it If the chip is partially functional the SROM code continues to execute However if the chip cannot perform most of its functions that CPU hangs and that CPU pass fail LED remains off In these systems the CPU pass fail LED is not visible If the system has more than one CPU and at least one passes both the SROM and XSROM power up tests the system will bring up the console The console checks the FW_SCRATCH register where evidence of the power up failure is left Upon finding the error the console sends these messages to COM1 and the OCP e COM1 or VGA Power up tests have detected a problem with your system e OCP Power up failure Power Up 2 9 Table 2 2 lists the tests performed by the SROM Table 2 2 SROM Tests Test Name Logic Tested D cache RAM March test D cache Tag RAM March test S cache Data March test S cache Tag RAM March test I cache Parity Error test D cache Parity Error test S cache Parity Error test IOD Ac
95. aaee cusedes A EEEREN EE E 1 2 1 2 Cover Interlock Circuits aleis oei aE aA O E Ni 1 3 1 3 Control Panel Assembly ano n ns e ien e esa e aes E EES EREE Eua EOS EKSi siS 1 4 1 4 Architectire Diastam arrn aoire aro raar E E EE AE EEE SASE 1 8 1 5 CPU Module Placements ioi a e aee ea e eea e a eaaa 1 10 1 6 Memory Placement ispisiri aiene aa eTa ai iTi 1 12 1 7 How Memory Addressing Is Calculated eee eeeseeeeseeeeeneeeeneeseeeeeee 1 14 1 8 System Motherboard t anes o as ene as Ea ESE TREE a 1 16 1 9 System Bus Block Diagram seeseeeeeeereesreesrrsrrserrrereserrssrrsereeeresereeerese 1 18 1 10 System Bus to PCI Bus Bridge Block Diagram ec eeseeeeeeeeneeeeeees 1 20 1 11 PCI Block Diapratn snoin fies eens favstes o ins tees ee ike tes Seve boe Stake dus Suede tiabee 1 22 1 12 Remote Control Logic cic sas Menta eeu aE Ra EE iNES cA 1 24 1 13 Power Control Logic a a iit wuld hil EEE E ENEE 1 26 1 14 Power Circuit Diagram asien a a a aiaei 1 28 1 15 Back of Power Supply and Location ssseeeseesseesreeersesreerresreseressreesreeeres 1 30 1 16 Power Up Down Sequence Flowchart esesseeeeseeseereeserrerserrrssererseerees 1 32 1 17 IC Bus Block Diagram 5 4 c d lt cicnciessiessiectanoninctescnteatevesanadarsuanieuenesndeeunls 1 34 1 18 StorageWorks Drive Location seeeeeeeeereeeeeesrserisereseressreerresereseresereee 1 36 2 1 Control Panel and LCD Display eeceeseeeseeceseeceseeeesseecsaeecseeeeseeeesae
96. ache Tag Control Parity Error Indicates that a B cache read transaction encountered bad parity in the tag control RAM B Cache Tag Address Parity Error Indicates that a B cache read transaction encountered bad parity in the tag address RAM Chip Identification Read as 5 Future update revisions to the chip will return new unique values All ones Eror Registers 54 Table 5 1 Extemal Interface Status Register continued Name Bits Type Description lt 63 36 gt SEO_HRD_ERR lt 35 gt FIL_IRD lt 34 gt EI_PAR_ERR lt 33 gt UNC_ECC_ERR lt 32 gt All ones Second External Interface Hard Error Indicates that a fill from B cache or main memory or a system address command received by the CPU has a hard error while one of the hard error bits in the EI_ STST register is already set Fill I Ref D Ref When set indicates that the error occurred during an I ref fill When clear indicates that the error occurred during a D ref fill This bit has meaning only when one of the ECC or parity error bits is set This bit is not defined for a B cache tag parity error BC_TPERR or a B cache tag control parity error BC_TC_ERR External Interface Command Address Parity Error Indicates that an address and command received by the CPU has a parity error Uncorrectable ECC Error Indicates that fill data received from outside the CPU contained an uncorrectable ECC error In parity mode this bit
97. aging the System Remotely C 25 C 7 Modem Dialog Details This section is intended to help you reprogram your modem if necessary Default Initialization and Answer Stings The modem initialization and answer command strings set at the factory for the RCM are Initialization string AT amp FOEVS0 0S12 50 lt cr gt Answer string ATXA lt cr gt NOTE All modem commands must be terminated with a lt cr gt character Ox0d hex Modifying Initialization and Answer Strings The initialization and answer strings are stored in the RCM s NVRAM They come pre programmed to support a wide selection of modems With some modems however you may need to modify the initialization string answer string or both The following SRM set and show commands are provided for this purpose To replace the initialization string POO gt gt gt set rcm init new_init_string To replace the answer string POO gt gt gt set rcm answer new_answer_string To display all the RCM strings that can be set by the user POO gt gt gt show rcem rcm answer ATXA rem_dialout rem_init AT amp FOEVS0 0S12 50 P00 gt gt gt Managing the System Remotely C 26 Initialization Sting Substitutions The following modems require modified initialization strings Modem Model Initialization String Motorola 3400 Lifestyle 28 8 at amp f0e0v0x0s0 2 AT amp T Dataport 14 4 FAX at amp f0e0v0x0s0 2 Hayes Smartmodem Optima 288 at
98. aland Replacement 6 27 6 14 Operator Control Panel Removal and Replacement Figure 6 13 Removing the OCP i g lbe lekl Ce COC yy ji ii WSS ea TA PES rA BA ns EA Cite i NA lo c oS IE h Saa Ne W PKW 0501A 97 Removaland Replacement 6 28 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 To remove the StorageWorks door a Open the door slightly and grab the left edge of the door with your left hand and the right edge of the door with your right hand b While pushing the door up bend it by pulling it away from the system The door compresses enough so its bottom post slips out of its retaining hole c Once the bottom of the door is free gently pull the top down to release it from the post on the door jam and release it from the spring d Put the door aside 4 Using a Phillips head screwdriver remove the nine screws holding the molded plastic front panel to the system Six screws are accessed from the front of the system and three through the fan compartment of the system 5 Tilt the front panel away from the system and disconnect all the cables fr
99. always set and the last three indicate the node Bit to text translations give six bit data although only the last three bits define the node Table 4 10 Node IDs Node ID lt 2 0 gt Six Bit Hex Node 000 001 010 011 100 101 110 111 38 39 3A 3B 3C 3D NA NA Memory CPUO CPU1 IODO on Mbrd IOD1 on Mbrd NA NA ErorLogs 4 50 4 5 Double Eror Halts and Machine Checks While in PAL Mode Two error cases require special attention Neither double error halts or machine checks while the machine is in PAL mode result in error log entries Nevertheless information is available that can help determine what error occurred 4 5 1 PALcode Overview PALcode privileged architecture library code is used to implement a number of functions at the machine level without the use of microcode This allows operating systems to make common calls to PALcode routines without knowing the hardware specifics of each system the operating system is running on PALcode routines handle e Instructions that require complex sequencing such as atomic operations e Instructions that require VAX style interlocked memory access e Privileged instructions e Memory management e Context swapping e Interrupt and exception dispatching e Power up initialization and booting e Console functions e Emulation of instructions with no hardware support ErorLogs 451 4 5 2 Double Eror Halt A double error halt occurs under the foll
100. ame AS1200I0 AS1200CP The function table displays followed by the UPD gt prompt Console firmware can now be updated UPD gt exit 8 Running Utilities A 17 The update command updates the device specified or all devices For each device you are asked to confirm that you want to update the firmware The default is no Once the update begins do not abort the operation Doing so will corrupt the firmware on the module The Ifu command restarts the utility so that console firmware can be updated Another method is shown in Example A 6 where the user specifies the file AS1200FW and is prompted to insert the second diskette The default update file AS1200CP is selected The console firmware can now be updated using the same procedure as for the I O firmware The exit command returns you to the console from which you entered LFU either SRM or AlphaBIOS Example A 6 Selecting AS1200FW to Update Firmware from the Floppy Disk POO gt gt gt lfu xx xx x Toadable Firmware Update Utility Select firmware load device cda0 dva0 ewa0 or Press lt return gt to bypass loading and proceed to LFU dva0 Pleas nter the name of the firmware files list or Press lt return gt to use the default filename AS1200I0 AS1200CP as1200fw Copying AS1200FW from DVAO Copying TCREADME from DVAO Copying TCSRMROM from DVAO sae Secdiste aa ua aece ale due euslaneide Copying TCARCROM fro
101. ariables Copy it and record the settings for each system Use the show command to list environment variable settings Table B 4 Environment Variables Worksheet Environment Variable System Name System Name System Name auto_action bootdef_dev boot_osflags com1_baud com2_baud console cpu_enabled ew 0_mode ew 0_protocols kbd_hardware_ type kzpsa _host_id language memory_test ocp_text os_type pci_parity pk 0_fast pk 0_host_id Halts Console Commands and Environment Variables B 10 Table B 4 Environment Variables Worksheet C ontinued Environment Variable System Name System Name System Name pk 0_soft_term sys_model_num sys_serial_num sys_type tga_sync_green tt_allow_login Halts Console Commands and Environment Variables B 11 Appendix C Managing the System Remotely This chapter describes how to manage the system from a remote location using the remote console manager RCM You can use the RCM from a console terminal at a remote location You can also use the RCM from the local console terminal Sections in this chapter are RCM Overview First Time Setup RCM Commands Dial Out Alerts Using the RCM Switchpack Troubleshooting Guide Modem Dialog Details Managing the System Remotely C 1 C 1 RCM Overview The remote console manager RCM
102. ating Firmware from the Hoppy Disk Performing the Update Insert an update diskette see Section A 5 2 into the floppy drive Start LFU and select dva0 as the load device Example A 5 Updating Firmware from the Hoppy Disk xx x x Loadable Firmware Update Utility Select firmware load device cda0 dva0 ewa0 or Press lt return gt to bypass loading and proceed to LFU dva0 1 Pleas nter the name of the options firmware files list or Press lt return gt to use the default filename AS1200I0 AS1200CP AS120010 2 Copying AS12001I0 from DVAO Copying TCREADME from DVAO Copying CIPCA315 from DVAO Copying DFPAA252 from DVAO Copying KZPSAA11 from DVAO The function table displays followed by the UPD gt prompt as shown in Example A 3 UPD gt list Device Current Revision Filename Update Revision AlphaBIOS V5 12 3 arcrom Missing file pfi0 2 46 dfpaa_fw 2 92 srmflash T3 2 21 srmrom Missing file cipca_fw A315 kzpsa_fw All Continued on next page Running Utilities A 15 Select the device from which firmware will be loaded The choices are the internal CD ROM the internal floppy disk or a network device In this example the internal floppy disk is selected Select the file that has the firmware update or press Enter to select the default file When the internal floppy disk is the load device the file options are AS1200CP default SRM console and AlphaBIOS c
103. bus Note that they are affected by the commander in charge of the bus during the transaction The command is a six bit field in the command address bits lt 5 0 gt Bit to text translations give six bit data the top two bits may or may not be relevant Note that address bit lt 39 gt defines the command as being either a system space or an I O command Table 4 9 Decoding Commands MC_CMD CMD MCADR 54 3210 in Hex 39 gt Description IOD XX 0000 X0 1 Mem Idle Y 00 0010 02 1 Write Pend Ack Y XX 0011 X3 1 Mem Refresh XX 0101 X4 0 Set Dirty x0 0110 0 2 6 0 Write Thru Mem x0 0110 0 2 6 1 Write Thru I O x 1 0110 3 16 0 Write Back Mem x 1 0110 3 1 6 1 Write Intr I O Y 00 0111 07 0 Write Full Mem Y 10 0111 27 0 Write Part Mem Y x0 0111 0 27 1 Write Mask I O Y x0 0111 0 27 0 Write Merge Mem Y XX 1000 X8 0 ReadO Mem Y XX 1000 X8 1 Read0 I O XX 1001 X9 0 Read Mem Y XX 1001 x9 1 Read I O 0 XX 1010 XA Read Mod0 Mem Y Eror Logs 4 49 Table 4 9 Decoding Commands continued MC_CMD CMD MCADR 54 3210 in Hex 39 gt Description IOD XX 1010 XA 1 Read Peer0 I O Y XX 1011 XB 0 Read Mod1 Mem Y XX 1011 XB 1 Read Peer I O Y 10 1100 2C 1 FILLO due to Y ReadO Peer0 10 1101 2D 1 FILL 1 due to Y Read1 Peer1 XX 1110 XE ReadO Mem XxX 1111 XF Read1 Mem 4 4 11 Node IDs The node ID is a six bit field in the command address bits lt 38 33 gt The high order three bits are
104. cess test D cache access D cache data D cache address logic D cache tag store RAM D cache bank address logic S cache RAM cells S cache data path S cache address path S cache tag store RAM S cache bank address logic I cache parity error detection ISCR register and error forcing logic IC_PERR_STAT register and reporting logic D cache parity error detection DC_MODE register and parity error forcing logic DC_PERR_STAT register and reporting logic S cache parity error detection SC_CTL register and parity error forcing logic SC_STAT register and reporting logic Access to IOD CSRs data path through CAP chip and MDPO on each IOD PCIO A D lines lt 31 0 gt Power Up 2 10 2 4 SROM Enors Reported The SROM reports machine checks pending interrupt exception errors and errors related to corruption of FEPROM 0 If SROM errors are fatal the particular CPU will hang and only the CPU self test pass LEDs and or the LEDs on the system motherboard will indicate the failure The CPU self test pass LED is not visible but the IODO and IOD1 pass LEDs are Example 2 1 SROM Enorss Reported at Power Up Unexpected Machine Check CPU Enor UNEX MCHK on CPU 0 EXC_ADR 42a9 EI STAT fffffffOO4ffffff EI ADDR ffffff000000801F SC_STAT 0 SC_ADDR FFFFFF0000005F2F Pending Intenupt Exception CPU Enon INT EXC on CPUO ISR 400000 EI_STAT fffffffOO7ffffff EI ADDR ffffff7fffffffdf FIL SYN 631B BCTGADR ffffffa7fffcafff
105. checked 4 Parity Error and Fill Parity errors are forced on the address and Error tests data lines on system bus and PCI buses A fill error transaction is forced on the system bus 5 Translation Error test A loopback test using scatter gather address translation logic on each IOD 6 Write Pending test Runs test 2 with the write pending bit set and clear in the CAP chip control register 7 PCI Loopback test Loops data through each PCI on each IOD testing the mask field of the system bus 8 PCI Peer to Peer Tests that devices on the same PCI and on Byte Mask test different PCIs can communicate 9 Page Table Entry test Tests every PTE using scatter gather 1 CAP chip translation and addressing 10 Page Table Entry test Tests random PTEs forcing use of all 2 CAP chip interesting tag and page registers Not run on power up These tests take approximately 30 seconds and are run in user mode Power Up 2 16 Table 2 6 PCI Motherboard Tests Test Diagnostic Number Test Name Name Description 1 PCEB pceb_diag Tests the PCI to EISA bridge chip 2 ESC esc_diag Tests the EISA system controller 3 8K NVRAM nvram_diag Tests the NVRAM 4 Real Time Clock ds1287_diag Tests the real time clock chip 5 Keyboard and 18242 diag Tests the keyboard mouse chip Mouse 6 Flash ROM flash_diag Dumps contents of flash ROM 7 Serial and combo_diag Tests COM ports 1 and 2 the Parallel Ports and parallel port and the floppy Floppy 8 CD ROM
106. cits Su oleic alin Oh nies E Ait A 22 A 6 A 7 Updating Firmware from AlphaBIOS 0 ee eeeseeeseeeeseeeeseeeesneeeeneers A 25 Upgrading AlphaBIOS 0 0 0 0 ceeeeeseeeseecsneeceseeeesaeecsaeesseecsseesesaeeesaeers A 26 Appendix B___ Halts Console Commands B 1 B 2 B 3 B 1 B 1 1 B 2 and Environment Variables Halt Button Functions iia deci aici ates Sis Aiea ae Marea B 2 Usine the Halt Button esis t tees be a a item eh tet tae a B 3 Halt Assertion sirare a ra hatcaniayssc sien taaat ieee sian B 4 Summary of SRM Console Commands cceeseesseeesseeceseeceneeeeseeeesaes B 6 Summary of SRM Environment Variables cceeseeeeseeeeseeeeeeneeesneees B 8 Recording Environment Variables 0 escesscccesseeesneeseneeseseeeeseeeesaes B 10 Appendix C Running Utilities C 1 RECM Overvie Win teiar eeen oaeee bank eeren a e Eae heehee setae C 2 C 2 First Time Setups sivaiisiveshdvediiveihresiihaitnralinheinnrainiand C 3 C 2 1 Configuring the Modeni eroine i a a C 4 C 2 2 Dialing In and Invoking RCM 0 00 eee eesecesseeessneeeseeseeeesseecesaeeesaeessaeers C 5 23 Using RCM Locally oeddet ses case e ees Sth sauteed ee dea rtet hehe deastet as C 6 C 3 RCM Commands irsi3isssiess rates nition diag a E E ahr C 7 C 4 Dial Out Alerts e ae raaa eaae raae tee stee E oaae eE ea AA Lantos C 16 C 5 Using the RCM Switchpack cee ceeeeseccssseessneecsneeseseeceseecesaeeeseeesaeers C 19 C 6 Troubleshooting Guide
107. cond level data cache bus interface _unit CPU Variants Module Variant Clock Mequency Onboard Cache Color B3007 AA 400 MHz 4 Mbytes Orange B3007 CA _533 MHz _4 Mbytes _ Violet CPU Configuration Rules e The first CPU must be in CPU slot 0 to provide the system clock e The second CPU should be installed in CPU slot 1 e Both CPUs must have the same Alpha chip clock speed The system bus may hang without an error message if the oscillators clocking the CPUs are different System Overview 1 11 1 6 Memory Memory consists of two riser cards and up to eight pairs of DIMMs Each riser card receives one of the two DIMMs in the DIMM pair There are two DIMM variants a 32 Mbyte version and a 128 Mbyte version Figure 1 6 Memory Placement Power connectors Floppy connector Bulkhead connectors PCI 0 Slot 2 PCI 0 Slot 3 PCI 0 Slot 4 PCI 1 Slot 2 PCI 1 Slot 3 PCI 1 Slot 4 EISA ISA Slot Fan connectors CPU 0 MEM L CPU 1 RCM Switch eee pack S LEDs gt PCI Bridges pal Internal SCSI a connector N RCM power down connector Speaker 0 connector OCP connector PKW0504B 97 Sy
108. console dev Check for illegal memory config Print warnings to console dev and OCP Initialize all memory pairs Secondaries alerted that console has started They jump to and run PALcode joining the console Note The XSROM can only print to the console device if the environment variable console serial It always sends output to the OCP PKW0432A 96 XSROM tests are described in Table 2 3 Failure indicates a CPU failure PowerUp 2 12 After jumping to the primary CPU s S cache the code then intentionally I caches itself and is completely register based no D stream for stack or data storage is used The only D stream accesses are writes reads during testing Each FEPROM has sixteen 64 Kbyte sectors The first sector contains B cache tests memory tests and a fail safe loader The second sector contains support for system memory and PALcode The third sector contains a copy of the first sector The remaining thirteen sectors contain the SRM console and decompression code NOTE Memory tests are run during power up and reset see Table 2 4 They are also affected by the state of the memory_test environment variable which can have the following values FULL Test all memory PARTIAL Test up to the first 256 Mbytes NONE Test 32 Mbytes Table 2 3 XSROM Tests Test TestName Logic Tested 11 B cache Data March test B cache data RAMs CPU chip B cache control CPU chip B cache addre
109. d reliable layers of the XSROM are loaded sequentially into the processor chip on each CPU None of the SROM or XSROM power up tests are run from memory all run from the caches in the CPU chip thus providing excellent diagnostic isolation Later power up tests run under the console are used to complete testing of the I O subsystem There are two console programs the SRM console and the AlphaBIOS console as detailed in your system User s Guide By default the SRM console is always loaded and I O system tests are run under it before the system loads AlphaBIOS To load AlphaBIOS the os_type environment variable must be set to NT and halt assertion must be clear Otherwise the SRM console continues to run PowerUp 2 7 2 3 SROM Power Up Test How The SROM tests the CPU chip and the path to the XSROM Figure 2 5 SROM Power Up Test How Yes HANG lt N HANG Pirs Yes HANG For each CPU Initialize CPU chip Turn off CPU LED D cache errors y No All 3 S cache banks pass Yes Dupilcate Tag or Fill errors No Vv Light CPU LED AA Determine Primary v Size IOD i Loopback on each IOD l Pass Light IOD LEDs Fail y Initialize chip PCI EISA bridge v Read TOY NVRAM AA to COM port 1 Initialize Combo Chip on XBUS for access y Initialize OCP port to OCP display on
110. date on AlphaBIOS LY N y 6 DO NOT ABORT AlphaBIOS Updating to V6 40 1 Verifying V6 40 1 PASSED Confirm update on srmflash Y N y DO NOT ABORT srmflash Updating to V6 0 3 Verifying V6 0 3 PASSED UPD gt exit 7 Running Utilities A 11 The update command updates the device specified or all devices In this example the wildcard indicates that all devices supported by the selected update file will be updated For each device you are asked to confirm that you want to update the firmware The default is no Once the update begins do not abort the operation Doing so will corrupt the firmware on the module The exit command returns you to the console from which you entered LFU either SRM or AlphaBIOS Running Utilities A 12 A 5 2 Updating Firmware from the Hoppy Disk Creating the Diskettes Create the update diskettes before starting LFU See Section A 4 3 for an example of the update procedure Table A 2 File Locations for Creating Update Diskettes on a PC Console Update Diskette I O Update Diskette AS1200FW TXT AS120010 TXT AS1200CP TXT TCREADME SYS TCREADME SYS CIPCA315 SYS TCSRMROM SYS DFPAA310 SYS TCARCROM SYS KZPAAAII SYS To update system firmware from floppy disk you first must create the firmware update diskettes You will need to create two diskettes one for console updates and one for I O 1 Download the update files from the Internet 2 Ona PC copy files onto two FAT
111. default of 9600 You must disable RCM to select a baud rate other than 9600 e Switch 2 MODEM OFF Set this switch to ON disable if you want to prevent the use of the RCM for monitoring a system remotely RCM commands can still be run from the local serial console terminal e Switch 3 RPD DIS Set this switch to ON disable if you want to disable the poweroff command With poweroff disabled the monitored system cannot be powered down from the RCM e Switch 4 SET DEF Set this switch to ON enable if you want to reset the RCM to the factory settings See the section Resetting the RCM to Factory Defaults Changing a Switch Setting The RCM switches are numbered on the system board The default positions are shown in Figure C 3 To change a switch setting 1 Turn off the system 2 Unplug the AC power cords NOTE If you do not unplug the power cords the new setting will not take effect when you power up the system 3 Remove the system covers See Section 6 3 4 Locate the RCM switchpack on the system board and change the switch setting as desired 5 Replace the system covers and plug in the power cords 6 Power up the system to the SRM console prompt and type the escape sequence to enter RCM command mode if desired Managing the System Remotely C 21 Resetting the RCM to Factory Defaults You can reset the RCM to factory settings if desired You would need to do this if you forgot the escape sequence for
112. e Max Voltage Max Current 5 0 4 90 5 25 52 3 43 3 400 3 465 37 4 12 11 5 12 6 17 12 13 2 10 9 0 5 5 0 5 5 4 6 0 2 Vaux 4 85 5 25 0 6 e Remote sense on 5 0V and 3 43V 5 0V is sensed on the system motherboard 3 43V is sensed on all CPUs in the system and the system bus motherboard e Current share on 5 0V 3 43V and 12V e 1 regulation on 3 43V e Fault protection latched If a fault is detected by the power supply it will shut down The power supply faults detected are Fan Failure Over voltage Overcurrent Power overload e DC_ENABLE L input signal starts the DC outputs e SHUTDOWN_H input signal shuts the power supply off in case of a system fan or CPU fan failure e POK_H output signal indicates that the power supply is operating properly System Overview 1 31 1 11 Power Up Down Sequence System power can be controlled manually by the On Off button on the OCP or remotely through the RCM The power up down sequence flow is shown below Figure 1 16 Power Up Down Sequence Howchart Apply AC Power v Vaux on te On Off Off Button ff Q On On Off On T Button EE Assert DC_ENABLE L A Vv lt Power Supply Starts a Yes Disable Outputs 4 Deassert POK 4 No Assert Assert SHUTDOWN POK On On Off Off Button 30 Second 4 No _ Fan Temp Delay OK Yes PKW 0513A 97 Sy
113. e PALcode to start the SRM console The primary CPU prints a message indicating that it is running the console Starting with this message the power up display is printed to the default console terminal regardless of the state of the console environment variable If console is set to graphics the display from here to the end is saved in a memory buffer and printed to the graphics monitor after the PCI buses are sized and the graphics device is initialized The size and type of each memory pair is determined The console is started on each of the secondary CPUs A status message prints for each CPU The PCI bridges indicated as IODn are probed and the devices are reported T O adapters are configured The SRM console banner and prompt are printed The SRM prompt is shown in this manual as POO gt gt gt It can however be PO1 gt gt gt If the auto_action environment variable is set to boot or restart and the os_type environment variable is set to unix or openvms the DIGITAL UNIX or OpenVMS operating system boots If the system is running the Windows NT operating system the os_type environment variable is set to nt the SRM console loads and starts the AlphaBIOS console and does not print the SRM banner or prompt Power Up 2 23 2 10 Fail Safe Loader The fail safe loader is a software routine that loads the SRM console image from floppy Once the console is running you will want to run LFU to update FEPROM 0 with a new
114. e default event log file SYS ERRORLOG ERRLOG SYS enter the following command OpenVMS DIAGNOSE DIGITAL UNIX gt dia a The DIAGNOSE command allows DECevent to use built in defaults This command produces a full report directed to the terminal screen from the input event file SYS ERRORLOG ERRLOG SYS The TRANSLATE qualifier is understood on the command line To selectan altemate input file OpenVMS DIAGNOSE ERRORLOG OLD DIGITAL UNIX gt dia a f syserr old hostname These commands select an alternate input file ERRORLOG OLD or syserr old as the event log to translate The file name can contain the directory or path if needed Wildcard characters can be used To send reports to an output file OpenVMS DIAGNOSE OUTPUT ERRLOG_OLD TXT DIGITAL UNIX gt dia a gt syserr old txt These commands direct the output of DECevent to ERRLOG_OLD TXT or syserr old txt ErorLogs 47 To reverse the order of the input events OpenVMS DIAGNOSE TRANSLATE REVERSE DIGITAL UNIX gt dia R These commands reverse the order in which events are displayed The default order is forward chronologically 4 2 2 Filtering Events INCLUDE and EXCLUDE qualifiers allow you to filter input event log files The INCLUDE qualifier is used to create output for devices named in the command OpenVMS DIAGNOSE TRANSLATE INCLUDE DISK RZ DISK RA92 CPU DIGITAL U
115. e of uncorrec Cycle 0 Cycle 1 Cycle 2 Cycle 3 Palcode ECC ECC ECC ECC Rev table read error Syndrome x00000000 Syndrome x00000000 Syndrome x00000000 Syndrome x00000000 eae Error Logs 4 19 4 3 3 MCHK 670 Read Dirty C PU Detec ted Failure The error log in Example 4 3 shows the following CPUO logged the error in a system with two CPUs The External Interface Status Register records an uncorrectable ECC error from the system bit lt 30 gt set Both IOD CAP Error Registers logged an error The MC Error Info Registers 0 and have captured the error information The commander at the time of the error was CPUO known from MC_ERR1 The command on the bus at the time was a read memory command The address read was a memory address not an I O address eoogo od 886 The data associated with the read was dirty From this information you know CPUO requested data that was dirty therefore memory did not provide it nor did an I O device Only another CPU could have provided the data from its cache There is only one other CPU in this system and it is faulty See Section 4 4 for a procedure designed to help with IOD detected errors NOTE The error log example has been edited to decrease its size registers of interest are in bold type The MC bus is the system bus Refer to Table 4 9 for information on decoding commands and refer to Table 4 10 for information on node IDs ErorLogs
116. e present If the power supplies are receiving AC power Vaux is present on the system motherboard regardless of the condition of the On Off switch e When the Halt button LED is lit and the On Off button LED is on the system should be running either the SRM console or Windows NT PowerUp 2 2 Table 2 1 Control Panel Display Field Content Display Meaning 0 CPU number PO P1 CPU reporting status 2 Status TEST Tests are executing FAIL Failure has been detected MCHK Machine check has occurred INTR Error interrupt has occurred Test number 4 Suspected device CPU0 1 CPU module number MEMO 7 and L Memory pair number and low H or DIMM high DIMM or either IODO Bridge to PCI bus 0 IOD1 Bridge to PCI bus 1 FROMO Flash ROM COMBO COM controller PCEB PCI to EISA bridge ESC EISA system controller NVRAM Nonvolatile RAM TOY Real time clock 18242 Keyboard and mouse controller The potentiometer accessible through the access hole just above the Reset button controls the intensity of the LCD Use a small Phillips head screwdriver to adjust On the system motherboard 54 25147 01 Power Up 2 3 2 2 Power Up Sequence Console and most power up tests reside on the I O subsystem not on the CPU nor on any other module on the system bus Figure 2 2 Power Up How v Power Up Reset XSROM tests execute SROM code loaded into each CPU s l cache
117. eature to notify you of a power failure within the system When a dial out alert is triggered the RCM initializes the modem for dial out sends the dial out string hangs up the modem and reconfigures the modem for dial in The modem must continue to be powered and the phone line must remain active for the dial out alert feature to work Also if you are connected to the system remotely the dial out feature does not work Enabling Dial Out Alerts 1 Enter the set rem_dialout command followed by a dial out alert string from the SRM console see in Example C 3 See the next topic for details on composing the modem dial out string 2 Invoke the RCM and enter the enable command to enable remote access dial in The RCM status command should display Remote Access Enable See 3 Enter the alert_ena command to enable outgoing alerts See Example C 3 Configuring the Modem for Dial Out Alerts POO gt gt gt set rcm dialout ATDTstring RCM gt enable RCM gt status Remote Access Enable e RCM gt alert_ena Managing the System Remotely C 16 Composing the Dial Out Sting Enter the set rem_dialout command from the SRM console to compose the dial out string Use the show command to verify the string See Example C 4 Example C 4 Typical RCM Dial Out Command POO gt gt gt set rem dialout ATXDT9 15085553333 5085553332 POO gt gt gt show rcem_dialout rem_dialout ATXDT9 15085553333
118. eeesseeeesaeeesneers 1 28 POW SUPPLY so heo aeae EEK EEE AEAEE KEE E AE syste suesveshoressde 1 30 Power Up Down Sequence e seseessesesseeseseersererrssrerssreersrerersrreersreeeseee 1 32 Maintenance Bus C BUuS ccccccceessesessecessecsescscesvscesssceseeessesesceseaces 1 34 Storage Works ssivei deal eave ch Sine ein ach eines 1 36 Chapter 2 Power Up 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 Control Patel aeiee iene e seeks AEE EAE EE E EE dub cast 2 2 Power Up Sequence sridi toiit nasienia aires EKTE ESTEE 2 4 SROM Power Up Test FIOW ssesseseeseesesseesesreseeseeserreessrrersereersereessererssee 2 8 SROM Errors Reported deir pe aE a ri Taa eE Ears 2 11 XSROM Power Up Test FlOW sssesseeessesseseesssserrresrerrererersereersereersrrerssree 2 12 XSROM Errors Reported caprese naaa erai EE TE i 2 15 Console Power Up Tests csosscisteesscasisvssiestevscsetsesscedsdvessovssveceevstveseossennsss 2 16 Console Device Determination eeeeeeeesseecsseeseseeeeeeeessaeeesaeessneeeees 2 18 Console Power Up Display esccsseccsssecesseecsseecsseeceseeeesseessaeesseeeees 2 20 Fail Safe Loader iise ue a RE ETa E A R TAR 2 24 Chapter3 Troubleshooting 3 1 Troubleshooting with LEDS cceeesesescecsseecsseeesseeeesaeecsaeecseeseneeeesaes 3 2 3 2 Troubleshooting Power Problems esccseseeceseecsseeeeseeessseecseeesneeessaes 3 4 3 3 Running Diagnostics Test Command ee eeeeee
119. emotely C 4 C 2 2 Dialing In and Invoking RCM To dial in to the RCM modem port dial the modem enter the modem password at the prompt and type the escape sequence Use the hangup command to terminate the session A sample dial in dialog would look similar to the following Example C 1 Sample Remote Dial In Dialog ATQOV1E1S0 0 1 OK ATDT30167 CONNECT 9600 RCM V2 0 RCM gt Dialing In and Invoking RCM 1 Dial the number for the modem connected to the modem port See in Example C 1 for an example 2 The RCM prompts for a password with a character See 2 Enter the password that you set with the setpass command You have three tries to correctly enter the password After three incorrect tries the connection is terminated and the modem is not answered again for 5 minutes When you successfully enter the password the RCM banner is displayed See You are connected to the system COM1 port and you have control of the SRM console NOTE At this point no one at the local terminal can perform any tasks except for typing the RCM escape sequence The local terminal displays any SRM console output entered remotely 3 Type the RCM escape sequence not echoed ATST rem RCM gt NOTE From RCM command mode you can change the escape sequence for invoking RCM if desired Use the setese command to change the sequence Be sure to record the new escape sequence Managing the System Remo
120. er card slot 0 e Other memory pairs must be the same size or smaller than the first memory pair e Memory pairs must be installed in consecutive slots e Memory configurations that have a 64 Mbyte pair in riser card slot 0 are limited to two DIMM pairs or 128 Mbytes for the system The reason for this restriction is that the bit map describing memory holes can grow larger than physical memory System Overview 1 13 1 7 Memory Addressing Memory addressing in these systems is fixed regardless of the size of the DIMMs The address of a DIMM pair is fixed according to the slot in which the pair is placed The starting address of each pair in each slot on the riser card starts on a 512 Mbyte boundary Figure 1 7 How Memory Addressing Is Calculated Address Space Gbytes Riser Card 4 0 Slot 3 5 e0000000 3 0 c0000000 2 5 a0000000 MN Q 2 0 80000000 1 5 60000000 1 0 40000000 5 20000000 AVA 0 00000000 PKW0505 97 System Overview 1 14 The rules for addressing memory are as follows 1 A memory pair consists of two DIMMs of the same size 2 Memory pairs in riser cards may be of different sizes 3 The memory pair in slot 0 must be the largest of all memory pairs Other memory pairs may be as large but none may be larger 4 The physical starting address of each memory pair is N t
121. errors that are recorded after the error has occurred Data on exactly what was going on in the machine at the time of the error may not be known They are fatal errors MCHK 630 Processor correctable errors MCHK 620 System correctable errors Last fail Used to collect system bus registers prior to crashing T O error interrupt System environment Configuration IOD error interrupts Used to provide status on power fans and temperature Used to provide system configuration information ErorLogs 45 4 2 Using DECevent DECevent produces bit to text ASCII reports derived from system event entries or user supplied event logs The format of the reports is determined by commands qualifiers parameters and keywords appended to the comand The maximum command line length is 255 characters DECevent allows you to do the following Translate event log files into readable reports Select alternate input and output files Filter input events Select alternative reports Translate events as they occur Maintain and customize your environment with the interactive shell commands To access on line help OpenVMS HELP DIAGNOSE or DIA INTERACTIVE DIA gt HELP DIGITAL UNIX gt man dia or gt dia hlp Privileges necessary to use DEC event SYSPRV for the utility DIAGNOSE to use the CONTINUOUS qualifier Eror Logs 46 4 2 1 Translating Event Files To produce a translated event report using th
122. ff command RCM C 11 poweron command RCM C 12 Power up SROM and XSROM messages during 2 19 Power up display 2 20 Power up sequence 2 4 Processor determining primary 2 21 Processor correctable error 4 5 Processor machine checks 4 5 Q quit command RCM C 12 R RCM C 2 C 19 changing settings on switchpack C 20 command summary C 7 dial out alerts C 16 invoking and leaving command mode C 6 modem dialog details C 26 modem use C 3 remote dial in C 5 resetting to factory defaults C 22 switchpack C 19 switchpack defaults C 20 switchpack location C 19 troubleshooting C 23 typical dialout command C 17 RCM commands C 11 alert_clr C 8 alert_dis C 8 alert_ena C 8 disable C 9 enable C 9 halt C 10 haltin C 11 haltout C 11 hangup C 10 help C 11 poweroff C 11 poweron C 12 Index 4 quit C 12 reset C 12 setesc C 13 setpass C 13 status C 14 readme command LFU A 21 A 23 Registers 5 1 Remote console manager See RCM Remote control switch 1 25 Remote dial in RCM C 5 reset command RCM C 12 S Safety guidelines 6 1 SCSI cables 6 4 SCSI Disk removal and replacement 6 34 SCSI bus extender removal and replacement 6 38 Secure mode releasing 3 7 Serial ports 1 23 Serial terminal 2 19 setesc command RCM C 13 setpass command RCM C 13 Soft errors categories of 4 4 SRM console 1 7 2 23 SROM 2 21 defined 2 4 errors 2 11 power up test flow 2 8 test
123. gt NOTE The console prompt displays only after the entire power up sequence is complete This can take up to several minutes if the memory is very large AlphaBilOS Boot Menu On systems running the Windows NT operating system the Boot menu is displayed when the AlphaBIOS console is invoked AlphaBlOS 5 32 Please select the operating system to start Windows NT Server 4 0 Use t and mam to move the highlight to your choice Press Enter to choose I AlphaServer 1200 Family Press lt F2 gt to enter SETUP PKW0560 97 System Overview 1 6 SRM Console The SRM console is a command line interface that is used to boot the DIGITAL UNIX and OpenVMS operating systems It also provides support for examining and modifying the system state and configuring and testing the system The SRM console can be run from a serial terminal or a graphics monitor AlphaBlOS Console The AlphaBIOS console is a menu based interface that supports the Microsoft Windows NT operating system AlphaBIOS is used to set up operating system selections boot Windows NT and display information about the system configuration The EISA Configuration Utility and the RAID Standalone Configuration Utility are run from the AlphaBIOS console AlphaBIOS runs on either a serial or graphics terminal Windows NT requires a graphics monitor Environment Variables Environment variables are software parameters that define among other things the system configurat
124. gt gt gt info 3 cpu00 per_cpu impure area 00004400 cns flag 00000001 0000 cns flag 4 00000000 0004 cns hlt 00000000 0008 cns hlt 4 00000000 000c Eror Logs 452 cns mchkflag cns mchkflagt 4 cnsS exc_addr cnsSexc_addr 4 cnsSpal_base cns pal_baset 4 cns mm_stat cnsSmm_stat 4 cns va cns va 4 cns icsr cnsSicsr 4 cnsSipl cnsS ipl 4 cns ps cns ps 4 cnsS itb_asn cnsSitb_asn 4 cnsSaster cnsSastert 4 cnsSastrr cnsSastrrt 4 cns isr cns isr 4 cnsSivptbr cnsS ivptbrt 4 cns mcsr cns mcsr 4 cns dc_mode cns dc_mode 4 cns maf_mode cns maf_modet 4 cns sirr cns sirr 4 cns fpcsr cns fpcsrt 4 cnsS icperr_stat cnsS icperr_stat 4 cns pmctr cns pmctr 4 cns exc_sum cnsSexc_sum 4 cns exc_mask cns exc_mask 4 cns intid cns intid 4 cns dcperr_stat cns dcperr_stat 4 cns sc_stat cns sc_stat 4 cns sc_addr cns sc_addr 4 ens sc_ctl cns sc_ctl 4 cnsS bc_tag_addr cnsS bc_tag_addr 4 cns ei_stat cns ei_stat 4 ens fill_syn ens fill_syn 4 cns 1d_lock cns 1d_lock 4 00000228 00000000 20000004 00000000 00000000 00000000 0000da10 00000000 00080000 00000002 40000000 000000c1 0000001f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00400000 00000000 00000000 00000002 00000000 00000000 00000001 00000000 00000080 00000000 00000000 00000000 00000000 900000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000016 00000000
125. haServer 4000 1200 Series x00000002 1 x00000001 1 O S claims event is valid 1 Severe Priority 100 CPU Machine Check Errors 1 Machine check 670 entry x0000000300000000 IOD 1 Register Subpkt Pres IOD 2 Register Subpkt Pres x00000002 x00000000 C1563 x0000 00000000 00000000 x0000 x0098 x00000000 x00000000 00000000 x00000000 x00000001401A7A90 x00000000000021 x00000000ECE77A58 x000000012005A8B4 Native mode instruction Exception PC x0000000048016A2D x00000000 00000000 x00000000020000 Base addr for palcode x0000000008 x00000000 AST requests 3 0 x00000000 x000000C164000000 Timeout Bit Not Set Floating Point Instr may be issued PAL Shadow Registers Enabled Correctable Err Intrpts Enabled ICACHE BIST Successful TEST_STATUS_H Pin Asserted x00000000 x00000000 Eror Logs 416 Virtual Address Reg Memory Mgmt Flt Sts Reg Scache Address Reg Scache Status Reg Bcache Tag Address Reg Ext Interface Address Reg Fill Syndrome Reg Ext Interface Status Reg LD LOCK IOD SUBPACKET gt WHOAMI Base Address of Bridge Dev Type amp Rev Register MC PCI Command Register Memory Host Addr Exten IO Host Addr Extension Interrupt Control Interrupt Request Interrupt Mask Register 0 Interrupt Mask Register 1 MC Error Info Register 0 MC Error Info Register 1 x00000001407D6000 x00000000011A10 Ref resulted in DTB miss RA Field x0000000008 Opcode Field x00000000000023 xFFFFFF0O0000254BF
126. hapter 5 Error Registers This chapter describes the registers used to hold error information These registers include e External Interface Status Register e External Interface Address Register e MC Error Information Register 0 e MC Error Information Register 1 e CAP Error Register e PCI Error Status Register 1 Eror Registers 51 5 1 Extemal Interface Status Register Hl SIAT The EI_STAT register is a read only register that is unlocked and cleared by any PALcode read A read of this register also unlocks the EI ADDR BC_TAG_ADDR and FILL_SYN registers subject to some restrictions The EI_STAT register is not unlocked or cleared by reset Address FF FFFO 0168 Type R 2130 29 28 o7 24 23 CHIP_ID lt 3 0 gt BC_TPERR BC_TC_PERR ELES COR_ECC_ERR 61 36l3534 33 32 All 15 pi SEO_HRD_ERR a FIL_IRD El PAR_ERR UNC_ECC ERR PKW0453 96 Eror Registers 52 Fill data from B cache or main memory could have correctable or uncorrectable errors in ECC mode System address command parity errors are always treated as uncorrectable hard errors irrespective of the mode The sequence for reading unlocking and clearing EI_STAT EI ADDR BC_TAG_ADDR and FILL_SYN is as follows 1 Read the EL ADDR BC_TAG_ADDR and FIL_SYN registers in any order Does not unlock or clear any register 2 Read the EI_STAT register This operation unlocks the EI_ ADDR BC_TAG_ADDR and FILL_SYN registers It also
127. he system on or off When the LED to the right of the button is lit the power is on The On Off button is connected to the power supplies through the system interlock and the RCM logic Reset button Initializes the system Halt button When the halt button is pressed different results are manifest depending upon the state of the machine The major function of the Halt button is to stop whatever the machine is doing and return the system to the SRM console To get to the SRM console for systems running OpenVMS or DIGITAL UNIX press the Halt button To get to the SRM console for systems running Windows NT press the Halt button and then press the Reset button Pressing the Halt button when the system is running Windows NT causes a halt assertion flag to be set in the firmware When Reset is pressed the console reads the halt assertion flag and ignores environment variables that would cause the system to boot Function of the Halt button is complex because it depends upon the state of the machine when the button is pressed See Section B 1 for a full discussion of the Halt button System Overview 1 5 1 3 System Consoles There are two console programs the SRM console and the AlphaBIOS console SRM Console Prompt On systems running the DIGITAL UNIX or OpenVMS operating system the following console prompt is displayed after system startup messages are displayed or whenever the SRM console is invoked POO gt gt
128. his document For complete reference information on other SRM commands and environment variables see your system User s Guide NOTE It is recommended that you keep a list of the environment variable settings for systems that you service because you will need to restore certain environment variable settings after swapping modules Refer to Table B 4 for a convenient worksheet Halts Console Commands and Environment Variables B1 B 1 Halt Button Functions The Halt button causes the system to perform in various ways depending upon the system state at the time the button is pressed When the halt button is pressed results differ depending upon the state of the machine Table B 1 describes the full function of the halt button Table B 1 Results of Pressing the Halt Button Machine State Result OpenVMS running hung SRM console runs DIGITAL UNIX running hung SRM console runs Windows NT running hung Nothing AlphaBIOS running hung Nothing SRM console running Sets halt assertion flag the SRM console continues to run SROM 1 2 secs of pwr up Nothing XSROM power up Sets halt assertion flag auto boot ignored SRM console power up Sets halt assertion flag auto boot ignored A simple halt causes suspension of a system that is hung or running DIGITAL UNIX or OpenVMS and starts the SRM console The halt assertion flag is set in the TOY NVRAM it is read and cleared by the console only during power up or reset When the S
129. ied console command more Displays a file one screen at a time prcache Initializes and displays status of the PCI NVRAM set envar Sets or modifies the value of an environment variable set host Connects to an MSCP DUP server on a DSSI device set password set rcm_dialout set secure show envar show config show cpu show device show fru show memory show network show pal show power show rcm_dialout show version start stop test Sets the console password or changes an existing password Sets a modem dialout string Enables secure mode without requiring a restart of the console Displays the state of the specified environment variable Displays the configuration at the last system initialization Displays the state of each processor in the system Displays a list of controllers and their devices in the system Displays the serial number and revision level of all options Displays memory module information Displays the state of network devices in the system Displays the version of the privileged architecture library code PALcode Displays information about the power supplies system fans CPU fans and temperature Displays the modem dialout string Displays the version of the console program Starts a program previously loaded on the processor specified Halts the specified processor Same as halt Runs firmware diagnostics for the system Halts Console Commands and Environment Variables B 7 B
130. image NOTE FEPROM 0 contains images of the SROM XSROM PAL decompression and SRM console code If the fail safe loader loads the following conditions exist on the machine e The SROM has passed its tests and successfully unloaded the XSROM If the SROM fails to unload both copies of XSROM it reports the failure to the control panel display and COM1 if possible and the system hangs e The XSROM has completed its B cache and memory tests but has failed to unload the PALcode in FEPROM 0 sector or the SRM console code e The XSROM reports the errors encountered and loads the fail safe loader Power Up 2 24 Chapter 3 Troubleshooting This chapter describes troubleshooting during power up and booting It also describes the console test command and other useful commands The following topics are covered e Troubleshooting with LEDs e Troubleshooting Power Problems e Running Diagnostics Test Command e Releasing Secure Mode e Testing an Entire System e Other Useful Console Commands Troubleshooting 3 1 3 1 Troubleshooting with LEDs During power up reset initialization or testing diagnostics are run on CPUs memories I O bridges and the PCI backplane and its embedded options This section describes possible problems that can be identified by checking LEDs Unfortunately LEDs on the CPU module are not visible the only visible LEDs are on the system motherboard Figure 3 1 System Motherboard LEDs System Mothe
131. imes 512 Mbytes 200 0000 where N is the slot number on the riser card 5 Memory addresses are contiguous within each memory pair 6 If memory pairs do not completely fill the 512 Mbyte space provided memory holes occur in the physical address space 7 Software creates contiguous virtual memory even though physical memory may not be contiguous System Overview 1 15 1 8 System Motherboard The system motherboard contains five major logic sections performing five major system functions Figure 1 8 System Motherboard Power connectors Floppy connector Fan connectors CPU 0 MEM L System Bus Backplane CPU 1 MEM H System Bus to PCI Bus Bridges PCI 0 Slot 2 PCI 0 Slot 3 Internal SCSI connector PCI 0 Slot 4 PCI Backplane PCI 1 Slot 2 and Legacy I O PCI 1 Slot 3 Devices PCI 1 Slot 4 Speaker EISA ISA Slot connector OCP connector PKW0504F 97 System Overview 1 16 The five sections on the system motherboard are e The system bus or the CPU and memory backplane e The power control logic e The remote control logic e The system bus to PCI bus bridges e The PCI backplane containing two PCI buses an EISA ISA bus a built in CD ROM controller and an XBUS with several devices integral to the system System Overview 1 17 1 8 1 System Bus Backplane The system bus consists of a
132. in progress so the local console terminal is disabled System and terminal baud rate set incorrectly Check external cable installation Set switch 1 to ON Wait several minutes for the local terminal to become active again Wait for the remote session to be completed Disable RCM and set the system and terminal baud rates to 9600 baud Managing the System Remotely C 23 Table C 4 RCM Troubleshooting continued Symptom Possible Cause Suggested Solution RCM does not answer when the modem is called After the system and RCM are powered up the COM port seems to hang briefly Modem cables may be incorrectly installed RCM remote access is disabled RCM does not have a valid modem password set Switch setting incorrect The local terminal is currently attached to the RCM On power up the RCM defers initializing the modem for 30 seconds to allow the modem to complete its internal diagnostics and initialization Modem may have had power cycled since last being initialized or modem is not set up correctly This delay is normal behavior Check modem phone lines and connections Enable remote access Set password and enable remote access Set switch 1 to ON switch 2 to OFF Enter quit on the local terminal Wait 30 seconds after powering up the system and RCM before attempting to dial in Enter enable command from RCM Wait a few seconds for the COM
133. ing VGA alphanumeric mode only Starting background memory test affinity to all CPUs Starting processor cache thrasher on each CPU Starting processor cache thrasher on each CPU Testing SCSI disks read only No CD ROM present skipping embedded SCSI test Testing other SCSI devices read only Testing floppy drive dva0 read only Troubleshooting 38 ID Program Device Pass Hard Soft Bytes Written Bytes Read 00003047 memtest memory 1 0 0 134217728 134217728 00003050 memtest memory 205 0 0 213883392 213883392 00003059 memtest memory 192 0 0 200253568 200253568 00003062 memtest memory 192 0 0 200253568 200253568 00003084 memtest memory 80 0 0 82827392 82827392 000030d8 exer_kid dkb200 2 0 3 26 0 0 0 13690880 000030d9 exer_kid dkb400 4 0 3 26 0 0 0 13674496 0000310d exer_kid dva0 0 0 100 0 0 0 0 0 ID Program Device Pass Hard Soft Bytes Written Bytes Read 00003047 memtest memory 1 0 0 432013312 432013312 00003050 memtest memory 635 0 0 664716032 664716032 00003059 memtest memory 619 0 0 647940864 647940864 00003062 memtest memory 620 0 0 648989312 648989312 00003084 memtest memory 263 0 0 274693376 274693376 000030d8 exer_kid dkb200 2 0 3 90 0 0 0 47572992 000030d9 exer_kid dkb400 4 0 3 90 0 0 0 47523840 0000310d exer_kid dva0 0 0 100 0 0 0 0 327680 ID Program Device Pass Hard Soft Bytes Written Bytes Read 00003047 memtest memory 1 0 0 727711744 727711744 00003050 memtest memory 1054 0 0 1104015744 1104015744 00003059 memtest mem
134. ion They are used to pass information to different pieces of software running in the system at various times The os_type environment variable which can be set to VMS UNIX or NT determines which of the two consoles is used The SRM console is always brought into memory but AlphaBIOS is loaded if os_type is set to NT and the Halt LED is not lit Refer to Appendix B of this guide for a list of the environment variables used to configure a system Refer to your system User s Guide for information on setting environment variables Most environment variables are stored in the NVRAM that is placed in a socket on the system motherboard Even though the NVRAM can be removed and replaced on a new system motherboard it is recommended that you keep a record of the environment variables for each system that you service Some environment variable settings are lost when a module is swapped and must be restored after the new module is installed Refer to Appendix B for a convenient worksheet for recording environment variable settings System Overview 1 7 14 System Architecture Alpha microprocessor chips are used in these systems The CPU memory and the I O modules are connected to the system motherboard Figure 1 4 Architecture Diagram CPU Memory System Bus Pair 128 Bit Data Bus 16 ECC and 40 Bit Command Address Bus
135. is the physical interconnect between the system bus and the PCI bus Figure 1 10 System Bus to PCI Bus Bridge Block Diagram System Bus PCI Bus Control lt gt CAP Address gt Control Data A to B bus ECC amp Data gt MDPA lt 63 0 gt i EN Data A to B amp B to A bus ECC amp Data gt MDPB lt gt AD lt 63 32 gt lt 127 64 gt lt gt AD lt 31 0 gt PKW0507 97 System Overview 1 20 The system bus to PCI bus bridge module converts system bus commands and data addressed to I O space to PCI commands and data and converts PCI bus commands and data addressed to system memory or CPUs to system bus commands and data The bridge has two major components e Command address processor CAP chip e Two data path chips MDPA and MDPB There are two sets of these three chips one set for each PCI The interface on the system bus side of the bridge responds to system bus commands addressed to the upper 64 Gbytes of I O space I O space is addressed whenever bit lt 39 gt on the system bus address lines is set The space so defined is 512 Gbytes in size The first 448 Gbytes are reserved and the last 64 Gbytes when bits lt 38 36 gt are set are mapped to the PCI I O buses The interface on the PCI side of the bridge responds to commands addressed to CPUs and memory on the system bus On the PCI side the bridge provides the interface to the PCIs Each PCI bus is addressed separa
136. ision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg Module Self Test Passed LED On x00000001 Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYP E Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type Long ARB_MODE PCI Arbitration Round Robin 00000000 x00000000 x00000003 MC PCI Intr Enabled Device intr info enabled if en_int 1 x00800000 Interrupts asserted x00000000 x00C50001 x00000000 Hard Error 28681A80 MC bus trans addr lt 31 4 gt x028681A8 6 x800FD800 MC bus trans addr lt 39 32 gt x00000000 MC_Command x00000018 Device Id x0000003B MC error info valid xC0000000 Uncorrectable ECC err det by MDPB MC error info latched x00000000 x00000000 MDPA Chip Revision x00000000 x00000000 Cycle 0 ECC Syndrome x00000000 Eror Logs 418 DPB Status Register DPB Error Syndrome Reg PALcode Revision x00000000 x00000000 Cycl Cycl 1 2 Cycle 3 oo ECC ECC ECC Syndrome x00000000 Syndrome x00000000 Syndrome x00000000 MDPB Chip Revision x00000000 MPDB Error Syndrom
137. isk AlphaBIOS error conditions A 26 Hard errors categories of 4 4 help command LFU A 21 A 22 help command RCM C 11 l I squared C bus 1 34 INFO 3 command 4 59 INFO 5 command 4 61 Index 2 INFO 8 command 4 63 Initialization and answer strings default C 26 modifying for modem C 26 substitutions C 27 Interlock switches 6 26 IOD 2 23 IOD detected failure PCI error 4 32 system bus error 4 27 IOD error interrupts 4 5 IOD defined 4 2 L LEDs troubleshooting with 3 2 LFU exit command A 22 starting A 5 A 6 starting the utility A 5 typical update procedure A 6 update command A 23 updating firmware from CD ROM A 7 updating firmware from floppy disk A 11 A 13 updating firmware from network device A 17 lfu command LFU A 14 A 16 A 21 A 22 LFU commands display A 21 A 22 exit A 10 A 16 A 20 A 21 A 22 help A 21 A 22 Ifu A 14 A 16 A 21 A 22 list A 8 A 14 A 16 A 18 A 20 A 21 A 23 readme A 21 A 23 summary A 21 update A 10 A 21 A 23 verify A 21 A 23 list command LFU A 8 A 14 A 18 A 21 A 23 M Machine checks in PAL mode 4 58 Maintenance bus 1 34 Maintenance bus controller 1 34 MC Error Information Register 0 5 8 MC Error Information Register 1 5 9 MC_ERRO Register 5 8 MC_ERR1 Register 5 9 MCHK 620 correctable error 4 44 MCHK 630 correctable CPU error 4 41 MCHK 660 IOD detected failure 4 27 4 32 MCHK 670 CPU and IOD detected failure 4 16
138. l C stops the test The console cannot be secure Example 3 1 Test Command Syntax POO gt gt gt help test FUNCTION SYNOPSIS test q t lt time gt option where option is cpun memn pcin where n 0 1 or for CPUs and PCIs where n 0 through 7 or for MEM Th ntire system is tested by default if no is option specified NOTE If you are running the Microsoft Windows NT operating system switch from AlphaBIOS to the SRM console in order to enter the test command From the AlphaBIOS console press in the Halt button the LED will light and reset the system or select DIGITAL UNIX SRM or OpenVMS SRM from the Advanced CMOS Setup screen and reset the system test t time q option t time Specifies the run time in seconds The default for system test is 600 seconds 10 minutes q Disables the display of status messages as exerciser processes are started and stopped during testing option Either cpun memn or pcin where n is 0 1 or for CPUs and PCIs or where n is 0 through 7 or for memory If nothing is specified the entire system is tested Troubleshooting 3 6 3 4 Releasing Secure Mode The console cannot be secure for most SRM console commands to run If the console is not secure user mode console commands can be entered See the system manager if the system is secure and you do not know the password Example 3 2 Releasing Reestablishing Secure Mode P00 gt gt gt login Plea
139. m DVAO Copying CIPCA315 from DVAO Please insert next floppy containing the firmware Press lt return gt when ready Or type DONE to abort Copying CIPCA315 from DVAO Copying DFPAA310 from DVAO Copying KZPSAA10 from DVAO Running Utilities A 18 A 5 4 Updating Finmware from a Network Device Copy files to the local MOP server s MOP load area start LFU and select ewa0 as the load device Example A 7 Updating Firmware from a Network Device xx x x Loadable Firmware Update Utility Select firmware load device cda0 dva0 ewa0 or Press lt return gt to bypass loading and proceed to LFU ewa0 1 Pleas nter the name of the options firmware files list or Press lt return gt to use the default filename AS1200FW 2 Copying AS1200FW from EWAO Copying TCREADME from EWAO Copying TCSRMROM from EWAO eee eee eee cece eee eee Copying TCARCROM from EWAO Copying CIPCA315 from EWAO Copying DFPAA310 from EWAO Copying KZPSAA11 from EWAO The function table displays followed by the UPD gt prompt as shown in Example A 3 UPD gt list Device Current Revision Filename Update Revision AlphaBIOS V5 12 2 arcrom V6 40 1 kzpsa0 A10 kzpsa_fw A11 kzpsal A10 kzpsa_fw A11 srmflash V1 0 9 srmrom V6 0 3 cipca_fw A315 dfpaa_fw 2 46 Continued on next page Running Utilities A 19 Before starting LFU downl
140. m Motherboard oo Power control gt logic PKW0504D 97 System Overview 1 26 The power control logic performs these functions e Monitors system temperature and powers down the system 30 seconds after it detects that internal temperature of the system is above the value of the environment variable over_temp Default 55 C e Monitors the system and CPU fans at one second intervals and powers down the system 30 seconds after it detects a fan failure e Provides some visual indication of faults through LEDs e Controls reset sequencing e Provides IC interface for fans power supplies and temperature signals Power supply 0 1 present Power supply 0 1 power OK CPU fan 0 1 OK CPU 1 present Overtemp Temp OK System fan 0 1 OK Fan Kit OK System Overview 1 27 19 Power Circuitand Cover Interlock Power is distributed throughout the system and mechanically can be broken by the On Off switch the cover interlock or remotely through the RCM Figure 1 14 Power Circuit Diagram
141. mand and ignores the extra characters e If you type an incorrect command and press Enter the command fails with the message xxx ERROR unknown command alert cir The alert_clr command clears an alert condition within the RCM The alert enable condition remains active and the RCM will again enter the alert condition if it detects a system power failure RCM gt alert_clr alert dis The alert_dis command disables RCM dial out It also clears any outstanding alerts Dial out remains disabled until the alert_enable command is issued See also the enable and disable commands RCM gt alert_dis alert_ena The alert_ena command enables the RCM to automatically dial out when it detects a power failure within the system The RCM repeats the dial out alert at 30 minute intervals until the alert is cleared Dial out remains enabled until the alert_disable command or the disable command is issued See also the enable and disable commands RCM gt alert_ena Managing the System Remotely C 8 Two conditions must be met for the alert_enable command to work e A modem dial out string must be entered from the system console e Remote access to the RCM modem port must be enabled with the enable command If the alert_enable command is entered when remote access is disabled the following message is displayed KK error KK disable The disable command disables remote access to the RCM modem port It also disables RCM dial ou
142. mmand LFU A 21 A 23 X XBUS 1 23 XSROM defined 2 4 errors 2 15 power up test flow 2 12 tests 2 13 Index 6
143. n of the system 6 Unplug the cable connection to the floppy and if applicable to the optional device above the floppy Bend the cable back over the power section of the system 7 Unplug the cable connection to the CD ROM 8 Unplug the cable connection to the StorageWorks backplane 9 Remove the power harness from the system Replacement Reverse the steps in the Removal procedure Verification Power up the system Removaland Replacement 6 23 6 12 System Fan Removal and Replacement Figure 6 11 Removing System Fan Cable to Fan 0 Cable to Fan 1 Fan 0 17 31351 01 Fan 1 17 31350 01 PKW0523 97 Removaland Replacement 624 Removal 1 2 Shut down the operating system and power down the system Expose the card cage side of the system see Section 6 3 Removing Fan 0 3 4 5 Ts Remove the CPU module s Remove memory Trace the wire from the fan to the motherboard to determine which power cord to unplug Unplug the power cord to fan 0 and pass it through the sheet metal to the fan compartment Remove the plastic module guides that interfere with access to the four Phillips head screws holding the fan in place Unscrew the fan from the frame and remove it from the system Removing Fan 1 3 Remove any PCI modules that prevent access to the four Phillips head screws that hold fan 1 in place 4 Remove any plastic module g
144. n the same way as from a graphics monitor The menus are the same but some keys are different Table A 1 AlphaBlOS Option Key Mapping AlphaBilOS Key VIxxx Key Fl Ctrl A F2 Ctrl B F3 Ctrl C F4 Ctrl D F5 Ctrl E F6 Ctrl F F7 Ctrl P F8 Ctrl R F9 Ctrl T F10 Ctrl U Insert Ctrl V Delete Ctrl W Backspace Ctrl H Escape Ctrl Running Utilities A 3 A 3 Running ECU The EISA Configuration Utility ECU is used to configure EISA options on these systems The ECU can be run either from a graphics monitor or a serial terminal 1 Start AlphaBIOS Setup If the system is in the SRM console issue the command alphabios If the system has a graphics monitor you can set the SRM console environment variable to graphics 2 From AlphaBIOS Setup select Utilities then select Run ECU from floppy from the submenu that displays and press Enter NOTE The EISA Configuration Utility is supplied on diskettes shipped with the system There is a diskette for Microsoft Windows NT and a diskette for DIGITAL UNIX and OpenVMS 3 Insert the correct ECU diskette for the operating system and press Enter to run it The ECU main menu displays the following options EISA Configuration Utility Steps in configuring your computer STEP 1 Important EISA configuration information STEP 2 Add or remove boards STEP 3 View or edit details STEP 4 Examine required details STEP 5 Save and exit NOTE
145. n welchen F llen der Benutzer fiir entsprechende Gegenma nahmen verantwortlich ist Avertissement Cet appareil est un appareil de Classe A Dans un environnement r sidentiel cet appareil peut provoquer des brouillages radio lectriques Dans ce cas il peut tre demand l utilisateur de prendre les mesures appropri es Chapter 1 Overview 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 8 1 1 8 2 1 8 3 1 8 4 1 8 5 1 9 1 10 1 11 1 12 1 13 SYSteMBNClOSULE o nee sevssctvsessentpseesededevevounseesbecedederouapesbedegedvsonapssveedes 1 2 Operator Control Panel and Drives eeccesseseneeseneeeeeeeeesseeesaeessneeenee 1 4 System Console Sunn cece sessed stew E TE E ak aston inted Gata te Eas 1 6 System Architecture irsi tn a a re ayee arise 1 8 D aa s eS A SE EN SEEE 1 10 Memorya a erties E E A TE E EE EEA ET 1 12 Memory Addressing ccssccccesssceeeesseeeeeeeeeceesneeeeeseeeeeeenaeeesseneeeeeeas 1 14 System Motherboard seicncnotai nin r Teisa 1 16 System Bus Backplane eeeesseeesseecsseecsseeceeessseeeeseeeesaeeesaeers 1 18 System Bus to PCI Bus Bridge cee eeeeeeeseceseecsseeeeseeeesneeseneeeeee 1 20 PEI T O Subsystetivn co ae a alee alates aan 1 22 Remote Control Logic eee eeeeeesseeesseeesneecsseeceseeeesaeeesaeecsaeessneeenes 1 24 Power Control Logic ceeeeseccessecesseecsneeceeeesseeeesaeeesaeecsaeessseeeses 1 26 Power Circuit and Cover Interlock eee eeeeeeseccesneeeeseecen
146. naged system The halt command is equivalent to pressing the Halt button on the control panel and then immediately releasing it The RCM firmware exits command mode and reconnects the user s terminal to the system COM serial port RCM gt halt Focus returned to COM port The halt command can be used to force a halt assertion See Section B 3 for information on halt assertion NOTE If you are running Windows NT the halt command has no effect Managing the System Remotely C 10 haltin The haltin command halts a managed system and forces a halt assertion The haltin command is equivalent to pressing the Halt button on the control panel and holding it in This command can be used at any time after system power up to allow you to perform system management tasks See Section B 3 for information on halt assertion NOTE If you are running Windows NT the haltin command does not affect the operating system session but it does cause a halt assertion haltout The haltout command terminates a halt assertion that was done with the haltin command It is equivalent to releasing the Halt button on the control panel after holding it in rather than pressing it once and releasing it immediately This command can be used at any time after system power up See Section B 3 for information on halt assertion help or The help or command displays the RCM firmware commands poweroff The poweroff command requests the RCM to power off the system
147. nd the OCP should report the problem If Power Problem Occurs at Power Up If the system has a power problem on a cold start the motherboard LEDs and the OCP display will indicate a problem The console for systems running DIGITAL UNIX or OpenVMS will also indicate the problem The console on systems running NT will not print an error message Causes of power problems are Broken system fan Broken CPU fan A power supply could be broken and the system could still power up momentarily During power up an overcurrent condition occurs with two power supplies and is tolerated for a short period but a persistent overcurrent is not Power control logic on the motherboard could fail Interlock failure Wire problems Temperature problem unlikely Recommended Order for Troubleshooting Failure at Power Up 1 If the SRM console does not come all the way up check the console test output on OpenVMS or DIGITAL UNIX systems Restart the system if the system runs NT and watch for an error message on the OCP display Replace the FRU indicated If you can get to the SRM console use the show power command It will show the last power fault If neither step one nor step 2 identifies a FRU replace the motherboard Troubleshooting 3 5 3 3 Running Diagnostics Test Command The test command runs diagnostics on the entire system CPU devices memory devices and the PCI I O subsystem The test command runs only from the SRM console Ctr
148. nned by the power control logic There are two registers that the PC logic writes data to e One records the state of the fans and power supplies and is latched when there is a fault e The other causes an interrupt on the I C bus when a CPU or system fan fails an overtemperature condition exists or power supplied to the system exhibits an overcurrent condition The interrupt received by the I C bus controller on PCI 0 and passed on to the IOD 0 chip set alerts the system of imminent power shutdown The controller has 30 seconds to read the two registers and store the information in the EEPROM on the motherboard The SRM console command show power reads these registers Fault Display The OCP display is written through the I C bus Error State Error state is stored for power fan and overtemperature conditions on the IC bus Configuration Trac king Each CPU and each logical section of the system motherboard the PCI bridge the PCI backplane the power control logic the remote console manager and the system motherboard itself has an EEPROM that contains information about the module that can be written and read over the IC bus All EEPROMs contain the following information e Module type e Module serial number e Hardware revision for the logical block e Firmware revision System Overview 1 35 113 StorageWorks Drives The system supports up to seven StorageWorks drives Figure 1 18 StorageWorks Drive Location
149. nt 4 6 report formats 4 10 DIAGNOSE command 4 7 Diagnostics test command 3 6 DIMMs 1 12 removal and replacement 6 14 disable command RCM C 9 display command LFU A 21 A 22 Double error halt 4 57 4 58 E ECC syndrome bits 4 54 ECU running A 4 EL_ADDR Register 5 6 EL_STAT Register 5 2 enable command RCM C 9 Environment variables SRM 1 7 auto_action 2 23 console 2 21 2 23 os_type 2 23 SRM console B 4 Error detector placement 4 2 Error log events 4 5 Error registers 5 1 Event files translating 4 7 Events filtering 4 8 exit command LFU A 10 A 16 A 20 A 21 A 22 External Interface Address Register 5 6 External Interface Registers loading and locking rules 5 7 External Interface Status Register 5 2 F Fail safe loader 2 24 Fan removal and replacement CPU chip 6 10 removal and replacement system 6 24 Fans 6 3 Fatal errors 4 5 FEPROM and XSROM test flow 2 13 contents 2 5 defined 2 5 Firmware RCM C 7 updating A 6 updating from AlphaBIOS A 24 updating from CD ROM A 7 updating from floppy disk A 11 A 13 updating from network device A 17 updating AlphaBIOS selection A 5 updating SRM command A 5 Floppy removal and replacement 6 32 FRU list 6 2 FRU part numbers 6 3 G Graphics monitor VGA 2 19 H halt command RCM C 10 haltin command RCM C 11 haltout command RCM C 11 Halts caused by power problem 3 4 hangup command RCM C 10 Hard d
150. nterrupt P1 Interrupt P2 Min Gnt Max Lat x10 x01 x08 x7E Error Logs 4 35 4 3 6 MCHK 630 Conectable CPU Enor The error log in Example 4 6 shows the following O CPUO logged the error in a system with two CPUs 2 During a D ref fill the External Interface Status Register shows no error but states that the data source is b cache When a CPU chip does not find data it needs to perform a task in any of its caches it requests data from off the chip to fill its D cache It performs a D ref fill Both IOD CAP Error Registers logged no error 4 The FIL Syndrome Register has a valid ECC code for the lower half of the data Machine check 630s are detected by CPUs when they either take data off the system bus or when they access their own B cache In this case the data did not come from the system bus otherwise bit lt 30 gt would be set in the External Interface Status Register CPUO had a single bit ECC correctable error NOTE The error log example has been edited to decrease its size registers of interest are in bold type The MC bus is the system bus Refer to Table 4 9 for information on decoding commands and refer to Table 4 10 for information on node IDs Eror Logs 436 Example 4 6 MCHK 630 Conectable CPU Enor Logging OS 2 DIGITAL UNIX System Architecture 2 Alpha 4000 1200 Series Event sequence number 415 Timestamp of occurrence 15 JUN 1997 14 56 30 Host name System type register
151. oad the update files from the Internet see Preface You will need the files with the extension SYS Copy these files to your local MOP server s MOP load area Select the device from which firmware will be loaded The choices are the internal CD ROM the internal floppy disk or a network device In this example a network device is selected Select the file that has the firmware update or press Enter to select the default file The file options are AS1200FW default SRM console AlphaBIOS console and I O adapter firmware AS1200CP SRM console and AlphaBIOS console firmware only AS120010 T O adapter firmware only In this example the default file which has both console firmware AlphaBIOS and SRM and I O adapter firmware is selected Use the LFU list command to determine the revision of firmware in a device and the most recent revision of that firmware available in the selected file In this example the resident firmware for each console SRM and AlphaBIOS and I O adapter is at an earlier revision than the firmware in the update file Continued on next page Running Utilities A 20 Example A 7 Updating Firmware from a Network Device Continued UPD gt update all 4 WARNING updates may take several minutes to complete for each device DO NOT ABORT AlphaBIOS Updating to V6 40 1 Verifying V6 40 1 PASSED DO NOT ABORT kzpsa0 Updating to All Verifying All PAS
152. oard NVRAM NOTE Be sure to record the new escape sequence Although the factory defaults can be restored if you forget the escape sequence this requires resetting the EN RCM switch on the RCM switchpack The following sample escape sequence consists of 5 iterations of the Ctrl key and the letter o RCM gt setesc SOO OO 76 RCM gt If the escape sequence entered exceeds 15 characters the command fails with the message ERROR When changing the default escape sequence avoid using special characters that are used by the system s terminal emulator or applications Control characters are not echoed when entering the escape sequence Use the status command to verify the complete escape sequence setpass The setpass command allows the user to change the modem access password that is prompted for at the beginning of a modem session RCM gt setpass new PASSA Ts EA SAR RCM gt Managing the System Remotely C 13 The maximum length for the password is 15 characters If the password exceeds 15 characters the command fails with the message ERROR The minimum password length is one character followed by a carriage return If only a carriage return is entered the command fails with the message ERROR illegal password If you forget the password you can enter a new password status The status command displays the current state of the system sensors as well as the current escape
153. om the OCP 6 Once the front panel is removed unscrew the four screws holding the OCP to the front panel Replacement Reverse the steps in the Removal procedure Verification Power up the system If the OCP you installed is faulty the system will not power up Removaland Replacement 629 6 15 CD ROM Removal and Replacement Figure 6 14 Removing CD ROM PKW0519 97 Removaland Replacement 6 30 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 Loosen the two screws holding the CD ROM to its bracket see Figure 6 14 4 Detach both the power and signal connectors at the rear of the CD ROM 5 Pull the CD ROM forward out of the system Replacement Reverse the steps in the Removal procedure Verification Power up the system Use the following SRM console commands to test the floppy P00 gt gt gt show dev ncrO POO gt gt gt HD buf dka nnn where nnn is the device number for example dka500 Removaland Replacement 6 31 6 16 Hoppy Removal and Replacement Figure 6 15 Removing Hoppy PKW0520 97 Removaland Replacement 6 32 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 Remove the two Phillips head screws holding the floppy in the system in Figu
154. ome Reg PB Status Register DPB Error Syndrome Reg PALcode Revision x00000000 x00000000 x00000000 x00000000 Correctable ECC err det by MDPA DPA Status Register Data Not Valid DPA Syndrome Register Data Not Valid DPB Status Register Data Not Valid DPB Syndrome Register Data Not Valid Palcode Rev 0 0 1 M M M M Eror Logs 4 40 4 4 Troubleshooting OD Detected Enors Step 1 Read the CAP Error Registers on both PCI bridges F9E0000880 and FBE0000880 If one or both of these registers shows an error match the register contents with the data pattern and perform the action indicated Table 4 3 CAP Eror Register Data Pattem Action Data Pattem Most Likely Cause 110x x00x x000 0000 0000 0000 000x xxxx RDSB Uncortectable ECC Goto Step 2 error detected on upper QW of MC bus D127 64 gt 101x x00x x000 0000 0000 0000 000x xxxx RDSA Uncorrectable ECC Go to Step 2 error detected on lower QW of MC bus D63 0 gt 111x x00x x000 0000 0000 0000 000x xxxx RDS detected in both QWs Gotto Step 2 1001 1000 x000 0000 0000 0000 000x xxxx CRDB CorrectableECC error Go to Step2 detected on upper QW of MC bus D127 64 gt 1000 0000 x000 0000 0000 0000 000x xxxx CRDA Correctable ECC error Go to Step2 detected on lower QW of MC bus D63 0 gt 1001 1000 x000 0000 0000 0000 000x xxxx CRD detected in both QWs Go to Step 2 100x x 10x x000 0000 0000 0000 000x xxxx NXM Nonexistent M
155. ommand address bus protected by parity The bus speed is set to 66 6 MHz The 40 bit address bus can create one terabyte of addresses that s a million million The bus connects CPUs memory and the system bus to PCI bus bridge s There is a cache external to the CPU chip on CPU modules The Alpha chip has an 8 Kbyte instruction cache I cache an 8 Kbyte write through data cache D cache and a 96 Kbyte write back secondary data cache S cache The cache system is write back The system supports up to two CPUs Memory on these systems is constructed of DIMM memory pairs placed onto two memory modules called riser cards The riser cards are placed into the two memory slots on the system motherboard One member of a DIMM pair is placed onto one riser card and the other member is placed onto another riser card Each riser card drives half of the system bus along with the associated ECC bits Memory pairs consist of two synchronous DIMMs of the same size and are placed into the same slot on each riser card The system bus to PCI bus bridge chip set translates system bus commands and data addressed to I O space to PCI commands and data It also translates PCI bus commands and data addressed to system memory or CPUs to system bus commands and data The PCI bus is a 64 bit wide bus used for I O Logic and sensors on the system motherboard monitor power status and the system environment temperature and fan speeds System Overview 1 9
156. on extender 17 04021 01 68 pin con jumpr SCSI backpln SCSI backpln 17 04019 02 68 pin con cable External prt on Terminator SCSI backpln 12 41768 03 68 pin terminator End or 17 04019 02 Removaland Replacement 6 4 Table 6 1 Held Replaceable Unit Part Numbers continued System Cables and Jumpers From To 17 01495 01 Current share Current share Current share conn on cable conn on PSO PS1 17 03970 02 Floppy signal Floppy conn Floppy cable 34 pin on mbrd 17 03971 01 OCP signal OCP conn on OCP signal mbrd Twisted pair J2 RCM conn Power conn on OCP yellow and green on mbrd Twisted pair red OCP Interlock switch pigtail and black 70 31348 01 Interlock switch Interlock Twisted pair red and and pigtail cable switch assy black OCP DC enable pwr cable from OCP conn 17 04685 01 SCSI CD ROM CD ROM CD ROM sig conn sig cable conn on mbrd 70 37346 01 Power harness Power 3 conns On sys mbrd supply s CD ROM drv pwr Floppy pwr Optional drive above Flop Single ultra SCSI config StorageWorks backpln and pwr cable to Ultra SCSI bus extender Dual Ultra SCSI config two pwr cables to two SCSI bus extenders 17 04700 01 Power cable to Ultra SCSI bus Power harness Ultra SCSI bus extndr s pwr extndr s Y and StrWrks cable s backpln Removaland Replacement 6 5 6 3 System Exposure The system has three sheet metal covers one on top and one on each side The covers are removed to expose the system card cage and the power
157. ondition blocked airflow temperature in the room where the system is located is too high the system card cage is open and air is not channeled properly over the system Fix any of these conditions if possible The overtemperature threshold is programmable and is controlled by the environment variable over_temp Its default is 55 degrees C After the system has cooled down and can be powered up you can change the threshold If you do this and the temperature inside the system gets too hot it is likely that system errors will occur and the system may crash Troubleshooting 33 3 2 Troubleshooting Power Problems Power problems can occur before the system is up or while the system is running Power Problem List The system will halt for the following reasons 1 A CPU fan failure 2 A system fan failure 3 An overtemperature condition 4 Power supply failure 5 Circuit beaker s tripped 6 AC problem 7 Interlock switch activation or failure 8 Environmental electrical failure or unrecoverable system fault with auto_action ev halt or boot 9 Cable failure Indication of failure 1 LEDs indicate fan and overtemperature condition 2 The OCP display 3 Circuit breaker s tripped There is no obvious indication for failures 7 10 from the power system Troubleshooting 3 4 Halt Caused by Power Fan or Overtemperature Condition If a system is stopped because of a power fan or overtemperature problem the console a
158. onsole firmware only AS120010 T O adapter firmware only The default option in Example A 3 AS1200FW is not available since the file is too large to fit on a 1 44 MB diskette This means that when a floppy disk is the load device you can update either console firmware or I O adapter firmware but not both in the same LFU session If you need to update both after finishing the first update restart LFU with the Ifu command and insert the floppy disk with the other file In this example the file for I O adapter firmware is selected Use the LFU list command to determine the revision of firmware in a device and the most recent revision of that firmware available in the selected file In this example the update revision for console firmware displays as Missing file because only the I O firmware files are available on the floppy disk Continued on next page Running Utilities A 16 Example A 5 Updating Firmware from the Hoppy Disk Continued UPD gt update pfi0 4 WARNING updates may take several minutes to complete for each device Confirm update on pfi0d LY N y DO NOT ABORT pfi0 Updating to 3 10 Verifying to 3 10 PASSED UPD gt lfu 6 xx x x Loadable Firmware Update Utility Select firmware load device cda0 dva0 ewa0 or Press lt return gt to bypass loading and proceed to LFU dva0 Pleas nter the name of the options firmware files list or Press lt return gt to use the default filen
159. or Logs 411 Example 4 1 MCHK 670 Logging OS System Architecture Event sequence number Timestamp of occurrence Host name System type register Number of CPUs mpnum CPU logging event mperr Event validity Event severity Entry type CPU Minor class Software Flags Active CPUs Hardware Rev System Serial Number Module Serial Number Module Type System Revision MCHK 670 Regs Flags PCI Mask Machine Check Reason PAL SHADOW REG 0 PAL SHADOW REG 1 PAL SHADOW REG 6 PAL SHADOW REG 7 PALTEMP 0 PALTEMP 1 PALTEMP2 PALTEMP 22 PALTEMP 23 Exception Address Reg Exception Summary Reg Exception Mask Reg PAL BASE Interrupt Summary Reg IBOX Ctrl and Status Reg 2 DIGITAL UNIX 2 Alpha 4 04 APR 1997 17 20 04 whip16 x00000016 AlphaServer 4000 1200 Series x00000002 1 x00000001 1 O S claims event is valid 1 Severe Priority 100 CPU Machine Check Errors 1 Machine check 670 entry x0000000300000000 IOD 1 Register Subpkt Pres IOD 2 Register Subpkt Pres x00000003 00000000 C1563 x0000 x00000000 x00000000 x0000 x0098 x00000000 x00000000 x00000000 x00000000 x00000000E87C7A58 XFFFFFFFE8F 658000 xFFFFFC0O0003C9F40 xFFFFFC0O0004F9D60 x00000000E8709A58 xFFFFFCO0003BFB88 Native mode instruction Exception PC Xx3FFFFF00000EFEE2 x00000000 x00000000 x00000000020000 Base addr for palcode x0000000008 x00000000 AST requests 3 0 x00000000 x000000C160000000 Timeout Bit Not Set
160. orce a halt assertion Refer to Section B 3 for information on halt assertion Managing the System Remotely C 2 C 2 ArstTime Setup To set up the RCM to monitor a system remotely connect the console terminal and modem to the ports at the back of the system configure the modem port for dial in and dial in Figure C 1 RCM Connections PK 0906 97 Managing the System Remotely C 3 C 2 1 Configuring the Modem The RCM requires a Hayes compatible modem The controls that the RCM sends to the modem are acceptable to a wide selection of modems After selecting the modem connect it and configure it Qualified Modems The modems that have been tested and qualified with this system are e Motorola 3400 Lifestyle 28 8 e AT amp T Dataport 14 4 FAX e Hayes Smartmodem Optima 288 V 34 V FC FAX Modem Configuration Procedure 1 Connect a Hayes compatible modem to the RCM as shown in Figure C 1 and power up the modem 2 From the local serial console terminal type the following escape sequence to invoke the RCM POO gt gt gt rem The character is created by simultaneously holding down the Ctrl key and pressing the key right square bracket The SRM prompt RCM gt is displayed Use the setpass command to set a modem password Enable the modem port with the enable command Enter the quit command to leave the RCM Oy SO You are now ready to dial in remotely Managing the System R
161. ory 1039 0 0 1088289024 1088289024 00003062 memtest memory 1041 0 0 1090385920 1090385920 00003084 memtest memory 447 0 0 467607808 467607808 000030d8 exer_kid dkb200 2 0 3 155 0 0 0 81488896 000030d9 exer_kid dkb400 4 0 3 155 0 0 0 81472512 0000310d exer_kid dva0 0 0 100 0 0 0 607232 Testing aborted Shutting down tests Please wait System test complete CG P00 gt gt gt Troubleshooting 3 9 3 5 1 Testing Memory The test mem command tests individual memory devices or all memory The test shown in Example 3 4 runs for 2 minutes Example 3 4 Sample Test Memory Command P00 gt gt gt test memory Console is in diagnostic mode System test runtime 120 seconds Type C to stop testing Starting background memory test affinity to all CPUs Starting memory thrasher on each CPU Starting memory thrasher on each CPU ID Program Device Pass Hard Soft Bytes Written Bytes Read 000046da7 memtest memory 0 0 48234496 48234496 000046e0 memtest memory 122 0 0 126862208 126862208 000046e9 memtest memory 11 0 0 115329280 115329280 000046f2 memtest memory 109 0 0 113232384 113232384 000046fb memtest memory 4 0 0 41937920 41937920 ID Program Device Pass Hard Soft Bytes Written Bytes Read 000046da7 memtest memory 0 0 226492416 226492416 000046e0 memtest memory 566 0 0 592373120 592373120 000046e9 memtest memory 555 0 0 580840192 580840192 000046f2 memtest memory 554 0 0 579791744 579791744 000046fb memtest memory 21 0 0 22017408
162. owing conditions e A machine check occurs e PAL completes its tasks and returns control to the operating system e Asecond machine check occurs before the operating system completes its tasks The machine returns to the console and displays the following message halt code 6 double error halt PC 20000004 Your system has halted due to an irrecoverable error Record the error halt code and PC and contact your Digital Services representative In addition type INFO 5 and INFO 8 at the console and record the results The info 5 command Example 4 9 causes the SRM console to read the PAL built logout area that contains all the data used by the operating system to create the error entry The info 8 command Example 4 10 causes the SRM console to read the IOD 0 and IOD 1 registers 4 5 3 Machine Checks While in PAL Ifa machine check occurs while the system is running PALcode PALcode returns to the SRM console not to the operating system The SRM console writes halt code 7 machine check while in PAL mode PC 20000004 Your system has halted due to an irrecoverable error Record the error halt code and PC and contact your Digital Services representative In addition type INFO 3 and INFO 8 at the console and record the results The info 3 command Example 4 8 causes the SRM console to read the impure area which contains the state of the CPU before it entered PAL Example 4 8 INFO 3 Command P00
163. rboard 00 LEDs q ODO Pass IOD1 Pass Ol Fan Fault e Temp OK PKW0504G 97 Troubleshooting 3 2 System Motherboard LEDs You see the system motherboard LEDs by looking through the grate at the back of the machine The normal state of the LEDs is shown in Figure 3 1 If one of the IOD LEDs is off the system bus to PCI bus bridge has failed Replace the system motherboard If the Fan Fault LED is ON at least one of the four fans is broken If this condition occurs while the system is up and running an error message identifying the FRU is printed to the console If this condition occurs during a cold start identifying which fan caused the fan fault depends upon which type of console the system has If your console is a serial terminal for OpenVMS or DIGITAL UNIX the error identifying which fan failed is reported at the console If your console is a graphics monitor for NT reset the system and watch the OCP display During the first 30 seconds one of the following message should occur e SYSx Fan Failed where x 0or 1 e CPUx Fan Failed where x Qor 1 Replace the failing FRU If the Temp OK LED is OFF an overtemperature condition exists Several things can cause this c
164. re 6 15 4 Slide the floppy out the front of the system Replacement Reverse the steps in the Removal procedure Verification Power up the system press the Halt button if necessary to bring up the SRM console Use the following SRM console commands to test the CD ROM POO gt gt gt show dev floppy P00 gt gt gt HD buf dva0 Removaland Replacement 6 33 6 17 SCSI Disk Removal and Replacement Figure 6 16 Removing StorageWorks Disk ee a W E a g c wy AU PKW0501B 97 Removaland Replacement 6 34 Removal 1 Shut down the operating system and power down the system 2 Open the front door exposing the StorageWorks disks 3 Pinch the clips on both sides of the disk and slide it out of the shelf Replacement Reverse the steps in the Removal procedure Verification Power up the system Use the show device console commands to verify that the system sees the disk you replaced Removaland Replacement 6 35 6 18 StorageWorks Backplane Removal and Replacement Figure 6 17 Removing StorageWorks Backplane StorageWorks Backplane Ultra SCSI bus extender optional Ultra SCSI bus extender PKW0522B 97 Removaland Replacement 636 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system
165. remove it If the system has plastic mounts pinch each with a pair of pliers free the corner and pull the bus extender from the enclosure Replacement Reverse the steps in the Removal procedure Verification Power up the system Use the show device console command to verify that the StorageWorks shelf is configured into the system Removaland Replacement 639 Appendix A Running Utilities This appendix provides a brief overview of how to load and run utilities The following topics are covered e Running Utilities from a Graphics Monitor e Running Utilities from a Serial Terminal e Running ECU e Running RAID Standalone Configuration Utility e Updating Firmware with LFU e Updating Firmware from AlphaBIOS e Upgrading AlphaBIOS Running Utilities A 1 A 1 Running Utlities from a Graphics Monitor Start AlphaBIOS and select Utilities from the menu The next selection depends on the utility to be run For example to run ECU select Run ECU from floppy To run RCU select Run Maintenance Program Figure A 1 Running a Utility from a Graphics Monitor AlphaBIOS Setup F1 Help Display System Configuration Upgrade AlphaBlos Hard Disk Setup CMOS Setup Install Windows NT Utilities p gt Run ECU from floppy About AlphaBlOs OS Selection Setup PK 0729 96 Running Utilities A 2 A 2 Running Utlities from a Serial Terminal Utilities are run from a serial terminal i
166. reset Figure A 3 AlphaBlOS Setup Screen AlphaBIOS Setup Display System Configuration Hard Disk Setup CMOS Setup Install Windows NT Utilities About AlphaBIOS A Press ENTER to upgrade your AlphaBIOS from floppy or CD ROM ESC Exit PK 0726A 96 Running Utilities A 26 A 7 Upgrading AlphaBlOS It may become necessary to upgrade AlphaBIOS to work with new versions of Windows NT or when enhancements are made Use this procedure to upgrade from an earlier version of AlphaBIOS 1 2 Insert the diskette or CD ROM containing the AlphaBIOS upgrade If you are not already running AlphaBIOS Setup start it by restarting your system and pressing F2 when the Boot screen is displayed In the main AlphaBIOS Setup screen select Upgrade AlphaBIOS and press Enter The system is reset and the Loadable Firmware Update LFU utility is started See Section A5 5 for LFU commands When the upgrade is complete issue the LFU exit command The system is reset and you are returned to AlphaBIOS If you press the Reset button instead of issuing the LFU exit command the system is reset and you are returned to LFU Running Utilities A 27 Appendix B Halts Console Commands and Environment Vanables This appendix discusses halting the system and provides a summary of the SRM console commands and environment variables The test command is described in Chapter 3 of t
167. rews 3 5 mm PKW0517 97 Removaland Replacement 620 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 Unplug the power supply you are replacing 4 Remove the four screws at the back of the system cabinet and the two screws at the back of the power supply that hold the power supply in place 5 If you are removing power supply 0 slide the supply out the side of the cabinet If you are removing power supply 1 lift the supply out the top of the cabinet Replacement Reverse the steps in the Removal procedure Verification Power up the system Removaland Replacement 6 21 6 11 Power Hamess Removal and Replacement Figure 6 10 Removing Power Hamess To Floppy and Optional Device To CD ROM and StorageWorks Shelf Power Harness Current Share 70 31346 01 17 01495 01 PKW0522 97 Removaland Replacement 6 22 Removal 1 Shut down the operating system and power down the system 2 Remove the AC power cords 3 Expose both the card cage section and the power section of the system see Section 6 3 4 Remove the cable clip between the two sections of the system 5 Unplug the three cable connections to the motherboard and bend the cable back over the power sectio
168. rify the functioning of the new memory by issuing the command test memn where n is 0 1 2 3 or Verification Windows NT Systems 1 Start AlphaBIOS Setup select Display System Configuration and press Enter 2 Using the arrow keys select Memory Configuration to display the status of the new memory 3 Switch to the SRM console press the Halt button in so that the LED on the button lights and reset the system Verify the functioning of the new memory by issuing the command test memn where n is 0 1 2 3 or Removaland Replacement 613 6 7 DIMM Removal and Replacement Figure 6 6 Removing a DIMM from a Memory Riser Card Largest DIMM goes here PKW0505B 97 Removaland Replacement 614 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 Remove the memory riser card that has the broken memory DIMM see Section 6 6 4 There are prying retaining levers on the connectors in each slot on the riser card Press both levers in an arc away from the DIMM and gently pull the DIMM from the connector Replacement Reverse the steps in the Removal procedure Verification Follow the verification procedure recommended for the memory riser card Section 6 6 Removaland Replacement 6 15 6 8 System Motherboard Removal and Replacement Figure 6 7 Removing System Motherboard
169. ror on reads Set when a system bus command address parity error is detected ErrorRegisters 512 Table 5 5 CAP Eror Register continued Initial Name Bits Type State Description LOST_MC_ERR lt 24 gt RWIC 0 Set when an error is detected but not logged because the associated symptom fields and registers are locked with the state of an earlier error PIO_OVFL lt 23 gt RWIC 0 Set when a transaction that targets this system bus to PCI bus bridge is not serviced because the buffers are full This is a symptom of setting the PEND_NUM field in CAP_CNTL to an incorrect value Reserved lt 22 5 gt RO 0 PCI_LERR_VALID lt 4 gt RO 0 Logical OR of bits lt 3 0 gt of this register When set the PCI error address register is locked PTE_INV lt 3 gt RWIC 0 Invalid page table entry on scatter gather access MAB lt 2 gt RWIC 0 PCI master state machine detected PCI Target Abort likely cause NXM except Special Cycle On reads fill error is also returned SERR lt 1 gt RWIC 0 PCI target state machine observed SERR CAP asserts SERR when it is master and detects target abort PERR lt 0 gt RWI1C 0 PCI master state machine observed PERR ErorRegisters 513 5 6 PCI Eror Status Register 1 PCI_ERR1 Offset 1040 PCI_ERR1 is used by the system bus to PCI bus bridge to log bus address lt 31 0 gt pertaining to an error condition logged in CAP_ERR This register always captures PCI address lt
170. s 2 10 status command RCM C 14 StorageWorks 1 36 backplane removal and replacement 6 36 disk removal and replacement 6 34 SCSI bus extender removal and replacement 6 38 System architecture 1 8 fully configured 1 9 System bus 1 9 System bus address parity error 4 50 System bus block diagram 1 18 System bus ECC error 4 48 System bus nonexistent address error 4 49 System bus to PCI bus bridge 1 9 1 20 System bus to PCI EISA bus bridge 1 9 System cabinet 1 2 System cables and jumpers 6 5 System components 1 3 System consoles 1 6 System correctable errors 4 5 System drawer remote operation C 2 System exposure 6 6 System FRU locations 6 2 System machine checks 4 5 System motherboard 1 16 PCI I O subsystem section 1 22 power control logic section 1 26 remote control logic section 1 24 removal and replacement 6 16 system bus section 1 18 system bus to PCI bus bridge section 1 20 System motherboard LEDs 3 2 T Test command for entire system 3 8 Test mem command 3 10 Test pci command 3 12 Troubleshooting failures at power up 3 5 IOD detected errors 4 47 power problems 3 4 using error logs 4 2 U Ultra SCSI 1 36 Index 5 cables and jumpers 6 4 update command LFU A 10 A 16 A 20 A 21 A 23 Updating firmware AlphaBIOS console A 24 from AlphaBIOS console A 5 from SRM console A 5 Utility programs running from graphics monitor A 2 V verify co
171. s 0 iod0 PCIO Slot Option Name Type Rev Name 1 PCEB 4828086 0005 pceb0 2 S3 Trio64 Trio32 88115333 0054 vga0 3 DECchip 21041 AA 141011 0011 tulipo Bus 1 pceb0 EISA Bridge connected to iod0 slot 1 Slot Option Name Type Rev Name Bus 0 iodl PCI1 Slot Option Name Type Rev Name 1 NCR 53C810 11000 0002 necro 4 QLogic ISP1020 10201077 0005 isp0 Troubleshooting 3 15 Chapter 4 Error Logs This chapter provides information on troubleshooting with error logs The following topics are covered e Using Error Logs e Using DECevent e Error Log Examples and Analysis e Troubleshooting IOD Detected Errors e Double Error Halts and Machine Checks While in PAL Mode Error registers are described in Chapter 5 ErorLogs 41 4 1 Using Error Logs Error detection is performed by CPUs the IOD and the EISA to PCI bus bridge The IOD is the acronym used by software to refer to the system bus to PCI bus bridge Figure 4 1 Enor Detector Placement Memory EDES Data Ege VCTY ASIC gt Tag amp Status Duplicate Tag Ps a CPU Module Eco a xk System Bus Sys PCI CPU Chip Data Bus Bridge e a gt All System Bus gt O Comd add TOR e EISA Bus Tag amp Status Ps Bridge Q nig EISA PCI Q Parity logic Ps Parity stored Q ECC logic Ege E
172. s 2 2 2 2 Power Up FOW eseese sania sa hassde class deesias haptics wastes hap diss AKERE seis 2 4 2 3 Contentsiof FEPROMS aiaa eee aen e eh ath ate Eene 2 5 2 4 Console Code Critical Path 1200 Block Diagram c cceeesseeeeeeees 2 6 2 5 SROM Power Up Test FlOW eececeeesceccesseeeceesneeesesseeeeseseeesenseeeees 2 8 2 6 XSROM Power Up Flowchatt ceeeceeeesceccesseeeeeeseeeseeseeeeseseeeseenas 2 12 2 7 Console Device Determination Flowchart escceeseceeseeeeeteeesneeseeeeeee 2 18 vii 3 1 System Motherboard LEDS iesirea aiia 3 2 4 1 Error Det ctor Placem ntenicnisrnnn i n n it da ueiaiestetiee 4 2 6 1 System FRU LOcafiONS srepek nep eaa ne eap ia a ESO ae Eangi SEa 6 2 6 2 Exposing the System s cc ccescsesscnseesseeesectoscneoverceevecnessanevenessseebeseovevens 6 6 6 3 Removing CPU Module 0 cee eeseccessecesseecsseecsseecsseeeesaeecsaeesseeeseeeesaes 6 8 6 4 Removing CPU F n sasaat fein ae ted EA ONE oi 6 10 6 5 Removing Memory Riser Card ceeecessessseeeeseecsseeeeseeeesneessaeessseeeees 6 12 6 6 Removing A DIMM from a Memory Riser Card ceseceeesessseeeneeeeee 6 14 6 7 Removing System Motherboatd eeeeseseseseseeceseeeeseeeesseessaeessneeeees 6 16 6 8 Removing PCI EISA Option eeeeeeeccesseecsseecsneeceseeceseeeesaeessaeessneeeees 6 18 6 9 Removing Power Supply cscceescccsseecesseecsseecsneeceeeceseeessaeessaeesseeeees 6 20 6 10 Removing Power
173. s from the MC_ERR1 and MC_ERRO Registers to determine which memory slot is failing Replace both memory modules high and low for that slot For an RDS error there is no way to know which memory module high or low is bad Fora Connected Read Data Enor CRD When a CRD error occurs determine which memory module pair caused the error as follows 1 At the SRM console prompt enter the show mem command This command displays the base address and size of the memory module pair for each slot POO gt gt gt show mem Compare this address to the failing address from the MC_ERR1 and MC_ERRO Registers to determine which memory slot is failing ErorLogs 447 3 When you have isolated the failing memory pair determine which of the two DIMMs is bad You cannot do this if the operating system is Windows NT Read the CPU FIL SYNDROME Register If this register is non zero use the ECC syndrome bits in Table 4 8 to determine which DIMM had the single bit error Table 4 8 ECC Syndrome Bits Table CPU Syndrome Values for Low Order Memory 0l 02 04 08 10 20 40 80 CE CB D3 D5 D6 D9 DA DC 23 25 26 29 2C 31 13 19 4F 4A 52 54 57 58 5B 5D A2 A4 A8 BO CPU Syndrome Values for High Order Memory 2A 34 0E 0B 15 16 A 1C E3 E5 E6 E9 EA EC F1 F4 A7 AB AD B5 8F 8A 92 94 97 98 9B 9D 6B 6D 70 75 62 64 67 68 Eror Logs 4 48 4 4 10 Command Codes Table 4 9 shows the command codes for transactions on the system
174. s off and power up continues The EISA system controller PCI to EISA bridge COM1 port and control panel port are all initialized thereafter Each CPU prints an SROM banner to the device attached to the COM port and to the control panel display The banner prints to COM1 if the console environment variable is set to serial If it is set to graphics nothing prints to the console terminal only to the control panel display until occurs Each processor s S cache is initialized and the XSROM code in the FEPROM on the PCI 0 is unloaded into them If the unload is not successful a copy is unloaded from a different FEPROM sector If the second try fails the CPU hangs Each processor jumps to the XSROM code and sends an XSROM banner to the COM1 port and to the control panel display The three S cache banks on each processor are enabled and then the B cache is tested If a failure occurs a message is sent to the COM1 port and to the control panel display Each CPU sends a B cache completion message to COM1 The primary CPU is again determined and memory is sized using code in sector 1 of FEPROM 0 The information on memory pairs is sent to COM1 If an illegal memory configuration is detected a warning message is sent to COM and the control panel display Memory is initialized and tested and the test trace is sent to COM and the control panel display Each CPU participates in the memory testing The numbers for tests 20 and 2
175. se enter password XxXxXx P00 gt gt gt User mode SRM console commands are now available P00 gt gt gt set secure The console command login clears secure If the password has been forgotten and the system is in secure mode the procedure for regaining control is 1 Enter the login command P00 gt gt gt login 2 Atthe please enter password prompt press the Halt button and then press the Return key The password is now cleared and the console is in user mode A new password must be set to put the console into secure mode again For a full discussion of securing the console see your system User s Guide Troubleshooting 3 7 3 5 Testing an Entire System A test command with no modifiers runs all exercisers for subsystems and devices on the system I O devices tested are supported boot devices The test runs for 10 minutes Example 3 3 Sample Test Command POO gt gt gt test Console is in diagnostic mode System test runtime 600 seconds Type C to stop testing Configuring system polling ncrO NCR 53C810 slot 1 bus 0 PCI hose 1 SCSI Bus ID 7 dka500 5 0 1 1 DKa500 RRD45 1645 polling ncrl NCR 53C810 slot 3 bus 0 PCI hose 1 SCSI Bus ID 7 dkb200 2 0 3 1 DKb200 RZ29B 0007 dkb400 4 0 3 1 DKb400 RZ29B 0007 polling floppy0 FLOPPY PCEB XBUS hose 0 dva0 0 0 1000 0 DVAO RX23 polling tulipO DECchip 21040 AA slot 2 bus 0 PCI hose 1 ewa0 0 0 2 1 08 00 2B E5 B4 1A Testing EWAO network device Test
176. secesseeceneeeeneeeeneeeesaee 3 6 3 4 Releasing Secure Modes ascii agen aree E E bapa ideas 3 7 3 5 Testingan Entire System heita ed ont il teenie 3 8 3 5 1 Testine Memory aunes pna A isles aia A ia 3 10 3 5 2 Testing PCT aro ie era S a S ees eah 3 12 3 6 Other Useful Console Commands eeceeseecesneeeeteeceneeeeseeeesaeeesaeers 3 14 Chapter4 Enor Logs 4 1 Using Error Logsieics c4 fedtecie ted ite te eg ee ee 4 2 4 1 1 Hard Errors ionene nerna aneen eene ten evesateerenetedoredeynperenstedomitelasnetes 4 4 4 1 2 Soft Errors aieia a E nae nivel ieee 4 4 4 1 3 Error Lot EyentSi ninni n a a abeoee 4 5 4 2 Using DEC v nt nirien sin yeh di E E i n 4 6 4 2 1 Translating Event Files cccssccccesesceceeeeeneeeeeseneeecesnneeeesseeeessaees 4 7 4 2 2 Biltering Events iiceivaietaetheetehionn dai ctevtasnan esearch tented eas 4 8 4 2 3 Selecting Alternative Reports cceeesecesceceseceseecseecsneeseseeeesaee 4 10 4 3 Error Log Examples and AnalySis ccsscesescecsseeceseeceseeeesseessneessneeeees 4 11 4 3 1 MCHK 670 CPU Detected Failure cee eeeeceeseeceeneeseneeesneeseeeeee 4 11 4 3 2 MCHK 670 CPU and IOD Detected Failure eee eeeeeeeeeeeneeeeee 4 16 4 3 3 MCHK 670 Read Dirty CPU Detected Failure tee eeeeeeeseeeeee 4 21 4 3 4 MCHK 660 IOD Detected Failure System Bus Errotr 0 4 27 4 3 5 MCHK 660 IOD Detected Failure PCI Error ccccccccccceeeeeseeee 4 32 4 3 6 MCHK
177. sequence and alarm information The following is an example of the display RCM gt status Firmware Rev V2 0 Escape Sequence RCM Remote Access ENABLE Alerts DISABLI Alert Pending NO Temp C 26 0 RCM Power Control ON RCM Halt Deasserted External Power ON Server Power ON Gl RCM gt The status fields are explained in Table C 2 Managing the System Remotely C 14 Table C 2 RCM Status Command Fields Item Description Firmware Rev Escape Sequence Remote Access Alerts Alert Pending Temp C RCM Power Control RCM Halt External Power Server Power Revision of RCM firmware Current escape sequence to invoke RCM Modem remote access state ENABLE DISABLE Alert dial out state ENABLE DISABLE Alert condition triggered YES NO Current system temperature in degrees Celsius Current state of RCM system power control ON OFF Asserted indicates that halt has been asserted with the haltin command Deasserted indicates that halt has been deasserted with the haltout command or by cycling power with the On Off button on the control panel The RCM Halt field does not report halts caused by pressing the Halt button Current state of power to RCM Always on Indicates whether power to the system is on or off Managing the System Remotely C 15 C 4 Dial Out Alerts When you are not monitoring the system remotely you can use the RCM dial out f
178. slot for each module display can help you identify the location of a device exit The exit command terminates the LFU program causes system initialization and testing and returns the system to the console from which LFU was called help The help or command displays the LFU command list shown below Function Description Display Displays the system s configuration table Exit Done exit LFU reset List Lists the device revision firmware name and update revision Lfu Restarts LFU Readme Lists important release information Update Replaces current firmware with loadable data image Verify Compares loadable and hardware images or Help Scrolls this function table ifu The Ifu command restarts the LFU program This command is used when the update files are on a floppy disk The files for updating both console firmware and I O firmware are too large to fit on a 1 44 MB disk so only one type of firmware can be updated at a time Restarting LFU enables you to specify another update file Running Utilities A 24 list The list command displays the inventory of update firmware on the CD ROM network or floppy Only the devices listed at your terminal are supported for firmware updates The list command shows three pieces of information for each device e Current Revision The revision of the device s current firmware e Filename The name of the file used to update that firmware e
179. ss decode INDEX_H lt 23 6 gt address bus 12 B cache Tag March test B cache tag store RAMs B cache STAT store RAMs 13 B cache ECC Data Line test CPU chip ECC generation and checking logic ECC lines from CPU chip to B cache B cache ECC RAMs 14 B cache Tag Data Line test Access to B cache tags shorts between tag data and its status and parity bits 15 B cache Data Line test B cache data lines to B cache data RAMs B cache read write logic 16 B cache ECC Data Line test CPU chip ECC generation and checking logic ECC lines from CPU chip to B cache B cache ECC RAMs Power Up 2 13 Table 2 4 Memory Tests Test TestName Logic Tested Description 20 Memory Data test 21 Memory Address test 23 Memory Bitmap Building 24 Memory March test Data path to and from memory Data path on memory and RAMs Address path to and from memory Address path on memory and RAMs No new logic No new logic Test floats 1 and 0 across data and check bit data lines Errors are reported for each DIMM memory card from MEMO_L to MEM7_H Same as test 20 Maps out bad memory by way of the bitmap It does not completely fail memory Maps out bad memory There is no test 22 Power Up 2 14 2 6 XSROM Enors Reported The XSROM reports B cache test errors and memory test errors It also reports a warning if memory is illegally configured Example 2 2 XSROM Enors Reported at Power Up
180. stem Overview 1 32 When AC is applied to the system Vaux auxiliary voltage is asserted and is sensed by the power control logic PCL section of the motherboard if the On Off Button is On The PCL asserts DC_LENABLE _L starting the power supplies If there is a hard fault on power up the power supplies shut down immediately otherwise the power system powers up and remains up until the system is shut off or the PCL senses a fault If a power fault is sensed the signal SHUTDOWN is asserted after a 30 second delay Cycling the On Off button can restore the power System Overview 1 33 1 12 Maintenance Bus PC Bus The IC bus referred to as the I squared C bus is a small internal maintenance bus used to monitor system conditions scanned by the power control logic write the fault display store error state and track configuration information in the system Although all system modules not I O modules sit on the maintenance bus only the I C controller accesses it Figure 1 17 IC Bus Block Diagram Motherboard Thermom Thermostat up to 8 PCL Registers Memory CPUs MEMs IODO PCIO PCIO OCP C Bus XBUS ISA Controller Controller PKW051 1 97 System Overview 1 34 Monitor The IC bus monitors the state of system conditions sca
181. stem Overview 1 12 Memory Variants Memory consists of two riser cards supporting eight DIMM pairs There are two DIMM variants a 32 Mbyte version and a 128 Mbyte version Maximum memory using 32 Mbyte DIMMs is 128 Mbytes and the maximum memory using 128 Mbyte DIMMs is 2 Gbytes All memory is synchronous DRAM Option Size Module Type Number Size MS300 BA 64MB 54 25084 DA_ Synch _ 18 4M x 72 20 47405 D3 32MB MS300 DA 256MB 54 25092 DA Synch 18 16M x 72 _20 45619 D3 _128MB Memory Operation Each DIMM in the pair provides half the data or 64 bits plus 8 ECC bits of the octaword 16 byte transferred on the system bus DIMMs are placed in slots on the riser cards and the riser cards are placed in the slots designated MEM L and MEM H on the system motherboard NOTE Memory in slot MEM L does not drive the lower 8 bytes and memory in slot MEM H does not drive the higher 8 bytes of the 16 byte transfer Some bits originating from MEM L are high order bits and some bits originating from MEM H are low order bits Memory drives the system bus in bursts Upon each memory fetch data is transferred in 4 consecutive cycles transferring 64 bytes Memory Configuration Rules In a system memories of different sizes are permitted but e DIMMs are installed and used in pairs Both DIMMs in a memory pair must be of the same size e Each riser card receives one DIMM of the DIMM pair e The largest DIMM pair must be in ris
182. sts important release information Update Replaces current firmware with loadable data image Verify Compares loadable and hardware images or Help Scrolls this function table UPD gt list 4 Device Current Revision Filename Update Revision AlphaBIOS V5 32 0 arcrom ve 40 1 srmflash V5 0 1 srmrom V6 0 3 Running Utilities A 8 Select the device from which firmware will be loaded The choices are the internal CD ROM the internal floppy disk or a network device In this example the internal CD ROM is selected Select the file that has the firmware update or press Enter to select the default file The file options are AS1200FW default SRM console AlphaBIOS console and I O adapter firmware AS1200CP SRM console and AlphaBIOS console firmware only AS120010 T O adapter firmware only In this example the file for console firmware AlphaBIOS and SRM is selected The LFU function table and prompt UPD gt display Use the LFU list command to determine the revision of firmware in a device and the most recent revision of that firmware available in the selected file In this example the resident firmware for each console SRM and AlphaBIOS is at an earlier revision than the firmware in the update file Continued on next page Running Utilities A 10 Example A 3 Updating Firmware from the CD ROM Continued UPD gt update WARNING updates may take several minutes to complete for each device Confirm up
183. system Disabling Autoboot The system automatically boots the selected operating system at power up or reset if the following environment variables are set e For DIGITAL UNIX and OpenVMS the SRM environment variables os_type auto_action bootdef_dev boot_file and boot_osflags e For Windows NT the SRM os_type environment variable and the Auto Start selection in the AlphaBIOS Standard CMOS Setup screen You might want to prevent the system from autobooting so you can perform tasks from the SRM console Use one of the methods described previously to force a halt assertion When the SRM console prompt is displayed you can enter commands to configure or test the system Chapter 4 of your system User s Guide describes the SRM console commands and environment variables Disabling the SRM Power Up Script The system has a power up script file named nvram that runs every time the system powers up If you accidentally insert a command in the script that will cause a system problem disable the script by using one of the methods described previously to force a halt assertion When the SRM console prompt is displayed edit the script to delete the offending command See Section 4 4 of your system User s Guide for more information on editing the nvram script Halts Console Commands and Environment Variables B5 B 4 Summary of SRM Console Commands The SRM console commands are used to examine or modify the system state Table B
184. t RCM gt disable When the modem is disabled it remains disabled until the enable command is issued If a modem connection is in progress entering the disable command terminates it NOTE If the modem has been disabled from the RCM switchpack on the motherboard the enable command does not work To enable the modem reset the switch 2 MODEM OFF on the switchpack to OFF enabled See Section C 5 for information on the switchpack enable The enable command enables remote access to the RCM modem port It can take up to 10 seconds for the enable command to be executed RCM gt enable When the modem is enabled it remains enabled until the disable command is issued Managing the System Remotely C 9 The enable command can fail for the following reasons e No modem access password was set e The initialization string or the answer string might not be set properly See Section C 7 e The modem is not connected or is not working properly e The modem has been disabled from the RCM switchpack To enable the modem reset switch 2 MODEM OFF on the switchpack to OFF enabled If the enable command fails the following message is displayed ERROR nable failed hangup The hangup command terminates the modem session When this command is issued the remote user is disconnected from the server This command can be issued from either the local or remote console RCM gt hangup halt The halt command halts the ma
185. t quit The quit command exits the user from command mode and reconnects the serial terminal to the system console port The following message is displayed Focus returned to COM port The next display depends on what the system was doing when the RCM was invoked For example if the RCM was invoked from the SRM console prompt the console prompt will be displayed when you enter a carriage return Or if the RCM was invoked from the operating system prompt the operating system prompt will be displayed when you enter a carriage return reset The reset command requests the RCM to reset the hardware The reset command is equivalent to pressing the Reset button on the control panel RCM gt reset Focus returned to COM port Managing the System Remotely C 12 The following events occur when the reset command is executed e The system restarts and the system console firmware reinitializes e The console exits RCM command mode and reconnects the serial terminal to the system COM serial port e The power up messages are displayed and then the console prompt is displayed or the operating system boot messages are displayed depending on how the startup sequence has been defined setesc The setesc command resets the default escape sequence for invoking RCM The escape sequence can be any character string A typical sequence consists of 2 or more characters to a maximum of 15 characters The escape sequence is stored in the module s on b
186. t followed by an illustration or example and ends with descriptive text This manual has six chapters and three appendixes as follows Chapter 1 System Overview introduces the DIGITAL AlphaServer 1200 and the DIGITAL Ultimate Workstation 533 systems It describes each system component Chapter 2 Power Up provides information on how to interpret the power up display on the operator control panel the console screen and system LEDs It also describes how hardware diagnostics execute when the system is initialized Chapter 3 Troubleshooting describes troubleshooting during power up and booting as well as the test command Chapter 4 Error Logs explains how to interpret error logs and how to use DECevent Chapter 5 Error Registers describes the error registers used to hold error information Chapter 6 Removal and Replacement describes removal and replacement procedures for field replaceable units FRUs Appendix A Running Utilities explains how to run utilities such as the EISA Configuration Utility and RAID Standalone Configuration Utility Appendix B Halts Console Commands and Environment Variables summarizes the commands used to examine and alter the system configuration Appendix C Operating the System Remotely describes how to use the Remote Console Manager RCM to monitor and control the system remotely xi Documentation Titles Table 1 lists books in the documentation set for both systems Table
187. t lt 20 gt in the MC_ERRI Register is clear indicating the data is clean and comes from memory The error was detected by a CPU and the data was on the system bus and is clean Therefore a memory module provided the wrong data If the Dirty bit had been set the data would have come from the cache of another CPU To determine which memory see Section 4 4 NOTE The error log example has been edited to decrease its size registers of interest are in bold type The MC bus is the system bus Refer to Table 4 9 for information on decoding commands and refer to Table 4 10 for information on node IDs ErorLogs 415 Example 4 2 MCHK 670 CPU and IOD Detected Failure Logging OS System Architecture Event sequence number Timestamp of occurrence Host name System type register Number of CPUs mpnum CPU logging event mperr Event validity Event severity Entry type CPU Minor class Software Flags Active CPUs Hardware Rev System Serial Number Module Serial Number Module Type System Revision MCHK 670 Regs Flags PCI Mask Machine Check Reason PAL SHADOW REG 0 PAL SHADOW REG 1 PAL SHADOW REG 6 PAL SHADOW REG 7 PALTEMP 0 PALTEMP 1 PALTEMP 23 Exception Address Reg Exception Summary Reg Exception Mask Reg PAL BASE Interrupt Summary Reg IBOX Ctrl and Status Reg Icache Par Err Stat Reg Dcache Par Err Stat Reg 2 DIGITAL UNIX 2 Alpha 6 08 APR 1997 11 27 55 whip16 x00000016 Alp
188. t pci command tests PCI buses and devices The test runs for 2 minutes Example 3 5 Sample Test Command for PCI P00 gt gt gt test pci Console is in diagnostic mode System test runtime 120 seconds Type C to stop testing Configuring all PCI buses polling ncrO NCR 53C810 slot 1 bus 0 PCI hose 1 SCSI Bus ID 7 dka500 5 0 1 1 DKa500 RRD45 1645 polling ncrl NCR 53C810 slot 3 bus 0 PCI hose 1 SCSI Bus ID 7 dkb200 2 0 3 1 DKb200 RZ29B 0007 dkb400 4 0 3 1 DKb400 RZ29B 0007 polling tulipO DECchip 21040 AA slot 2 bus 0 PCI hose 1 ewa0 0 0 2 1 08 00 2B E5 B4 1A polling floppy0 FLOPPY PCEB XBUS hose 0 dva0 0 0 1000 0 DVAO RX23 Testing all PCI buses Testing EWAO network device Testing VGA alphanumeric mode only Testing SCSI disks read only Testing floppy dva0 read only ID Program Device Pass Hard Soft Bytes Written Bytes Read 00002c29 exer_kid dkb200 2 0 3 27 0 0 0 14642176 00002c2a exer_kid dkb400 4 0 3 27 0 0 0 14642176 00002c5e exer_kid dva0 0 0 100 0 0 0 0 0 Troubleshooting 3 12 ID Program Device Pass 00002c29 exer_kid dkb200 2 0 3 92 00002c2a exer_kid dkb400 4 0 3 92 00002c5e exer_kid dva0 0 0 100 0 Testing aborted Shutting down tests Please wait Testing complete SC P00 gt gt gt Hard Soft Bytes Written Bytes Read 0 48689152 0 48689152 0 286720 Troubleshooting 3 13 3 6 Other Useful Console Commands There are several console commands that help diagnose the system
189. t validity 1 O S claims event is valid Event severity 5 Low Priority Entry type 100 CPU Machine Check Errors CPU Minor class 4 620 System Correctable Error Software Flags x0000000000000000 Active CPUs x00000003 Hardware Rev x00000000 System Serial Number C1563 Module Serial Number Module Type x0000 System Revision x00000000 Machine Check Reason x0204 IOD Detected Soft Error Ext Interface Status Reg x 0000000000000000 Not Valid for 620 System 2 Correctable Errors Ext Interface Address Reg x0000000000000000 Not Valid for 620 System Correctable Errors Fill Syndrome Reg x0000000000000000 Not Valid for 620 System Correctable Errors Interrupt Summary Reg x0000000000000000 Not Valid for 620 System Correctable Errors WHOAMI x00000000 Module Revision 0 MID 0 GID 0 Sys Environmental Regs x00000000 Base Addr of Bridge x000000FBE0000000 Dev Type amp Rev Register x06008021 CAP Chip Revision x00000001 Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg MC Error Info Register 0 x122D5640 MC Bus Trans Addr lt 31 4 gt 122D5640 MC Error Info Register 1 x800E9600 MC bus trans addr lt 39 32 gt x00000000 MC Command is WriteBack Mem CPUO Master at Time of Error Device ID 2 00000002 4 MC error info valid CAP Error Register x89000000 Error Detected but Not Logged Eror Logs 439 MC error info latched IPA Status Register DPA Error Syndr
190. tchpack on the system board Use the switches to enable or disable certain RCM functions if desired Figure C 2 Location of RCM Switchpack on System Board SET DEF RPD DIS MODEM OFF EN RCM System Motherboard RCM Joo Switchpack SSeS 4 3 i a RCM power 2 Ue VAUX from 1 power supplies 1 a i PKW0504C 97 Managing the System Remotely C 19 Figure C 3 RCM Switches Factory Settings a O if PKW0950 97 Switch Name Description 1 EN RCM Enables or disables the RCM The default is ON RCM enabled The OFF setting disables RCM 2 MODEM OFF Enables or disables the modem The default is OFF modem enabled 3 RPD DIS Enables or disables remote poweroff The default is OFF remote poweroff enabled 4 SET DEF Sets the RCM to the factory defaults The default is OFF reset to defaults disabled Managing the System Remotely C 20 Uses of the Switc hpack You can use the RCM switchpack to change the RCM operating mode or disable the RCM altogether The following are conditions when you might want to change the factory settings e Switch 1 EN RCM Set this switch to OFF disable if you want to reset the baud rate of the COMI port to a value other than the system
191. tely The bridge does not respond to devices communicating with each other on the same PCI bus However should a device on one PCI address a device on the other PCI bus commands addresses and data run through the bridge out onto the system bus and back through the bridge to the other PCI bus In addition to its bridge function the system bus to PCI bus bridge module monitors every transaction on the system bus for errors It monitors the data lines for ECC errors and the command address lines for parity errors System Overview 1 21 1 8 3 PCI 1 0 Subsystem The I O subsystem consists of two 64 bit PCI buses One has an embedded EISA ISA bridge and three PCI option slots the other has a built in CD ROM driver and three PCI option slots Figure 1 11 PCI Block Diagram PCI 1 Bus SCSI Control 40MHz 530810 Clock ds Serial PCI 1 Connector S Interrupt 3 64 bit slots A y Logic S 33 3MHz lt t gt _ Osc Clock Bfr f e PCI O Bus y m PO PCI to EISA ISA aerial Ct 2 64 bit slots Bridge Chipset B Logie 1 32 bit slot EISA lt u XBUS pata BDATA XBUS Bus S Xceivers Xceivers NVRAM aoa Realtime gomipo o Mouse 12C Bus EISA 8Kx8 Clock parallel port Keyboard Interface 1 16 2MB floppy cntrl bit slot PKW0508 97 System Overview 1 22 Table
192. tely C 5 4 To terminate the modem connection enter the RCM hangup command RCM gt hangup If the modem connection is terminated without using the hangup command or if the line is dropped due to phone line problems the RCM will detect carrier loss and initiate an internal hangup command If the modem link is idle for more than 20 minutes the RCM initiates an auto hangup NOTE Auto hangup can take a minute or more and the local terminal is locked out until the auto hangup is completed C 2 3 Using RCM Locally Use the default escape sequence to invoke the RCM mode locally for the first time You can invoke RCM from the SRM console the operating system or an application The RCM quit command reconnects the terminal to the system console port 1 To invoke the RCM locally type the RCM escape sequence See in Example C 2 for the default sequence The escape sequence is not echoed on the terminal or sent to the system At the RCM gt prompt you can enter RCM commands 2 To exit RCM and reconnect to the system console port enter the quit command see Press Return to get a prompt from the operating system or system console Example C 2 Invoking and Leaving RCM Locally POO gt gt gt rem 1 RCM gt RCM gt quit 2 Focus returned to COM port Managing the System Remotely C 6 C 3 RCM Commands The RCM commands given in Table C 1 are used to control and monitor a system remotely Table C 1 RCM Command S
193. tepping DISABLED SERR Sys Err Driver Capability Enabled Fast Back to Back to Many Target DISABLED x0200 Device is 33 Mhz Capable 7 No Support for User Defineable Features Fast Back to Back to Different Targets Is Not Supported in Target Device Device Select Timing Medium x02 x010000 Mass Storage SCSI Bus Controller x00 xFF x00 Single Function Device x00 x00101200 x0412A100 x00000000 x00000000 x00000000 x00000000 x00000000 x04 x01 x00 x00 x000000FBC0001000 Slot or Device Number 2 x10201077 QLogic ISP_1020 Vendor ID x102B QLogic Device ID x00001020 Command Register x0107 I O Space Accesses Response Enabled Memory Space Accesses Response Enabled PCI Bus Master Capability Enabled Monitor for Special Cycle Ops DISABLED Generate Mem Wrt Invalidate Cmds DISABLED Parity Error Detection Response IGNORE Wait Cycle Address Data Stepping DISABLED SERR Sys Err Driver Capability Enabled Fast Back to Back to Many Target DISABLED Eror Logs 433 Status Register Revision ID Device Class Code Cache Line S Latency T Header Type Bist Base Address Register Base Address Register Base Address Register Base Address Register Base Address Register Base Address Register Expansion Rom Base Addres Interrupt Pl Interrupt P2 Min Gnt Max Lat NOBWNHE CONFIG Address Device and Vendor ID Command Register Status Register Revision ID Device Class Code Cache Line S Latency T Header
194. ter MDPA Error Syndrome Reg DPB Status Register DPB Error Syndrome Reg PALcode Revision x00000000 x00000000 MDPA Chip Revision x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 x00000000 MDPB Chip Revision x00000000 x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 IOD 1 Register Subpacket Device ID x0000003B Bcache Size 2MB VCTY ASIC Rev 0 Module Revision 0 x000000BB x000000FBE0000000 x06008021 CAP Chip Revision x00000001 Host to PCI Revision x00000003 I O Backplane Revision x00000003 PCI EISA Bus Bridge Present on PCI Device Class Host bus to PCI Bridg x46480FF1 Module Self Test Passed LED On Delayed PCI Bus Reads Protocol Enabled Bridge to PCI Transactions Enabled Bridge REQUESTS 64 Bit Data Transactions Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check Enabled MC Bus CMD Addr Parity Check Enabled MC Bus NXM Check Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold 8 RD_TYPE Memory Prefetch Algorithm Short RL_TYPE Mem Rd Line Prefetch Type Medium RM_TYPE Mem Rd Multiple Cmd Type Long ARB_MODE PCI Arbitration Round Robin x00000000 x00000000 x00000003 MC PCI Intr Enabled Device intr info enabled if en_int 1 x0000
195. ter When the valid bit MC_ERR_VALID in the CAP Error Register is clear the contents are undefined 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1312 11 10 09 0807 06 05 04 03 02 01 00 reserved 0 111 VALID bit Dirty bit DEVICE_ID MC Command lt 5 0 gt Failing Address ADDR lt 39 32 gt PKW0551A 97 Eror Registers 59 Table 5 4 MC Enor Information Register 1 Name Bits Type Initial State Description VALID Reserved Dirty Reserved DEVICE_ID MC_CMD lt 5 0 gt ADDR lt 39 32 gt lt 31 gt lt 30 21 gt lt 20 gt lt 19 17 gt lt 16 14 gt lt 13 8 gt lt 7 0 gt RO RO RO RO RO RO 0 Logical OR of bits lt 30 23 gt in the CAP_ERR Register Set if MC_ERRO and MC_ERR 1 contain a valid address Set if the system bus error was associated with a Read Dirty transaction When set the device ID field lt 19 14 gt does not indicate the source of the data All ones Slot number of bus master at the time of the error Active command at the time the error was detected Address bits lt 39 32 gt of the transaction on the system bus when an error is detected Eror Registers 5 10 5 5 CAP Eror Register CAP _ERR Offset 880 CAP_ERR is used to log information pertaining to an error detected by the CAP or MDP ASIC If the error is a hard error the register is locked All bits except the LOST_MC_ER
196. type register Number of CPUs mpnum CPU logging event mperr Event validity Event severity Entry type CPU Minor class Software Flags Active CPUs Hardware Rev System Serial Number Module Serial Number Module Type System Revision MCHK 660 Regs Flags PCI Mask Machine Check Reason PAL SHADOW REG 0 PAL SHADOW REG 7 PALTEMP 0 PALTEMP 23 Exception Address Reg Exception Summary Reg Exception Mask Reg PAL BASE Interrupt Summary Reg IBOX Ctrl and Status Reg Icache Par Err Stat Reg Dcache Par Err Stat Reg Virtual Address Reg Memory Mgmt Flt Sts Reg 2 DIGITAL UNIX 2 Alpha 6 04 APR 1996 17 20 04 whip16 x00000016 AlphaServer 4000 1200 Series x00000002 x00000000 1 1 O S claims event is valid 1 Severe Priority 100 CPU Machine Check Errors 2 660 Entry x0000000300000000 IOD 1 Register Subpkt Pres IOD 2 Register Subpkt Pres x00000003 x00000000 C1563 x0000 x00000000 x00000000 x0000 x0202 x00000000 x00000000 x0000000007 x00000000047FDA58 xFFFFFC000038D784 Native mode instruction Exception PC x3FFFFF00000E35E1 x00000000 x00000000 x00000000020000 Base addr for palcode x0000000008 x00000000200000 EXT HW interrupt at IPL21 AST requests 3 0 x00000000 x000000C160000000 Timeout Bit Not Set PAL Shadow Registers Enabled Correctable Err Intrpts Enabled ICACHE BIST Successful TEST_STATUS_H Pin Asserted x00000000 x00000000 xFFFFFFFFFF800130 x00000000014990
197. uides that prevent access to the Phillips head screws that hold fan 1 in place 5 Trace the wire from the fan to the motherboard to determine which power cord to unplug Unplug the power cord to fan 1 and pass it through the sheet metal to the fan compartment 6 Unscrew the fan from the frame and remove it from the system Replacement Reverse the steps in the Removal procedure Verification Power up the system If the fan you installed is faulty the system will not power up Removaland Replacement 6 25 6 13 Cover Interlock Removal and Replacement Figure 6 12 Removing Cover Interlock Interlock Switch PKW0519A 97 Removaland Replacement 626 Removal 1 Shut down the operating system and power down the system 2 Expose the card cage side of the system see Section 6 3 3 Loosen the screw that holds the CD ROM bracket to the system in Figure 6 12 4 Detach both the power and the signal connectors at the rear of the CD ROM 5 Pull the CD ROM and the bracket a short distance toward the rear of the system and lift them out of the cabinet 6 Unplug the interlock switch s pigtail cable from the cable it is connected to 7 Remove the two screws holding the interlock in place and remove the interlock Replacement Reverse the steps in the Removal procedure Verification Power up the system If the switch is faulty the system will not power up Remov
198. ummary Command Function alert_clr alert_dis alert_ena disable enable halt haltin haltout hangup help or poweroff poweron quit reset setesc setpass status Clears alert flag stopping dial out alert cycle Disables the dial out alert function Enables the dial out alert function Disables remote access to the modem port Enables remote access to the modem port Halts the server Emulates pressing the Halt button and immediately releasing it Causes a halt assertion Emulates pressing the Halt button and holding it in Terminates a halt assertion created with haltin Emulates releasing the Halt button after holding it in Terminates the modem connection Displays the list of commands Turns off power Emulates pressing the On Off button to the off position Turns on power Emulates pressing the On Off button to the on position Exits console mode and returns to system console port Resets the server Emulates pressing the Reset button Changes the escape sequence for invoking command mode Changes the modem access password Displays system status and sensors Managing the System Remotely C 7 Command Conventions e The commands are not case sensitive e Acommand must be entered in full e You can delete an incorrect command with the Backspace key before you press Enter e Ifyou type a valid RCM command followed by extra characters and press Enter the RCM accepts the correct com
199. up or the SRM console is running The system halts at the SRM console and the halt status is saved The next time the system powers up the saved halt status is checked NOTE Wait 5 seconds after the system begins powering up before pressing the Halt button or remotely entering the RCM halt command Halt Assertion with RC M Haltin Command Enter the RCM haltin command at any time except during power up For example enter haltin during an operating system session or when the AlphaBIOS console is running If you enter the RCM haltin command during a DIGITAL UNIX or OpenVMS session the system halts back to the SRM console and the halt status is saved The next time the system powers up the saved halt status is checked If you enter the RCM haltin command when Windows NT or AlphaBIOS is running the interrupt is ignored However you can enter the RCM haltin command followed Halts Console Commands and Environment Variables B 4 by the RCM reset command to force a halt assertion Upon reset the system powers up to the SRM console but the SRM console does not load the AlphaBIOS console Clearing a Halt Assertion Clear a halt assertion as follows e Ifthe halt assertion was caused by pressing the Halt button or remotely entering the RCM halt command the console uses the halt assertion once then clears it e Ifthe halt assertion was caused by entering the RCM haltin command enter the RCM haltout command or cycle power on the local
200. us trans addr lt 31 4 gt x04A26DBF MC bus trans addr lt 39 32 gt x00000000 MC Command x00000016 6 Device Id x0000003B MC error info valid Uncorrectable ECC err det by MDPA MC error info latched e MDPA Chip Revision x00000000 MDPA Error Syndrome of uncorrectable read error Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 MDPB Chip Revision x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 Palcode Rev 1 21 3 Eror Logs 428 4 3 5 MCHK 660 IOD Detected Failure PCI Enor The error log in Example 4 5 shows the following CPU 0 logged the error in a system with two CPUs The MCHK 660 register gives the reason for the machine check as an IOD detected hard error or a Dtag Parity Error if cached CPU The External Interface Status Register records that the error occurred during a D ref Fill but does not indicate what the error is The CAP Error Register for IODO did not see an error The CAP Error Register for IOD1 however records a serious PCI error The MC Error Info Registers 0 and 1 are not valid since the valid bit lt 31 gt is not set Exactly what was happening at the time of the error is not known 00O 86 OO There is a PCI Subpacket from PCI1 with four nodes on it Two devices on the PCI bus did not see an error however two did the Mylex
201. vironment Variables B 8 Table B 3 Environment Variable Summary Continued Environment Variable Function memory_test ocp_text os_type pci_parity pk 0_fast pk 0_host_id pk 0_soft_term sys_model_num sys_serial_num sys_type tga_sync_green tt_allow_login Specifies the extent to which memory will be tested For DIGITAL UNIX systems only Overrides the default OCP display text with specified text Specifies the operating system and sets the appropriate console interface Disables or enables parity checking on the PCI bus Enables fast SCSI mode Specifies the default value for a controller host bus node ID Enables or disables SCSI terminators on systems that use the QLogic ISP1020 SCSI controller Displays the system model number and computes certain information passed to the operating system Must be restored after a PCI motherboard is replaced Restores the system serial number Must be set if the system motherboard is replaced Displays the system type and computes certain information passed to the operating system Must be restored after a PCI motherboard is replaced Specifies the location of the SYNC signal generated by the DIGITAL ZLXp E PCI graphics accelerator option Enables or disables login to the SRM console firmware on other console ports Halts Console Commands and Environment Variables B 9 B 5 Recording Environment Variables This worksheet lists all environment v
202. ystem Enclosure Operator Control Panel and Drives System Consoles System Architecture CPU Types Memory Memory Addressing System Motherboard System Bus Backplane System Bus to PCI Bus Bridge PCI I O Subsystem Remote Control Logic Power Control Logic Power Circuit and Cover Interlock Power Supply Power Up Down Sequence Maintenance Bus I C Bus StorageWorks Drives System Overview 1 1 System Enclosure The system has up to two CPU modules and up to 2 Gbytes of memory A single fast wide or fast wide Ultra SCSI StorageWorks shelf provides storage Figure 1 1 System Enclosure 0 g 00000 9 000900 a 0 SANN 5000000 M 0 1 Ogsrngoo2000 ed A 000 ah AA l i oR R i i ij A PKW 0500 97 System Overview 1 2 The numbered callouts in Figure 1 1 refer to the system components System card cage which holds the system motherboard and the CPU memory and system I O PCI EISA section of the system card cage Operator control panel assembly which includes the control panel the LCD display and the floppy drive CD ROM drive Cooling section containing two fans StorageWorks shelf ooo 8 amp 8 6 Cover Interlock The system has a single cover interlock switch tripped by the top cover Figure 1 2 Cover Interlock Circuit

Download Pdf Manuals

image

Related Search

Related Contents

USER MANUAL SWAN Cycle III version 40.72A  - OKI Support  Istruzioni per l`uso Logano plus SB745  SHC 01 user's guide VER.B  Guía de Uso - Almacen de Balanzas  Manual de Instruções FI 713/1 Super Tork Profissional  MANUAL DE USUARIO - Ministerio del Trabajo  October 2011 Newsletter  PDFを開く  Electrolux JetMaxx EL4040A Owner's Guide  

Copyright © All rights reserved.
Failed to retrieve file