Home

SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic

image

Contents

1. Deletion of system board Reservation to delete system board in original domain in original domain Deletion Reboot of completed original domain 1 A Process to change domain configuration in original domain i Assignment unavailable Assignment assigned REER Connectivity disconnected Unassignmen a te rage ll aaa Configuration unconfigured from domain g unconligure a aca E E E Configuration change of original domain completed i Process to change configuration of destination domain Assignment unavailable ET cai Assignment assigned Connectivity disconnected Assignment Connectivity disconnected Configuration unconfigured to domain Configuration unconfigured Registration for destination domain completed Request to add system board to destination domain DCL registration status in destination domain Status of assignment to destination domain 2 24 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 2 4 3 4 Flowchart Replacing System Board The flow of DR operations and the transition of system board status when a system board has been replaced are described using the schematic flowchart Each system board state indicated in FIGURE 2 8 is the main status that is changed The sample status before and after replacement as shown in the
2. XSB 00 0 XSB 01 0 2 2 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 FIGURE 2 2 Example of Hardware Configuration with Quad XSBs of Midrange Server XSB 00 0 XSB 00 1 XSB 00 2 XSB 00 3 XSB 01 0 XSB 01 1 XSB 01 2 XSB 01 3 CMU IOU Chapter 2 What You Must Know Before Using DR 2 3 FIGURE 2 3 Example of a Hardware Configuration with Uni XSBs of High end Server CMU IOU I O device I O device A XSB 00 0 E N I O device I O device FIGURE 2 4 Example of a Hardware Configuration with Quad XSBs of High end Server CMU IOU p XSB 00 0 I O device XSB 00 1 I O device XSB 00 2 I O device XSB 00 3 I O device 2 1 1 1 CPU Using DR to change a CPU configuration is easier than using it to change the configuration of memory or an I O device An added CPU is automatically recognized by the Oracle Solaris OS and becomes available for use 2 4 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 2 1 1 2 A CPU to be deleted must meet the following conditions m No running process is bound to the CPU to be deleted If a running p
3. DR operation DR operation possible DR operation not not possible possible or domain Of domain confi tion t configuration changed eae to be changed Confirmation of the move source and move destination domains and selecting an operation DR operation possible DR operation not possible Checking the status of the system Reserve operation for board to be moved moving a system board Checking the device status DR operation possible Power on or restart of Move operation for the F the move source domain system board ove processing of the system Addition operation for board f the system board in the Change operation for the move move destination domain source and move destination domain configurations ae tatus of reserved addition in the move destination domain Chapter 4 Practical Examples of DR 4 5 4 1 4 4 6 Flow Replacing a System Board FIGURE 4 4 Flow Replacing a System Board Stop status Checking operation and selecting a DR operation Pooled eee Operation status and configuration of a domain system Adjustment between other domains board Configuration of the system board to be replaced Checking the device status DR deletion Deletion reservation Deletion reservation Deletion operation for operation for the system asl the system board in board in its domain its domain Stop status of the domain here is a dom
4. Note For diagnosis and management of a system board memory must be mounted on the system board even if the omit memory option is enabled Enabling the omit memory option reduces available memory in the domain and may lower system performance This option must be used in consideration of the influence on jobs The value of this option is true omit memory or false do not omit memory The default value is false Note Enable the omit memory option when the system board is in the system board pool or when the system board is not connected to the domain configuration Chapter 2 What You Must Know Before Using DR 2 15 2 2 2 4 Omit I O Option The omit I O option disables the PCI cards disk drives and basic local area network LAN ports on a system board to prevent the target domain from using them Set this option to true if the domain needs to use only the system board s CPU and memory Set this option to false if the domain needs to use the system board s PCI cards and I O units In this case you must fully understand the restrictions on use of these I O components And you must stop the software e g application programs or daemons that uses them before you attempt to delete or move the system board The value of this option is true omit I O units or false do not omit I O units The default value is false Note Enable the omit I O option when the syst
5. Specifies the system board XSB number of the system board to be deleted Specify xsb in the XX Y format XX 00 to 15 Y 0 to 3 The value depends on your server To specify multiple system boards several XSB numbers can be specified by delimiting each with a space SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 3 1 8 Note Note 1 The time required for system board deletion processing depends on the amount of hardware resources mounted on the target system board For this reason much time may be required for the command to end its operation If the system board contains kernel memory the OS is suspended for a while Note Note 2 If the DR processing executed by the deleteboard 8 command fails the target system board cannot be restored to the previous status If DR processing fails identify the cause of failure based on the error message output by the deleteboard 8 command and Oracle Solaris OS messages and then take appropriate corrective action Note that some errors require the domain to be rebooted Note Note 3 When a system board is forcibly deleted from a domain by the deleteboard 8 command with the option specified a serious problem may occur in a process that is bound to the CPU or in accessing an I O device For this reason you should avoid using the f option for normal DR operations When using the deleteboard 8 command with the f opt
6. Explanation Invalid argument is passed to the driver or there may be inconsistency in the system Remedy Repeat the action If this error message appears again please contact customer service dr_post_attach_cpu cpu_get failed for cpu X SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Explanation There may be inconsistency in the system Remedy Please contact customer service dr_pre_release_cpu thread s bound to cpu X Explanation The thread in the process is bound to the detached CPU X Remedy Check if the process bound to the CPU exists by pbind 1M command If it exists unbind from the CPU and repeat the action dr_pre_release_mem unexpected kphysm_del_release return value Explanation There may be inconsistency in the system Remedy Please contact customer service dr_pt_ioctl invalid passthru args Explanation Invalid argument is passed to the driver or there may be inconsistency in the system Remedy Repeat the action If this error message appears again please contact customer service dr_release_mem unexpected kphysm error code id 0xX Explanation There may be inconsistency in the system Remedy Please contact customer service dr_release_mem_done mem unit X Y deleted memory still found in phys_install Explanation There may be inconsistency in the system Remedy Please contact customer service dr_release_mem_done target
7. on page 1 5 m Section 1 3 Security on page 1 7 m Section 1 4 Overview of DR User Interfaces on page 1 7 1 1 DR Dynamic Reconfiguration referred to as DR in this document enables hardware resources such as processors memory and I O to be added and deleted even while the Oracle Solaris Operating System referred to as Oracle Solaris OS in this document is running DR has three basic functions i e addition deletion and move which can be used for the following purposes a Add system boards without stopping the Oracle Solaris OS of the domain to improve business operations or handle higher system loads a Temporarily remove a faulty system board for parts replacement without stopping the Oracle Solaris OS of the domain in the event of an error that causes the system board to become degraded 1 1 m Move a resource from one domain to another while continuously operating the domains without physically removing or inserting a system board Resources can be moved to balance the loads on multiple domains or to share common I O resources between domains SPARC Enterprise M4000 M5000 M8000 M9000 servers have a unique partitioning feature that can divide one physical system board PSB into one logical board undivided status or four logical boards A PSB that is logically divided into one board undivided status is called a Uni XSB whereas a PSB that is logically divided into four boards is called a Quad X
8. command see Section 3 1 7 Deleting a System Board on page 3 17 For details of the conditions and actions for executing the addboard 8 command see Section 3 1 6 Adding a System Board on page 3 15 Note Note 1 Before replacing a system board you must know the division type of the replacement target PSB and the configurations and operation status of all domains to which all XSBs on the PSB belong If the division type of the replacement target PSB is Quad XSB and the XSBs on the replacement target PSB belong to multiple domains you must consult with all administrators of the relevant domains in advance to adequately adjust the method of replacing the system board If the division type of the replacement target PSB is Uni XSB its replacement does not affect any other domains However prior adjustment may be required when the replacement target system board is used as a floating board for multiple domains or hardware replacement work may affect other domains Note Note 2 If the DR processing executed by the deleteboard 8 or addboard 8 commands fails the target system board cannot be restored its the previous status Identify the cause of failure based on the error messages output by the commands and Oracle Solaris OS messages and then take appropriate corrective action Note that some errors require the domain to be rebooted Note Note 3 If a system board is forcibly deleted from a domain by t
9. mow n to stop it DR operation canceled by operator Explanation DR operation canceled by operator Domain DomainID X is not currently running Explanation Destination domain 0 was not active when c configure was specified Remedy Execute it by specifying c assign A 24 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 XSB XX X is already assigned to another domain Explanation The specified system board XSB XX X has already been assigned to another domain Remedy XSB has already been assigned to another domain Confirm the XSB by showboards 8 XSBHXX X is not installed Explanation System board XSB XX X is not installed Remedy Specify the wrong XSB Confirm the XSB by showboards 8 XSB XX X is currently unavailable for DR Try again later Explanation The specified system board XSB XX X has already been executed by another operation Remedy DR or power off has been executing for another session Try again after waiting for a while with the confirmation of the XSB status XSB XX X has not been registered in DCL Explanation System board XSB XX X is not registered to DCL Remedy Register DCL information by setdc1 8 Another DR operation is in progress Try again later Explanation The specified system board XSB XX X has already been executed by another session Remedy DR operation is in progress by another session
10. 00 Running 3 Check the status of the move destination domain Execute the showdc1 8 command to display domain information and then check the operation status of the move destination domain Based on the operation status of the move source and move destination domains determine whether to perform the DR operation or change the domain configuration XSCF gt showdcl d 1 DID LSB XSB Status 01 Running 00 01 0 01 00 1 Chapter 4 Practical Examples of DR 4 11 4 Check the status of the system board to be moved Execute the showboards 8 command to display system board information and then check the status of the system board to be moved XSCF gt showboards 00 1 XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 1 00 01 Assigned y y y Passed Normal 5 Move the system board Execute the moveboard 8 command to delete the system board from the move source domain and add it to the move destination domain XSCF gt moveboard c configure d 1 00 1 6 Check the status of the move source domain When the moveboard 8 command ends normally execute the showdc1 8 command to display and check the operation status of the move source domain If the moveboard 8 command completes abnormally or leaves the board in an unwanted status refer output messages to identify the problem then correct it XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 00 1 7 Check
11. Appendix A Message Meaning and Handling A 21 opl_fc_ops_free_handle DMA seen Explanation A DMA resource was found in the resource list that is being freed while the board is unprobed Remedy Please contact customer service opl_fc_ops_free unknown resource type lt type gt Explanation An unknown resource type was found in the resource list that is being freed while the board is unprobed Remedy Please contact customer service VM viability test failed dr 0 SBX memory Explanation There is not enough real memory to detach memory on system board X Remedy Check the amount of available real memory and repeat the action If this error message appears again please contact our customer service DR parallel copy timeout Explanation Internal error happened during kernel migration Remedy Retry and if the problem persists contact customer service SCF busy Explanation SCF was busy during kernel migration Remedy Retry and if the problem persists contact customer service SCF I O Retry Error Explanation Internal error happened during kernel migration Remedy Please contact customer service FMEM command timeout Explanation Internal error happened during kernel migration Remedy Please contact customer service Hardware error Explanation Internal error happened during kernel migration Remedy Please contact customer service FMEM operation terminated Explanation In
12. The following shows examples of displays by the showdc1 8 command m Example 1 Display of information on domain 0 XSCF gt showdcl DID LSB 00 00 04 05 06 07 08 XSB 00 0 01 0 01 01 01 2 01 3 02 0 d 0 Status Running a Example 2 Display of detailed information on domain 0 XSCF gt showdcl DID LSB 00 00 01 02 03 04 05 06 07 08 09 10 TL XSB 00 0 01 0 01 1 01 2 01 3 02 0 v d 0 Status Running No Mem False False False False A True ig True No IO Float Cfg policy FRU False False False False True False True True True True True True Displaying Domain Status The showdomainstatus 8 command lists the domains in the system and their status This command displays the same domain status information as the showdc1 8 command Use the showdomainstatus 8 command to check domain status before and after a DR operation The following examples show the format and options of the showdomainstatus 8 command showdomainstatus a showdomainstatus d domain_id showdomainstatus h Chapter 3 DR User Interface 3 5 TABLE 3 5 Options of the showdomainstatus Command Option a d domain_id h Description Displays the status of all domains Displays information about the specified domain where domain_id is the domain number possibly 0 to 23 depending on your server Only one domain ID can be s
13. Unavailable Available Assigned Disconnected Connected Unconfigured Configured Description The system board is not mounted or cannot be recognized perhaps because it is faulty The system board is not being diagnosed Testing Passed A system board error was detected and the board has been deconfigured The system board is in the system board pool not assigned to a domain and its status is one of the following not yet diagnosed under diagnosis or diagnosis error All system boards that are not mounted are also shown as Unavailable The system board is in the system board pool and its diagnosis has completed normally The system board is reserved or assigned to the domain The system board is disconnected from the domain configuration and is in the system board pool The system board is connected to the domain configuration The hardware resources of the system board have been deleted from the Oracle Solaris OS The hardware resources of the system board have been added into the Oracle Solaris OS XSCF changes and configures system board status according to the conditions under which a system board is installed removed or registered in the DCL or when a domain is started or stopped System board status also changes when the system board is added deleted or moved by DR To perform a DR operation for a system board you must determine the method of DR operation according to the status of
14. command showfru a device showfru device location showfru h Chapter3 DR User Interface 3 13 TABLE 3 11 Options of the showfru Command Option Description g h device location Specifies that the command display all configuration information on devices of the type specified by devtype Displays usage information Specifies a device type Specify sb for DR Specifies a device name Specifies a physical system board PSB number Specify a decimal number from 00 to 15 for PSB To display information about multiple system boards several PSB numbers can be specified by delimiting each with a space The range of PSB numbers to be specified varies depending on your server The table below lists the items displayed by the showfru 8 command TABLE 3 12 Items of System Board Configuration Information to be Displayed Display items Description Device Location XSB Mode Memory Mirror Mode Device type sb is the corresponding device for DR Mounting location of a device Displays a physical system board PSB number XSB division type Uni Uni XSB no division mode Quad Quad XSB four division mode Memory mirror mode yes Memory mirror mode is enabled no Memory mirror mode is disabled The following example shows a display of the showfru 8 command a Example Display of configuration information on all system boards Device sb sb sb s
15. de Fujitsu Limited ou des soci t s affili es de l une ou l autre entit Ce document ainsi que les produits et technologies qu il d crit peuvent inclure des droits de propri t intellectuelle de parties tierces prot g s par copyright et ou c d s sous licence par des fournisseurs a Oracle et ou ses soci t s affili es et Fujitsu Limited y compris des logiciels et des technologies atives aux polices de caract res Conform ment aux conditions de la licence GPL ou LGPL une copie du code source r gi par la licence GPL ou LGPL selon le cas est disponible sur demande par I Utilisateur final Veuillez contacter Oracle et ou ses soci t s affili es ou Fujitsu Limited Cette distribution peut comprendre des composants d velopp s par des parties tierces Des parties de ce produit peuvent tre d riv es des syst mes Berkeley BSD distribu s sous licence par l Universit de Californie UNIX est une marque d pos e aux Etats Unis et dans d autres pays distribu e exclusivement sous licence par X Open Company Ltd Oracle et Java sont des marques d pos es d Oracle Corporation et ou de ses soci t s affili es Fujitsu et le logo Fujitsu sont des marques d pos es de Fujitsu Limited Toutes les marques SPARC sont utilis es sous licence et sont des marques d pos es de SPARC International Inc aux Etats Unis et dans d autres pays Les produits portant la marque SPARC reposent sur des architectures d velopp es par
16. is made automatically to all output messages The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed n Specifies that a response of no is made automatically to all output messages The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed E Forcibly adds a system board that has not been diagnosed to a domain This option for normal DR operations must not be used A faulty system board or a system board where a fault is detected will not be forcibly added to the destination domain v Displays the progress of this DR command If the option is specified with the q option the v option is ignored Chapter 3 DR User Interface 3 15 3 16 TABLE 3 13 Options of the addboard Command Continued Option h c configure c assign c reserve d domain_id xsb Description Displays the usage information Specifies that the command add a system board to the domain If no other c option is specified c configure is the default Specifies that the command assign a system board to the domain With this option specified the command assigns the target system board to the domain The assigned system board is added to the domain when the addboard 8 command with the c configure option specified i
17. 1 2 5 2 2 Operation Management This section describes the premises and the actions for DR operations I O Device Management Upon the addition of a system board device information is reconfigured automatically However addition of the system board and the reconfiguration of device information do not end at the same time Sometimes device link in dev directory is not automatically cleaned up by devfsadmd 1M daemon Using devfsadm 1M you can manually clean up this device link See the devfsadm 1M Oracle Solaris man page for details Swap Area The size of available virtual memory is the sum of the size of memory mounted in the system and the size of the swap area on the disk You must ensure that the size of available memory is sufficient for all necessary operations Swap Area at System Board Addition By default in Oracle Solaris the swap area is also used to store a system crash dump You should use a dedicated dump device instead See the Oracle Solaris man page dumpadm 1M The default swap area used to store the crash dump varies in size according to the size of mounted memory The size of the dump device used to store the crash dump must be larger than the size of mounted memory When a system board is added thereby increasing the size of mounted memory the dump device must be reconfigured as required For details see the dumpadm 1M Oracle Solaris man page Swap Area at System Board Deletion When you delete a
18. Assignment assigned l l Test testing l Diagnosis completed Domain configuration change process I Test passed Test passed i Assignment assigned Connectio ee eH d Connectivity disconnected to domain y ah ay fe se ere Request of addition into OS Process of addition into OS l l l Test passed Test passed Assignment assigned Incorpor Assignment assigned Connectivity connected ation into Connectivity connected Configuration unconfigured OS Configuration configured a Pe a a ei Spe ee te 2 4 3 2 Flowchart Deleting a System Board The flow of DR operations and the transition of system board status when a system board has been deleted or reserved for deletion are described in the schematic flowchart below Each system board status indicated in FIGURE 2 6 is the main status that is changed 2 22 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 FIGURE 2 6 Flow of System Board Deletion Processing jee eee ee Ke eK Ke Ke Ke Ke Ke Ke oe ra a reservation l i Configuration unconfigured Configuration configured Pa ra Status of addition into OS i r Status of deletion from OS i a gt i i l l l I Request of l Test passed l Test passed Deletion assignment assigned Saeed Assignment assigned deletion C
19. If the addboard 8 command ends abnormally identify the cause of the abnormality based on the messages output and then take appropriate corrective action XSCF gt showboards v 01 0 XSB R DID LSB Assignment Pwr Conn Conf Test Fault COD 01 0 SP Available y n n Passed Normal n 5 Stop or reboot the domain Stop or reboot the domain This operation executes the reserved deletion of the system board as a change in domain configuration Chapter 4 Practical Examples of DR 4 21 4 6 2 Example Reserving a System Board Delete FIGURE 4 11 Example Reserving a System Board Delete Domain 0 XSB 00 0 XSB 01 0 Domain 0 XSB 00 0 XSB 01 0 Delete z 1 Login to XSCF 2 Check the status of the domain Execute the showdc1 8 command to display domain information and then check the operation status of the domain Based on the operation status of the domain determine whether to perform the DR operation or change the domain configuration XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 01 0 3 Check the status of the system board to be deleted Execute the showboards 8 command to display system board information and then check the status of the system board to be deleted XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test 4 Reserve the deletion of the system board Execute the deleteboard 8 command to res
20. Pwr Conn Conf Test Fault 00 0 00 00 Assigned y y y Passed Normal 01 0 00 01 Assigned y n n Passed Normal 01 1 00 02 Assigned y n n Passed Normal 01 2 01 00 Assigned y n n Passed Normal 01 3 01 01 Assigned y n n Passed Normal 7 Physically replace the system board Execute the replacefru 8 command then follow the displayed instructions to replace the system board per the Active Replacement procedure For information about Active Replacement see the Service Manual for your server XSCF gt replacefru 8 Check the status of the replaced system board Execute the showboards 8 command to display system board information and then check the status of the system board to be added and confirm its registration in the DCL If you need to change the PSB configuration use the setupfru 8 command If the system board is not registered in the DCL register it in the DCL for the target domain by using the setdc1 8 command XSCF gt showboards a 4 XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 0 00 00 Assigned y y y Passed Normal 01 0 00 01 Assigned y n n Passed Normal 01 1 00 02 Assigned y n n Passed Normal 01 2 01 00 Assigned y n n Passed Normal 01 3 01 01 Assigned y n n Passed Normal 4 18 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 9 Check the status of all related domains Execute the showdc1 8 command to display domain information and then check the op
21. SPARC64 VII SPARC64 VII and SPARC64 VI Processors and CPU Operational Modes 2 30 2 5 9 1 CPU Operational Modes 2 31 3 DR UserInterface 3 1 3 1 How To Use the DR User Interface 3 1 3 1 1 3 1 2 3 1 3 3 1 4 3 1 5 3 1 6 3 1 7 3 1 8 3 1 9 3 1 10 Displaying Domain Information 3 2 Displaying Domain Status 3 5 Displaying System Board Information 3 7 Displaying Device Information 3 10 Displaying System Board Configuration Information 3 13 Adding a System Board 3 15 Deleting a System Board 3 17 Moving a System Board 3 19 Replacing a System Board 3 22 Reserving a Domain Configuration Change 3 25 3 2 Command Reference 3 26 3 3 XSCFWeb 3 27 3 4 RCM Script 3 27 4 Practical Examples of DR 4 1 4 1 Flow of DR Operation 4 2 4 1 1 4 1 2 4 1 3 4 1 4 Flow Adding a System Board 4 3 Flow Deleting a System Board 4 4 Flow Moving a System Board 4 5 Flow Replacing a System Board 4 6 Contents vii 4 2 4 3 4 4 4 5 4 6 Example Adding a System Board 4 7 Example Deleting a System Board 4 9 Example Moving a System Board 4 11 Examples Replacing a System Board 4 13 4 5 1 Example Replacing a Uni XSB System Board 4 13 4 5 2 Example Replacing a Quad XSB System Board 4 16 Examples Reserving Domain Configuration Changes 4 20 4 6 1 Example Reserving a System Board Add 4 20 4 6 2 Example Reserving a System Board Delete 4 22 4 6 3 Example Reserving a System Board Move 4 23 A Message Meaning and Handling
22. You will be able to use DR to delete the bad SPARC64 VI boards so you can remove them But you will not be able to use DR to add replacement or repaired SPARC64 VI boards until you change the domain from SPARC64 VII Enhanced Mode to SPARC64 VI Compatible mode which requires a reboot Setting cpumode to compatible in advance enables you to avoid possible failure of a later DR add operation and one or more reboots The SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF User s Guide contains the above information as well as more detailed instructions SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 CHAPTER 3 DR User Interface This chapter describes the user interfaces for DR Section 3 1 How To Use the DR User Interface on page 3 1 Section 3 2 Command Reference on page 3 26 Section 3 3 XSCF Web on page 3 27 Section 3 4 RCM Script on page 3 27 Al How To Use the DR User Interface XSCF provides two user interfaces for DR the command line interface by XSCF shell and the browser based user interface by XSCF Web This section describes the main XSCF shell commands used for DR For other related commands see Section 3 2 Command Reference on page 3 26 For XSCF Web see Section 3 2 Command Reference on page 3 26 and Section 3 3 XSCF Web on page 3 27 Note If your server is configured with SPARC64 VII processors some restrict
23. by operator XSBHXX X is not installed Explanation System board XSB XX X is not installed Remedy Specify the wrong XSB Confirm the XSB by showboards 8 XSB XX X is currently unavailable for DR Try again later Explanation The specified system board XSB XX X has already been executed by another operation Remedy DR or power off has been executing for another session Try again after waiting for a while with the confirmation of the XSB status XSB XX X has not been registered to DCL Explanation System board XSB XX X is not registered to DCL Remedy Register DCL information by setdc1 8 XSB XX X is the last LSB for DomainID X and this domain is still running Operation failed Explanation XSB XX X is the last LSB for domain X Remedy Power off the domain by specifying c reserve IP address of DSCP path is not specified Explanation DR cannot communicate with the domain because DSCP IP Address is not set up or not registered Remedy Register the DSCP IP Address An internal error has occurred This may have been caused by a DR library error Explanation The DR processing cannot be failed on the domain OS The error occurred at the DR library Remedy Find out the cause of the DR failure referring monitoring message and errorlog Confirm the patch applying status and the XCP version DR failed Domain DomainID X cannot communicate via DSCP path Explanation D
24. bydevice byboard d domain_id showdevices h Note The showdevices 8 command only reports information about a running domain 3 10 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 Note Note 2 The showdevices 8 command will succeed only if the following Oracle Solaris Service Management Facility SMF services are active on that domain Domain SP Communication Protocol dscp Domain Configuration Server dcs Oracle Sun Cryptographic Key Management Daemon sckmd TABLE 3 9 Options of the showdevices Command Option Description v Specifies that the command displays information about all devices Information about not only the management target devices but also other devices is displayed However the displayed information includes resource information about the devices whose resources are managed and does not include resource information about the devices whose resources are not managed p bydevice Specifies that the command display information about the devices mounted on a system board CPU memory and I O devices sorted by device If neither p bydevice nor p byboard is specified p bydevice is the default p byboard Specifies that the command display information about the devices mounted on system boards CPU memory and I O devices by system board p query Tests the detachability of the board by test running the DR command without actually
25. configure or c assign option specified is executed For details about the moveboard 8 command see Section 3 1 8 Moving a System Board on page 3 19 Chapter3 DR User Interface 3 25 3 2 Command Reference This section lists the DR commands and other commands related to DR For details of the commands see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF Reference Manual For the DR commands see Section 3 1 How To Use the DR User Interface on page 3 1 Note Note 1 Use of each command is restricted to selected administrators only To use each command you must have appropriate administrator privileges For details see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF Reference Manual Note Note 2 This section does not list all commands related to DR For other DR related commands see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF Reference Manual TABLE 3 16 DR Display Commands Command name Function showdcl Displays the DCL and the domain status showdomainstatus Displays domain status showboards Displays system board information showdevices Displays information about the CPUs memory and I O devices on system boards showfru Displays PSB configuration information TABLE 3 17 DR Operation Commands Command name Function setdcl Updates and edits the DCL setupfru Sets the division type and memory mirror mode for PSB addboa
26. confirm whether the system board of the PSB is assigned to the domain or not and release the system board if it is in the assigned status SB XX is not installed Explanation Because PSB is not installed it could not be set Remedy Please execute it again after confirming installation of the hardware Operation has completed However a configuration error was detected SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Explanation Although configuration of PSB is changed configuration error is occurring on the system board created Confirm the CPU module and DIMM slot on the specified PSB and status of Memory Mirror Mode Remedy Confirm the CPU module and DIMM slot on the PSB board and status of Memory Mirror Mode The specified parameter is not supported in this model Explanation Unsupported parameter in this server is specified For this reason the command was canceled Remedy Confirm the specified parameter and your server and execute the command once again Invalid parameter Explanation There is an error in the specified argument or operand Remedy Confirm the specified argument or operand and execute the command once again Permission denied Explanation Do not have privilege Remedy Confirm the user privilege and the command privilege In the case of high end servers please also confirm whether command is executed by XSCF on standby side The current con
27. depends on your server if you can remove a kernel memory board or not Operator confirmation for quiesce is required dr 0 SBX memory Explanation There is non relocatable kernel memory on the board Remedy The target board with kernel memory cannot be disconnected by DR Output Console and Standard Output Unexpected internal condition drmach c Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Unexpected internal condition SBX Explanation The attempt to call OBP failed Remedy Repeat the action If this error message appears again please contact customer service SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Output Console and Standard Output Device busy dr 0 SBX cpuY Explanation CPU Y on system board X is busy during release operation Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output Insufficient memory dr 0 SBX cpuY Explanation Lack of memory resources detected Remedy Check the size of available memory and detach the board If the problem still exists please contact customer service Output Console and Standard Output Invalid argument dr 0 SBX cpuY Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standa
28. movement of a system board as shown in the given configuration diagram 4 6 1 Example Reserving a System Board Add FIGURE 4 10 Example Reserve a System Board Add Domain 0 Domain 0 XSB 00 0 XSB 01 0 XSB 00 0 XSB 01 0 1 Login to XSCF 4 20 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 2 Check the status of the system board to be added Execute the showboards 8 command to display system board information and then check the status of the system board to be added and confirm its registration in the DCL If you need to change the PSB configuration use the setupfru 8 command If the system board is not registered in the DCL register the system board in the DCL for the target domain by using the setdc1 8 command XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 01 0 XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 01 0 SP Available y n n Passed Normal 3 Reserve the addition of the system board Execute the addboard 8 command to reserve the addition of the system board XSCF gt addboard c reserve d 0 01 0 4 Check the status of the system board When the addboard 8 command ends normally execute the showboards 8 command to display system board information and then check the status of the target system board and confirm that the addition of the target system board has been reserved
29. of this document servers described herein were shipping with XCP 1100 firmware installed That might no longer be the latest available version or the version now installed Always see the Product Notes that apply to the firmware on your server and those that apply to the latest firmware release This chapter includes the following sections m Audience on page x a Related Documentation on page x m Text Conventions on page xii m Syntax of the Command Line Interface CLI on page xii a Documentation Feedback on page xiii Audience This guide is written for experienced system administrators with working knowledge of computer networks and advanced knowledge of the Oracle Solaris Operating System Oracle Solaris OS Related Documentation All documents for your sever are available online For the web location of these documents refer to the getting started guide packaged with your server Please check for the most recent version of product notes for your server Product Notes are available only online Note For Sun Oracle software related manuals Oracle Solaris OS and so on go to http docs sun icom Book Title Sun Oracle Fujitsu SPARC Enterprise M4000 M5000 Servers Site Planning Guide 819 2205 C120 H015 SPARC Enterprise M8000 M9000 Servers Site Planning Guide 819 4203 C120 H014 SPARC Enterprise Equipment Rack Mounting Guide 819 5367 C120 H016 SPARC Enterprise M4000 M5000 Se
30. or is suspended in the OpenBoot PROM ok prompt state Running Oracle Solaris OS is running Shutdown Started Oracle Solaris OS is being shut down Panic State Oracle Solaris OS has panicked To perform a DR operation for a system board you must determine the method of DR operation according to the status of the relevant domain The conditions of domain status available for DR operation are described in individual sections of Chapter 3 DR User Interface For details of each method used for DR see the relevant section System Board Status XSCF manages system board status in units of XSB for the following management items TABLE 2 3 System Board Management Items Management item Description Power Power on off status of system board Test Diagnostic status of system board Assignment Status of assignment to domain Connectivity Status of connection to domain Configuration Status of addition into Oracle Solaris OS The table below lists the status types available for individual management items TABLE 2 4 System Board Management Items Management item Status Description Power Power Off The system board is powered off and cannot be used Power On The system board is powered on Chapter 2 What You Must Know Before Using DR 2 19 2 20 TABLE 2 4 System Board Management Items Continued Management item Test Assignment Connectivity Configuration Status Unmount Unknown Testing Passed Failed
31. pages to the swap area Sufficient swap space must be available for this operation to succeed 2 1 Locked Pages and ISM Pages Some user pages are locked into memory and cannot be swapped out These pages receive special treatment by DR Intimate Shared Memory ISM pages are special user pages which are shared by all processes ISM pages are permanently locked and cannot be swapped out as memory pages ISM is usually used by Data Base Management System DBMS software to achieve better performance Although locked pages cannot be swapped out the system automatically moves them to the memory on another system board to avoid any problem concerning the pages Note however that the deletion of user memory fails if there is not sufficient free memory size on the remaining system boards to hold the relocated pages SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 213 Although such moving of memory called save processing requires a certain length of time system operations can continue during save processing because it is executed as a background task Note The Dynamic Intimate Shared Memory DISM is a feature that allows applications to dynamically resize their ISM segments Some applications use RCM scripts to resize their DISM segments to assist DR See the Oracle Solaris man page for remscript 4 Deleting or moving a user memory board fails if either of the following statements is t
32. system board the memory of the system board is swapped to the swap area of the disks The available swap area is decreased by the memory size to be deleted So before you execute a delete board command check the total swap area to verify that enough free swap space is available to hold the board s physical Chapter 2 What You Must Know Before Using DR 2 27 PAENG 2 5 4 2 28 memory contents Be aware that some of the total swap space may be supplied by disks that are attached to the board to be deleted When making your assessment be certain to also account for the swap space that will be lost m If the size of available memory e g 1 5 gigabytes is larger than the size of deleted memory e g 1 gigabytes the total size of available memory will be 0 5 gigabytes after deleting the system board m If the size of available memory e g 1 5 gigabytes is smaller than the size of deleted memory 2 gigabytes the attempt to delete the system board will fail To determine the size of currently available swap area execute the swap s command on the OS and verify that the memory size is marked available For details see the Oracle Solaris man page swap 1M Moreover the size of physical memory of system board to be deleted and information on I O devices connected can be confirmed by the showdevices 8 command See Section 3 1 4 Displaying Device Information on page 3 10 or the showdevices 8 man page see Appendix B for a more
33. the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF User s Guide for further information 1 8 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 CHAPTER 2 What You Must Know Before Using DR This chapter provides information you must know to successfully use the DR functions This chapter includes these sections Section 2 1 System Configuration on page 2 1 Section 2 2 Conditions and Settings Using XSCF on page 2 13 Section 2 3 Conditions and Settings Using Oracle Solaris OS on page 2 16 Section 2 4 Status Management on page 2 18 Section 2 5 Operation Management on page 2 27 2 1 PAI System Configuration This section describes the conditions premises and actions for operating the DR functions to construct a system System Board Components There are three types of system board components that can be added and deleted by DR CPU memory and I O device FIGURE 2 1 and FIGURE 2 2 show examples of a system board of a midrange server that is divided into one Uni XSB and into Quad XSBs FIGURE 2 3 and FIGURE 2 4 show examples of a system board of a high end server that is divided into one Uni XSB and into Quad XSBs 2 1 Note Due to diagnostic requirements the DR function works only on boards that have at least one CPU and memory FIGURE 2 1 Example of Hardware Configuration with Uni XSB of Midrange Server CMU IOU
34. the setting of memory mirror mode by fully considering the requirements for the domain configuration and operations Capacity on Demand COD DR works the same on COD boards as on other system boards but standard COD restrictions still apply For detailed information on COD boards see the SPARC Enterprise M4000 M5000 M8000 M9000 Servers Capacity on Demand COD User s Guide XSCEF Failover An XSCF reset or failover might prevent a DR operation from completing Log in to the active XSCF to determine if DR succeeded If not try it again Kernel Memory Board Deletion An XSCF reset or failover during the Copy rename phase of a deleteboard 8 or moveboard 8 operation might cause the domain to panic and display the following message Irrecoverable FMEM error error_code If the XSCF reset or failover results in a domain panic check the active XSCF to determine if the DR operation succeeded If not try it again Chapter 2 What You Must Know Before Using DR 2 29 2 5 8 2 5 9 2 30 Deletion of Board with CD RW DVD RW Drive To delete the system board to which the server s CD RW DVD RW drive is connected execute the following steps 1 Stop the vold 1M daemon by disabling the volfs service usr sbin svcadm disable volfs 2 Execute the DR operation 3 Restart the vold 1M daemon by enabling the volfs service usr sbin svcadm enable volfs For details see the vold 1M Oracle
35. to be stopped for system board replacement XSCF gt showdcl a DID LSB XSB Status 00 Running 00 00 0 01 01 0 02 01 1 01 Running 00 01 2 01 01 3 3 Check the status of all related system boards Execute the showboards 8 command to display system board information and then check the status of all system boards related to the PSB to be replaced The DR operation for replacement may not be possible if the board to be replaced does not support the DR delete operation XSCF gt showboards a XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 Assigned y y y Passed Normal 01 Assigned y y y Passed Normal 01 1 00 02 Assigned y y y Passed Normal 00 Assigned y y y Passed Normal 01 Assigned y y y Passed Normal 4 Delete all system boards related to the CMU to be replaced Execute the deleteboard 8 command to delete the system boards and then assign the boards to a domain that permits the DR operation XSCF gt deleteboard c disconnect 01 0 01 1 Chapter 4 Practical Examples of DR 4 17 5 Power off Domain 1 so the CMU can be replaced Execute the powerof 8 command so that the CMU being replaced will not be in use by domain 1 XSCF gt poweroff d 1 6 Check the status of all related system boards Execute the showboards 8 command to display system board information and then check the status of all related system boards XSCF gt showboards a XSB DID LSB Assignment
36. 0 Servers XSCF User s Guide Settings Using XSCF The DR functions provide users with some options to avoid the complexities of reconfiguration and memory allocation with the Oracle Solaris OS and make DR operations smoother You can set up these options using the XSCF shell or XSCF Web This section describes the following options Configuration policy option m Floating board option Omit memory option a Omit I O option These options are set using setdc1 8 command For details of how to set the options see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF User s Guide or the setdc1 8 man page Chapter 2 What You Must Know Before Using DR 2 13 2 2 2 1 2 2 2 2 2 14 Configuration Policy Option DR operations involve automatic hardware diagnosis to add or move a system board safely Degradation of components occurs when the components are set according to the configuration of this option and a hardware error is detected This option specifies the range of degradation Moreover this option can be used for initial diagnosis by domain startup in addition to DR operations The unit of degradation can be a component where a hardware error is detected the system board XSB where the component is mounted or a domain Values that can be set and units of degradation are explained in TABLE 2 1 The default value of the configuration policy option is FRU Note Enable the configuration policy optio
37. 3 2 Check the status of the domain Execute the showdc1 8 command to display domain information and then check the operation status of the domain Based on the operation status of the domain determine whether to perform the DR operation or replace the system board after stopping the domain XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 01 0 3 Check the status of the system board to be replaced Execute the showboards 8 command to display system board information and then check the status of the system board to be deleted The DR operation for replacement may not be possible if the board to be replaced does not support the DR delete operation XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 4 Delete the system board Execute the deleteboard 8 command to delete the system board XSCF gt deleteboard c disconnect 01 0 5 Check the status of the system board Execute the showboards 8 command to display system board information and then check the status of the system board XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 01 0 00 01 Assigned y n n Passed Normal 6 Physically replace the system board Execute the replacefru 8 command then follow the displayed instructions to replace the system board per the Active Replacement procedure For information about Active Replacement see the Service Manual fo
38. A 1 Al A 2 Oracle Solaris OS Messages A 1 A 1 1 Transition Messages A 1 A 1 2 PANIC Messages A 3 A 1 3 Warning Messages A 4 Command Messages A 24 A 2 1 addboard A 24 A 2 2 deleteboard A 27 A 2 3 moveboard A 29 A 2 4 setdcl A 33 A 2 5 setupfru A 34 A 2 6 showdevices A 35 B Example Confirm Swap Space Size B 1 Index Index 1 viii SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Preface This guide describes the Dynamic Reconfiguration DR feature of SPARC Enterprise M4000 M5000 M8000 M9000 servers from Oracle and Fujitsu DR enables users to add remove or exchange system boards in the M4000 M5000 midrange and M8000 M9000 high end servers while the domains that contain these boards remain up and running The M3000 server does not support DR Some references to server names and document names are abbreviated for readability For example if you see a reference to the M9000 server note that the full product name is the SPARC Enterprise M9000 server And if you see a reference to the XSCF Reference Manual note that the full document name is the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF Reference Manual Before reading this document you should read the overview guide for your server the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers Administration Guide and the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF User s Guide At publication
39. ARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Explanation Internal error during DR operation Remedy Please contact customer service drmach_node_ddi_get_parent NULL parent dip Explanation Internal error during DR operation Remedy Please contact customer service Failed to remove CMP xx on board n Explanation Internal error during DR operation Remedy Please contact customer service scf_fmem_cancel failed rv 0x lt error code gt Explanation Internal error during kernel migration Remedy Please contact customer service scf_fmem_start error Explanation SCF fails to start the FMEM operation It is possible that there is HW error and there is no SCF path or the SP is down Remedy Please contact customer service scf_fmem_cancel error Explanation DR detects some error in the copy rename process and informs SCF to cancel the operation However SCF fails to cancel the operation Remedy Please contact customer service Unknown cpu implementation Explanation There may be inconsistency in the system Remedy Please contact customer service dr_mem_ecache_scrub address 0x lx not on page boundary Explanation There may be inconsistency in the system Remedy Please contact customer service unexpected kcage_range_delete_post_mem_del return value Explanation There may be inconsistency in the system Remedy Please contact customer service
40. B XX X is not installed Remedy Specify the wrong XSB Confirm the XSB by showboards 8 XSB XX X is currently unavailable for DR Try again later Explanation The specified system board XSB XX X has already been executed by another operation Remedy DR or power off has been executing for another session Try again after waiting for a while with the confirmation of the XSB status XSB XX X has not been registered in DCL Explanation System board XSB XX X is not registered to DCL Remedy Register DCL information by setdc1 8 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Another DR operation is in progress Try again later Explanation The specified system board XSB XX X has already been executed by another session Remedy DR operation is in progress by another session Try again after waiting for a while with the confirmation of the XSB status XSB XX X is the last LSB for DomainID X and this domain is still running Operation failed Explanation XSB XX X is the last LSB for domain X Remedy Power off the domain by specifying c reserve XSB XX X detected timeout by DR self test Explanation The timeout occurred during DR processing because the hardware diagnosis did not complete There is something wrong with the hardware Remedy Find out the cause of the DR failure referring monitoring message and errorlog Replace the failure component
41. Cryptographic Key Management Daemon sckmd For details see the Notes about SMF services in Section 3 1 4 Displaying Device Information on page 3 10 Section 3 1 6 Adding a System Board on page 3 15 Section 3 1 7 Deleting a System Board on page 3 17 andSection 3 1 8 Moving a System Board on page 3 19 2 4 2 4 1 2 18 Status Management The success of DR operations depends on the status of domains and system boards This section describes the status information on the domains and system boards managed by XSCF and the points to be noted for a better understanding of DR operation conditions Domain Status XSCF manages the status of each domain You can display and reference the status of each domain through a user interface provided by XSCF For details of the user interface see Chapter 3 DR User Interface XSCF manages the following aspects of domain status TABLE 2 2 Domain Status Status Description Powered Off Domain power is off Initialization Phase POST processing or OpenBoot PROM initialization is in progress OpenBoot Executing Initialization of OpenBoot PROM is completed Completed SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 2 4 2 TABLE 2 2 Domain Status Continued Status Description Booting Oracle Solaris OS is being booted or due to the domain being shutdown or reset the system is in the OpenBoot PROM running state
42. DR operations When using the moveboard 8 command with the option specified be sure to check the status of the move source domain and application processes Note Note 4 You can execute the moveboard 8 command on a source domain or a destination domain that is not running When the source domain is running the moveboard 8 command with c configure or c assign will succeed only if the following Oracle Solaris Service Management Facility SMF services are active on that domain Domain SP Communication Protocol dscp Domain Configuration Server dcs Oracle Sun Cryptographic Key Management Daemon sckmd Replacing a System Board Use the deleteboard 8 and addboard 8 commands to replace a system board Use them to replace add or delete such hardware resources as the CPU memory and I O devices or replace the PSB of a CMU or IOU SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Note In a midrange server you cannot use DR commands to replace a system board Instead turn off the power of all domains and then replace the target system board To replace a system board in a domain first delete the target system board from the domain by using the deleteboard 8 command to make the PSB replaceable Next replace the PSB with a new one and then add the target system board to the domain For details of the conditions and actions for executing the deleteboard 8
43. MED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID go si ca Adobe PostScript Copyright 2007 2010 FUJITSU LIMITED Tous droits r serv s Oracle et ou ses soci t s affili es ont fourni et v rifi des donn es techniques de certaines parties de ce composant Oracle et ou ses soci t s affili es et Fujitsu Limited d tiennent et contr lent chacune des droits de propri t intellectuelle relatifs aux produits et technologies d crits dans ce document De m me ces produits technologies et ce document sont prot g s par des lois sur le copyright des brevets d autres lois sur la propri t intellectuelle et des trait s internationaux Ce document le produit et les technologies aff rents sont exclusivement distribu s avec des licences qui en restreignent l utilisation la copie la distribution et la d compilation Aucune pete de ce produit de ces technologies ou de ce document ne peut tre reproduite sous quelque forme que ce soit par quelque moyen que ce soit sans l autorisation crite pr alable d Oracle et ou ses soci t s affili es et de Fujitsu Limited et de leurs ventuels bailleurs de licence Ce document bien qu il vous ait t fourni ne vous conf re aucun droit et aucune licence expresses ou tacites concernant le produit ou la technologie auxquels il se rapporte Par ailleurs il ne contient ni ne repr sente aucun engagement de quelque type que ce soit de la part d Oracle ou
44. Oracle et ou ses soci t s affili es SPARC64 est une marque de SPARC International Inc utilis e sous licence par Fujitsu Microelectronics Inc et Fujitsu Limited Tout autre nom mentionn peut correspondre a des marques appartenant a d autres propri taires United States Government Rights Commercial use U S Government users are subject to the standard government user license agreements of Oracle and or its affiliates and Fujitsu Limited and the applicable provisions of the FAR and its supplements Avis de non responsabilit les seules garanties octroy es par Oracle et Fujitsu Limited et ou toute soci t affili e de l une ou l autre entit en rapport avec ce document ou tout produit ou toute technologie d crits dans les pr sentes correspondent aux garanties express ment stipul es dans le contrat de licence r gissant le produit ou la technologie fournis SAUF MENTION CONTRAIRE EXPRESS MENT STIPULEE DANS CE CONTRAT ORACLE OU FUJITSU LIMITED ET LES SOCI T S AFFILIEES L UNE OU L AUTRE ENTITE REJETTENT TOUTE REPRESENTATION OU TOUTE GARANTIE QUELLE QU EN SOIT LA NATURE EXPRESSE OU IMPLICITE CONCERNANT CE PRODUIT CETTE TECHNOLOGIE OU CE DOCUMENT LESQUELS SONT FOURNIS EN L ETAT EN OUTRE TOUTES LES CONDITIONS REPRESENTATIONS ET GARANTIES EXPRESSES OU TACITES Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALIT MARCHANDE A L APTITUDE A UNE UTILISATION PARTICULI RE OU A L ABSENCE DE CONTREFA ON SONT E
45. P setting is wrong or the error occurs at the DSCP path Remedy Confirm the domain powered off DSCP setting DSCP error with monitoring message and errorlog XSB XX X could not be configured into DomainID X due to operating system error Explanation An error occurred from DR library of domain OS at DR process The error occurred at configuration management of domain OS Remedy Find out the cause of the DR failure referring monitoring message and console message Try again after taking out cause Invalid parameter Explanation There is an error in the specified argument or operand Remedy Confirm the specified argument or operand and execute the command once again Permission denied Explanation Do not have privilege Remedy Confirm the user privilege and the command privilege In the case of high end servers please also confirm whether command is executed by XSCF on standby side The current configuration does not support this operation Explanation Cannot execute the command in the current configuration or it is not supported SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 A 2 2 Remedy Confirm the current hardware configuration and support status A hardware error occurred Please check the error log for details Explanation Hardware error occurred Please confirm monitoring message and the error log Remedy Find out the cause of the DR failure referr
46. PUs memory I O devices A system board is replaced successively in stages In the replace operation the selected system board is deleted from the OS of the domain Then the system board is removed when it is ready to be released from its domain After field parts replacement or other such task the system board is re installed and added Note You cannot use DR to replace a system board in a midrange server because doing so would replace an MBU To replace a system board in a midrange server you must turn off the power of all domains then replace the board without using DR commands 1 3 Security DR operations are executed based on privileges For information about privileges and user accounts see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers Administration Guide 1 4 Overview of DR User Interfaces DR operations are performed through the command line interface CLI within the XSCF shell or through the browser based user interface BUI in the XSCF Web provided by the eXtended System Control Facility XSCF These operations are collectively managed by the XSCF Furthermore XSCF security management restricts DR operations to administrators who have the proper access privileges Chapter 1 Overview of Dynamic Reconfiguration 1 7 For details of XSCF shell commands provided for DR see Section 3 1 How To Use the DR User Interface on page 3 1 XSCF Web is beyond the scope of this document See
47. R processing cannot communicate with the domain The reasons are that domain is powered off the DSCP setting is wrong or the error occurs at the DSCP path SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Remedy Confirm the domain powered off DSCP setting DSCP error with monitoring message and errorlog XSB XX X could not be unconfigured from DomainID X due to operating system error Explanation An error occurred from DR library of domain OS at DR process The error occurred at configuration management of domain OS Remedy Find out the cause of the DR failure referring monitoring message and console message Try again after taking out cause Invalid parameter Explanation There is an error in the specified argument or operand Remedy Confirm the specified argument or operand and execute the command once again Permission denied Explanation Do not have privilege Remedy Confirm the user privilege and the command privilege In the case of high end servers please also confirm whether command is executed by XSCF on standby side A hardware error occurred Please check the error log for details Explanation Hardware error occurred Please confirm monitoring message and the error log Remedy Find out the cause of the DR failure referring monitoring message and error log Replace the failure component An internal error has occurred Please contact your system administr
48. SB Each composition of physical unit of the divided PSB is called an eXtended System Board XSB These XSBs can be combined freely to create domains DR functions on these servers are performed on an XSB This manual uses the term system board unless physical units of PSB and XSB are described For an explanation of each term see TABLE 1 2 Note This document explains DR functions on system boards Use the Oracle Solaris command cfgadm 1M to execute DR on I O devices including PCI cards For more information please see the Service Manual for your server and the cfgadm 1M and cf gadm_pci 1M man pages FIGURE 1 1 Uni XSB and Quad XSB Midrange Servers Uni XSB Quad XSB XSB MBU XSB MBU CMU IOU System boards 1 2 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 FIGURE 1 2 Uni XSB and Quad XSB High end Servers CMU IOU Uni XSB Quad XSB XSB XSB XSB XSB CMU IOU System boards TABLE 1 1 and TABLE 1 2 list DR related terms TABLE 1 1 Basic DR Terms Term Definition Add To connect a system board to a domain and configure it into the Oracle Solaris OS of the domain Delete To unconfigure a system board from the Oracle Solaris OS of a domain and disconnect it from the domain Move To disconnect a system board from a domain and then connect the system board to another domain Register To register a system board in the domain compone
49. SPARC Enterprise M4000 M5000 M8000 M9000 Servers Dynamic Reconfiguration DR User s Guide oe Sun FUJITSU Rolie aah bia i ORACLE __ SPARC aaa Copyright 2007 2010 FUJITSU LIMITED All rights reserved Oracle and or its affiliates provided technical input and review on portions of this material Oracle and or its affiliates and Fujitsu Limited each own or control intellectual pope rights relating to products and technology described in this document and such products technology and this document are protected by copyright laws patents and other intellectual property laws and international treaties This document and the product and technology to which it pertains are distributed under licenses restricting their use copying distribution and decompilation No part of such product or technology or of this document may be reproduced in any form by any means without prior written authorization of Oracle and or its affiliates and Fujitsu Limited and their applicable licensors if any The furnishings of this document to you does not give you any rights or licenses express or implied with respect to the product or technology to which it pertains and this document does not contain or represent any commitment of any kind on the part of Oracle or Fujitsu Limited or any affiliate of either of them This document and the product and technology described in this document may incorporate third party intellectual property copyrighted by and o
50. See the SPARC Enterprise manual a variable or user M3000 M4000 M5000 M8000 M9000 replaceable text Servers XSCF User s Guide my Indicates names of chapters See Chapter 2 System Features sections items buttons or menus Syntax of the Command Line Interface CLI The command syntax is as follows m A variable that requires input of a value must be put in Italics a An optional element must be enclosed in A group of options for an optional keyword must be enclosed in and delimited by xii SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Documentation Feedback If you have any comments or requests regarding this document go to the following web sites m For Oracle users http docs sun com a For Fujitsu users in U S A Canada and Mexico http www computers us fujitsu com www support_servers shtml s upport servers a For Fujitsu users in other countries refer to this SPARC Enterprise contact http www fujitsu com global contact computing sparce_index html Preface xiii xiv SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 CHAPTER 1 Overview of Dynamic Reconfiguration This chapter provides an overview of Dynamic Reconfiguration which is controlled by the eXtended System Control Facility XSCF This chapter includes these sections m Section 1 1 DR on page 1 1 m Section 1 2 Basic DR Functions
51. Solaris man page SPARC64 VII SPARC64 VII and SPARC64 VI Processors and CPU Operational Modes Note This section applies only to M4000 M5000 M8000 M9000 servers that run or will run SPARC64 VII or SPARC64 VII processors The M4000 M5000 M8000 M9000 servers support system boards that contain any mix of SPARC64 VII SPARC64 VII and SPARC64 VI processors Note Supported firmware releases and Oracle Solaris releases vary based on processor type For details see the Product Notes that apply to the XCP release running on your server and the latest version of the Producct Notes no earlier than XCP version 1100 FIGURE 2 9 shows an example of a mixed configuration of SPARC64 VII and SPARC64 VI processors SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 FIGURE 2 9 CPUs on CPU Memory Board Unit CMU and Domain Configuration CMU 0 CMU 1 CMU 2 CMU 3 CMU mounted with SPARC64 VII only CMU mounted with JCMU of mixed CPU JCMU of mixed CPU SPARC64 VI only configuration configuration Domain 1 yf Domain 2 SPARC64 VII processor SPARC64 VI processor Q Different types of processors can be mounted on a single CMU as shown in CMU 2 and CMU 3 in FIGURE 2 9 And a single domain can be configured with different types of processors as shown in Domain 2 in FIGURE 2 9 2 5 9 1 CPU Operational Modes An M4000 M5000 M8000 M9000 server domain runs in one
52. Try again after waiting for a while with the confirmation of the XSB status XSB XX X has been detected timeout by DR self test Explanation The timeout occurred during DR processing because the hardware diagnosis did not complete There is something wrong with the hardware Remedy Find out the cause of the DR failure referring monitoring message and errorlog Replace the failure component XSB XX X encountered a hardware error See error log for details Explanation An error occurred during hardware diagnosis There is something wrong with the hardware Remedy Find out the cause of the DR failure referring monitoring message and errorlog Replace the failure component Appendix A Message Meaning and Handling A 25 IP address of DSCP path is not specified Explanation DR cannot communicate with the domain because the DSCP IP Address is not set up or registered Remedy Register the DSCP IP Address An internal error has occurred This may have been caused by a DR library error Explanation The DR processing cannot be failed on the domain OS The error occurred at the DR library Remedy Find out the cause of the DR failure referring monitoring message and errorlog Confirm the patch applying status and the XCP version DR failed Domain DomainID X cannot communicate via DSCP path Explanation DR processing cannot communicate with the domain The reasons are that domain is powered off the DSC
53. XCLUES DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE Sauf mention contraire express ment stipul e dans ce contrat dans la mesure autoris e par la loi applicable en aucun cas Oracle ou Fujitsu Limited et ou l une ou l autre de leurs soci t s affili es ne sauraient tre tenues responsables envers une quelconque partie tierce sous quelque th orie juridique que ce soit de tout manque a ae ou de perte de profit de probl mes d utilisation ou de perte de donn es ou d interruptions d activit s ou de tout dommage indirect sp cial secondaire ou cons cutif m me si ces entit s ont t pr alablement inform es d une telle ventualit LA DOCUMENTATION EST FOURNIE EN L TAT ET TOUTE AUTRE CONDITION DECLARATION ET GARANTIE EXPRESSE OU TACITE EST FORMELLEMENT EXCLUE DANS LA MESURE AUTORISEE PAR LA LOI EN VIGUEUR Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALIT MARCHANDE A L APTITUDE A UNE UTILISATION PARTICULI RE OU A L ABSENCE DE CONTREFACON Contents Preface ix Overview of Dynamic Reconfiguration 1 1 1 1 DR 1 1 1 2 Basic DR Functions 1 5 1 2 1 Adding a System Board 1 6 1 2 2 Deleting a System Board 1 6 1 2 3 Moving a System Board 1 6 1 2 4 Replacing a System Board 1 7 1 3 Security 1 7 1 4 Overview of DR User Interfaces 1 7 What You Must Know Before Using DR 2 1 2 1 System Configuration 2 1 2 1 1 System Board Components 2 1 2 1 1 1 CPU 2 4 2 1 1 2 Memory 2 5 2 1 1 3 I O Devi
54. XSB XX encountered a hardware error See error log for details Explanation An error occurred during hardware diagnosis There is something wrong with the hardware Remedy Find out the cause of the DR failure referring monitoring message and errorlog Replace the failure component IP address of DSCP path is not specified Explanation The DR processing cannot communicate the domain because DSCP IP Address is not set up Remedy Register the DSCP IP Address An internal error has occurred This may have been caused by a DR library error Explanation The DR processing cannot be failed on the domain OS The error occurred at the DR library Remedy Find out the cause of the DR failure referring monitoring message and errorlog Confirm the patch applying status and the XCP version DR failed Domain DomainID X cannot communicate via DSCP path Explanation DR processing cannot communicate with the domain The reasons are that domain is powered off the DSCP setting is wrong or the error occurs at the DSCP path Appendix A Message Meaning and Handling A 31 Remedy Confirm the domain powered off DSCP setting DSCP error with monitoring message and errorlog XSB 03 0 could not be unconfigured from DomainID 1 due to operating system error or XSB 03 0 could not be configured into DomainID 0 due to operating system error Explanation An error occurred in DR library of domain OS at DR process The error o
55. a response of no is made automatically to output messages The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed Forcibly deletes a system board from the domain This option for normal DR operations must not be used Displays the progress of this DR command If the option is specified with the q option the v option is ignored Displays the usage information Specifies that the command delete a system board from the domain and set it in the status where it is assigned to the domain This is a default option Deletes the board and adds it to the system board pool The command unconfigures and disconnects the system board from the domain If the board is in the state where it is assigned to the domain the command unassigns the board from the domain and puts it in the system board pool Also if the domain power is off the command similarly puts the board in the system board pool Reserves the deletion of a system board from a domain The system board is deleted from the domain and placed in the system board pool when the domain power is turned off or the domain is rebooted If the board is in the state where it is assigned to the domain the command unassigns the board from the domain and places it in the system board pool Also if the domain power is off the command similarly places the board in the system board pool
56. ain execute the setdomainmode 8 command to change the cpumode setting from auto to compatible then reboot the domain DR operations work normally on domains running in SPARC64 VI Compatible Mode You can use DR to add delete or move boards with any of the processor types which are all treated as if they are SPARC64 VI processors DR also operates normally on domains running in SPARC64 VII Enhanced Mode with one exception You cannot use DR to add or move into the domain a system board that contains any SPARC64 VI processors To add a SPARC64 VI processor you must power off the domain change it to SPARC64 VI Compatible Mode then reboot the domain In an exception to the above rule you can use the DR addboard 8 command with its c reserve or c assign option to reserve or register a board with one or more SPARC64 VI processors in a domain running in SPARC64 VII Enhanced Mode The next time the domain is powered off then rebooted it comes up running in SPARC64 VI Compatible Mode and can accept the the reserved or registered board Note Change the cpumode from auto to compatible for any domain that has or is expected to have SPARC64 VI processors If you leave the domain in auto mode and all the SPARC64 VI processors later fail the Oracle Solaris OS will see only the SPARC64 VII and SPARC64 VII processors because the failed SPARC64 VI processors will have been degraded and it will reboot the domain in SPARC64 VII Enhanced Mode
57. ain for There is no which deletion has domain for been reserved which deletion has been reserved DR addition Deletion reservation Deletion operation for operation for the system A the system board board in its domain in its domain Start of domain Start of domain Power on of the relevant domain State of the domain in operation SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 4 2 Example Adding a System Board This section provides an example of the DR operation to add a system board to a domain In the example a procedure conforming to section 4 1 1 Flow Adding a System Board is used and the system board shown in the figure is added by using the XSCF shell FIGURE 4 5 Example Adding a System Board Domain 0 Domain 0 XSB 00 0 XSB 01 0 XSB 00 0 XSB O1 0 1 Login to XSCF 2 Check the status of the domain Execute the showdc1 8 command to display domain information and then check the operation status of the domain Based on the operation status of the domain determine whether to perform the DR operation or change the domain configuration XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 3 Check the status of the system board to be added Execute the showboards 8 command to display system board information and then check the status of the system board to be added and con
58. alue depends on your server To specify multiple system boards several XSB numbers can be specified by delimiting each with a space Chapter 3 DR User Interface 3 21 3 1 9 3 22 Note Note 1 The time required for system board deletion processing in the move source domain depends on the amount of hardware resources mounted on the target system board Moreover in the system board addition processing in the move destination domain the system board to be added is first diagnosed and then added to the domain For this reason much time may be required for the command to end its operation Oracle Solaris OS is suspended for a while when the system board includes kernel memory Note Note 2 If the DR processing executed by the moveboard 8 command fails the target system board cannot be restored to the previous status If DR processing fails identify the cause of failure based on the error message output by the moveboard 8 command and Oracle Solaris OS messages in the move source and move destination domains and then take appropriate corrective action Note that some errors require one of the domains to be rebooted Note Note 3 When a system board is forcibly deleted from the move source domain by the moveboard 8 command with the f option specified a serious problem may occur in a process that is bound to the CPU or in accessing an I O device For this reason you should avoid using the option for normal
59. ase see Section 2 5 9 SPARC64 VII SPARC64 VII and SPARC64 VI Processors and CPU Operational Modes on page 2 30 This chapter includes these sections Section 4 1 Flow of DR Operation on page 4 2 Section 4 2 Example Adding a System Board on page 4 7 Section 4 3 Example Deleting a System Board on page 4 9 Section 4 4 Example Moving a System Board on page 4 11 Section 4 5 Examples Replacing a System Board on page 4 13 Section 4 6 Examples Reserving Domain Configuration Changes on page 4 20 4 1 4 1 Flow of DR Operation This section provides the flows of basic DR operations to add delete move and replace system boards along with flow diagrams 4 2 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 4 1 1 Hardware maintenance Flow Adding a System Board FIGURE 4 1 Flow Adding a System Board Checking operation and selecting a DR operation Operation status and configuration of a domain Judgment of whether the DR operation can be performed DR DR operation not operation possible or possible See status domain Aare configuration Checking the domain status to be changed The domain is operating Reserve operation for adding a system board Checking the status of the system board to be added Checking the device status DR operation not possible Normal Addition operation fo
60. ator Explanation DR failed There is a possibility that DR failed because of an internal error in XSCF Remedy Find out the cause of the DR failure referring monitoring message and error log Please also confirm the XCP version A 2 3 moveboard XSB XX X will be moved from DomainID X to DomainID X immediately Continue y n Appendix A Message Meaning and Handling A 29 Explanation Confirming whether DR operation is going to be executed or not Input y to execute it and now n to stop it XSB XX X will be assigned to DomainID X immediately Continue y n Explanation Confirming whether DR operation is going to be executed or not Input y to execute it and now n to stop it XSB XX X will be assigned to DomainID X after DomainID X restarts Continue y n Explanation Confirming whether DR operation is going to be executed or not Input y to execute it and wow n to stop it DR operation canceled by operator Explanation DR operation canceled by operator Domain DomainID X is not currently running Explanation Destination domain X was not active when c configure was specified Remedy Execute it by specifying c assign XSBHXX X cannot be moved due to System Board Pool Explanation The XSB in the system board pool cannot be moved Remedy Executing addboard command XSB XX X is not installed Explanation System board XS
61. b XSCF gt showfru a sb Location XSB Mode Memory Mirror Mode 00 Quad yes 01 Quad yes 02 Quad no 03 Uni no 3 14 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 3 1 6 Adding a System Board Use the addboard 8 command to add a system board to a domain or reserve the addition of a system board to a domain based on the DCL The system board must already be registered in the target domain s DCL Use the showdc1 8 command to check whether a system board is registered in the DCL To register a system board in the DCL use the setdc1 8 command Before executing the addboard 8 command check the status of the DR target domain and system board You must determine whether you can perform the DR operation based on the status of the domain and system board The following examples show the format and options of the addboard 8 command addboard q y n f v c configure d domain_idxsb addboard q y n v c assign d domain_idxsb addboard q y n v c reserve d domain_idxsb addboard h TABLE 3 13 Options of the addboard Command Option Description q Specifies the suppression of output message display The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed y Specifies that a response of yes
62. ccurred at configuration management of domain OS Remedy Find out the cause of the DR failure referring monitoring message and console message Try again after taking out cause Invalid parameter Explanation There is an error in the specified argument or operand Remedy Confirm the specified argument or operand and execute the command once again Permission denied Explanation Do not have privilege Remedy Confirm the user privilege and the command privilege In the case of high end servers please also confirm whether command is executed by XSCF on standby side The current configuration does not support this operation Explanation Cannot execute the command in the current configuration or it is not supported Remedy Confirm the current hardware configuration and support status A hardware error occurred Please check the error log for details Explanation Hardware error occurred Please confirm monitoring message and the error log Remedy Find out the cause of the DR failure referring monitoring message and error log Replace the failure component An internal error has occurred Please contact your system administrator Explanation DR failed There is a possibility that DR failed because of an internal error in XSCF Remedy Find out the cause of the DR failure referring monitoring message and error log Please also confirm the XCP version A 32 SPARC Enterprise Mx000 Servers Dynamic Reconfigu
63. ce 2 9 2 1 2 System Board Configuration Requirements 2 10 2 1 3 System Board Pool Function 2 10 2 1 4 Checklists for System Configuration 2 11 2 1 5 Reservation of Domain Configuration Changes 2 12 2 2 Conditions and Settings Using XSCF 2 13 2 2 1 Conditions Using XSCF 2 13 2 2 2 Settings Using XSCF 2 13 2 2 2 1 Configuration Policy Option 2 14 2 2 2 2 Floating Board Option 2 14 2 2 2 3 Omit memory Option 2 15 2 2 2 4 Omit I O Option 2 16 2 3 Conditions and Settings Using Oracle Solaris OS 2 16 2 3 1 I O and Software Requirements 2 16 2 3 2 Settings of Kernel Cage Memory 2 17 2 3 3 Setting of Oracle Solaris Service Management Facility SMF 2 18 2 4 Status Management 2 18 2 4 1 Domain Status 2 18 2 4 2 System Board Status 2 19 2 4 3 Flow of DR Processing 2 21 2 4 3 1 Flowchart Adding a System Board 2 21 2 4 3 2 Flowchart Deleting a System Board 2 22 2 4 3 3 Flowchart Moving a System Board 2 23 2 4 3 4 Flowchart Replacing System Board 2 25 2 5 Operation Management 2 27 2 5 1 I O Device Management 2 27 2 5 2 Swap Area 2 27 2 5 2 1 Swap Area at System Board Addition 2 27 2 5 2 2 Swap Area at System Board Deletion 2 27 2 5 3 Real time Processes 2 28 2 5 4 Memory Mirror Mode 2 28 vi SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 2 5 5 2 5 6 2 5 7 2 5 8 2 5 9 Capacity on Demand COD 2 29 XSCF Failover 2 29 Kernel Memory Board Deletion 2 29 Deletion of Board with CD RW DVD RW Drive 2 30
64. ces 8 command on the XSCF and the swap 1M command on the Oracle Solaris OS In this example the system board to be deleted contains physical memory and a disk has been attached to it to provide swap space A disk that is attached to another system board provides additional swap space This example is based on the following swap space size and physical memory size Most of the swap space in the system is still available and the system board can be safely deleted a Swap area of the entire domain 4GB m Swap area of the system board to be deleted 1GB m Physical memory of the system board to be deleted 2GB 1 Execute the showdevices 8 command on the XSCF to show the resources of the system board XSB 00 0 to be deleted This command displays the total physical memory on the board and the I O devices that are attached B 1 XSCF gt showdevices 00 0 CPU DID XSB id stat speed cache 00 00 0 40 on line 2048 4 00 00 0 41 on line 2048 4 00 00 0 40 on line 2048 4 00 00 0 41 on line 2048 4 Memory board perm base domain target deleted remaining DID XSB mem MB mem MB address mem MB XSB mem MB mem MB 00 00 0 2048 0 0x0000000000000000 4096 IO Devices DID XSB device resource usage 00 00 0 sdo dev dsk c0t0d0s1 swap area Notice in the Memory section that 2048 MB 2GB of physical memory is on this board And in the I O Devices section the dev dsk c0t3d0s1 disk contains a configured swap space 2 On the
65. changed a You want to avoid changing the current domain configuration settings and change the configuration immediately after the domain is rebooted when necessary to delete a system board having a driver or PCI card that does not support DR m You want to assign a floating board to a specific domain beforehand to prevent the system board from being acquired by another domain For how to reserve domain changes see Section 3 1 10 Reserving a Domain Configuration Change on page 3 25 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 22 22l 2 22 Conditions and Settings Using XSCF This section describes the operating conditions required for XSCF to start DR operations and the settings that are established by XSCF Conditions Using XSCF The DR operation to add a system board cannot be executed when the system board has only been mounted The DR operation is enabled by registering the system board in the DCL by using the XSCF shell or XSCF Web You must confirm that the system board to be added is registered in the DCL before performing the DR operation As a matter of course system boards to be deleted moved or replaced have already been registered in the DCL You need not confirm that these boards have been registered in the DCL For details about the DCL and how to register system boards in the DCL and to confirm registration see the SPARC Enterprise M3000 M4000 M5000 M8000 M900
66. complete example Real time Processes The Oracle Solaris OS is temporarily suspended when a kernel memory board is deleted or moved If your system has any real time requirements such as might be indicated by the presence of real time processes be aware that such a DR operation could significantly affect these processes Memory Mirror Mode The memory mirror mode is a function used to duplex memory to ensure the hardware reliability of memory When memory mirror mode is enabled the domain can continue operation even if a fault occurs in a part of memory provided that the fault is recoverable Memory mirror mode cannot be set in some division types of PSB For more information see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF User s Guide Enabling memory mirror mode does not restrict any DR functions However you must consider the domain configuration and operation when enabling memory mirror mode SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 20 0 2 5 6 AI For example when a kernel memory board with memory mirror mode enabled is deleted or moved kernel memory is moved from the kernel memory board to another system board Kernel memory is moved normally even if memory mirror mode is disabled for the move destination system board However this operation results in lowered reliability of memory on the new kernel memory board You must properly plan and decide
67. ct customer service Irrecoverable FMEM error lt error code gt Explanation Internal error during kernel migration Remedy Please contact customer service scf fmem request failed error code O0x lt error code gt Appendix A Message Meaning and Handling A 3 Explanation Internal error during kernel migration Remedy Please contact customer service scf_fmem_end failed rv 0x lt error code gt Explanation Internal error during kernel migration Remedy Please contact customer service CPU nn hang during Copy Rename Explanation A fatal HW error was encountered during copy rename Remedy Please contact customer service A 1 3 Warning Messages megabytes not available to kernel cage Explanation Lack of memory resource deleted Remedy Detach the board then attach it again IKP init failed Explanation The initial device tree walk to locate the nodes that are interesting to IKP fails Remedy Please contact customer service dr failed to alloc soft state Explanation Failed to allocate soft state due to lack of the memory resource Remedy Repeat the action If this error message appears again please contact customer service dr module not yet attached Explanation Failed to attach the DR driver Remedy Repeat the action If this error message appears again please contact customer service dr_add_memory_spans unexpected kphysm_add_memory_dynamic return value X basepfn Y npag
68. d System Boards on Which Kernel Memory is Loaded Before determining the hardware configuration and operations you must understand how job processes are affected by DR operations on system boards on which CPUs memory and I O devices are mounted You can perform DR operations on system boards that contain kernel memory When disconnecting a system board on which kernel memory is loaded DR copies kernel memory into the memory on another system board The copy operation is based on the premise that the copy destination system board does not already contain any kernel memory When kernel memory is copied the Oracle Solaris OS is temporarily suspended Therefore you must understand the effect of disconnecting the network connection with remote systems and other influences of the DR operation on job processes before determining system operations Reservation of Domain Configuration Changes Besides letting you add delete or move system boards dynamically DR also lets you order such reconfiguration to take place the next time the affected domains are turned on or turned off or the domain is rebooted Use the addboard 8 deleteboard 8 or moveboard 8 command with the c reserve option to specify these actions Some of the reasons you might want to reserve a domain change include a A hardware resource cannot be dynamically reconfigured by DR for business or operational reasons m Domain configuration settings should not be immediately
69. d ends normally execute the showdc1 8 command to check the operation status of the domain and then execute the showboards 8 command to check the status of the deleted system board If the deleteboard 8 command completes abnormally or leaves the board in an unwanted status refer output messages to identify the problem then correct it XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 01 0 XSCF gt showboards a XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 0 00 00 Assigned y y y Passed Normal 01 0 SP Available y n n Passed Normal 4 10 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 4 4 Example Moving a System Board This section provides an example of an operation to move a system board between domains In the example a procedure conforming to Section 4 1 3 Flow Moving a System Board on page 4 5 is used and the system board shown in the figure is moved using the XSCF shell FIGURE 4 7 Example Moving a System Board Domain 0 E Domain 1 Domain 0 XSB 00 0 xsB 01 0 XSB 00 0 KY Aoo Domain 1 ea i XSB 00 1 XSB 01 0 ie acetals we a Pe Perec 1 Login to XSCF 2 Check the status of the move source domain Execute the showdc1 8 command to display domain information and then check the operation status of the move source domain XSCF gt showdcl d 0 DID LSB XSB Status
70. dded system board If the addboard 8 command completes abnormally or leaves the board in an unwanted status see the output messages to identify the problem then correct it XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 01 0 Chapter 4 Practical Examples of DR 4 15 4 5 2 4 16 XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 01 0 00 01 Assigned y y y Passed Normal Example Replacing a Quad XSB System Board FIGURE 4 9 Example Replacing a Quad XSB System Board Domain 0 XSB 00 0 XSB 01 0 i XSB 01 1 I 1 Delete XSB 01 2 XSB 01 3 Add 1 Login to XSCF SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 2 Check the configurations and status of all domains to which the relevant system boards belong Execute the showdc1 8 command to display domain information and then check the configurations and operation status of all domains to which the relevant XSBs belong Based on the configurations and operation status of the domains determine whether to perform the DR operation or replace the replacement target system board after stopping the domains If a domain is configured by only the XSBs in the PSB to be replaced the DR operation for replacement is disabled and the domain must be stopped for replacement In this example domain 1 has a configuration that requires it
71. domain execute the swap 1M command with its 1 option specified to determine the size of the swap space configured on the disk swap 1 swapfile dev swaplo blocks free dev dsk c0t3d0s1 118 1 16 2097152 2097152 dev dsk c1t1d0s1 118 2 16 6291456 4109712 Notice that dev dsk c0t3d0s1 the disk to be deleted contributes 2097152 blocks Each block is 512 bytes so this disk contributes 1GB of swap space Moreover the domain has additional swap space available from dev dsk c1t1d0s1 a disk connected to another system board which contributes 6291456 blocks 3GB Thus the total available swap space is 4GB 3 Execute the swap 1M command with its s option to determine the total value of available swap space This amount could have been determined in the previous step but you can use the following command to get a brief summary of the details Swap s total 40096k bytes allocated 2200k reserved 42296k used 4152008k available Notice that most of the 4GB of total swap space is available When the system board is deleted 1GB of total swap space will be removed and the remaining available swap space will be nearly 3GB Therefore there is enough remaining swap space to allow this system board to be deleted B 2 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Index A Add 1 3 addboard 3 2 3 15 3 22 addfru 3 27 addition 1 6 2 21 2 27 3 15 4 3 4 7 Assig
72. ds v a XSB R DID LSB Assignment Pwr Conn Conf Test Fault COD 00 0 00 00 Assigned y y y Passed Normal n 00 1 00 01 Assigned y n n Passed Degraded n 00 2 SP Available y n n Unknown Normal n 00 3 01 15 Assigned y y y Passed Normal n a Example 3 Display of information on the system board in the system board pool in domain 0 XSCF gt showboards c sp d 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 2 SP Available y n n Passed Normal 3 1 4 Displaying Device Information Use the showdevices 8 command to display device information The showdevices 8 command displays information about the physical devices including CPUs memory and PCI cards mounted on system boards and displays the hardware resources usable with these devices in hardware resource format The showdevices 8 command is used before a DR operation to confirm information about and status of the hardware resources of the DR target system board and to determine the process to access the CPU and I O devices Resource management applications or subsystems provide information concerning use of the hardware resources A showdevices 8 command offline query about management target resources estimates the effect of each DR operation applied to the system boards and displays the results The following examples show the format and options of the showdevices 8 command showdevices v p bydevice byboard query force xs showdevices v p
73. e changed by using the XSCF command setupfru 8 Quad XSB may be used to describe a PSB division type or status 1 2 Basic DR Functions This section describes the basic DR functions FIGURE 1 3 shows DR processing FIGURE 1 3 DR Processing Flow ao NI E Y KANN N WW N AN NSS VAa NN NN N N g N N A N amp N Q YN TAER EE NER N ZIINEN Z IN EN N ENE NER E N OP A 5 N SNI SUNEN S N J N N N AN fgg SS WW Domain A Chapter 1 Overview of Dynamic Reconfiguration 1 5 L24 1 22 1 2 3 1 6 In the example shown in FIGURE 1 3 system board 2 is deleted from domain A and added to domain B In this way the physical configuration of the hardware mounting locations is not changed but the logical configuration is changed for management of the system boards Adding a System Board You can use DR to add a system board to a domain provided that board is installed in the system and not assigned to another domain You can do so without stopping the Oracle Solaris OS running in the domain A system board is added in such stages as connect and configure In the add operation the selected system board is connected to the target domain Then the system board is configured to the Oracle Solaris OS of the domain At this point addition of the system board is completed Deleting a System Board You can use DR to delete a sys
74. ease contact customer service Output Console and Standard Output no error dr 0 SBX memory Explanation There may be inconsistency in the system Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output Unrecognized platform command Explanation Invalid argument is passed to the driver or there may be inconsistency in the system A 10 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output Bad address dr 0 SBX memory Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Cannot read property value device node XXXXXX property name Explanation Fail to get the property from OBP Remedy Please contact customer service Output Console and Standard Output Cannot read property value property scf cmd reg Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Cannot find mc opl interface Explanation DR cannot locate mc opl driver s suspend resume interface mc opl is probably not loaded or incorrect version is used Remedy Please contact customer service Cannot find scf_fmem interface Explana
75. em board is in the system board pool or when the system board is not connected to the domain configuration 2 3 Zad 2 16 Conditions and Settings Using Oracle Solaris OS This section describes the operating conditions and settings required for DR operations I O and Software Requirements As described in Section 2 1 System Configuration on page 2 1 all I O device drivers and software installed in a domain where DR is to be used must support DR The device drivers that support DR must also support the following DDI and DKI entries attach 9E DDI_ATTACH and DDI_LRESUME detach 9E DDI _DETACH and DDI_SUSPEND If a device driver that does not support DR is present the deletion of a system board might fail SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 Vie po Even if the DDI_LDETACH interface is supported DDILDETACH processing fails when the relevant driver is in use Before starting the deletion of a system board you must stop using all devices on the system board to be deleted The device drivers that do not support DR must be unloaded before a system board is deleted To unload a device driver you must stop using all I O devices controlled by the device driver To unload a device driver you can use the Oracle Solaris command modunload 1M Then you can reload the driver for the remaining instances and resume using those remaining instances after deleting the system boa
76. em board number Reservation status of a system board URI is displayed for a system board when the board is reserved for addition deletion or a move Domain ID of the domain into which the system board is added and logical system board number SP is displayed for a system board that is in the system board pool Status of assignment to domain configuration Unavailable The system board is in the system board pool not assigned to a domain and its status is one of the following not yet diagnosed under diagnosis or diagnosis error All system boards that are not mounted are also shown as Unavailable Available The system board is in the system board pool and its diagnosis has completed normally Assigned The system board is assigned to the domain Power on off status of system board n Power off status The system board is powered off and cannot be used y Power on status The system board is powered on Status of connection to domain configuration n Disconnected status The system board is disconnected from the relevant domain configuration or in the system board pool y Connected status The system board is connected to the relevant domain configuration Status of addition into Oracle Solaris OS n Unconfigured status The hardware resources of the system board have been deleted from the Oracle Solaris OS y Configured status The hardware resources of the system board have been added into the Oracle S
77. ember 2010 IKP destroy pci lt board gt lt channel gt lt leaf gt failed Explanation The node was not destroyed Remedy Please contact customer service IKP destroy pseudo mc lt board gt failed Explanation The node was not destroyed Remedy Please contact customer service IKP destroy chip lt board gt lt chip gt failed Explanation The node was not destroyed Remedy Please contact customer service dr_del_mlist_query mlist NULL Explanation The memory list to be deleted is NULL This warning is also shown at memoryless board Remedy Please ignore this message on memoryless boards If DR failed after this message please contact customer service dr_memlist_canfit memlist_dup failed Explanation System might have run out of memory Or there is a memoryless board Remedy Please ignore this message on memoryless boards If DR failed after this message please check if the system has enough memory resource and repeat the action If the error remains please contact customer service Cannot get floating boards proplen Explanation Failed to get property information of floating boards Remedy Please contact customer service Cannot get floating boards prop Explanation Failed to get property information of floating boards Remedy Please contact customer service Device node Ox lt dip gt has invalid property value board lt board gt Explanation The device node has invalid
78. epeat the action If this error message appears again please contact customer service Output Console and Standard Output Invalid board number X Explanation Invalid board number Remedy Check the board number and repeat the action If this error message appears again please contact customer service Output Console and Standard Output Kernel cage is disabled Explanation The kernel cage memory feature is disabled Remedy Ensure etc system is edited to enable kernel cage memory Output Console and Standard Output Appendix A Message Meaning and Handling A 13 Memory operation failed dr 0 SBX memory Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Memory operation refused dr 0 SBX memory Explanation The DR operation is refused Remedy Respond in the manner directed by the other message Memory operation cancelled dr 0 SBX memory Explanation The DR operation is canceled Remedy Respond in the manner directed by the other message No device s on board dr 0 SBX Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Non relocatable pages in span dr 0 SBX memory Explanation There is non relocatable kernel memory on the system board Remedy The target board with kernel memory cannot be disconnected by DR It
79. er of the move destination domain is turned on or the move destination domain is rebooted The move operation from the move source domain is performed and the system board is set to the state where it is assigned to the move destination domain when the domain power is off in both the move source domain and the move destination domain or the Oracle Solaris OS is not running in both domains Specifies that the command reserve a system board move in the move source domain The system board is deleted from the move source domain and assigned to the move destination domain when the power of move source domain is turned off or the move source domain rebooted The assigned system board is added to the move destination domain when the addboard 8 command is executed in the move destination domain the power of the move destination domain is turned on or the move destination domain is rebooted The move operation from the move source domain is performed and the system board is set to the state where it is assigned to the move destination domain when the domain power is off or the Oracle Solaris OS is not running in the move source domain Specifies the domain ID of the move destination domain where domain_id is the domain number possibly 0 to 23 depending on your server Only one domain ID can be specified Specifies the system board XSB number of the system board to be moved Specify xsb in the XX Y format XX 00 to 15 Y 0 to 3 The v
80. eration status of all related domains Based on the operation status of the domain determine whether to perform the DR operation or reboot the domains XSCF gt showdcl a DID LSB XSB Status 00 Running 00 00 0 01 01 0 02 01 1 OL Powered Off 00 01 2 01 01 3 10 Add the new system board to the domain Execute the addboard 8 command in the domain to add the new system board XSCF gt addboard c configure d 0 01 0 01 1 11 Check the status of the related domains and system boards Execute the showdc1 8 command to check the operation status of related domains and then execute the showboards 8 command to check the status of related system boards In this example domain 1 is booted by power on in this stage XSCF gt poweron d 1 XSCF gt showdcl a DID LSB XSB Status 00 Running 00 00 0 01 01 0 02 01 1 01 Running 00 01 2 01 01 3 Chapter 4 Practical Examples of DR 4 19 XSCF gt showboards a XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 0 00 00 Assigned y y y Passed Normal 01 0 00 01 Assigned y y y Passed Normal 01 1 00 02 Assigned y y y Passed Normal 01 2 01 00 Assigned y y y Passed Normal 01 3 01 01 Assigned y y y Passed Normal 4 6 Examples Reserving Domain Configuration Changes This section provides examples of operations to reserve a change in domain configuration by DR In the examples the XSCF shell is used to reserve the addition deletion and
81. erve deletion of the system board XSCF gt deleteboard c reserve 01 0 4 22 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 4 6 3 Check the reserved status of the system board Execute the showboards 8 command with the v option specified to display system board information and then confirm that deletion of the system board has been reserved XSCF gt showboards v 01 0 XSB R DID LSB Assignment Pwr Conn Conf Test Fault COD 01 0 00 01 Assigned y y y Passed Normal n 6 Stop or reboot the domain This operation changes the domain s configuration reserving deletion of the system board Example Reserving a System Board Move FIGURE 4 12 Example Reserving a System Board Move 1 2 Domain 0 Domain 1 Domain 0 XSB 00 0 XSB 00 0 XSB 01 0 1 1 1 1 1 J Move XSB 00 1 XSB 00 1 Login to XSCF Check the status of the move source domain Execute the showdc1 8 command to display domain information and then check the operation status of the move source domain XSCF gt showdcl d 1 DID LSB XSB Status 01 Running 00 01 0 Chapter 4 Practical Examples of DR 4 23 3 Check the status of the move destination domain Execute the showdc1 8 command to display domain information and then check the operation status of the move destination domain Based on the operation status of the
82. es Z Explanation There may be inconsistency in the system Remedy Please contact customer service A 4 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 dr_cancel_cpu failed to disable interrupts on cpu X Explanation Failed to disable interrupt on CPU X Remedy Disable interrupt on cpu X with psradm I and if this command fails again respond in the manner directed by command message dr_cancel_cpu failed to online cpu X Explanation Failed to online CPU X Remedy Repeat the action If this error message appears again please contact customer service dr_cancel_cpu failed to power on cpu X Explanation Failed to power on cpu X Remedy Repeat the action If this error message appears again please contact customer service dr_copyin_iocmd 32bit failed to copyin sbdcmd struct Explanation There may be inconsistency in the system Remedy Please contact customer service dr_copyin_iocmd failed to copyin options Explanation There may be inconsistency in the system Remedy Please contact customer service dr_copyin_iocmd failed to copyin sbdcmd struct Explanation There may be inconsistency in the system Remedy Please contact customer service dr_copyout_errs 32bit failed to copyout Explanation There may be inconsistency in the system Remedy Please contact customer service dr_copyout_errs failed to copyout Explanation There may be inconsiste
83. eserve move 4 23 S setdcl 3 2 setdomainmode 8 2 32 setdscp 3 27 setupfru 3 2 showboards 3 2 3 7 showdcl 3 2 showdevices 3 2 3 10 showdomainstatus 3 2 3 5 showdscp 3 27 showfru 3 2 3 13 Solaris OS 2 16 SPARC64 VI Compatible Mode 2 31 SPARC64 VII Enhanced Mode 2 31 swap area 2 12 2 27 system board 1 5 system board pool 2 10 system board status 2 19 3 7 system configuration 2 11 U Unassign 1 3 Unconfigure 1 4 Uni XSB 1 5 2 1 2 10 4 13 user memory board 2 8 X XSB 1 4 XSCF 2 13 XSCF Web 3 27 Index 2 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010
84. executing it p force Tests the detachability of the board by test running the DR command with the force flag without actually executing it xsb Specifies a system board XSB number Specify xsb in the XX Y format XX 00 to 15 Y 0 to 3 The value depends on your server d domain_id Specifies ID of the specified domain where domain_id is the domain number possibly 0 to 23 depending on your server Only one domain ID can be specified Chapter 3 DR User Interface 3 11 TABLE 3 10 Domain Information Displayed by the showdevices command Display items CPU Memory IO Devices Description CPU information DID XSB id state speed ecache usage Memory information DID XSB board mem perm mem base address domain mem target board deleted mem remaining mem I O device information DID XSB device resource usage query usage reason Domain ID System board number CPU ID CPU status CPU frequency MHz CPU cache size Megabyte MB Description of instance using resources Domain ID System board number Size of memory on system board MB Size of non relocatable kernel memory on system board MB Base physical address of memory on system board Size of memory in domain MB System board number of the system board whose kernel memory is drained Size of already deleted memory MB Size of remaining memory to be deleted MB Domain ID Syste
85. failed to read HWD header Explanation The header of the hardware descriptor could not be read Remedy Please contact customer service IKP create cpu lt board gt lt chip gt lt core gt lt cpu gt failed Explanation There was a problem creating the device node for a cpu Remedy Please contact customer service IKP create core lt board gt lt chip gt lt core gt failed Explanation There was a problem creating the device node for a core Remedy Please contact customer service IKP create chip lt board gt lt chip gt failed Explanation There was a problem creating the device node for a chip Remedy Please contact customer service IKP create pseudo mc lt board gt failed Explanation There was a problem creating the pseudo mc device node for the board Remedy Please contact customer service Appendix A Message Meaning and Handling A 17 opl_claim_memory unable to allocate contiguous memory of size Zero Explanation A claim request with size zero was issued by the fcode interpreter Remedy If DR failed after this message please contact customer service opl_claim_memory vhint is not zero vhint 0x lt vhint gt Ignoring Argument Explanation A claim request with a nonzero hint came from the fcode interpreter Remedy If DR failed after this message please contact customer service opl_claim_memory unable to allocate contiguous memory Explanation Memory allocatio
86. figuration does not support this operation Explanation Cannot execute the command in the current configuration or it is not supported Remedy Confirm the current hardware configuration and support status An internal error has occurred Please contact your system administrator Explanation DR failed There is a possibility that DR failed because of an internal error in XSCF Remedy Find out the cause of the DR failure referring monitoring message and error log Please also confirm the XCP version showdevices XSB s is not currently running Appendix A Message Meaning and Handling A 35 Explanation The system was not able to get some parameter for the XSB Remedy Confirm the information for the XSB via the showboards command cannot get device information from DomainID Explanation The system was unable to collect the requested information from the domain Remedy Confirm that the DSCP setting is correct confirm that the dsc process is running fine on the domain A 36 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 APPENDIX B Example Confirm Swap Space Size This example shows one way to analyze the physical memory on a system board in a SPARC Enterprise M4000 M5000 M8000 M9000 server from Oracle and Fujitsu to determine whether the system has enough swap space to support deletion of a board It explains how to collect and analyze information using the showdevi
87. figure are explained below The actual status after hardware replacement may not match the indicated status For the flow of system board addition processing or deletion processing and the related system board status see Section 2 4 3 1 Flowchart Adding a System Board on page 2 21 or Section 2 4 3 2 Flowchart Deleting a System Board on page 2 22 respectively For details of hardware replacement operations see the Service Manual for your server Chapter 2 What You Must Know Before Using DR 2 25 FIGURE 2 8 Flow of System Board Replacement Processing Deletion process q Deleting a system board Request to delete from Deletion of system boards also DCL registration status from system board pool DCL registration status I I l I Assignment assigned System board pool i I I l Assignment available N Z 7 N F at B 7 7 Replacement Replacement process process d Hardware replacement and diagnosis Replacement Replacement completed completed 7 N DCL registration status Systm board pool Test passed Assignment assigned Assignment available 4 i Test passed N 7 Addition Addition process oa process C Addition of system board 2 26 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 20 2951 2 5 2 2 5 2
88. firm its registration in the DCL Chapter 4 Practical Examples of DR 4 7 If you need to change the PSB configuration use the setupfru 8 command If the system board to be added is not registered in the DCL register the system board in the DCL of the target domain by using the setdc1 8 command XSCF gt showboards a XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 0 00 00 Assigned y y y Passed Normal 01 0 SP Available y n n Passed Normal 4 Add the new system board Execute the addboard 8 command to add the system board to the move destination domain XSCF gt addboard c configure d 0 01 0 5 Check the status of the domain and added system board When the addboard 8 command ends normally execute the showdc1 8 command to check the operation status of the domain and then execute the showboards 8 command to check the status of the added system board If the addboard 8 command completes abnormally or leaves the board in an unwanted status refer output messages to identify the problem then correct it XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 01 0 XSCF gt showboards d 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 0 00 00 Assigned y y y Passed Normal 01 0 00 01 Assigned y y y Passed Normal 4 8 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 4 3 Example Deleting a System Board This section prov
89. he deleteboard 8 command with the f option specified a serious problem may occur in a process bound to the CPU or accessing an I O device For this reason you should avoid using the option in normal DR operations If you must use the deleteboard 8 command with the f option specified be sure to check the status of the domain and application processes before and after execution Chapter3 DR User Interface 3 23 Note Note 4 To execute the addboard 8 command to add a system board by DR the system board must already be registered in DCL Use the showdc1 8 command to check whether a system board is registered in the DCL To register a system board in the DCL use the setdc1 8 command To replace hardware you must set the system board to the state where it is assigned to the domain or to the state where it is placed in the system board pool by using the deleteboard 8 command 3 24 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 3 1 10 Reserving a Domain Configuration Change Use the addboard 8 deleteboard 8 or moveboard 8 command to reserve a domain configuration change A domain configuration change is reserved when a system board cannot be added deleted or moved immediately for operational reasons The reserved addition deletion or move of the system board is executed when the power of the target domain is turned on or off or the domain rebooted If a system board
90. he DR functions classify system boards by memory usage into two types a Kernel memory board a User memory board Chapter 2 What You Must Know Before Using DR 2 5 2 6 1 Kernel Memory Board A kernel memory board is a system board on which kernel memory memory internally used by the Oracle Solaris OS and containing an OpenBoot PROM program is loaded Kernel memory cannot be removed from the system But the location of kernel memory can be controlled and kernel memory can be copied from one board to another To control whether a system board contains kernel memory use one or more of the following features which are described below kernel cage floating boards and kernel memory assginment m To copy kernel memory from one board to another use the Copy rename operation Copy rename makes it possible for you to perform DR operations on kernel memory boards 1 1 Kernel Cage The kernel cage function must be in use for DR operations on memory to succeed Without the kernel cage kernel memory could be assigned to all system boards making it impossible to perform DR operations on memory With the kernel cage kernel memory is limited to a minimum set of system boards For details on enabling this function see Section 2 3 2 Settings of Kernel Cage Memory on page 2 17 1 2 Floating Boards A floating board is a system board that is designated to be moved easily to another domain In general kernel memory is not ass
91. her the system board status permits DR operations and to confirm the domain ID of the domain to which the target system board belongs The showboards 8 command is also used after a DR operation to confirm system board status To change domain settings or register a system board in the DCL use the setdc1 8 command To change PSB settings use the setupfru 8 command The following examples show the format and options of the showboards 8 command showboards v a c sp showboards v d domain _id c sp showboards v xsb showboards h TABLE 3 7 Options of the showboards Command Option Description v Displays detailed information about the system board a Displays information about all mounted system boards h Displays the usage information d domain_id Displays information about the specified domain where domain_id is the domain number possibly 0 to 23 depending on your server Only one domain ID can be specified xsb Displays information about the specified XSB Specify xsb in the XX Y format XX 00 to 15 Y 0 to 3 The value depends on your server c sp Displays information about system boards in system board pool Chapter 3 DR User Interface 3 7 The table below lists the items displayed by the showboards 8 command TABLE 3 8 Items of System Board Information to be Displayed Display items Description XSB R DID LSB Assignment Pwr Conn Conf Syst
92. his appendix explains the meaning and handling of DR related messages This appendix includes these sections Section A 1 Oracle Solaris OS Messages on page A 1 Section A 2 Command Messages on page A 24 A 1 A 1 1 Oracle Solaris OS Messages This section explains the console messages printed by the DR driver The output for messages that do not have an output field is console Transition Messages DR PROM detach board X Explanation Detach system board X OS configure dr 0 SBX cpuY Explanation Configure CPU Y on system board X OS configure dr 0 SBX memory Explanation Configure memory on system board X OS configure dr 0 SBX pciY Explanation Configure PCI Y on system board X A 1 OS unconfigure dr 0 SBX cpuY Explanation Unconfigure CPU Y on system board X OS unconfigure dr 0 SBX memory Explanation Unconfigure memory on system board X OS unconfigure dr 0 SBX pciY Explanation Unconfigure PCI Y on system board X suspending lt device name gt lt device info gt aka lt alias gt Explanation Suspending the device suspending lt device name gt lt device info gt Explanation Suspending the device resuming lt device name gt lt device info gt aka lt alias gt Explanation Resuming the device resuming lt device name gt lt device info gt Explanation Resuming the device DR resuming kernel daemons Explanation Resum
93. ications and releasing devices from applications For details of how to register RCM scripts and script execution timing see the Oracle Solaris man page for remscript A4 Chapter 3 DR User Interface 3 27 Note Note 1 An RCM script can only automate actions performed to prepare for the deletion of a system board When a system board is added to a domain any actions required for use of the added resources must be manually performed Note Note 2 You should test the RCM scripts you create for DR before executing the DR operations The RCM scripts may not be able to execute certain processing 3 28 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 CHAPTER 4 Practical Examples of DR This chapter provides examples of DR operations such as the addition deletion move and replacement of system boards Each example shows an operation procedure using the command line interface of the XSCF shell Similar procedures can also be applied to DR operations using the browser based interface of the XSCF Web Note that the sections below explain only procedures such as those for checking the status of parts and devices for DR operations and not hardware operations e g installing removing and replacing system boards See the Service Manual for your server as needed Note If your server is configured with SPARC64 VII processors some restrictions regarding DR might apply Ple
94. ides an example of operation to delete a system board from a domain In the example a procedure conforming to Section 4 1 2 Flow Deleting a System Board on page 4 4 is used and the system board shown in the figure is deleted using the XSCF shell FIGURE 4 6 Example Deleting a System Board Domain 0 Domain 0 XSB 00 0 XSB 01 0 Gane XSB 00 0 XSB 01 0 1 Login to XSCF 2 Check the status of the domain Execute the showdc1 8 command to display domain information and then check the operation status of the domain Based on the operation status of the domain determine whether to perform the DR operation or change the domain configuration XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 01 0 Chapter 4 Practical Examples of DR 4 9 3 Check the status of the system board to be deleted Execute the showboards 8 command to display system board information and then check the status of the system board to be deleted XSCF gt showboards a XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 0 00 00 Assigned y y y Passed Normal 01 0 00 01 Assigned y y y Passed Normal 4 Delete the system board Execute the deleteboard 8 command to delete the system board and pool it in the system board pool XSCF gt deleteboard c unassign 01 0 5 Check the status of the domain and deleted system board When the deleteboard 8 comman
95. ified LSB has already been registered in DCL Remedy Confirm the domain LSB and XSB Setup data correctly LSB 00 has not been registered in DCL yet Explanation The domain and LSB weren t set up when the DCL of no mem no io and floating board was changed Remedy Set up the domain and LSB Try again Appendix A Message Meaning and Handling A 33 A 2 5 A 34 DomainID X does not exist Explanation No LSB was set up on the domain when the DCL of configuration policy was changed Remedy Set up the domain and LSB Try again Invalid parameter Explanation There is an error in the specified argument or operand Remedy Confirm the specified argument or operand and execute the command once again Permission denied Explanation Do not have privilege Remedy Confirm the user privilege and the command privilege In the case of high end servers please also confirm whether command is executed by XSCF on standby side An internal error has occurred Please contact your system administrator Explanation DR failed There is a possibility that DR failed because of an internal error in XSCF Remedy Find out the cause of the DR failure referring monitoring message and error log Please also confirm the XCP version setupfru SB XX is currently in use Explanation Because the system board of the PSB is running on the domain or is assigned PSB configuration cannot be changed Remedy Please
96. igned to a floating board unless absolutely necessary However kernel memory can be assigned to a floating board when one of the following is true a The total amount of space available among non floating boards is not enough to hold the kernel memory m The deleteboard 8 command is used with its f force option For details on enabling the floating board option for a system board see Section 2 2 2 2 Floating Board Option on page 2 14 For further details alse see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF User s Guide or the setdc1 8 man page SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 1 3 Kernel Memory Assignment When a domain is powered on the Power On Self Test POST initially assigns an address space to each system board in that domain The order in which address spaces are assigned depends on the LSB number and floating board option of each system board The first address spaces are assigned to non floating boards in ascending order of LSB number Then additional address spaces are assigned to floating boards again in ascending order of their LSB numbers When the kernel cage is enabled kernel memory is assigned to system boards in the order of their address spaces The kernel cage begins in the first address space which initially corresponds to the non floating board with the lowest LSB number If the kernel requires more memory then the kerne
97. iled to start CPU Y on system board X Remedy Please contact customer service Output Console and Standard Output Failed to stop CPU dr 0 SBX cpuY Explanation Failed to stop CPU Y on system board X Remedy Please contact customer service Output Console and Standard Output Firmware deprobe failed SBX cpuY Explanation Failed to deprobe the CPU Remedy Please contact customer service Output Console and Standard Output Firmware probe failed SBX Explanation Failed to probe the board SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Remedy Respond in the manner directed by the other message Output Console and Standard Output Insufficient memory dr 0 SBX memory Explanation Detected lack of memory resource Remedy Check the size of memory detach the board and attach again If the problem still exists please contact customer service Output Console and Standard Output Internal error dr c Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Internal error dr_mem c Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Invalid argument dr 0 SBX memory Explanation The memory board X is currently involved in other DR operation and cannot be detached Remedy R
98. ination domain and moved system board Execute the showdc1 8 command to check the operation status of the move destination domain and then execute the showboards 8 command to check the status of the system board and confirm that addition of the system board has been reserved in the move destination domain XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 00 1 02 01 0 XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 9 Add the system board to the move destination domain Execute the addboard 8 command to add the system board to the move destination domain If the move destination domain is in stopped status the system board will be added the next time the domain is booted XSCF gt addboard c configure d 0 01 0 10 Check the status of the move destination domain and moved system board Execute the showdc1 8 command to check the operation status of the move destination domain and then execute the showboards 8 command to check the status of the moved system board XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 00 1 02 01 0 XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 01 0 00 02 Assigned y y y Passed Normal Chapter 4 Practical Examples of DR 4 25 4 26 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 APPENDIX A Message Meaning and Handling T
99. ing kernel daemons DR resuming user threads Explanation Resuming user threads DR suspending user threads Explanation Suspending user threads DR resume COMPLETED Explanation DR resume operation completed DR checking devices Explanation Checking if there are any DR unsafe device drivers loaded DR dr_suspend invoked with force flag A 2 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 A 1 2 Explanation User command requests DR operation without checking for unsafe conditions DR suspending drivers Explanation Suspending device drivers DR in kernel unprobe board lt board gt Explanation Unprobing the board PANIC Messages URGENT_ERROR_TRAP is detected during FMA Explanation A fatal HW error was encountered during copy rename Remedy Please contact customer service Failed to remove CMP X LSB NN Explanation There may be inconsistency in the system Remedy Please contact customer service drmach_copy_rename_fini invalid op code lt opcode gt Explanation Internal error happened during kernel migration Remedy Please contact customer service Cannot locate source or target board Explanation Cannot locate source or target board during kernel migration Remedy Please contact customer service Could not update device nodes Explanation Could not update device nodes during kernel migration Remedy Please conta
100. ing monitoring message and error log Replace the failure component An internal error has occurred Please contact your system administrator Explanation DR failed There is a possibility that DR failed because of an internal error in XSCF Remedy Find out the cause of the DR failure referring monitoring message and error log Please also confirm the XCP version Timeout detected during self test of XSBHXX X Explanation Because the hardware diagnosis in DR did not complete a timeout occurred There is a possibility that a hardware error occurred Remedy Find out the cause of the DR failure referring monitoring message and error log Replace the failure component deleteboard XSB XX X will be unassigned from domain immediately Continue y n Explanation Confirming whether DR operation is going to be executed or not to execute it and n to stop it Woo Input y XSB XX X will be unconfigured from domain immediately Continue y n Explanation Confirming whether DR operation is going to be executed or not to execute it and n to stop it Wom Input y XSB XX X will be unassigned from domain after the domain restars Continue y n Explanation Confirming whether DR operation is going to be executed or not to execute it and n to stop it Wom Input y Appendix A Message Meaning and Handling A 27 DR operation canceled by operator Explanation DR operation canceled
101. ion specified be sure to check the status of the domain and application processes Note Note 4 You can execute the deleteboard 8 command on a domain that is not running When the domain is running the deleteboard 8 command with c disconnect or c unassign will succeed only if the following Oracle Solaris Service Management Facility SMF services are active on that domain Domain SP Communication Protocol dscp Domain Configuration Server dcs Oracle Sun Cryptographic Key Management Daemon sckmd Moving a System Board Use the moveboard 8 command to delete a system board from the move source domain and add it to the move destination domain assign it to the move destination domain or reserve it to be moved later To execute the moveboard 8 command the system board must have been configured in or assigned to the move source domain and be registered in the DCL for the move destination domain Chapter3 DR User Interface 3 19 3 20 Use the showdc1 8 command to check whether a system board is registered in the DCL To register a system board in the DCL use the setdc1 8 command Before executing the moveboard 8 command check the status of the move source and move destination domains and move target system board and the device usage status on the system board You must determine whether you can perform the DR operation according to the status of the domains and system board and the device usage stat
102. ions regarding DR might apply Please see Section 2 5 9 SPARC64 VII SPARC64 VII and SPARC64 VI Processors and CPU Operational Modes on page 2 30 3 1 3 1 1 3 2 XSCF shell commands for DR operations are classified into two types DR display and DR operation commands TABLE 3 1 DR Display Commands Command name Function showdc1 Display the DCL and domain status showdomainstatus Display domain status showboards Display system board information showdevices Display information about the CPUs memory and I O devices on system boards showfru Display PSB configuration information TABLE 3 2 DR Operation Commands Command name Function setdcl Update and edit the DCL setupfru Set the division type and memory mirror mode for a PSB addboard Add a system board to a domain deleteboard Delete a system board from a domain moveboard Move a system board between domains The sections below describe the DR display and DR operation commands in detail and show examples For details of the options operands and usage of these commands see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF Reference Manual Note Use of the user interfaces with XSCF shell and XSCF Web is restricted to selected administrators and requires administrator privileges for DR operations When system boards are shared by multiple administrators the administrators must carefully prepare and plan secure DR operati
103. is placed in the system board pool a domain configuration change can be reserved to assign the system board to the intended domain in advance preventing the system board from being acquired by another domain To reserve the addition of a system board to a domain use the addboard 8 command with the c reserve option specified The system board will be added to the domain when the domain power is turned on the domain is rebooted or the next time the addboard 8 command with the c configure option specified is executed For details about the addboard 8 command see Section 3 1 6 Adding a System Board on page 3 15 To reserve the deletion of a system board from a domain use the deleteboard 8 command with the c reserve option specified The system board will be deleted from the domain when the domain power is turned off the domain is rebooted or the next time the deleteboard 8 command with the c disconnect or c unassign option specified is executed For details about the deleteboard 8 command see Section 3 1 7 Deleting a System Board on page 3 17 To reserve a system board move in a domain to another domain use the moveboard 8 command with the c reserve option specified The system board will be deleted from the move source domain and moved to the move destination domain when the power of the move source domain is turned off the move destination domain is rebooted or the next time the moveboard 8 command with the c
104. l cage expands to the next address space which initially corresponds to the non floating board with the next lowest LSB number and so on The kernel cage extends into the address spaces of floating boards only if kernel memory is too large to fit in the address spaces of the non floating boards Note During a copy rename operation the address spaces initially assigned by POST are exchanged between system boards The effects of this process persist through reboots of a domain Therefore kernel memory may be assigned in a seemingly different order until the domain has gone through a full powerof 8 and poweron 8 cycle as this pair of operations cancels the effects of copy rename operations For details on assigning LSB numbers to system boards see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF User s Guide or the setdc1 8 man page 1 4 Copy rename Kernel memory itself cannot be removed but it can be transferred to another system board A DR operation to delete a kernel memory board must first perform this transfer which is called a copy rename operation The Oracle Solaris OS selects the target for the copy rename operation from among the available user memory boards The following selection and preference criteria are in effect m The copy destination board must not yet contain any kernel memory It must be a user memory board m The copy destination board must not be a floating board unless
105. m board number Instance name and number of I O device Management resource name Description of resource usage Results of estimation with an offline query Description of resource usage and reason for the results of estimation with an offline query The following example shows a display by the showdevices 8 command SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 3 1 5 a Example Display of device information on XSB00 0 CPUs mem MB 10 1 1 1 00 00 0 8192 DID XSB id state 00 00 0 0 on line 2048 4 00 00 0 2 on line 2048 4 board perm remaining DID XSB mem MB mem MB address I O Devices DID XSB device 00 00 0 sdo 00 00 0 sdo 00 00 0 sdo 00 00 0 bge0 2048 XSCF gt showdevices 00 0 speed ecache base 0x000003c000000000 resource dev dsk c0t0d0s0 dev dsk c0t0d0s1 dev dsk c0t0d0s1 SUNW_network bge0 domain target deleted mem MB XSB mem MB 65536 usage mounted filesystem swap area dump device swap bge0 hosts IP addresses Displaying System Board Configuration Information Use the showfru 8 command to display system board configuration information The showfru 8 command displays information about the PSB division type and memory mirroring mode settings in list format To change the PSB configuration use the setupfru 8 command The following examples show the format and options of the showfru 8
106. mem unit X Y deleted memory still found in phys_install Explanation There may be inconsistency in the system Remedy Please contact customer service dr_release_mem_done unexpected kphysm_del_release return value Explanation There may be inconsistency in the system Remedy Please contact customer service Appendix A Message Meaning and Handling A 7 dr_reserve_mem_spans memory reserve failed Unexpected kphysm_del_span return value basepfn npages Explanation The selected target board can no longer fit all the kernel memory of the source board since it was last selected Remedy Please repeat the action If the problem remains please contact customer service dr_release_mem_done lt device path gt error lt error code gt noted Explanation Error noted for a device during releasing memory Remedy Please contact customer service drmach_log_sysevent failed rv for SBX Explanation There may be minor error in the system Remedy Please contact customer service unexpected kcage_range_add return value Explanation There may be inconsistency in the system Remedy Please contact customer service unexpected kcage_range_delete return value Explanation There may be inconsistency in the system Remedy Please contact customer service dr_select_mem_target no memlist for mem unit X board Y Explanation Detected inconsistency of the memory unit information in the DR driver
107. move source and move destination domains determine whether to perform the DR operation or change the domain configuration XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 00 1 02 01 0 4 Check the status of the system board to be moved Execute the showboards 8 command to display system board information and then check the status of the system board to be moved XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test 5 Reserve the move of the system board Execute the moveboard 8 command to reserve deletion of the system board from the move source domain and addition of the system board to the move destination domain XSCF gt moveboard c reserve d 0 01 0 6 Check the reserved status of the system board Execute the showboards 8 command with the v option specified to display system board information and confirm that moving the system board to the move destination domain has been reserved XSCF gt showboards v 01 0 R DID LSB Assignment Conn Conf Test 7 Stop the move source domain Stop the move source domain This operation executes the reserved deletion of the system board from the move source domain as a change in domain configuration and the reservation of the addition of the system board to the move destination domain 4 24 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 8 Check the status of the move dest
108. n 1 3 B Basic DR Terms 1 3 Cc Capacity on Demand 2 29 configuration policy 2 14 Configure 1 3 Copy rename 2 7 CPU 2 4 CPU operational modes 2 31 cpumode 2 32 cpumode auto 2 32 cpumode compatible 2 32 D DCL 1 3 2 11 degradation 2 14 Delete 1 3 deleteboard 3 2 3 17 3 22 deletefru 3 27 deletion 1 6 2 22 2 27 3 17 4 4 4 9 device information 2 27 3 10 division type 1 5 2 10 3 13 domain component list 1 3 domain status 2 18 3 2 3 5 DR functions 1 1 1 5 E eXtended System Board 1 4 eXtended System Control Facility XSCF 1 7 F Floating Boards 2 6 2 14 l I O device 2 9 2 16 2 27 Install 1 4 Intimate Shared Memory 2 8 IO board unit 1 4 ISM 2 8 K Kernel Cage 2 6 kernel cage memory 2 17 kernel memory 2 12 Kernel Memory Assignment 2 7 kernel memory board 2 6 L Logical System Board 1 4 LSB 1 4 M memory 2 5 Index 1 memory mirror mode 2 28 memory mirroring mode 3 13 Move 1 3 move 1 6 2 23 3 19 4 5 4 11 moveboard 3 2 3 19 O omit I O 2 16 omit memory 2 15 P Physical System Board 1 4 poweroff 3 27 poweron 3 27 prtdiag 1M 2 32 PSB 1 4 Q Quad XSB 1 5 2 1 2 10 4 16 R RCM Script 3 27 real time processes 2 28 Register 1 3 Release 1 3 Remove 1 4 Replace 1 4 replacefru 3 27 replacement 1 7 3 22 4 13 reservation 2 12 3 25 Reserve 1 4 reserve addition 4 20 reserve deletion 4 22 r
109. n failed for the fcode interpreter Remedy If DR failed after this message please contact customer service opl_get_fcode Unable to copy out fcode image Explanation Failed to copy out the fcode image to the efcode daemon Remedy If DR failed after this message please contact customer service opl_get_hwd_va Unable to copy out cmuch descriptor for lt addr gt Explanation Failed to copy out the cmuch HWD to the efcode daemon Remedy If DR failed after this message please contact customer service opl_get_hwd_va Unable to copy out pcich descriptor for lt addr gt Explanation Failed to copy out the pcich HWD to the efcode daemon Remedy If DR failed after this message please contact customer service IKP create leaf lt board gt lt channel gt lt leaf gt failed Explanation A device node was not created for a PCI device Remedy If DR failed after this message please contact customer service IKP Unable to probe PCI leaf lt board gt lt channel gt lt leaf gt Explanation The fcode interpreter returned a bad status for the probe Remedy If DR failed after this message please contact customer service IKP Unable to bind PCI leaf lt board gt lt channel gt lt leaf gt Explanation The driver binding fails after the leaf has been probed Remedy If DR failed after this message please contact customer service SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide Dec
110. n if configuration conditions have been met System Board Pool Function The system board pooling function places a specific system board in the status where that board does not belong to any domain SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 2 1 4 This function can be effectively used to move a system board among multiple domains as needed For example a system board can be added from the system board pool to a domain where CPU or memory has a high load When the added system board becomes unnecessary the system board can be returned to the system board pool All system boards that are targets of DR operations must be registered in the target domain s Domain Component List DCL A domain s DCL managed by XSCF is a list of system boards that are or are to be attached to that domain The DCL of each domain contains not only information of registered system boards but also domain information and option information of each system board Moreover a system board that is pooled can be assigned to a domain only when it is registered on DCL Pooled system boards must be properly managed You can add and delete system boards by combining the system board pooling function with the floating board omit memory and omit I O options described in Section 2 2 Conditions and Settings Using XSCF on page 2 13 Checklists for System Configuration This section describes the prerequisites a
111. n when the power supply of the domain is turned off TABLE 2 1 Unit of Degradation Value Unit of degradation FRU Hardware is degraded in units of components such as CPU and memory XSB Hardware is degraded in units of system boards XSB System Hardware is degraded in units of domains or the relevant domain is stopped without degradation Floating Board Option The floating board option controls kernel memory allocation Upon deletion of a system board on which kernel memory is loaded the OS is temporarily suspended The suspended status affects job processes and may disable DR operations To avoid this problem use the floating board option to set the priority of kernel loading into the memory of each system board which increases the likelihood of successful DR operations To move a system board among multiple domains this option can be enabled for the system board to facilitate the system board move The value of this option is true to enable the floating board setting or false to disable the floating board setting The default is false A system board with true set for this option is called a floating board A system board with false set for this option is called a non floating board SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 22 233 Kernel memory is allocated to the non floating boards in a domain by priority in ascending order of LSB n
112. ncy in the system Remedy Please contact customer service dr_copyout_iocmd 32bit failed to copyout sbdcmd struct Explanation There may be inconsistency in the system Appendix A Message Meaning and Handling A 5 A 6 Remedy Please contact customer service dr_copyout_iocmd failed to copyout sbdcmd struct Explanation There may be inconsistency in the system Remedy Please contact customer service dr_status failed to copyout status for board Explanation There may be inconsistency in the system Remedy Please contact customer service dr_status unknown dev type Explanation There may be inconsistency in the system Remedy Please contact customer service dr_dev2devset invalid cpu unit Explanation Invalid argument is passed to the driver or there may be inconsistency in the system Remedy Repeat the action If this error message appears again please contact customer service dr_dev2devset invalid io unit Explanation Invalid argument is passed to the driver or there may be inconsistency in the system Remedy Repeat the action If this error message appears again please contact customer service dr_dev2devset invalid mem unit Explanation Invalid argument is passed to the driver or there may be inconsistency in the system Remedy Repeat the action If this error message appears again please contact customer service dr_exec_op unknown command
113. nd the checklists for configuring the system for DR 1 Redundant Configuration of I O Devices Before a system board can be replaced any I O device connected to that board must be temporarily disconnected You should use redundant configuration software to prevent any problem that might be caused by disconnection of an I O device that would affect a job process You should also confirm that the driver and software support DR before performing a DR operation 2 Selection of PCI Cards Supporting DR All PCI cards and I O device interfaces on a system board must support DR If not you cannot execute DR operations on that system board You must turn off the power supply to the domain before performing maintenance and installation 3 Confirmation of DR Compliance of Drivers and Other Software You must confirm that all I O device drivers and software installed in the system support DR and allow the I O device operations of DR You should also apply the latest patches to the drivers and other software before performing DR Chapter 2 What You Must Know Before Using DR 2 11 2 1 5 2 12 4 Allocation of Sufficient Memory and Distributed Swap Areas You must allocate sufficient memory resources to be used when the memory on a system board is disconnected Performing a DR operation with a high load already applied to memory may significantly lower job process performance and DR operability 5 Consideration of Hardware Configuration an
114. nt list hereinafter called DCL Release To delete a registered system board from the DCL Assign To assign a system board to a domain Unassign To release a system board from a domain Connect To connect a system board to a domain Disconnect To disconnect a system board from a domain Configure To configure a system board in the Oracle Solaris OS Chapter 1 Overview of Dynamic Reconfiguration 1 3 TABLE 1 1 Basic DR Terms Term Definition Unconfigure To unconfigure a system board in the Oracle Solaris OS Reserve To reserve a system board such that it is assigned to or unassigned from a domain on the next reboot or power cycle Install To insert a system board into a system Remove To remove a system board from a system Replace To remove a system board and then mount it or a new system board for system maintenance and inspection TABLE 1 2 Terms Related to Hardware Configurations Term Definition CPU Memory board Unit equipped with a CPU module and memory High end servers unit CMU only Motherboard Unit Unit for midrange servers A CMU is mounted on this board MBU Midrange servers only I O unit IOU Unit equipped with a PCI card and a disk drive unit Physical System The PSB is made up of physical parts and can include 1 CMU and 1 Board PSB IOU or just 1 CMU In midrange servers the CMU is mounted on a MBU A PSB also can be used to describe a physical unit for addition deletion exchange of ha
115. o available memory target dr 0 SBX memory Explanation The system board cannot be detached because it contains kernel memory and there is no available target memory board Remedy Add new system board and then try the detach operation again Output Console and Standard Output Unsafe driver present lt driver name major gt Explanation DR driver found DR unsafe drivers in the system Remedy Unload the unsafe drivers and try the DR operation again Output Console and Standard Output Device failed to resume lt driver name major gt Explanation Devices on the list failed to resume Remedy Please contact customer service Output Console and Standard Output Device failed to suspend lt driver name major gt Explanation Devices on the list failed to suspend Remedy Please contact customer service A 16 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Output Console and Standard Output Operation not supported ERROR Explanation Invalid operation Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output Cannot setup resource map opl fcodemem Explanation Resource memory mapping cannot be set up Remedy Please contact customer service opl_cfg failed to load error lt errno gt Explanation opl_cfg module failed to load Remedy Please contact customer service IKP
116. of the following CPU operational modes m SPARC64 VI Compatible Mode All processors in the domain behave like and are treated by the Oracle Solaris OS as SPARC64 VI processors The extended capabilities of SPARC64 VII and SPARC64 VII processors are not available in this mode Domains 1 and 2 in FIGURE 2 9 correspond to this mode m SPARC64 VII Enhanced Mode All boards in the domain must contain only SPARC64 VII or SPARC64 VII processors In this mode the server utilizes the extended capabilities of these processors Domain 0 in FIGURE 2 9 corresponds to this mode Chapter 2 What You Must Know Before Using DR 2 31 2 32 To check the CPU operational mode execute the prtdiag 1M command on the Oracle Solaris OS If the domain is in SPARC64 VII Enhanced Mode the output will display SPARC64 VII on the System Processor Mode line If the domain is in SPARC64 VI Compatible Mode nothing is displayed on that line By default the Oracle Solaris OS automatically sets a domain s CPU operational mode each time the domain is booted based on the types of processors it contains It does this when the cpumode variable which can be viewed or changed by using the setdomainmode 8 command is set to auto You can override the above process by using the setdomainmode 8 command to change the cpumode from auto to compatible which forces the OS to set the CPU operational mode to SPARC64 VI Compatible Mode on reboot To do so power off the dom
117. olaris OS 3 8 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 TABLE 3 8 Items of System Board Information to be Displayed Continued Display items Description Test Diagnostic status of system board Unmount The system board is not mounted or cannot be recognized because it is faulty Unknown The system board is not being diagnosed Testing testing Passed The system board was tested and passed Failed A system board error was tested and failed The system board cannot be used or has been degraded Fault Normal abnormal status of system board Normal Normal Degraded Components have been degraded but the system board is operating Degraded here means that a system board included in the corresponding component is faulty Failed The system board cannot operate because of an error COD Indication of whether the system board is a COD board n The board is not a COD board y The board is a COD board The following examples show displays of the showboards 8 command m Example 1 Display of information on all system boards XSCF gt showboards a XSB DID LSB Assignment Pwr Conn Conf Test Fault 00 0 00 00 Assigned y y y Passed Normal 00 1 00 01 Assigned y n n Passed Degraded 00 2 SP Available y n n Unknown Normal 00 3 01 15 Assigned y y y Passed Normal Chapter 3 DR User Interface 3 9 a Example 2 Display of detailed information on all system boards XSCF gt showboar
118. onnectivity connected PEO Connectivity connected i l l l I 1 7 Reboot of Deletion from domain after OS completed oe reservation w Ae Domain configuration Change process Test passed Test passed Assignment assigned Assignment a ssigned Disconnection Gil oe Connectivity connected from l Connectivity disconnected omain Domain configurationchange completed Deletion from DCL registration status I l Domain l Test passed Deletion Assignment assigned from l DCL N 7 n sss 4 2 4 275 44 i System board pool I I I Test passed Assignment available l 2 4 3 3 Flowchart Moving a System Board The flow of DR operations and the transition of system board status when a system board has been moved or reserved for a move are described in the schematic flowchart below Each system board status indicated in FIGURE 2 7 is the main status that is changed Chapter 2 What You Must Know Before Using DR 2 23 For the flow of system board addition processing or deletion processing and the related system board status see Section 2 4 3 1 Flowchart Adding a System Board on page 2 21 or Section 2 4 3 2 Flowchart Deleting a System Board on page 2 22 respectively FIGURE 2 7 Flow of System Board Move Processing Move process Move reservation process
119. ons Displaying Domain Information The showdc1 8 command displays domain information including the domain ID configured system board numbers and domain status in list format SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 The showdc1 8 command is used before a DR operation to determine whether the domain status permits DR operation and confirm the registration of the DR target system board in the DCL The showdc1 8 command is also used after a DR operation to confirm domain status and configuration To change domain settings or register a system board in the DCL use the setdc1 8 command To change PSB settings use the setupfru 8 command The following examples show the format and specifiable options of the showdc1 8 command showdcl v a showdcl v d domain_id 1 lsb 1 l1sb showdcl h TABLE 3 3 Options of the showdcl Command Option Description a Displays configuration information and status of all domains v Displays detailed domain configuration information h Displays usage information d domain_id Displays information about the specified domain where domain_id is the domain number possibly 0 to 23 depending on your server Only one domain ID can be specified 1 Isb Displays information about the specified logical system board LSB numbered 00 to 15 For information about multiple LSBs list board numbers separated by a space Fo
120. or inconsistency in the system Remedy Please contact customer service Unable to detach last available TOD on board X Explanation Detaching the system board will result in detaching the last available Time of Date clock Remedy Attach another system board before detaching Device in fatal state Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Appendix A Message Meaning and Handling A 9 I O error dr 0 SBX memory Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Invalid argument Explanation Invalid argument is passed to the driver or there may be inconsistency in the system Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output Invalid argument Explanation Invalid argument is passed to the driver Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output Invalid CPU core state Explanation DR finds some faulty CPU that fails to power on Remedy Please contact customer service No error Explanation Invalid argument is passed to the driver or there may be inconsistency in the system Remedy Repeat the action If this error message appears again pl
121. ource domain and move it to the move destination domain This option for normal DR operations must not be used A faulty system board or a system board where a fault is detected will not be forcibly added to the destination domain v Displays messages about the progress of this DR operation If the option is specified with the q option the v option is ignored h Displays the usage information SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 TABLE 3 15 Options of the moveboard Command Continued Option c configure c assign c reserve d domain_id xsb Description Specifies that the command delete a system board from the move source domain and adds it to the move destination domain If no other c option is specified c configure is the default The move operation from the move source domain is performed when the domain power is off or the Oracle Solaris OS is running in the move source domain However if the domain power is off or the Oracle Solaris OS is not running in the move destination domain the move operation from the move source domain is not performed and DR processing terminates with an error Specifies that the command delete a system board from the move source domain and assign it to the move destination domain The assigned system board is added to the move destination domain when the addboard 8 command is executed in the move destination domain the pow
122. pecified Displays usage information The table below lists the items displayed by the showdomainstatus 8 command TABLE 3 6 Items of Domain Information to be Displayed Display items Description DID Domain ID Status Domain status Powered Off Initialization Phase OpenBoot Executing Completed Booting OpenBoot PROM prompt Running Shutdown Started Panic State Domain power is off POST processing or OpenBoot PROM initialization is in progress Initialization by OpenBoot PROM is completed Oracle Solaris OS is being booted or due to the domain shutdown or reset the system is in the OpenBoot PROM running state or is suspended in the OpenBoot PROM ok prompt state Oracle Solaris OS is running Oracle Solaris OS is being shut down Oracle Solaris OS panic occurred The following example shows a display of the showdomainstatus 8 command m Example Display of information on all domains DID 00 01 02 03 XSCF gt showdomainstatus Domain Status Running Powered Off Running 3 6 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 3 1 3 Displaying System Board Information The showboards 8 command displays system board information including the domain ID of the domain to which the target system board belongs and various kinds of system board status in list format Use the showboards 8 command before a DR operation to determine whet
123. property value Remedy Please contact customer service DR IKP initialization failed Appendix A Message Meaning and Handling A 19 Explanation IKP initialization failed Remedy Please contact customer service I O callback failed in pre release Explanation I O callback failed in pre release Remedy Please contact customer service I O callback failed in post attach Explanation I O callback failed in post attach Remedy Please contact customer service Kernel Migration fails 0x x Explanation Internal error happened during kernel migration Remedy Please contact customer service Failed to add CMP d on board d Explanation CPU failed to power on during DR attach Remedy Please contact customer service FMEM error Ox lt error code gt Explanation DR detects error during the copy rename operation Remedy Please contact customer service Cannot proceed Board is configured or busy Explanation Board cannot be disconnected because its status is busy Remedy Repeat the action If the problem still exists please contact customer service drmach parameter is not a valid ID Explanation ID parameter for status command is not a valid ID Remedy Correct the format of the ID parameter drmach parameter is inappropriate for operation Explanation Parameter s for DR command specified incorrectly Remedy Correct the parameter s drmach_node_ddi_get_parent NULL dip SP
124. r Power on or restart the system board of the domain Addition processing of the system board Change operation for the domain configuration Chapter 4 Practical Examples of DR 4 3 4 1 2 Flow Deleting a System Board FIGURE 4 2 Flow Deleting a System Board Stop status of a domain 4 4 Checking operation and selecting a DR operation Operation status and configuration of a domain Judgment of whether the DR operation can be performed DR operation possible Checking the domain status The domain is operating DR operation not possible or domain configuration to be changed DR operation not possible Checking the status of Reserve operation the system board to be DR for deleting a deleted system board Checking the device status opera tion not possible DR operation possible Deletion operation for Power on or restart the system board of the domain Deletion processing of the system board Change operation for the domain configuration SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Flow Moving a System Board FIGURE 4 3 Flow Moving a System Board Checking operation and selecting a DR operation Operation status and configuration of the move source domain Operation status and configuration of the move destination domain Judgment of whether the DR operation can be performed
125. r licensed from the suppliers to Oracle and or its affiliates and Fujitsu Limited including software and font technology Per the terms of the GPL or LGPL a copy of the source code governed by the GPL or LGPL as applicable is available upon request by the End User Please contact Oracle and or its affiliates or Fujitsu Limited This distribution may include materials developed by third parties Parts of the product may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and in other countries exclusively licensed through X Open Company Ltd Oracle and Java are registered trademarks of Oracle and or its affiliates Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited All SPARC trademarks are used under license and are registered trademarks of SPARC International Inc in the U S and other countries Products bearing SPARC trademarks are based upon architectures developed by Oracle and or its affiliates SPARC64 is a trademark of SPARC International Inc used under license by Fujitsu Microelectronics Inc and Fujitsu Limited Other names may be trademarks of their respective owners United States Government Rights Commercial use U S Government users are subject to the standard government user license agreements of Oracle and or its affiliates and Fujitsu Limited and the applicable provisions of the FAR and its supplements Disclaimer The only
126. r example showdcl 1 00 1 01 TABLE 3 4 Items of Domain Information to be Displayed Display items Description DID Domain ID LSB Logical system board number XSB System board number Chapter 3 DR User Interface 3 3 TABLE 3 4 Items of Domain Information to be Displayed Continued Display items Description Status Domain Status Powered Off Domain power is off Initialization POST processing or OpenBoot PROM initialization is in Phase progress OpenBoot Initialization of OpenBoot PROM is completed Executing Completed Running Oracle Solaris OS is running Shutdown Oracle Solaris OS is being shut down Started Panic State Oracle Solaris OS panic occurred No mem Setting of omit memory option true Enabled Oracle Solaris OS does not use memory false Disabled Oracle Solaris OS uses memory No IO Setting of omit IO option true Enabled Oracle Solaris OS does not use I O device false Disabled Oracle Solaris OS uses I O device Float Setting of floating board option true Enabled Board is designated as a Floating board false Disabled Board is not designated as Floating board Cfg policy Setting of configuration policy FRU Degradation in units of components XSB Degradation in units of XSB System Stopping of domain without degradation The table below lists the items displayed by the showdc1 8 command 3 4 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 3 1 2
127. r your server XSCF gt replacefru 4 14 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 7 Check the status of the replaced system board Execute the showboards 8 command to display system board information and then check the status of all related system boards and confirm their registration in the DCL If necessary to change the system board configuration e g number of divisions do so by using the setupfru 8 command If the system board is not registered in the DCL register it in the DCL for the target domain by using the setdc1 8 command XSCF gt showboards 01 0 XSB DID LSB Assignment Pwr Conn Conf Test Fault 8 Check the status of the domain Execute the showdc1 8 command to display domain information and then check the operation status of the domain Based on the operation status of the domain determine whether to perform the DR operation or reboot the domains XSCF gt showdcl d 0 DID LSB XSB Status 00 Running 00 00 0 01 01 0 9 Add the new system board to the domain Execute the addboard 8 command to add the system board to the move destination domain XSCF gt addboard c configure d 0 01 0 10 Check the status of the domain and added system board When the addboard 8 command ends normally execute the showdc1 8 command to check the operation status of the domain and then execute the showboards 8 command to check the status of the a
128. ration User s Guide December 2010 A 2 4 Timeout detected during self test of XSBHXX X Explanation Because the hardware diagnosis in DR did not complete a timeout occurred There is a possibility that a hardware error occurred Remedy Find out the cause of the DR failure referring to the monitoring message and error log Replace the failed component XSBHXX X will be assigned to DomainID X Continue y n Explanation Confirming whether DR operation is going to be executed or not Input y to execute it and wow n to stop it XSB XX Xwill be configured into DomainID X Continue y n Explanation Confirming whether DR operation is going to be executed or not Input y to execute it and wow n to stop it XSB XX X could not be configured into DomainID X due to operating system error Explanation An error occurred in DR library of domain OS at configuration process The error occurred at configuration management of domain OS Remedy Find out the cause of the DR failure referring monitoring message and console message Try again after resolving cause setdcl XSB is already assigned to an LSB in a running Domain DomainID X Explanation The system board of the specified LSB has already been registered in DCL Remedy Power off the domain or move XSB to the system board pool Try again LSB 00 is already registered in DCL Explanation The system board of the spec
129. rd Settings of Kernel Cage Memory Kernel cage memory is a function used to minimize the number of system boards to which kernel memory is allocated Kernel cage memory is enabled by default in the Oracle Solaris 10 OS If the kernel cage is disabled the system may run more efficiently but kernel memory will be spread among all boards and DR operations will not work on memory To determine whether kernel cage memory is enabled after the system has been rebooted check the following message output from the var adm messages file NOTICE DR kernel Cage is ENABLED If the kernel cage is disabled the message will be NOTICE DR kernel Cage is DISABLED In most cases the kernel cage should be enabled However you must consider actual operations before changing the setting If you do not need to perform DR operations you do not need to enable the kernel cage To enable kernel cage memory remove or comment out the following setting from the etc systen file set kernel_cage_enable 0 The OS must be rebooted to make the new setting effective Chapter 2 What You Must Know Before Using DR 2 17 2 3 3 Setting of Oracle Solaris Service Management Facility SMF Certain DR operations succeed only when the following Oracle Solaris Service Management Facility SMF services are active on the domain a Domain SP Communication Protocol dscp m Domain Configuration Server dcs m Oracle Sun
130. rd Adds a system board into a domain deleteboard Deletes a system board from a domain moveboard Moves a system board between domains 3 26 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 TABLE 3 18 DR related Commands Command name poweron poweroff setdscp showdscp addfru deletefru replacefru showhardconf showstatus showlog Function Turns on the power of all domains or a specified domain Turns off the power of all domains or a specified domain Configures DSCP network Displays the DSCP network configuration Installs a Field Replaceable Unit FRU Removes a Field Replaceable Unit FRU Replaces a Field Replaceable Unit FRU Displays all components mounted in the server Lists degraded components Displays an error log power log event log console log panic log IPL log temperature humidity log and monitoring message log 3 3 XSCF Web XSCF Web lets you execute DR functions from a browser XSCF Web is beyond the scope of this document For details see the SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF User s Guide 3 4 RCM Script Reconfiguration Coordination Manager RCM is a framework used to manage the dynamic disconnection of system components RCM provides script functions that enable you to write your own scripts for dynamic reconfiguration Using RCM scripts enables you to avoid complicated DR operations e g stopping appl
131. rd Output Invalid state transition dr 0 SBX cpuY Explanation Invalid state transition of cpu Y on system board X Remedy Repeat the action If the problem still exists please contact customer service Output Console and Standard Output Invalid state transition dr 0 SBX memory Explanation Invalid state transition of memory on system board X Remedy Repeat the action If the problem still exists please contact customer service Output Console and Standard Output Invalid state transition dr 0 SBX pciY Explanation Invalid state transition of pci Y on system board X Remedy Repeat the action If the problem still exists please contact customer service Output Console and Standard Output Appendix A Message Meaning and Handling A 15 No such device dr 0 SBX cpuY Explanation There may be inconsistency in the system Remedy Please contact customer service Output Console and Standard Output Operation already in progress dr 0 SBX cpuY Explanation The operation on cpu Y on system board X is in progress Remedy Repeat the action If the problem still exists please contact customer service Output Console and Standard Output dr_move_memory failed to quiesce OS for copy rename Explanation There is a task not suspended in the process Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output N
132. rdware The PSB can be used in one of two methods one complete unit undivided status or divided into four subunits eXtended System The XSB is made of physical parts In the XSB the PSB can be either Board XSB one complete unit undivided status or divided into four subunits The XSB is a unit used for domain construction and identification and also can be used as a logical unit Logical System Board A logical unit name assigned to an XSB Each domain has its own set LSB of LSB assignments LSB numbers are used to control how resources such as kernel memory get allocated within domains 1 4 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 TABLE 1 2 Terms Related to Hardware Configurations Continued Term Definition System board The hardware resources of a PSB or an XSB A system board is used to describe the hardware resources for operations such as domain construction and identification In this manual this refers to the XSB Uni XSB One of the division types of a PSB Uni XSB is a name for when a PSB is logically only one unit undivided status It is a default value setting for the division type for a PSB The division type can be changed by using the XSCF command setupfru 8 Uni XSB may be used to describe a PSB division type or status Quad XSB One of the division types of a PSB Quad XSB is a name for when a PSB is logically divided into four parts The division type can b
133. rocess is bound to the target CPU you must unbind or stop the process m The CPU to be deleted does not belong to any processor set If the target processor belongs to a processor set you must delete the CPU from the processor set by using the psrset 1M command m If the resource pools facility is in use by the domain the CPU cannot be deleted unless the minimum processor set sizes can otherwise be maintained Use the Oracle Solaris commands pooladm 1M and poolcfg 1M to check these parameters and if necessary adjust the sizes of the domain s resource pools Note These conditions also apply to movement of a system board If any of the above conditions are not met the DR operation is stopped and a message is displayed However if you specify the deleteboard 8 command with the f force option these protections are ignored and DR continues the deletion process Note Exercise care when using the f force option as doing so introduces risk of domain failure To avoid this problem and automate the operations for CPUs the Oracle Solaris OS provides the Reconfiguration and Coordination Manager RCM script function For details of RCM see Section 3 4 RCM Script on page 3 27 For information about mixed configurations of SPARC64 VII or or SPARC64 VII and SPARC64 VI processors see Section 2 5 9 SPARC64 VII SPARC64 VII and SPARC64 VI Processors and CPU Operational Modes on page 2 30 Memory T
134. rs XSCF 821 2797 C120 E332 User s Guide SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers XSCF Varies per Varies per Reference Manual release release SPARC Enterprise M4000 M5000 M8000 M9000 Servers Dynamic 821 2796 C120 E335 Reconfiguration DR User s Guide SPARC Enterprise M4000 M5000 M8000 M9000 Servers Capacity on 821 2795 C120 E336 Demand COD User s Guide SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers Product Varies per Varies per Notest release release SPARC Enterprise M4000 M5000 Servers Product Notes Varies per Varies per release release SPARC Enterprise M8000 M9000 Servers Product Notes Varies per Varies per release release External I O Expansion Unit Product Notes 819 5324 C120 E456 SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers Glossary 821 2800 C120 E514 This is a printed document t Beginning with the XCP 1100 release Preface xi Text Conventions This manual uses the following fonts and symbols to express specific types of information Font symbol Meaning Example AaBbCc123 What you type when contrasted XSCF gt adduser jsmith with on screen computer output This font represents the example of command input in the frame AaBbCc123 The names of commands files and XSCF gt showuser P directories on screen computer User Name jsmith output Privileges useradm This font represents the example of auditadm command output in the frame Italic Indicates the name of a reference
135. rue m The swap area does not have sufficient free space to save data from the user memory to be deleted m There are too many locked or ISM pages to be covered by the memory on other system boards I O Device 1 Adding an I O Device The device driver processing executed by the Oracle Solaris OS is based on the premise that all device drivers dynamically recognize newly added devices In the domain where DR is performed all device drivers must support the addition of devices by DR Upon the addition of an I O device by DR the I O device is reconfigured automatically The path name of a device file under dev is configured as the path name of the newly added I O device to make the I O device accessible 2 Deleting an I O Device An I O device can be deleted when both of the following conditions are met m The device to be deleted is not in use in the domain where the DR operation is to be performed m The device drivers in the domain where the DR operation is to be performed support DR In most cases the device to be deleted is in use For example the root file system or any other file systems requisite for operation cannot be unmounted To solve this problem you can configure the system by using redundant Chapter 2 What You Must Know Before Using DR 2 9 Dolan 2 1 3 2 10 configuration software to make the access path to each requisite I O device redundant For a disk drive unit you can make the unit redundant b
136. rvers Getting Started Guide 821 3045 C120 E345 SPARC Enterprise M8000 M9000 Servers Getting Started Guide 821 3049 C120 E323 SPARC Enterprise M4000 M5000 Servers Overview Guide 819 2204 C120 E346 SPARC Enterprise M8000 M9000 Servers Overview Guide 819 4204 C120 E324 SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers Important 821 2098 C120 E633 Legal and Safety Information SPARC Enterprise M4000 M5000 Servers Safety and Compliance Guide 819 2203 C120 E348 SPARC Enterprise M8000 M9000 Servers Safety and Compliance Guide 819 4201 C120 E326 External I O Expansion Unit Safety and Compliance Guide 819 1143 C120 E457 SPARC Enterprise M4000 Server Unpacking Guide 821 3043 C120 E349 x SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Book Title Sun Oracle Fujitsu SPARC Enterprise M5000 Server Unpacking Guide 821 3044 C120 E350 SPARC Enterprise M8000 M9000 Servers Unpacking Guide 821 3047 C120 E327 SPARC Enterprise M4000 M5000 Servers Installation Guide 819 2211 C120 E351 SPARC Enterprise M8000 M9000 Servers Installation Guide 819 4200 C120 E328 SPARC Enterprise M4000 M5000 Servers Service Manual 819 2210 C120 E352 SPARC Enterprise M8000 M9000 Servers Service Manual 819 4202 C120 E330 External I O Expansion Unit Installation and Service Manual 819 1141 C120 E329 SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Servers 821 2794 C120 E331 Administration Guide SPARC Enterprise M3000 M4000 M5000 M8000 M9000 Serve
137. s executed and then the domain power is turned on or the domain rebooted Specifies that the command reserve the addition of a system board to the domain With this option specified the command executes the same processing as for the c assign option and it assigns the target system board to the domain The assigned system board is added to the domain when the addboard 8 command with the c configure option specified is executed and then the domain power is turned on or the domain is rebooted Specifies the domain ID of the domain to add a system board where domain_id is the domain number possibly 0 to 23 depending on your server Only one domain ID can be specified Specifies the system board XSB number of the system board to be added Specify xsb in the XX Y format XX 00 to 15 Y 0 to 3 The value depends on your server To specify multiple system boards several XSB numbers can be specified by delimiting each with a space Note Note 1 In the system board addition processing executed by this command a diagnosis of the system board to be added is performed first and then the system board is added to the target domain For this reason much time may be required for the command to complete its operation Note Note 2 If DR processing by the addboard 8 command fails the target system board cannot be restored to its previous status You must identify the cause of failure based on the error message outpu
138. s internal data Remedy Please contact customer service FAILED to suspend lt device name gt lt device info gt Explanation Device suspension failed Remedy Repeat the action If the message persists please contact customer service FAILED to resume lt device name gt lt device info gt Explanation The device cannot be resumed Remedy Please contact customer service SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 dr_stop_user_threads failed to stop thread process lt name gt pid Explanation Cannot stop the user thread Remedy Please contact customer service Cannot stop user thread lt pid gt lt pid gt Explanation The DR driver cannot stop all the user processes in the list Remedy Please contact customer service Output Console and Standard Output Cannot setup memory node Explanation DR is unable to read the HW information for the memory device Remedy Please contact customer service Kernel Migration fails 0xX Explanation Kernel data migration failed as a result of DR detach Remedy Please contact customer service TOD on board X has already been attached Explanation Time of Date Clock on board X has been attached This may be a minor inconsistency in the system Remedy Please contact customer service TOD on board X has already been removed Explanation Time of Date Clock on board X has been removed This may be a min
139. t by the addboard 8 command and Oracle Solaris OS messages and then take appropriate corrective action Note that some errors require the domain to be rebooted SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 3 1 7 Note Note 3 If a system board has been forcibly added to a domain by the addboard 8 command with the f option specified normal operation of all added hardware resources may be disabled For this reason you should avoid using the option for normal DR operations After adding a system board by using the addboard 8 command with the f option specified be sure to check the status of the added system board and the devices on the system board Note Note 4 You can execute the addboard 8 command on a domain that is not running When the domain is running the addboard 8 command with c configure will succeed only if the following Oracle Solaris Service Management Facility SMF services are active on that domain Domain SP Communication Protocol dscp Domain Configuration Server dcs Oracle Sun Cryptographic Key Management Daemon sckmd Deleting a System Board Use the deleteboard 8 command to delete a system board from a domain and assign it to the system board pool If you specify the c reserve option the action takes place the next time the domain is powered off or rebooted Before executing the deleteboard 8 command check the status of
140. tem board from a domain without stopping the Oracle Solaris OS running in that domain A system board is deleted in such stages as unconfigure and disconnect If the board must be assigned to another domain the delete operation must also include an unassign step In the delete operation the selected system board is unconfigured from its domain by the Oracle Solaris OS Then the board is disconnected from the domain At this point deletion of the system board is completed Moving a System Board You can use DR to reassign a system board from one domain to another without stopping the Oracle Solaris OS running in either domain This move function can change the configurations of both domains without physical removal and remounting of the system board The move operation for a system board is a serial combination of the delete and add operations In other words the selected system board is deleted from its domain and then added to the target domain SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 1 2 4 Replacing a System Board You can use DR to remove a system board from a domain and either add it back later or replace it with another system board provided both boards satisfy DR requirements as described in this document You can do so without stopping the Oracle Solaris OS running in either domain You can replace system board in the case of exchanging hardware resources such as C
141. ternal error happened during kernel migration A 22 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 Remedy Please contact customer service Memory copy error Explanation Memory copy error happened during kernel migration Remedy Retry and if the problem persists contact customer service Appendix A Message Meaning and Handling A 23 SCF error Explanation Internal error happened during kernel migration Remedy Please contact customer service Cannot add SPARC64 VI to domain booted with all SPARC64 VII CPUs Explanation System board with SPARC64 VI cannot be added into a domain booted with all SPARC64 VII CPUs when the domain s CPU mode is set as auto via XSCF Remedy The system board that failed to be added is assigned to the target domain Please delete the system board to restore the status as available SCF OFFLINE Explanation XSCF failure or failover occurred during kernel migration Remedy Log in to XSCF again to check the status and repeat the action A 2 Command Messages A 2 1 addboard XSBH XX X will be assigned to DomainID X Continue y n Explanation Confirming whether DR operation is going to be executed or not Input y to execute it and Now n to stop it XSBH XX Xwill be configured into DomainID X Continue y n Explanation Confirming whether DR operation is going to be executed or not Input y to execute it and
142. the force option is used with the deleteboard 8 command m The copy destination board must contain at least as much physical memory as the system board being deleted Chapter 2 What You Must Know Before Using DR 2 7 2 8 m If more than one system board satisfies all the selection criteria to the same degree of satisfaction the one with the lowest LSB number is selected as the copy destination board Note If no system boards meet the selection criteria the DR operation to delete the kernel memory board will fail Once the copy destination board has been selected the Oracle Solaris OS performs a memory deletion on the selected user memory board Then the kernel memory on the system board to be deleted is copied into memory on the selected copy destination system board The system is suspended while the copying is in progress After all the memory is copied the address space of the copy destination board is renamed to that of the kernel memory board being deleted Note If the address space of a system board is renamed by a copy rename operation the change will persist across reboots of the domain A powerof 8 poweron 8 cycle of the domain will reset the address space assignments and remove the effects of one or more copy rename operations 2 User Memory Board A user memory board is a system board on which no kernel memory is loaded Before deleting user memory the system attempts to swap out the physical
143. the status of the move destination domain and moved system board Execute the showdc1 8 command to check the operation status of the move destination domain and then execute the showboards 8 command to check the status of the moved system board XSCF gt showdcl d 1 DID LSB XSB Status 01 Running 00 01 0 01 00 1 XSCF gt showboards 00 1 XSB DID LSB Assignment Pwr Conn Conf Test Fault 4 12 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration Users Guide December 2010 4 5 4 5 1 Examples Replacing a System Board This section provides examples of operations to replace a system board in a domain The examples illustrate replacement of a system board in a Uni XSB environment and a system board in a Quad XSB environment In each sample operation a procedure conforming to Section 4 1 4 Flow Replacing a System Board on page 4 6 is used and the system board shown in each figure is replaced using the XSCF shell Note You cannot use DR to replace a system board in a midrange server because replacing a system board replaces an MBU To replace a system board in a midrange server you must turn off the power for all domains then perform a hardware replacement Example Replacing a Uni XSB System Board FIGURE 4 8 Example Replacing a Uni XSB System Board a Domain 0 XSB 00 0 1 XSB 01 0 Add r 1 Login to XSCF Chapter 4 Practical Examples of DR 4 1
144. the target domain and system board and the device usage status on the system board You must determine whether you can perform the DR operation according to the status of the domains and system board and the device usage status on the system board You must also stop the processes that are bound to the CPU and the accessing of I O devices to prepare for system board deletion If the system board to be deleted is a kernel memory board check the status and memory size of the system board to which kernel memory is to be moved The following examples show the format and options of the deleteboard 8 command deleteboard y n v c disconnect xsb xsb deleteboard y n f v c unassign xsb xsb deleteboard y n f v c reserve xsb xsb deleteboard Chapter 3 DR User Interface 3 17 3 18 TABLE 3 14 Options of the deleteboard Command Option q c disconnect c unassign c reserve xsb Description Specifies the suppression of output message display The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed Specifies that a response of yes is made automatically to output messages The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed Specifies that
145. the target system board You can display and reference the status of each system board via a user interface provided by XSCF For details of the user interface see Chapter 3 DR User Interface SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User s Guide December 2010 2 4 3 Flow of DR Processing This section describes the flow of DR processing and the changes in system board status during individual DR operations 2 4 3 1 Flowchart Adding a System Board The flow of DR operations and the transition of system board status when a system board has been added or reserved for addition are described in the schematic flowchart below Each system board status indicated in FIGURE 2 5 is the main status that is changed Chapter 2 What You Must Know Before Using DR 2 21 FIGURE 2 5 Flow of System Board Addition Processing Mrs gO ark Oy Sa ge ee a a hs 4 System board pool DCL registration status l I registration Addition or Test passed ti Test passed l Add _ reserva Assignment available reservation Assignment assigned Operation tion DCLI1 l I registration So SS ae eee 7 process Request to add Request to add system board system board or domain reboot after registration reservation To ag ee Se SNE a gee p TOE me Diagnosis Error status gt i I I Error found Test fail I i Assignment assigned
146. tion DR cannot locate SCF driver s FMEM interface functions SCF is probably not loaded or incorrect version is used Remedy Please contact customer service Device busy dr 0 SBX pciY Explanation Some devices are still referenced Remedy Confirm that all devices in this pci slot are not in use and repeat the action If this error message appears again please contact customer service Output Console and Standard Output Device driver failure path Explanation The device driver failed in attach or detach operation Appendix A Message Meaning and Handling A 11 Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output Error setting up FMEM buffer Explanation DR fails to allocate enough memory to perform copy rename Remedy Retry and if the problem persists contact customer service Failed to off line dr 0 SBX cpuY Explanation Failed to off line CPU Y on board X Remedy Repeat the action If this error message appears again please contact customer service Output Console and Standard Output Failed to on line dr 0 SBX cpuY Explanation Failed to online CPU Y on system board X Remedy Online CPU with psradm n If it fails to online CPU and if this command fails again respond in the manner directed by command message Output Console and Standard Output Failed to start CPU dr 0 SBX cpuY Explanation Fa
147. umber When only floating boards are set in the domain one of them is selected and used as a kernel memory board In that case the status of the board is changed from floating board to non floating board When Copy rename is operated by system board deletion or removal and only floating board can be used because non floating board cannot be used specify the f force option Configuration of floating board option does not change when the force option is used Note Enable the floating board option when the system board is in the system board pool or when the system board is not connected to the domain configuration Omit memory Option When the omit memory option is enabled the memory on a system board cannot be used in the domain Even when a system board actually has memory this option enables you to make the memory on the system board unavailable through a DR operation to add or move the system board This option can be used when the target domain needs only the CPU and not the memory of the system board to be added If a domain has a high load on memory an attempt to delete a system board from the domain may fail This failure results if a timeout occurs in memory deletion processing saving of the memory of the system board to be disconnected onto a disk by paging when many memory pages are locked because of high load To prevent this situation you can enable the omit memory option to facilitate the DR operation beforehand
148. us on the system board You must also stop any processes that are bound to the CPU and any that are accessing I O devices to prepare for system board deletion If the system board to be deleted is a kernel memory board check the status and memory size of the system board to which kernel memory is to be moved The following examples show the format and options of the moveboard 8 command moveboard q y n v c configure d domain_id xsb xsb moveboard q y n f v c assign d domain_id xsb xsb moveboard q y n fl v c reserve d domain_id xsb xsb moveboard h TABLE 3 15 Options of the moveboard Command Option Description q Specifies the suppression of output message display The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed y Specifies that a response of yes is made automatically to output messages The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed n Specifies that a response of no is made automatically to output messages The y or n option determines how output messages are automatically answered whether or not the messages themselves are suppressed with the q option or displayed Forcibly deletes a system board from the move s
149. warranties granted by Oracle and Fujitsu Limited and or any affiliate of either of them in connection with this document or any product or technology described herein are those expressly set forth in the license agreement pursuant to which the product or technology is provided EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT ORACLE OR FUJITSU LIMITED AND OR THEIR AFFILIATES MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIND EXPRESS OR IMPLIED REGARDING SUCH PRODUCT OR TECHNOLOGY OR THIS DOCUMENT WHICH ARE ALL PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Unless otherwise expressly set forth in such agreement to the extent allowed by applicable law in no event shall Oracle or Fujitsu Limited and or any of their affiliates have any liability to any third party under any legal theory for any loss of revenues or profits loss of use or data or business interruptions or for any indirect special incidental or consequential damages even if advised of the possibility of such damages DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAI
150. y using disk mirroring software If a device driver that does not support DR is used in the domain all access to I O devices controlled by the device driver must be stopped and the device driver must be unloaded by using the modunload 1M command Note Do not move a device that is part of a redundant configuration from one domain to another domain The consequences of two domains simultaneously accessing the same device through different paths could be disastrous such as data corruption System Board Configuration Requirements XSCF enables the Uni XSB or Quad XSB setting according to the configuration conditions to determine the division type If the CPU or memory configuration does not meet the configuration conditions neither Uni XSB nor Quad XSB can be set as the division type For the CPU configuration and memory configuration conditions set for the division types see the System Overview for your server The setting of division type may be changed for DR operation if a domain operation requirement dictates changing of a necessary hardware resource when a system board is added to the domain In such cases the CPU configuration and memory configuration conditions for changing the division type are the same as described above For the conditions see the System Overview for your server Note Changing the division type before a DR operation may not be possible depending on the system board status or DR operation eve

Download Pdf Manuals

image

Related Search

Related Contents

3 e dimanche du Carême  MS-Tech Soundcard LS 5.1  LED Mini Moving Display-1  ご活用プラン 災害対策会議室での映像確認  Confezionatrici per termoretraibili DIBIPACK 3246  Whirlpool GEQ8821KQ0 User's Manual  Impex JD 2 User's Manual  Olympus Stylus 725 SW Basic manual  Inventum NB800 blender  

Copyright © All rights reserved.
Failed to retrieve file