Home
UltraSPARC IV Processor
Contents
1. eese ee eee eene eee ee eee eroe eee ee eee sese eese enne nn tnos 37 5 1 Machine States After Reset eene eet em rennen 37 Performance Instrumentation 7 setenta aeos sense tn 7 43 Assembly Language e 45 7 1 Prefetch Instr ction i eee Eee E 45 Memory Sollen ee 47 UltraSPARC IV Processor User s Manual April 2004 e Sun microsystems 8 1 SDRAM Timing Control 47 8 2 Chip Kill DIMM Support airen Ici onu 49 IEEE 754 1985 Standard EE 51 9 1 01 51 9 1 1 Floating Point Operations 7 51 HL Reunding Mode sese aee idee e eain 52 9 1 3 Nonstandard Floating Point Operating Mode sss 52 9 1 4 Memory and Register Data Images ssssssseeeeee 52 HES Subnormal Operations 5 serere Teese pt eerte eds 52 9 1 6 FSR CEXC and FSR AEXC Updates eee 53 9 1 7 Prediction Logic 2 eene bx ep itd 53 9 2 Floating Point Numbers 53 9 2 1 Floating Point Number Line sese eee 55 9 3 IS E E ee E 55 931 Addition io etse E a d per eta 56 9 32 Subttactioni 57 9 3 3 Multiplication eet ied e ete Ee 58 59 9 35 Square ROOT eR Re SE HP ert e eite e EQ 60 9 3 6 60 9 3 7 Precision CONVELSION ee eg TH RU e ederet tem 61 9 3 8 Floating point to Integer Number Conversion cessseeesteeeseeeeeneees
2. UltraSPARC IV Processor User s Manual April 2004 10 1 1 3 10 1 2 un microsystems L2 Cache Error Enable Register Three bits are added to the L2 Cache Error Enable register ASI ESTATE ERROR EN REG ASI Ox4B VA 0x00 in order to enhance RAS capability TABLE 10 2 defines these bits Bits 18 0 of this register are the same as those in the UltraSPARC III Cu processor TABLE 10 2 L2 cache Error Enable Register Format Bits Field RW Use 22 FPPE RW Force CPORT data parity error on data parity bit When this bit is set to 1 the datapty n signal is toggled before it is driven 21 FDPE RW Force CPORT data parity error on data LSB bit When this bit is set to 1 the data n 0 signal is toggled before it is driven 20 FSAPE RW Force Fireplane address parity error on parity bit When this bit is set to 1 the addrpty n signal is toggled before it is driven 19 Reserved Reserved field Note This private register is accessed by ASI_ESTATE_ERROR_EN_REG Its settings affects that particular logical processor only Note FPPE FDPE and PSAPE have effect on outgoing transactions to other chips as well as inter logical processor transactions so do FMT bit 18 and FMD bit 13 Shared Resource Error Reporting An error not specific to any one logical processor is handled in a special way When an error not rela
3. FSUB rou rs gt rd Destination Register Destination Register Written rd Flag s Written rd Flag s Trap 0 0 0 no 0 no 0 0 0 no 0 no 0 0 0 no 0 no 0 0 0 no 0 no 0 Normal Normal no Normal no 0 Normal Normal no Normal no 0 Infinity Infinity no Infinity no 0 Infinity Infinity no Infinity no t uf Normal Infinity Infinity SS fa no set nvc ieee trap set nva set ufc set nvc 0 tNormal Infinity Infinity 1 no set nvc set ufa ieee trap set nva Normal Normal May overflow see 9 5 3 May overflow see 9 5 3 Normal Normal Normal no Normal no Normal Normal May underflow see 9 5 4 May underflow see 9 5 4 Normal Normal May underflow see 9 5 4 May underflow see 9 5 4 Infinity HO d Normal Infinity no Infinity no Infinity 0 e ER Normal Infinity no Infinity no set nvc set nvc Infinity Infinit n QNaN set nva S ieee trap Infinity Infinity Infinity no Infinity no Infinity Infinity Infinity no Infinity no set nvc set nvc Infinity Infinity QNaN sativa no ieee trap IEEE 754 1985 Standard 9 57 un microsystems 9 3 3 Multiplication TABLE 9 5 Floating point Multiplication RESULT from the operation includes one or more of the following e Number in f register see Trap Event note page 66 MULTIPLICATION Instruction Exception bit set s
4. 0 Target ID The register has only one 6 bit field that encodes the LP ID When an error in a shared resource is detected the AFSR AFAR of the logical processor whose LP ID matches with the one specified in the CMT Error Steering register is updated and if enabled a trap is triggered If the logical processor is suspended the trap will be taken after the logical processor enters the running state The Target ID indicates the TTE that has a LP ID equal in value to that of the target ID Note It is the responsibility of the software to make sure that the CMT Error Steering register identifies an appropriate logical processor If the register identifies a logical processor that is not enabled an error not specific to any one logical processor may result in an update of the EESR and AFSR AFAR of this disabled logical processor However the error should not report to and thus causing no effect on either of the enabled logical processors Although an UltraSPARC IV processor always sets bits 5 1 to 0 it is suggested that software always program these bits to 0 for future compatibility Reporting Shared Resource Errors Before a trap can be generated for a shared resource error the error must be recorded shared resource errors are recorded in the asynchronous error reporting mechanism of the logical processor specified by the CMT Error Steering register The same asynchronous error UltraSPARC IV Processor User s Manual A
5. PER Register Register FIGURE 9 1 Floating point Number Line 9 3 IEEE Operations The response of each operation to operands with 0 Normal Infinite and NaN numbers are described in this section The response to Subnormal numbers are described in section 9 8 Subnormal Operations on page 73 The result of each operation is concluded by one of the following A number is written to the destination f register rd A number is written to the destination register and an IEEE flag is set An IEEE flag is set and an IEEE trap is generated rd is unchanged Each instruction is defined with one or more operands Most instructions generate a result The FCMP E instruction does not generate a result instead it sets the fccN bits IEEE 754 1985 Standard 9 55 un microsystems 9 3 1 9 56 Addition TABLE 9 3 Floating point Addition RESULT from the operation includes one or more of the following ADDITION Number in f register see Trap Event note page 66 Instruction Exception bit set see TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 Underflow Overflow may occur FADD 151 rsz rsz Masked Exception TEM 0 Enabled Exception TEM 1 gt rd rs l a Destination Register Flag s Destination Register Flag s Tra Written rd 9 Written rd ts Trap 0 0 0 no 0 no 0 0 0 FSR RD 0 1 2 0 FSR RD 0 1 2 SS 0 FSR RD 3 0 FSR RD 3
6. UltraSPARC IV Processor User s Manual April 2004 23 un microsystems TABLE 2 1 Enhancements to the UltraSPARC IV Processor s Core Feature New Write cache indexing hashing feature Hardware support for rare corner cases in floating point add sub operations Avoids unfinished_F Pop traps More optimal software prefetch semantics Hardware response to the prefetch instruction 1 Dual Inline Memory Module DIMM TABLE 2 2 Changes Due to CMT Enhancement Feature Some resources such as some MCU registers some pins and some Sun Fireplane Interconnect registers are shared One new shared MCU Timing Control register is added to support a broader range of SDRAM timing New registers have been added to support the Sun Standard CMT model Certain processor registers have been mapped to allow CMT operation Each logical processor has an associated CESR ID register for enhanced error diagnostics and recovery in tightly clustered systems Note In the UltraSPARC IV processor applications can access shared registers If applications being executed on separate logical processors try to read write the same shared register at the same time the UltraSPARC IV processor will arbitrate and sequence the requests However the order is not guaranteed To obtain a deterministic result the software must program it correctly e g by using mutex semantics RAS Architecture The Ul
7. UltraSPARC IV Processor User s Manual Supplement SS Version 1 0 April 2004 un microsystems Copyright 2004 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 U S A All rights reserved Sun Microsystems Inc has intellectual property rights relating to technology embodied in the product that is described in this document In particular and without limitation these intellectual property rights may include one or more of the U S patents listed at http www sun com patents and one or more additional patents or pending patent applications in the U S and in other countries This document and the product to which it pertains are distributed under licenses restricting their use copying distribution and decompilation No part of the product or of this document may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Third party software including font technology is copyrighted and licensed from Sun suppliers Sun Sun Microsystems the Sun logo Java Solaris UltraSPARC IV UltraSPARC III Cu UltraSPARC Sun Fireplane Interconnect VIS and OpenBoot PROM are trademarks or registered trademarks of Sun Microsystems Inc in the U S and other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the U S and other countries Products bearing SPARC trademarks are b
8. o9 3 oo Nonstandard Floating Point Operating Mode The processor supports a nonstandard floating point mode to facilitate in the handling of Subnormals by the hardware avoiding a software trap to supervisor software The floating point operating mode is controlled by the FSR NS bit When FSR NS 1 nonstandard mode is selected However when GSR IM 1 interval arithmetic rounding mode is selected then regardless of the FSR NS bit the processor will be in standard mode Memory and Register Data Images The floating point values are represented in the f registers in the same way that they are represented in memory Any conversions for ALU operations are completed within the floating point execution unit Load and store operations do not modify the register value VIS instructions logical and move copy operations can be used with values generated by the floating point unit Subnormal Operations Subnormal operations include operations with Subnormal number operands and situations where an operation without Subnormal number operands generate a Subnormal number result The floating point unit response to Subnormal numbers is described in section 9 8 Subnormal Operations on page 73 UltraSPARC IV Processor User s Manual April 2004 9 1 6 un microsystems FSR CEXC and FSR AEXC Updates The current exception cexc and accrued exception aexc fields in the FSR are described in section 9 5 JEEE Traps on p
9. 0 0 0 no 0 no 0 Normal Normal no Normal no 0 Normal Normal no Normal no 0 Infinity Infinity no Infinity no 0 Infinity Infinity no Infinity no set ofc set ofc 2 7 set ofa Normal Infinity Infinity no set nvc set nvc ieee trap set nva set ofc set ofc is set ofa Normal Infinity Infinity no set nvc set nvc ieee trap set nva Normal Normal May overflow see 9 5 3 May overflow see 9 5 3 Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal May underflow see 9 5 4 May underflow see 9 5 4 Infinity Infinity Infinity no Infinity no set nvc set nvc Infinity Infinit a n nini Infini GNaN set nva 8 ieee trap set nvc set nvc 1 ity 1 it i nfinity Infinity NaN set nva SE ieee trap Infinity Infinity Infinity no Infinity no UltraSPARC IV Processor User s Manual April 2004 9 92 Subtraction TABLE 9 4 Floating point Subtraction un microsystems SUBTRACTION Instruction S1 2 RESULT from the operation includes one or more of the following Number in register see Trap Event note page 66 Exception bit set see TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 Underflow Overflow may occur Masked Exception TEM 0 Enabled Exception TEM 1
10. 1552 260 0 FLUSH RETRY UPPERCASE items are acronyms instruction names or writable register fields Some common acronyms are listed in the UltraSPARC III Cu Processor User s Manual Note Names of some instructions contain both upper and lowercase letters Underbar characters join words in register register field exception and trap names Note Such words can be split across lines at the underbar without an intervening hyphen This is true whenever the integer condition code field is an example of how the underbar characters are used Notational Conventions The following notational conventions are used Square brackets indicate a numbered register in a register file For example r 0 translates to register 0 indicate a bit number or colon separated range of bit numbers within a field Bits FSR 29 28 and FSR 12 are Curly braces indicate textual substitution For example the string PRIMARY _LITTLE expands to ASI PRIMARY and ASI PRIMARY LITTLE If a bar is used with the curly braces it represents multiple substitutions For example the string ASI DMMU TSB 8KB 64KB DIRECT DIR REG expands to ASI DMMU TSB 8KB PIR REG ASI DMMU TSB 64KB DIR REG and ASI DMMU TSB DIRECT BIR REG The symbol designates concatenation of bit vectors A comma on the left side of an assignment separates quantities that are concatenated for the purpose of as
11. 6 PRB MH IERR Multiple way probe hits Specific to a LP 7 ST MH IERR Multiple way store hits Specific to a LP UltraSPARC IV Processor User s Manual April 2004 un microsystems TABLE 10 7 System Bus Protocol Error Data Bit Description Comment 8 Undefined DTransID Not specific to a LP Read Tx Incoming DTransID does not match any outstanding ATransID Write Tx Incoming DTransID does not match any outstanding TargID 9 Undefined TTransID Incoming Not specific to a LP TTIransID does not match any outstanding ATransID 10 Multiple TargetID issued for the Not specific to a LP same write transaction 11 Unexpected DtransID grant Not specific to a LP 12 UTG PERR Unexpected TargetID TTransID Not specific to a LP grant TABLE 10 8 Internal Errors of the DPCTL Bit Field Error Type Description Comment 13 LWQ_OV IERR Local Write Queue Overflow Not specific to a LP 14 LWQ_UF IERR Local Write Queue Underflow Not specific to a LP 15 FRDQ_OV IERR Foreign Read Queue Specific to a LP Overflow 16 FRDQ_UF IERR Foreign Read Queue Specific to a LP Underflow 17 C2MS_WER IERR Overwrite a valid C2MS entry Not specific to a LP by trying to update the valid entry of a Local write transaction 18 C2MS_IR TERR Request to invalidate a Not specific to a LP unoccupied C2MS entry 19 S2M_WER JERR Overwrite a valid S2M entry Not specific to a LP 20 FRARB OV IERR Forei
12. Index microsystems e Sun microsystems UltraSPARC IV Processor User s Manual April 2004 A address space identifiers 9 aexc field of FSR 53 ASI _ECACHE_TAG 34 _ECACHE_W 31 ASI_ECACHE_R 31 B bit vector concatenation xii C cexc field of FSR 53 Chip Multithreading 7 Chip Kill 6 CMT 1 8 concatenation of bit vectors xii conventions font xii notational xii D Data Cache Unit Control Register 25 E ECACHE_W EC addr 32 ECC check vector 32 F fp exception other exception 53 64 65 73 FPops 51 FSR aexc field 53 cexc field 53 I implementation note xiii Implementation Registers 22 L L2 cache 27 LRU 3 M MCU timing 47 Multithreading 7 N note implementation xiii programming xiii P Prefetch 45 programming note xiii Q quiet NaN not a number 71 R RED state 37 5 Subnormal operations 2 T Thread 8 trap handler user 70 Index xiii U underflow mask UFM bit of TEM field of FSR 70 underflow operation 69 unfinished FPop exception 73 user trap handler 70 W W cache 25 xiv UltraSPARC IV Processor User s Manual April 2004
13. 0 bit 1 represents LP 1 Chip Multithreading CMT 3 13 un microsystems 3 42 3 4 2 1 8 14 If a bit position in the register is asserted 1 the corresponding logical processor is implemented and is functional in the CMT processor If a bit position in the register is not asserted 0 the corresponding logical processor is not implemented or was permanently disabled at manufacturing time An implemented logical processor is a logical processor that can be enabled and used In the UltraSPARC IV processor this register is always read 88 1 TABLE 3 4 shows the format of the LP Available register Each bit represents one logical processor bit 0 for LP 0 bit 1 for LP 1 and so on If a logical processor is available or implemented then the hardware will set the corresponding bit 1 Otherwise the hardware sets bit 0 In the UItraSPARC IV processor bit 1 and bit 0 will be set to 1 bits 63 2 are always 0 TABLE 3 4 LP Available Register Shared Bit Field Description 63 2 Reserved Reserved 0 when read 1 LP1 This bit represents LP 1 0 LP 0 This bit represents LP 0 Enabling and Disabling Logical Processors The CMT programming model allows logical processors to be enabled and disabled Enabling or disabling a logical processor is a heavyweight operation that requires a system reset for updates Disabled logical processors produce no architectural effects observable by other logical proce
14. 11 UItraSPARC IV Processor Shared Registers ASI JTAG Value ASI Name Access Description Accessible Ox41 ASI CORE AVAILABLE R LP Available register 0x41 ASI CORE ENABLE STATUS LP Enable Status register 0x41 ASI CORE ENABLE LP Enable register Read Write 0x41 ASI XIR STEERING XIR Steering register Read Write 0x41 ASI CORE RUNNING RW LP Running register Read Write 0x41 ASI CORE RUNNING 858 LP Running register Write One Set 0x41 ASI CORE RUNNING W1C LP Running register Write One Clear 0x41 ASI CORE RUNNING STATUS LP Running Status register 0x41 ASI CMT ERROR STEERING Error Steering register Read Write 3 22 UltraSPARC IV Processor User s Manual April 2004 un microsystems Note ASI accesses to the registers must use LDXA STXA LDDFA STDFA instructions Using another type of load or store instruction will cause a data access exception trap with SFSR FT 8 illegal ASI value VA RW or size Attempt to access these registers while in non privileged mode will cause a privileged action trap with SFSR FT 1 privilege violation A non aligned access will cause a mem address not aligned trap If the instruction is LDDFA STDFA and if the address is aligned to a 32 bit boundary but not to a 64 bit boundary then the trap type will be LDDF STDF mem address not aligned Chip Multithreading CMT 3 23 un micros
15. 45 Prefetch Instruction The UltraSPARC III Cu processor implements ten prefetch functions whose function codes are 0 1 2 3 4 16 20 21 22 and 23 The UltraSPARC IV processor features the following changes 1 Prefetch with fcn 3 now performs the same as prefetch with fcn 2 2 Prefetch with fcn 23 now performs the same as prefetch with fcn 22 3 Prefetch with fcn 17 is added whose behavior is the same as prefetch with fcn 3 in the UltraSPARC III Cu processor e Sun microsystems 7 46 TABLE 7 1 summarizes the prefetch instruction behavior TABLE 7 1 Prefetch Instruction for Several Reads Prefetch Functions Description 64 bytes of data from the specified target address are prefetched by means of an RTS transaction and Modified New in the UltraSPARC IV processor fcn 0 20 installed in both E cache and P cache One Read 64 bytes of data from the specified target address are fcn 1 21 prefetched by means of an RTS transaction and installed in the P cache Several Writes fcn 2 22 One Write fen 3 23 64 bytes of data from the specified target address are prefetched and install in the L2 cache If the ASI_ECACHE_CTRL pf2 RTO_en bit is set an RTO transaction is issued for the prefetch otherwise an RTS is issued Read to Nearest Unified Cache 64 bytes of data from the specified target address are prefetched by means of an RTS transaction and ins
16. 62 9 3 9 Integer to Floating point Number Conversion c cccssseceseeeeseeeeeeees 63 9 3 10 Copy Move Operations mieie inn n EE E E AERE 63 9 3 11 fRegister Load Store Operations sssssssseen e 64 9 3 12 VIS Oper tions xoci EES ei 64 9 4 Traps and Exceptions see ee ERICH Re op 64 94 1 Summaty of Exceptions oue E ee 66 9 44 2 Frap d RI Pe teet De Ee PU E I ERA e te Pe rel e rege 66 Table of Contents amp Sun microsystems 9 4 3 Trap Priority inel eee eee E RERO YER 67 9 5 uci M c 67 9 5 1 IEEE Trap Enable Mask TEM ee 67 9 5 2 IEEE Invalid nv Trap 67 9 5 3 IEEE Overflow 0f Trap ai 67 9 5 4 IEEE Underflow uf Trap sese nennen 68 9 5 5 IEEE Divide by Zero 68 9 5 6 IEEE Inexact eee enne nnne 68 9 6 Underflow Operation 4 deese AE 69 9 6 L Trapped Underllow ege ege 69 9 6 2 Untrapped Underflow nserita A E 70 9 7 IEEE NaN Operations 7 70 9 7 1 Signaling and Quiet NaNs ssssssssssssseeeeeeeeee enne nne 71 9 7 2 SNaN to QNaN Transformation 71 9 7 3 Operations with NaN Operands sessssseseeeeee eene 71 9 7 4 NaN Results from Operands without NaNs eese 73 9 8 eee eene eene nennen nnne nnns 73 9 8 1 Response to Subnormal Operands sse 73 9 8 2 Subnormal Number Generation see
17. 63 Memory Controller 8 49 8 50 un microsystems 0 internal banking disable 1 internal banking enable rfr mrs pcall spread Memory_Timing1_CTL bit 56 This bit is used to determine whether to spread refresh mode register setting and precharge all to a CK DIMM into two consecutive commands 0 no spread 1 spread When turning on the rfr_mrs_pcall_spread the software must also add additional 2 clkr cycles to the value of auto_rfr_cycle that was set when the feature is off Otherwise it may cause unexpected behavior TABLE 8 2 summarizes the setting of these three additional bits Note that when the NG DIMM is selected the int bank enableand rfr mrs pcall spread bits are ignored by the hardware In this case no internal banking and no command spreading are allowed TABLE 8 2 CK DIMM mode setting UItraSPARC IV processor MCU operation mode DIMM Type Internal Banking rfr mrs pe spread mem tim5 ctl 21 mem addr ctl 63 mem tim1 ctl 56 NG DIMM 0 X X CK DIMM internal bank enabled spread enabled CK DIMM internal bank enabled spread disabled CK DIMM internal bank disabled spread enabled CK DIMM internal bank disabled spread disabled Note Only bank 0 amp 1 are available when the CK DIMM is used and the internal banking is disabled Note The other bits of the Memory Address Control register are not changed and should maintain their behavior as in the UltraSPARC
18. 74 9 9 Conditions for Software Trapping 76 H Error Handling p 77 10 1 Error Handling in UltraSPARC IV Processors 7 T 10 1 1 Error Reporting Specific to a Logical Processor ceseesceseeseeeeeneeeeees 77 10 1 2 Shared Resource Error Reporting 79 10 1 3 Lasting of CM EO inte sete Tete i Rt 81 iv UltraSPARC IV Processor User s Manual April 2004 TABLE 2 1 TABLE 2 2 TABLE 3 1 TABLE 3 2 TABLE 3 3 TABLE 3 4 TABLE 3 5 TABLE 3 6 TABLE 3 7 TABLE 3 8 TABLE 3 9 TABLE 3 10 TABLE 3 11 TABLE 4 1 TABLE 4 2 TABLE 4 3 TABLE 4 4 TABLE 4 5 TABLE 4 6 TABLE 4 7 un microsystems List of Tables Enhancements to the UltraSPARC IV Processor s Core oo seeeeeceeesseseeseseeeneesececeeeeeeneeeeeeeees 4 5 1 cat tet eae 6 11 EP Interrupt ID Register Fields ueterem pee dert ie ette 12 CESR ID Register iei adem e e dece RUE RE EE e pe e eT 13 LP Available Register Shared sse enne 14 LP Enable Status Register Shared 7 15 LE Enable Register Shared xz der AO aee tetigere do ede d ges 15 LP Running Register Shared srenti rite aare aA eE EE AEK e Ea E ae edee Sine 17 LP Running Status 19 XIR Steering Register Shared 97 21 UltraSPARC IV Processor Private Registers 22 UltraSPARC IV Process
19. III Cu processor UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 9 IEEE 754 1985 Standard The implementation of the floating point unit for standard and nonstandard operating modes are described in this chapter This chapter defines debug and diagnostics support in these sections Chapter Topics Introduction on page 51 Floating Point Numbers on page 53 IEEE Operations on page 55 Traps and Exceptions on page 64 IEEE Traps on page 67 Underflow Operation on page 69 IEEE NaN Operations on page 70 Subnormal Operations on page 73 Conditions for Software Trapping on page 76 9 Introduction 9 1 1 Floating Point Operations Floating point Operations FPops include the algebraic operations and usually do not include the specially treated floating point Load store FBfcc or the VIS instructions The FABS FNEG and FMOV instructions are also treated separately from the algebraic operations 9 51 un microsystems 9 12 9 1 9 9 1 4 91 5 9 52 Rounding Mode The rounding mode of the floating point unit is determined either by the FSR RD bit while in standard rounding mode or by the GSR IRND bit when in interval arithmetic rounding mode The rounding direction effects the result after any under or overflow condition is detected Underflow is detected before rounding TABLE 9 1 FSR RD bit options FSR RD Round Toward 0 Nearest even if tie 1 0 2
20. The LRU bit is not covered by the ECC The ECC value of zero L2 cache tag is also 0 Thus after STXA 0x40 all lines will have correct ECC values and will be in INVALID states UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 5 Reset RED state and Error state This chapter supplements Chapter 18 of the UltraSPARC III Cu Processor User s Manual and contains additional information for the UltraSPARC IV processor Chapter Topics Machine States After Reset on page 37 5 1 Machine States After Reset TABLE 5 1 and TABLE 5 2 list the states of the newly added registers and fields at hard POR and system reset Soft POR These new added registers or fields are unchanged after Watchdog Reset WDR External Initiated Reset XIR Software Initiated Reset SIR or after entering RED_state 5 37 un microsystems Register ASR 18 TABLE 5 1 UltraSPARC IV Processor New Defined Private Register Field Reset Machine State Hard_POR System Reset No New Register Field State Soft POR Comments 1 ASI_ECACHE_CT All 0 Unchanged Default to RL 0x75 direct mapped VA 0x00 L2 cache in both LPs 2 ASI ECACHE CT All Undefined Unchanged Unused in the RL2 0x75 VA UltraSPARC IV 0x08 processor in both LPs 3 ASI_CORE_ID Max_LP_ID 000001 nchanged 2 LPs per IPO UltraSPARC IV processor 000000 Unchanged ASI CORE ID Max LP ID 000001 nchanged 2 LPs per L
21. User s Manual April 2004 un microsystems Name ASI CESR ID ASI 0x63 VA 63 0 0x40 Read Write Privileged Access TABLE 3 3 CESR ID Register Bit Field Description 63 8 Reserved Reserved 7 0 CESR ID The CESR ID field is an 8 bit CESR ID in the bus transaction For a RBIO WBIO transaction CESR 7 0 is encoded appropriately Note The CESR_ID only affects the Sun Fireplane Interconnect RBIO and WBIO transactions It does not affect other types of Sun Fireplane Interconnect transactions 3 4 3 4 1 Disabling and Suspending Logical Processors The CMT programming model provides the ability to disable or temporarily suspend logical processors This section describes the interface for probing which logical processors are available enabled and not suspended This section also describes the interface for enabling disabling and suspending running logical processors The registers described in this section are shared between logical processors LP Available Register AST CORE AVAILABLE The LP Available register is a shared register that indicates the number of logical processors implemented in a CMT processor and which logical processor numbers are assigned to them Name ASI CORE AVAILABLE ASI 0x41 VA 63 0 0x00 Read Only Privileged The LP Available register is a read only register with fields in which each bit position corresponds to a logical processor Bit 0 represents LP
22. a local RS on the bus with Specific to a LP PTA state dl 48 RTSR_ER IERR Detect a local RTSR on the bus Specific to a LP with PTA state dT or dO 49 RTOR_ER IERR Detect a local RTOR with Specific to a LP PTA state dT 50 RSR_ER JERR Detect a local RSR on the bus Specific to a LP with PTA state dl Error Handling 10 85 un microsystems TABLE 10 11 Snoop Result Errors Error Bit Type Description Comment 51 RTS_SE PERR Local RTS Shared with Error Specific to a LP SharedIn 0 and OwnedIn 1 52 RTO_NDE Local RTO no data and SharedIn Specific to a LP 0 53 RTO_WDE PERR Local RTO wait data with Specific to a LP SharedIn 1 TABLE 10 12 Mtag Errors Error Bit Field Type Description Comment 54 SSM_MT Mtag gM in non SSM mode Specific to a LP 55 SSM_URT Unexpected remote transaction Not specific to a R_ in non SSM mode LP 56 SSM_URE Unexpected reissued transaction Not specific to a from SSM device transactions LP that are not initiated by UltraSPARC IV 57 SSM_IMT PERR Illegal MTag on returned data Not specific to a Mtag gl for RTSR RSR Br MTag gl gS for RTOR TABLE 10 13 Internal errors on the PENDQ and QCTL Bit Field Error Type Description Comment 58 CPBK MH IERR Multiple hits in fast copyback Specific to a LP buffer 59 PTA OV IERR Too many transaction hit on a Specific to a LP same PTA entry attempt to increment PTA c
23. end of the reset should be set to run by the master logical processor at the proper time in the booting process LP Running Status Register AST CORE RUNNING STATUS Since there is a delay from when a logical processor is directed to suspend until it actually becomes suspended the LP Running Status register is provided to indicate when a logical processor actually becomes suspended The LP Running Status register is a shared read only register where each bit indicates if the corresponding logical processor is active UltraSPARC IV Processor User s Manual April 2004 un microsystems In the UltraSPARC IV processor a logical processor is considered suspended successfully if the following conditions are satisfied 1 No instruction in the instruction queue and logical processor 2 No pending I cache fetch D cache load D cache store P cache load and W cache eviction requests 3 No requests in the Store Queue Note A D cache load is considered finished if the D cache has received the data Name ASI CORE RUNNING STATUS ASI 0x41 VA 63 0 0x58 Privileged Read Only JTAG Accessible TABLE 3 8 LP Running Status Register Shared Bit Field Description 63 2 Reserved Reserved Must be 0 when read 1 LP1 This bit represents LP 1 0 LP 0 This bit represents LP 0 As shown in TABLE 3 8 the LP Running Status register is a 64 bit register Each bit of the register represents one logica
24. for directed read write EC way 0 Way 0 EC way 1 Way 1 If the disp flush field is set it means displacement flush If it is clear L2 cache tag access is performed Note Displacement flush will invalidate the line and cause writeback if the line is dirty In this case data return from EMU is undefined Note For displacement flush use only LDXA STXA has NOP behavior Since EMU will return garbage data to the MS pipeline it is recommended to use the 1388 reg_addr ASI_ECACHE_TAG g0 instruction format TABLE 4 12 4 MB L2 cache Tag State Access Data Format Bit Field 63 43 Reserved 42 LRU 41 21 EC tag 20 3 Reserved 2 0 EC 0 TABLE 4 13 8 MB L2 cache Tag State Access Data Format Bit Field 42 LRU 41 21 EC tag 20 6 Reserved 5 3 EC statel 2 0 EC state In TABLE 4 13 the LRU field is a 1 bit LRU bit The EC tag field is a 21 bit physical tag field EC tag 41 21 PA 41 21 of associated data for 4 MB EC tag 41 22 PA 41 22 of associated data for 8 MB UltraSPARC IV Processor User s Manual April 2004 4 3 un microsystems Note In the UltraSPARC IV processor and UltraSPARC III Cu processor PA 42 is removed from all cache tags since in all UltraSPARC III Cu processor based platforms PA 42 is always 0 for cacheable address space Note When writing the L2 cache tag using direct
25. more cores Chip Multithreading CMT A processor capable of executing 2 or more software threads simultaneously without resorting to a software context switch Chip Multithreading may be achieved through the use of multiple processor cores supporting multiple threads per core or a combination of these strategies General CMT Behavior In general each logical processor of a CMT processor behaves functionally from the viewpoint of software visibility as if it was an independent unit This is an important aspect of CMT because user code running on a logical processor need not know whether or not that logical processor is part of a CMT device The operating system exploits logical processors to simultaneously schedule multiple threads of execution Various low level software boot error diagnostic among others must be aware of multiple logical processors This chapter describes mainly the interface between low level software and multiple logical processors Logical processors obey the same memory model semantics as if they were independent processors All multiprocessing libraries thread libraries and code will be able to operate on multiple logical processors without any modification UltraSPARC IV Processor User s Manual April 2004 un microsystems Note All previous documentation including the UltraSPARC III Cu Processor User s Manual and The SPARC Architecture Manual Version 9 use the term processor When these ear
26. rounding is summarized in TABLE 9 15 Define a few ter ms uis the unrounded exact value of the result ris the rounded value of U occurs when there is no trap generated e Underflow is when 0 gt Juj gt smallest Normal number TABLE 9 15 Underflow Exception Summary Underflow enabled UFM 1 masked UFM 0 masked UFM 0 Inexact don t care NXM x enabled NXM 1 masked NXM 0 u r ris minimum Normal none none none exa i r is Subnormal set ufc ieee trap none none result r is Zero none none none ur ris minimum Normal set ufc ieee trap set nxc ieee trap set ufc set ufa inexact S Subnormal set ufc ieee trap set nxc ieee trap set ufc set ufa result r is Zero set ufc ieee trap set nxc ieee trap set ufc set ufa set nxc means FSR cexc nxc setto 1 set ufc means FSR cexc ufc set to 1 set ufa means FSR aexc ufa set to 1 ieee trap means fp exception ieee 754 9g 9 70 IEEE N aN Operations When a NaN operand appears or a NaN result is generated and the invalid nv trap is enabled FSR TI EM NVM 1 then the fp exception ieee 754 occurs If the invalid nv trap is masked FSR TEM NVM 0 then a signalling NaN operand is transformed into a quiet NaN A quiet NaN operand will propagate to the destination register Subnormals operations are described in TABLE 9 16 Results from NaN Operands on page 72 Whenever a NaN i
27. rs operand The value of the constants dependent on precision type see TABLE 9 17 TABLE 9 17 Subnormal Handling Constants per Destination Register Precision Destination Register Precision Number of Bits in Exponent Bias Exponent Max Exponent Gross Exponent Field Epias Emax Underflow Egur Double 11 1023 2047 53 e For FMULs and FMULd Es E rs E rs 6 For FDIVs and FDIVd E E rs E rsz Egjas 1 When two Normal operands of FMULs d and FDIVs d generate a Subnormal result the Erb is calculated using the algorithm shown in code example 9 1 UltraSPARC IV Processor User s Manual April 2004 un microsystems CODE EXAMPLE 9 1 Normal Operands Generating a Subnormal Result Pseudocode If fraction_msb overflows i e fraction msb gt 1 d2 ELSE E For FdTOs E E rs EpgjAs P rs Epras P_rd where P rs is the larger precision of the source and P rd is the smaller precision of the destination Even though 0 gt E rs or E rs gt 255 for each single precision biased operand exponent the computed biased exponent result E can be 0 E 255 or can even be negative For example for the FMULs instruction If E rs E rs5 127 then E 127 127 127 127 If E rs E rs 0 then E 127 0 0 127 Overflow Result If the appropriate trap enable masks are not set FSR OFM 0 and FSR NXM 0 then set FSR ae
28. when the logical processor is set to run If however no interrupt buffer is available the interrupt is NACK ed The STICK and TICK counters will continue to count while a logical processor is suspended Suspending logical processors is intended for critical diagnostic and recovery code The interference with performance monitors using the TICK or STICK counters should not be a general issue Using the TICK or STICK counter to detect the suspending of a logical processor is not recommended LP Running Register ASI CORE RUNNING The LP Running register is a shared register used by software to suspend and run selected logical processors When a logical processor is suspended the logical processor stops executing new instructions and will not initiate transactions except in response to a coherency transaction initiated by another logical processor There may be an arbitrarily long but bounded delay from when the LP Running register is updated until the corresponding logical processor s actually suspends or is set to run The LP Running register is described in TABLE 3 7 is used by software to suspend selected logical processors Name ASI CORE RUNNING RW ASI 0x41 VA 63 0 0x50 Privileged Read Write JTAG Accessible Name ASI_CORE_RUNNING_W1S ASI 0x41 VA 63 0 0x60 Privileged Write Only Write One to Set Name ASI_CORE_RUNNING_W1C ASI 0x41 VA 63 0 0x68 Privileged Write Only Write One to Clear TABLE 3
29. 0 LP ID A LP ID field which represents this logical processor s number as assigned by the hardware The LP ID is encoded in 6 bits In the UItraSPARC IV processor one logical processor has a value of 6 b000000 the other logical processor has a value of 6 b000001 LP Interrupt ID Register AST INTR ID The LP Interrupt ID register described in TABLE 3 2 is added to support the Sun Fireplane Interconnect interrupt transaction This register is used to differentiate to which logical processor the interrupt is sent This private register is used by software to assign a 10 bit interrupt ID to a logical processor that is unique within the system This is important to enable logical processors to receive interrupts The ID in this register is used by other logical processors and other bus agents to address interrupts to this specific logical processor It is Chip Multithreading CMT 3 11 un microsystems 2 39 3 12 also used by this logical processor to identify the source of interrupts it issues to other logical processors and bus agents It is expected to be changed only at boot or reconfiguration time Name ASI INTR ID ASI 0x63 VA 63 0 0x00 Read Write Privileged Access Note The UltraSPARC IV processor sets the Sun Fireplane MID 9 5 to SID U and MID 4 0 to SID_L The source of MID 9 0 is the 251 INTR ID 9 0 of the logical processor issuing the INT TABLE 3 2 LP Interrupt ID Register Fields Bi
30. 0 87 un microsystems TABLE 10 15 Internal errors of the ECU Bit Field Error Type Description Comment 75 MPT ERR IERR Miss request protocol error Specific to a Handshaking protocol LP ec si rq si ec req ack between SIU and ECU is broken 76 Multiple hits for any Etag Specific to a access LP 77 EC ILL WAY Illegal way select info when Specific to a ECU allocates for a new Etag LP entry 78 EC ILL CAM HIT IERR Illegal CAM hit on the new Specific to a ECache miss request LP TABLE 10 16 lists the new errors introduced in the UItraSPARC IV processor When one of these errors happen the IERR bit in the AFSR will be set TABLE 10 16 UltraSPARC IV Processor New Internal Error in TOB Bit Field Error Type Description Comment 79 CA FSM ILL IERR CA FSM encounters illegal Notspecific state toa LP 80 CA GNT ERR IERR Both LPs are getting grant Notspecific to a LP 81 XAID REO ILL IERR Simultaneous xaid request Notspecific from both LPs toa LP 82 AID TBL CNFT IERR Same AID shared by both Notspecific LPs toa LP 83 LP AID TAB ILL IERR Main AID table is free yet Notspecific individual LP AID tables are toa LP allocated 84 ARB SYNC ERR IERR Fireplane address arbiter out Notspecific of sync to a LP 85 XACTN_OE_ILL TERR xactn output enable enabled Notspecific by both LPs to a LP 10 88 UltraSPARC IV Processor User s Manual April 2004
31. 1 New MCU Timing Control Register Bit Feild 63 23 Reserved 22 add_le_hold 21 dimm_type 20 addr le pw 3 19 cmd pw 4 18 Reserved 17 rd msel dly 6 16 rdwr rd ti dly 6 15 Reserved 14 rdwr rd ti dly 6 13 Reserved 12 wr wr ti dly 6 11 rdwr rd pi more dly 5 10 sdram ctl dly 4 9 sdram ctl dly 3 8 auto rfr cycle 7 7 rd wait 5 6 1 Reserved 0 rfr int 9 Except for bits 21 and 22 all other parameters have the same meaning as those in the UItraSPARC III Cu processor except that their maximum values are 2 times that of the UItraSPARC III Cu processor Bit 22 is defined as follows addr le hold Address Hold Time to Address Latch Enable 0 2 processor clock cycles default 1 3 processor clock cycles The reserved bits have no effect when writing and will return 0 when reading Note The UltraSPARC IV processor supports 0 1 and 2 wait states It does not support 3 wait states UltraSPARC IV Processor User s Manual April 2004 8 2 un microsystems Note There is only one copy of MCU registers including those in the UltraSPARC III Cu processor and the new one defined in this section These registers can be accessed by using ASI or PIO However the ASI access is only available for the logical processors that are on the same die as these registers and the PIO access is only available for foreign UltraSPARC IV processor agents The UltraSPARC III Cu processor MCU
32. 1985 Standard 9 71 9 72 un microsystems TABLE 9 16 Results from NaN Operands RESULT from the operation includes one or more of the following Number in f register see Trap Event note page 66 Exception bit set see TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 Operation Underflow Overflow may occur Masked Exception Enabled Exception TEM NVM 0 TEM NVM 1 rd or fcc Register da 9 iis flag set Register flag set Written d Written One Operand rs rd NaN NaN ONaN Q 1 no Q 1 no see note see note aN gt a set nvc tn SNaN SNaN ON N is se vc see note set nva ieee trap Two Operand rs rs rs rs gt rd ONaN ONaN QNaN 2 no QNaN 52 no ONAN anything except SNaN and ONan ONaN no no SNaNrs2 gt set me set nvc SNaN SNaN QNaN no ieee NU see note set nya P 5 set nvc t SNaN anything except SNaN 2DAN QNAN no We see note set nva ieee trap set nvc set nvc SNaN or QNaN anything fec 3 unordered no x set nva ieee trap set nvc set nvc SNaN anything fcc 3 unordered no is set nva ieee trap FCMPs d ONAN anything except SNaN fcc 3 unordered no nee no unordered 1 For the Fs dTOs d and other instructions see section 9 7 2 SNaN to QNaN Transformation on page 71 Note Notice from TABLE 9 16 that the compare and cause exception if unordered instruction FCMPEs d will cause an invalid nv e
33. 66 Summary of Exceptions TABLE 9 12 Floating point Unit Exceptions IEEE Trap Description Flag Abbreviation Fault Trap Type Exception Trap Vector Floating point unit fp_disabled disabled none disable trap none 02016 Floating point operation ay invalid IEEE Floating point operation ut overflow IEEE Floating point operation IEEE 745 exception fp_exception_ieee_754 SP P uf ieee trap P P P underflow IEEE FSR FTT 1 02149 Floating point operation de division by zero IEEE Floating point operation m inexact IEEE Trap Event When a floating point exception causes a trap the trap is precise The response to traps is described in TABLE 9 13 TABLE 9 13 Response to Traps fp_exception_other unimpleme unfinished Exception Event gt fp disabled nted FPop _FPop fp_exception_ieee_754 Resulting Action Address of instruction that caused the trap is put in the PC and pushed onto the trap stack The destination f register rd is unchanged from its state prior to the execution of the instruction that caused the trap The floating point condition codes fccN are unchanged The FSR aexc field is unchanged The FSR cexc field is unchanged Appropriate bit is set to 1 The FSR ftt field is set to nc 3 2 1 UltraSPARC IV Processor User s Manual April 2004 un microsystems 9 4 3 Trap Prior
34. 7 LP Running Register Shared Bit Field Description 63 2 Reserved Reserved Must be 0 when read 1 0 LPO This bit represents LP 0 This bit represents LP 1 The LP Running register is a 64 bit register Each bit of the register represents one logical processor with bit 0 representing LP 0 and bit 1 representing LP 1 Chip Multithreading CMT 3 17 un microsystems 3 4 3 2 3 18 Once a logical processor is set to suspend the logical processor will stop fetching instructions complete the instructions in the logical processor and the instruction buffers and then become idle When the logical processor is set to run it continues execution from the point it was suspended A logical processor is allowed to suspend itself A logical processor that suspends itself should follow the ASI write by a FLUSH instruction This satisfies the ASI writing rules and guarantees that the logical processor will be suspended and no instructions will be executed following the FLUSH if the logical processor is successfully suspended The FLUSH instruction itself may be erected before or after the logical processor is suspended Note The UltraSPARC IV processor will not allow software to set both logical processors to be suspended On an update to the LP Running register that would cause both logical processors to become suspended the logical processor making the update is automatically set to run by hardware T
35. ASI access the correct L2 cache tag ECC bits are also automatically generated and written to the L2 cache Tag ECC array To intentionally inject errors the ECC value can be changed using direct ASI write see Section 4 3 Note Each UltraSPARC IV logical processor contains 32K LRU bits They are addressable by VA 20 6 4 MB or VA 21 7 8 MB The EC way signal has no effect on accessing the LRU bits In direct mapped mode normal L2 cache accesses do not update the LRU bits hence the ASI ECACHE TAG read should return 0 unless the LRU bits have been updated by the ASI ECACHE TAG write ASI Access to L2 Cache Tag ECC Bits ASI Ox4E VA 63 24 0x0 VA 23 0x1 For direct mapped L2 cache VA 21 6 EC_tag_addr for 4 MB VA 22 7 EC_tag_addr for 8 MB For 2 way L2 cache VA 21 EC_way VA 20 6 EC_tag_addr for 4 MB VA 22 EC_way VA 21 7 EC_tag_addr for 8 MB VA 5 0 TABLE 4 14 4 MB and 8 MB L2 Cache Tag State Access Data Format Bit Field Description 63 8 Reserved Reserved 7 0 ECC_value The ECC_value field is an 8 bit ECC value written to read from L2 cache Tag ECC RAM Caches and Cache Coherency 4 35 e Sun microsystems 4 36 Note The UltraSPARC IV processor uses the same algorithm as the UltraSPARC III Cu processor to generate L2 cache tag ECC The signals covered by the L2 cache tag ECC include the tag and the coherence states
36. C ertors 2 eee e nega e ad EE EEG 82 Internal etrors of the MCU 66 82 Internal Error ofthe Write Cache 82 System Bus Protocol Error Data 83 Internal Errors of the DPCTL 1 esistere iter tob dr ritenere eb dne 83 System Bus Protocol Errors Transaction 84 Cache Consistency ETTOfS oec cech teer a eR n UH ECRIRE RE EUR 85 Snoop Result Errors inccr RD ERE CREE ENEE GF ERE 86 arnee a ea EE E E E A AE T A TS AES 86 Internal errors on the PENDQ and OCT 86 Intermal Errors of th TOB 87 T ternalerrorsof the ECU 2 1 oer etr rer rip e P URS Se ipei E e E Ea 87 UltraSPARC IV Processor New Internal Error in TOB sees 88 List of Tables vii un microsystems viii UItraSPARC IV Processor User s Manual April 2004 un microsystems List of Figures FIGURE 3 1 CMT Register Changes During Reset 24 FIGURE 9 1 Floating point Number Line deriessinebeReer de Redeem ee ete 55 List of Figures ix un microsystems UItraSPARC IV Processor User s Manual April 2004 un microsystems Preface This book contains information about the architecture and programming of the UltraSPARC IV processor one of Sun Microsystems family of SPARC V9 compliant processors This document is a supplement to the UltraSPARC III Cu Processor User s Manual and should be read in conjunction with that document This document extends the material in the UltraSPARC III C
37. Exception TEM 0 Floating point to 32 bit integer when the source operand is not between 23 1 and 2 then the result is inexact Integer number nx ieee trap nx Floating point to 64 bit integer when the source operand is not between 2 1 and 263 then the result is inexact Integer number nx ieee trap nx Integer to floating point when the 32 bit integer source operand magnitude is not exactly representable in single precision 23 bit fraction Single Precision Ge Normal nx nx ieee trap UltraSPARC IV Processor User s Manual April 2004 un microsystems TABLE 9 14 Floating Point o Integer Conversions that Generate Inexact Exceptions Masked Exception TEM 1 Unmasked Instruction Conversion Description Exception TEM 0 Integer to floating point when the 64 bit integer source operand magnitude is not exactly representable in single precision 23 bit fraction Single Precision nx ieee tra Normal nx P Integer to floating point when the 64 bit integer source operand magnitude is not exactly representable in double precision 52 bit fraction Double Precision nx ieee trap Normal nx FxTOd 224 1 Even if the operand is 2 1 if enough of its trailing bits are zeros it may still be exactly representable 2 Even if the operand is 2 1 if enough of its trailing bits are zeros it may still be exactly repre
38. Multithreading CMT The UltraSPARC IV processor supports Sun s new software interface and registers to support logical processor identification reset diagnostics and error reporting These CMT registers can be classified as private or shared Chapter Topics Introduction on page 7 Accessing CMT Registers on page 9 Private Processor Registers on page 10 Disabling and Suspending Logical Processors on page 13 Reset Handling on page 20 Private and Shared Registers Summary on page 22 CMT Register Changes Due to Reset on page 24 Introduction This chapter corresponds to Sun s common interface between hardware and software and addresses issues common to CMT processors CMT Definition A CMT processor is defined by its external visible nature and not its internal organization The following section provides background terminology followed by a description of the CMT definition 3 7 un microsystems 3 1 1 1 3 1 2 Background Terminology Thread The basic unit of program execution a stream of computer instructions that is in control of a process Logical Processor LP The abstraction of a processor s architecture that maintains the state and management of an executing thread Core A hardware unit that instantiates one or more logical processors Processor A single piece of silicon that interprets and executes operating system functions and other software tasks A processor is implemented by one or
39. NING and ASI CMP ERROR STEERING registers will be preserved In other words considering the initial states after System reset of these two registers are unchanged unless overwritten by JTAG Reset RED_ state and Error_state 5 41 un microsystems 5 42 UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 6 Performance Instrumentation This chapter supplements Chapter 14 of the UltraSPARC III Cu Processor User s Manual and contains additional information for the UltraSPARC IV processor TABLE 6 1 lists the counters that count differently in the UItraSPARC IV processor in comparison with the UltraSPARC III Cu processor TABLE 6 1 Counter Behavior differences UItraSPARC IV Processor UltraSPARC IlI Cu Processor Counter Encoding Behavior Behavior EC ref PIC SL 001100 Total L2 Cache references Total L2 Cache references PICL excluding non cacheable and excluding non cacheable speculative load accesses accesses but including speculative load accesses 6 43 e Sun microsystems 6 44 UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 7 Assembly Language This chapter supplements Appendix B of the UltraSPARC III Cu Processor User s Manual and contains additional information for the UltraSPARC IV processor Chapter Topics Prefetch Instruction on page 45 7 1 7
40. ORE AVAILABLE enn 13 3 4 2 Enabling and Disabling Logical Processors eseeee 14 3 4 3 Suspending and Running Logical Processors sese 16 3 5 Reset Handling orarie ac ri E api AEE EE D coe de RE at 20 3 5 1 Private Resets SIR and WDR Resets ccccccssceeesceeeneeeeeeeeeeteeeeeneeenes 20 3 5 2 Full CMT Resets System Reset sse 20 3 5 3 Partial CMT Resets XIR Reser 20 3 6 Private and Shared Registers Summary essen 22 3 6 1 Implementation Registers 0 4 6444 22 3 7 CMT Register Changes Due to Reset ccccecescecesscessneeeeseeeeeneeeesaeeessneeeseeeeseeenes 24 Caches and Cache Coh renty 25 4 1 Wnte Cache W cache s e dae Gat e HRS Bs t REO debet dee 25 4 2 External E2 Cache ete or e SES 27 4 2 1 L2 Cache Control Register cecceccccecesceesseceseeeeeeneeeesaeeesseeessneeeeseeenes 27 4 2 2 Shared L2 Cache Configuration and Timing Control Register 29 4 2 3 Secondary L2 Cache Control Register sss 30 4 2 4 2 Way Support in L2 Cache Data ECC Fields R W Beo e m e a PRU 30 4 2 5 Direct L2 Cache Tag Bank Access and Displacement Flush 32 4 3 ASI Access to L2 Cache Tag ECC Bits sssssssssseeeeeeeeeeneee nen 35 Reset RED state and Error state
41. P1 UltraSPARC IV processor 000001 Unchanged LP ID 4 ASI INTR ID All Undefined Unchanged Undefined for both LPs 5 ESTATE ER EN REG 6 CESR ID Unchanged 7 DCU CONT 0 Default to use REGISTER PA 8 6 to index W cache 8 Dispatch Control OBS 11 6 E Unchanged UltraSPARC IV Processor User s Manual April 2004 TABLE 5 2 System Reset un microsystems UltraSPARC IV Defined Shared Registers Field Reset Machine State No New Register Hard POR Soft POR Comments 1 ASI_ECACHE_CFG EC_assoc nchanged Default to _TIMING_CTRL direct mapped 0x73 VA 0x00 ee trace_out nchanged Default to 6 cycles 6 6 5 trace_in nchanged Default to 5 cycles 6 6 5 EC_clock nchanged Default to 6 1 L2 cache clock ratio EC_size nchanged Default to 8 MB L2 cache EC turn nchanged Default to 2 rw cycles Others nchanged 2 New Sun Fireplane CLK 2 nchanged Default to 6 1 Interconnect Clock 1 0 system clock Ratio in ratio SAFARI CONFIG and SAFARI CONFIG 21 3 SAFARI CONF 1G ASI INTR ID ASI INTR Default to 9 0 of LP 0 _ID 9 0 of reflect LP 0 s LP 0 INTR_ID 4 Mem_Timing5_CTL Undefined Unchanged 5 Mem_Address_CTL Undefined Unchanged Default to disable internal banking 6 ASI CORE AVAIL 63 0 3 decimal 3 decimal UItraSPARC IV ABLE processor hardware 0x41 VA 0x00 always sets 3 decimal to this register Reset RED s
42. U Timing Control Register 48 CK DIMM mode set tg ebe Steeg RR Na eege 50 ESR RD bitoptiOnS 6 2 Ploating point N mlbers 2 22 bed eee deo Ii RE Een dee 53 Floating point Addition nione ere eee e OR ERE EH Here bandas HE ERR OR ERO 56 Floating point Subtraction 2 5 rete E Hee te it PERI ine ae 57 Floating point Multiplication 58 Floating point Division MM 59 7 0 0 Number Compare P 0 0 ied ona ole a iaa em e ed et dee 61 Floating point to Integer Number Conversion 62 Integer to Floating point Number Conversion eseeeeeeeneee eene emere 63 Floating point Unit Exceptions 66 Response to KC E 66 Floating Point Integer Conversions that Generate Inexact Exceptions sse 68 UItraSPARC IV Processor User s Manual April 2004 TABLE 9 15 TABLE 9 16 TABLE 9 17 TABLE 10 1 TABLE 10 2 TABLE 10 3 TABLE 10 4 TABLE 10 5 TABLE 10 6 TABLE 10 7 TABLE 10 8 TABLE 10 9 TABLE 10 10 TABLE 10 11 TABLE 10 12 TABLE 10 13 TABLE 10 14 TABLE 10 15 TABLE 10 16 un microsystems Underflow Exception 7 20 Results from NaN Operands 72 Subnormal Handling Constants per Destination Register Precision 74 EMU Error Mask Register Additional Bits sese 78 L2 cache Error Enable Register Format 79 CMT Error Steering Register Shared 0 nene 80 Etag EC
43. age 67 In general Only floating point operations FPops will update cexc and only when an exceptional condition is detected All other instructions will leave cexc unchanged When an exception is detected but the trap is masked then the FPop will update the appropriate aexc field of the FSR Prediction Logic Prediction logic is used by the hardware to predict overflow underflow and inexact traps Prediction always errs on the side of providing correct results when the hardware can do so and generating an exception when it cannot or the hardware is not sure Prediction of inexact occurs unless one of the operands is a Zero NaN or Infinity When prediction occurs and the exception is enabled system software will properly handle these cases and resume program execution If the exception is not enabled the result status is used to update the FSR aexc and FSR cexc bits of the FSR 9 2 Floating Point Numbers The floating point number types and their abbreviations are shown in TABLE 9 2 In general the IEEE 754 1985 Standard reserves exponent field values of all 0s and all 1s to represent special values in the standard s floating point scheme TABLE 9 2 Floating point Numbers Data Representation Number Type Abbreviation Sign Exponent Fraction Zero 0 Oor 1 000 000 0 000 000 1 Subnormal SbN 0 1 to 1 1 000 001 90 Normal Normal Oor 1 to to 1 oO ET 1 IEEE 754 1985 Standar
44. al processors Each logical processor has access to the same size external cache as the UltraSPARC III Cu processor however the UltraSPARC IV processor s caches have smaller lines for less contention and optimal Least Recently Used LRU replacement The primary design goal for the UltraSPARC IV processor is to improve the performance on commercial applications such as databases and web servers The following three key techniques are used to improve the UltraSPARC IV processor s performance Integrated two cores on a single processor This technique significantly increases throughput per cubic foot per Watt and per dollar Improved L2 cache configuration Each logical processor has access to an 8 MB 2 way set associative cache The line sizes are also reduced from 512 bytes to 128 bytes to reduce extra contention with sub blocked caches In addition a more optimal cache replacement policy LRU is used 2 3 un microsystems Enhanced Floating Point Unit and Write Cache The write cache is enhanced with hashed index to reduce conflict misses especially in case of multiple write streams This enhancement helps codes such as high radix Fast Fourier Transform FFT Executing applications share the address and data bus when accessing the L2 cache data the Memory Control Unit MCU and the Sun Fireplane Interconnect port The bus to the L2 cache and the physical SRAM modules containing the L2 cache is shared The two L2 cac
45. ased upon architecture developed by Sun Microsystems Inc DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID e Sun microsystems UltraSPARC IV Processor User s Manual April 2004 amp Sun microsystems Table of Contents Preface ete xi Introducing the 1 1 1 OVEDVIEW AeA SONG end E 1 Architectural 3 2 1 Introduction Sire ptt aia 0 des 3 2 2 New Features in the UltraSPARC IV Processor eee 4 2 3 RAS AtTChitect te erogare Ho eb P OR ive DOR eoe deci kde 5 Chip Multithreading 7 3 1 Vitro dW Ch Oth E D 7 CMT Definition 52e e he ec thresh ei ee edet 7 3 1 2 General CMT Behavior 8 3 2 9 3 21 Typ s of CMT Registers ooer neninn e Bes 9 3 2 2 Accessing CMT Registers Through ASI Interface sessss 10 3 3 Private Processor 10 3 3 1 LPID Register ASI CORE ID 11 3 3 2 LP Interrupt ID Register ASI INTR 110 ssessesssesssersseesserrserrsssresreessees 11 3 3 3 CESR Cluster Error Status Register ID Register 12 Table of Contents amp Sun microsystems 3 4 Disabling and Suspending Logical Processors 13 3 4 1 LP Available Register ASI C
46. ation relevant to the previous SPARC V8 architecture Note This highlights a useful note regarding important and informative processor architecture or functional operation This may be used for purposes not covered in one of the other notes Preface xiii e Sun microsystems xiv UItraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 1 Introducing the UltraSPARC IV Processor Chapter Topics Overview on page 1 L3 Overview The UltraSPARC IV processor is derived from Sun Microsystems high end UltraSPARC III processor providing the same fundamental features and offering the advantage of high throughput utilizing Chip Multithreading CMT technology The UltraSPARC IV processor features two cores each based on the UltraSPARC III processor From the software perspective the UltraSPARC IV processor appears as two software visible logical processors It implements both the full 64 bit SPARC V9 architecture and version 2 0 of Sun Microsystems VIS M instruction set The VIS instruction set provides a wide range of Single Instruction Multiple Data SIMD acceleration functions for working with 8 16 and 32 bit data values pixel manipulation 2D image processing 3D graphics data compression and other specialized performance critical operations In common with all other members of the UltraSPARC III processor family the UItraSPARC IV processor is a 4 way superscalar processor mean
47. cache data is divided into two parts one for LP 0 the other for LP 1 If it is accessed by LP 0 then ex addr 22 is always equal to 0 on the other hand if it is accessed by LP 1 then ex addr 22 is always equal to 1 The UltraSPARC IV processor supports 6 6 5 and 6 6 6 L2 cache modes in addition to the UItraSPARC III Cu processor modes L2 Cache Control Register As mentioned before the L2 cache Control register described in TABLE 4 2 is the same as the register accessed by ASI ECACHE CTRL in the UltraSPARC III Cu processor except that the EC assoc addr setup trace out ZZ trace in EC turn rw Caches and Cache Coherency 4 27 e Sun microsystems 4 28 EC_early EC_size and EC_clock fields are removed The bits for these fields are reserved in the UltraSPARC IV processor Writing to bits 23 11 has no effect reading returns an undefined value Other fields bits 63 25 10 0 have the same definitions and access restrictions as in the UltraSPARC III Cu processor Bit 24 EC_FIXED_PRE_ARB is a new defined bit in the UltraSPARC IV processor that indicates which priority scheme should be employed in the L2 cache unit pre arbiter for each logical processor Each logical processor has multiple request queues to access the L2 cache The arbitration between these request queues for each logical processor is decided by the pre arbiter for that logical processor If the EC FIXED PRE ARB bit
48. ception TEM 1 Destination Register Written rd Flag s Written rd Flag s Trap 0 0 no 0 no 0 0 set nvc 5 set nvc set nva ieee trap May underflow May underflow H BN overflow see 9 5 overflow see 9 5 QNaN sign 0 set nvc set nvc dg ol expo 111 111 set nva m ieee trap frac 111 111 Infinity Infinity no Infinity no Compare Two f registers are compared The result of the compare is reflected in the fccN bits of the FSR The FCMPE version of the instruction relates to Subnormal operations see TABLE 9 16 Results from NaN Operands on page 72 TABLE 9 8 Floating Point NUMBER COMPARE Instruction Number Compare RESULT from the operation includes one or more of the following Exception bit set see TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 The fcc bit set Masked Exception TEM 0 Enabled Exception TEM 1 FCMP E rau r S Condition Code Setting Flag s Condition Code Setting Flag s Tra fccN 9 fccN gS DEE 0 0 100 0 rs rsz no 100 0 rs rsz no 0 0 100 0 rs rsz no 100 0 rs rsz no 0 Normal Infinity 100 1 rs gt rs no 100 1 rs gt real no 0 Normal Infinity 100 0 rs rsz no 100 0 rs rsz no UltraSPARC IV Processor User s Manual April 2004 oS TABLE 9 8 Floating Point NUMBER COMPARE Instruction FCMP E rau rs Number Com
49. configuration and timing is controlled by the L2 Cache Configuration and Timing Control register defined below described in TABLE 4 3 Therefore both logical processors in the UltraSPARC IV processor will have the same L2 cache configuration and timing In this register writing to the reserved bits has no effect reading them returns 0 Software should not program a field with reserved values Doing so will result in undefined hardware behavior Name ASI ECACHE CFG TIMING CTRL ASI 0x73 VA 63 0 0x00 new assigned Read Write TABLE 4 3 L2 Cache Configuration and Timing Control Register Bits Field Description 63 25 Reserved Reserved 24 EC assoc 0 Direct mapped L2 cache 1 2 way L2 cache 23 addr_setup Address setup cycles prior to SRAM rising clock edge 0 1 cycle 1 2 cycles 22 21 trace_out Address trace out cycles 00 Reserved 01 4 cycles 10 5 cycles 11 6 cycles 20 Reserved Reserved 19 17 trace in Data trace in cycles 000 2 cycles 100 3 cycles 001 4 cycles 010 5 cycles 011 6 cycles 101 Reserved 110 Reserved 111 Reserved 16 EC_turn_rw 0 1 SRAM cycle between read write 1 2 SRAM cycles between read write default 15 EC early Reserved Caches and Cache Coherency 4 29 e Sun microsystems 4 2 3 4 2 4 4 30 TABLE 4 3 L2 Cache Configuration and Timing Control Register Continued Bits Field Des
50. cription 14 13 EC_size 14 13 00 Reserved 14 13 01 4 MB L2 cache Size 14 13 10 8 MB L2 cache Size 14 13 11 Reserved 12 11 EC clock 12 11 00 Reserved 12 11 01 Reserved 12 11 10 Selects 5 1 L2 cache clock ratio 12 11 11 Selects 6 1 L2 cache clock ratio 10 0 Reserved Reserved Note At Hard POR and system reset soft POR all L2 cache mode settings default to 6 6 5 i e trace out 6 cycles 2 b11 EC clock selects 6 1 2 b11 and trace in 5 cycles 3 b010 Note Similar to the UltraSPARC III UItraSPARC III Cu processors specifying a 1 cycle EC turn rw time may cause contention on the SRAM data bus for some L2 cache modes Secondary L2 Cache Control Register The UItraSPARC IV processor does not support the secondary L2 cache Control register since the UltraSPARC IV processor does not support low power modes and since this register is solely for 1 2 low power mode and 1 32 low power mode Writing to this register has no effect reading will get undefined data 2 Way Support in L2 Cache Data ECC Fields R W TABLE 4 4 TABLE 4 5 TABLE 4 6 and TABLE 4 7 explains the L2 cache data access address format Note Due to the new L2 cache organization the address and data formats may differ from the UItraSPARC III Cu processor in the ASI access discussed in Section 4 2 4 Section 4 2 5 and Section 4 3 UltraSPARC IV Processor User s Man
51. d 9 53 un microsystems TABLE 9 2 Floating point Numbers Data Representation Number Type Abbreviation Sign Exponent Fraction Infinity Infinity 01 111 111 0 Signalling NaN SNaN 0 or 1 111 111 0EXX Xxx Quiet NaN QNaN Oor 1 111 111 1 Zero Zero is not directly representable if the straight format is followed this is due to the assumption of a leading 1 To allow the number zero to yield a value of zero the fraction or mantissa must be exactly zero Therefore the number zero is special cased with exponent and fraction fields of zero It is also important to note that 0 and 0 are considered to be distinct values though they both compare as equal SubNormal If the exponent field is all 0s and the fraction field is non zero then the value is a subnormal denormalized number These numbers do not have an assumed leading 1 before the binary point For single precision these numbers are represented as 1 0 x 2 76 in double precision the representation is 1 x 0 fx 2 102 In both cases s is the sign bit and fis the fraction Note that exponent and fraction fields of all 0s is the special representation of the number zero From this point of view the number zero can be considered a subnormal Infinity The values infinity and infinity are represented with an exponent field of all 1s and a fraction field of all Os The sign bit distinguishes between positive and negative inf
52. e field specified in Section 4 2 2 Shared L2 Cache Configuration and Timing Control Register on page 4 29 4 2 5 Direct L2 Cache Tag Bank Access and Displacement Flush TABLE 4 8 TABLE 4 9 TABLE 4 10 and TABLE 4 11 explains the L2 cache tag access address format TABLE 4 8 4 MB Direct Mapped Bit Field 63 25 Reserved 24 disp flush 23 Mandatory value 22 Reserved 21 6 EC tag addr 5 3 Reserved 2 0 Mandatory value should be 0 TABLE 4 9 4 MB 2 way Direct Mapped Bit Field 63 25 Reserved 24 disp flush 23 Mandatory value should be 0 4 32 UltraSPARC IV Processor User s Manual April 2004 un microsystems TABLE 4 9 4 MB 2 way Direct Mapped Bit Field 22 Reserved 21 EC_way 20 6 EC_tag_addr 5 3 Reserved 2 0 Mandatory value should be 0 TABLE 4 10 8 MB Direct Mapped Bit Field 63 25 Reserved 24 disp_flush 23 Mandatory value should be 0 22 7 EC_tag_addr 6 3 Reserved 2 0 Mandatory value should be 0 TABLE 4 11 8 MB 2 Way Direct Mapped Bit Field 63 25 Reserved 24 disp_flush 23 Mandatory value should be 0 22 EC_way 21 7 EC_tag_addr 6 3 Reserved 2 0 Mandatory value should be 0 Caches and Cache Coherency 4 33 e Sun microsystems 4 34 Name AST_ECACHE_TAG 0x4E The EC_way field is an L2 cache way select
53. e on page 27 4 Write Cache W cache To reduce W cache miss rates for certain classes of applications such as radix 8 FFT the UItraSPARC IV processor adds an option that uses hashed index to access the W cache This feature is controlled on a logical processor basis by the WIH bit in the Data Cache Unit Control Register ASI 0x45 VA 0x00 illustrated in TABLE 4 1 Name ASI CONTROL REGISTER ASI 0x45 VA 63 0 0x00 Read Write TABLE 4 1 Data Cache Unit Control Register Bit Field 63 50 Reserved 49 CP 48 Cv 47 ME 46 RE 4 25 e Sun microsystems TABLE 4 1 Data Cache Unit Control Register Continued Bit Field 45 PE 44 HPE 43 SPE 42 SL 41 WE 40 33 PM 32 25 VM 24 PR 23 PW 22 VR 21 Vw 20 5 Reserved 4 WIH 3 DM 2 IM 1 DC 0 IC The following occurs if the WIH bit 4 is set to e 0 Use PA 8 6 for index selection e Use the hash function PA 8 6 PA 11 9 PA 14 12 PA 17 15 for index selection where is bit wise exclusive OR Note WIH is used only if the WE is set Note It is required to flush the W cache and store buffer before changing the WIH setting This may require disabling interrupt and using MEMBAR before and after the WIH setting instruction 4 26 UltraSPARC IV Processor User s Manual April 2004 un microsystems Note The followin
54. e queue in company with instructions fetched either earlier or later depending on the specific instruction mix and availability of the necessary functional units The UltraSPARC IV processor is supported by Sun s popular Solaris operating system providing access to the more than eight thousand applications that have been developed for the SPARC Solaris platform over the years Comprehensive sets of programs are available for many fields including engineering manufacturing telecommunications financial services health retail ecommerce and a variety of other industry segments Additional operating systems available for use with UltraSPARC processors include Linux and leading real time operating systems A robust set of tools for developing software also can be readily acquired either from Sun Microsystems or independent software vendors UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 2 Architectural Overview This chapter supplements Chapter 3 of the UltraSPARC III Cu Processor User s Manual and contains additional information for the UltraSPARC IV processor Chapter Topics Introduction on page 3 New Features in the UltraSPARC IV Processor on page 4 RAS Architecture on page 5 241 Introduction The UltraSPARC IV processor features two cores each based on the UltraSPARC III processor From the software perspective the UltraSPARC IV processor appears as two software visible logic
55. ee TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 Underflow Overflow may occur FMUL r5 r rs rs gt rd Masked Exception TEM 0 Enabled Exception TEM 1 Destination Register Flag s Destination Register Flag s Written rd 9 Written rd Trap 0 0 Normal 0 no 0 no 0 0 Normal 0 no 0 no 0 0 Normal 0 no 0 no 0 0 Normal 0 no 0 no t t nvc 0 Infinity QNaN SERENG no AE set nva ieee trap t t nvc 0 Infinity QNaN no Se set nva ieee trap t t nvc 0 Infinity QNaN Cone ee no ce set nva ieee trap t t 0 Infinity QNaN BE ES no PERENE set nva ieee trap M dert Normal Normal May underflow ay underflow overflow see 9 5 overflow see 9 5 Normal Infinity Infinity Infinity no Infinity no Infinity Infinity Infinity no Infinity no Normal Infinity Infinity Infinity no Infinity no Normal Infinity Infinity Infinity no Infinity no 9 58 UltraSPARC IV Processor User s Manual April 2004 un microsystems 9 3 4 Division TABLE 9 6 Floating point Division RESULT from the operation includes one or more of the following i e Number in f register see Trap Event note page 66 DIVISION Instruction S S Exception bit set see TABLE 9 12 IS 2 Trap occurs see abbreviations in TABLE 9 12 Underflow Overflow may occur Masked Exception TEM 0 Enabled E
56. g lists a way of flushing W cache 1 Use ASI WCACHE TAG ASI 0x3A VA 0x0 to get W cache line addresses 2 For each W cache line calculate its L2 cache index and apply L2 cache Displacement Flush ASI 0x4E VA 24 1 to this index 4 2 4 2 1 External L2 Cache The external L2 cache changes described here are due to the following The UltraSPARC IV processor provides support for a high processor clock rate The UItraSPARC IV processor L2 cache uses LRU replacement strategy The two software visible logical processors in an UItraSPARC IV processor share the same physical SRAM modules i e the same physical address data bus The L2 cache Tag Array ECC protection mechanism ECC algorithm and error reporting method in the UltraSPARC IV processor are the same as those in the UltraSPARC III Cu processor Since the two software visible logical processors in an UItraSPARC IV processor share the same physical L2 cache data memory only one copy is needed for the cache configuration and timing control parameters These parameters include EC assoc 8002 setup trace out trace in EC turn rw EC early EC size and EC clock The UItraSPARC IV processor defines a new shared register accessed by ASI ECACHE CFG TIMING CTRL for these parameters Thus those fields in the register accessed by ASI ECACHE CTRL become unused Note In the UltraSPARC IV processor the physical memory for
57. g point or integer number precision This can happen in many different cases as listed in the tables of this section IEEE 754 1985 Standard 9 67 un microsystems 9 5 4 9 5 5 9 5 6 9 68 IEEE Underflow uf Trap When a Normal number underflows the inexact flag is also set Underflow is detected before rounding The Underflow condition leads to a Subnormal result unless gross underflow is detected In that case the result is 0 and the inexact flag is raised Underflow is discussed in detail in section 9 6 Underflow Operation on page 69 IEEE Divide by Zero dz Trap When a number is divided by zero the Divide by zero flag is asserted and an ieee_exception is generated if enabled The dz flag and trap can only be generated by the FDIV instruction IEEE Inexact nx Trap When an inexact condition occurs the processor sets the FSR aexc nxa and or the FSR cexc nxc bits whenever the rounded result of an operation differs from the precise result The Inexact nx flag is asserted for most of overflow or underflow conditions The Inexact trap is caused when the ideal result cannot fit into the destination format most square root operations some add subtract multiply and divide operations some number and precision conversion operations TABLE 9 14 Floating Point o Integer Conversions that Generate Inexact Exceptions Masked Exception TEM 1 Unmasked Instruction Conversion Description
58. gent and apply to an arbitrary subset of logical processors within a CMT processor The subset may be anything from all logical processors to no logical processors The UItraSPARC IV processors have in addition UltraSPARC IV Processor User s Manual April 2004 3 5 3 1 un microsystems to a system reset an additional externally initiated reset called an XIR This is a reset intended to reset a specific processor in a system primarily for diagnostic and recovery purposes Future processors may have multiple resets that replace the single XIR reset of current processors For this class of resets there must be a mechanism to specify which subset of logical processors should be reset There are two possible ways to specify the subset The first way to specify the subset is to have a steering register that is set up ahead of time to specify the subset of logical processors For systems using an XIR reset the XIR Steering register described in Section 3 5 3 1 XIR Steering Register ASI_XIR_STEERING should be used The second way to specify the subset is to specify the subset concurrently with delivering the reset across the interface used for communicating the reset This method would require that the interface used for communicating resets supports sending packets of information along with the resets XIR Steering Register ASI_XIR_STEERING The XIR reset can be steered only to specific logical processors under the control of t
59. gn Read Arbitration Not specific to a LP Queue Overflow 21 FRARB UF IERR Foreign Read Arbitration Not specific to a LP Queue Underflow 22 M2SARB OV IERR M2S Arbitration Queue Not specific to a LP Overflow Error Handling 10 83 un microsystems 10 84 TABLE 10 8 Internal Errors of the DPCTL Bit Field Error Type Description Comment 23 M2SARB_UF IERR M2S Arbitration Queue Not specific to a LP Underflow 24 LWARB_OV TERR Local Write Arbitration Queue Not specific to a LP Overflow 25 LWARB_UF TERR Local Write Arbitration Queue Not specific to a LP Underflow 26 WRD_UE TERR Unexpected write data Not specific to a LP request write data check Write data request for unissued TargID 27 RDR_UE TERR Unexpected read data ready Not specific to a LP 28 DROB_WER TERR Overwrite a valid DROB entry Not specific to a LP 29 DROB_IR TERR Request to invalidate a Not specific to a LP invalid DROB entry TABLE 10 9 System Bus Protocol Errors Transaction Bit Field Error Type Description Comment 30 USC PERR Undefined system bus Not specific to a command LP 31 CPQ_TO PERR CPQ system bus time out Specific to a LP 32 NCPQ_TO PERR NCPQ system bus time out Specific to a LP 33 WQ_TO PERR Write transaction time out Not specific to a LP 34 TID_TO PERR TargetID timeout When Not specific to a UltraSPARC IV sends out a LP valid targetID but no data arrives after t
60. he XIR Steering register described in TABLE 3 9 Name AST XIR STEERING ASI 0x41 VA 63 0 0x30 Privileged Read Write JTAG Accessible TABLE 3 9 XIR Steering Register Shared Bit Field Description 63 2 Reserved Reserved Must be 0 when read 1 LP1 This bit represents LP 1 0 LP 0 This bit represents LP 0 The XIR Steering register is a 64 bit register out of which only bits 1 0 are used in the UltraSPARC IV processor Each bit of the register represents one logical processor with bit 0 representing LP 0 and bit 1 representing LP 1 An XIR is blocked to a logical processor if the corresponding bit is 0 Hardware will force a 0 for unimplemented logical processors State After Reset At the end of a system reset or equivalent reset the value of the XIR reset is equal to the value of the LP Enable Status register which in turn is equal to the value of the LP Enable register Chip Multithreading CMT 3 21 un microsystems 3 6 Private and Shared Registers Summary The UltraSPARC IV processor implements the following private and shared registers 3 6 1 Implementation Registers TABLE 3 10 and TABLE 3 11 summarize the private and shared registers respectively TABLE 3 10 UltraSPARC IV Processor Private Registers ASI JTAG Value ASI Name Access VA Description Accessible 0x63 ASI INTR ID No 0x63 ASI CORE ID Yes 0x63 ASI CESR ID No TABLE 3
61. he specified timeout period 35 AID_LK PERR ATransID leakage error A Specific to a LP remote transaction R_ is issued by the processor but the reissued transaction is unable to complete 36 CPQ_OV PERR CPQ overflows after PauseOut Specific to a LP is asserted 37 NCPO OV IERR NCPQ overflows after Specific to a LP PauseOut is asserted 38 CPQ_UF IERR CPQ Underflow Specific to a LP UltraSPARC IV Processor User s Manual April 2004 un microsystems TABLE 10 9 System Bus Protocol Errors Transaction Bit Field Error Type Description Comment 39 NCPQ UF IERR NCPQ Underflow Specific to a LP 40 ORO OV PERR ORQ overflows after PauseOut Specific toa LP is asserted 41 ORQ UF IERR ORQ underflow Incoming is Specific to a LP asserted when ORQ is empty and HBM mode is set 42 HBM CON PERR HBM mode contention Not specific to a Incoming asserts 2 cycles after LP PreReq 43 HBM_ERR PERR HBM mode error PreReq or Not specific to a Incoming is asserted while LP HBM mode is not set TABLE 10 10 Cache Consistency Errors Bit Field Error Type Description Comment 44 RTS_ER TERR Detect a local RTS on the bus Specific to a LP with PTA state dl 45 RTO ER IERR Detect a local RTO on the bus Specific to a LP with either L2 cache state M PTA state dT 46 WB_ER IEER Detect a local WB with Specific to a LP PTA state dT 47 RS_ER TERR Detect
62. hes are split across 2 SRAM modules in such a fashion that both modules are used by each cache This document describes only the changes for the UltraSPARC IV processor with respect to the UltraSPARC III Cu processor Section 2 2 summarizes all of the feature changes of the UItraSPARC IV processor These changes may be due to enhancing processor performance or adopting CMT technology 2 2 New Features in the UItraSPARC IV Processor This section summarizes the UltraSPARC IV processor changes with respect to the UItraSPARC III Cu processor in TABLE 2 1 and TABLE 2 2 TABLE 2 1 lists these changes which includes clock rate increment and new cache organization TABLE 2 2 lists changes resulting from the employment of CMT technology TABLE 2 1 Enhancements to the UltraSPARC IV Processor s Core Feature Each logical processor has access to 8 MB of L2 cache with 128 byte line size 2 sub blocks per line or 4 MB with 64 byte line size no sub block L2 cache employs LRU replacement strategy to increase cache hit rates Support L2 cache modes 5 5 2 5 5 3 5 5 4 5 5 5 6 6 5 6 6 6 Support higher system frequency ratios up to 10 1 Low power mode is not supported Chip Kill DIMM support allows detection and correction of DRAM chip failure Internal Banking support allows for more optimal DIMM scheduling Only available when CK DIMMs are used L2 cache Address Bus error detection for all system platforms
63. ical processor Suspended logical processors can be set to run later The suspending and running of logical processors can be performed at arbitrary points in time and unlike disabling a logical processor a system reset is not required There may be an arbitrarily long but bounded delay from when a logical processor is directed to suspend until the change takes effect There is a LP Running Status register that can be used to determine if a logical processor has completed the process of becoming suspended A suspended logical processor does not execute instructions and does not initiate any transactions on its own A suspended logical processor does remain coherent with the system To remain coherent a suspended logical processor fully participates in cache coherency and can generate transactions in response to coherency requests from other logical processors on the same or different CMT processor When a logical processor is set to run it continues execution with the instruction that was next to be executed when the logical processor was suspended It is transparent to the software running on a logical processor that it was ever suspended UltraSPARC IV Processor User s Manual April 2004 3 4 3 1 un microsystems An interrupt to a suspended logical processor behaves the same as if the logical processor was too busy to accept the interrupt For example if an interrupt buffer is available the interrupt is ACK ed and a trap is taken only
64. ing it attempts to fetch 4 instructions at a time from the L1 instruction cache and given the appropriate instruction mix is capable of sustaining an execution rate of 4 instructions per clock cycle Each instruction is processed through a 14 stage pipeline that starts with address generation and ends with the final retirement of any valid execution result A 16 entry instruction queue decouples instruction fetch from instruction issue working to buffer any discrepancies 1 1 1 2 un microsystems between these two rates Thus if more instructions are fetched than can be issued repeatedly an empty instruction queue gradually will fill Or if the next instruction fetch misses in the L1 cache a filled instruction queue can hide this break in the flow of instructions through the pipeline by continuing to supply the execution units with instructions for the several clock cycles needed to retrieve the missing block of instructions from the integrated L2 cache To enhance throughput while instructions enter and exit the instruction queue in strict program order they can complete executing out of order For example if a short latency instruction like an integer add follows a long latency instruction like an integer divide in the pipeline the fast operation does not need to wait on the slow one to finish Instructions fetched together will enter the queue in parallel but within the constraints imposed by program order they may exit th
65. inities The infinity representation is important as it allows operations to continue past overflow Operations dealing with infinities are well defined by the IEEE 754 1985 Standard Not a Number The value NaN Not a Number is used to represent values that do not represent real numbers The NaN exponent field is all 1s and the fraction field is non zero There are two categories of NaN the QNaN quiet NaN and the SNaN signalling NaN A QNaN is a NaN with the most significant fraction field bit set QNaN is allowed to freely propagate through most arithmetic operations this NaN tends to appear when an operation produced mathematically undefined results A SNaN fraction field significant bit is clear The SNaN is used to signal an exception when it appears out of an operation being executed Semantically QNaN can be considered to denote indeterminate operations while SNaN indicates invalid operations 9 54 UltraSPARC IV Processor User s Manual April 2004 un microsystems Floating Point Number Line The floating point number line in FIGURE 9 1 represents the floating point numbers used in the processor Infinity Normal Subnormal Subnormal Normal Infinity SNaN QNaN QNaN SNaN gt 5 Exp All 1s All 1s Exp All Is Sign Bit 0 E 2 Exp AO E LE D m LE T 000 000 Negative Positive p Register 800 000 Register
66. is set to 1 fixed priority scheme is selected by the pre arbiter for that logical processor If EC FIXED PRE ARB is set to 0 default then a round robin fixed priority scheme is used A simple distributed fair arbitration algorithm is used between the two software visible logical processors of the UltraSPARC IV processor to ensure that each logical processor gets access to L2 cache A token is passed between the two software visible logical processors If a logical processor has the token and the other logical processor has pending requests the logical processor with the token will complete its current request if any and hands the token to the requesting logical processor In this way if only one logical processor has requests it will hold the token and complete its requests If both logical processors have requests the token will bounce back and forth with each logical processor completing single requests when it receives the token Name AST ECACHE CTRL ASI 0x75 VA 63 0 0x0 Read Write TABLE 4 2 L2 Cache Control Register Bit Field 63 27 Reserved 26 pf2_RTO_en 25 EC_TCC_en 24 EC FIXED PRE ARB 23 11 Reserved 10 EC ECC en 9 EC ECC force 8 0 EC check UltraSPARC IV Processor User s Manual April 2004 4 2 2 un microsystems Shared L2 Cache Configuration and Timing Control Register The UltraSPARC IV processor L2 cache
67. ister Each bit of the register represents one logical processor with bit 0 representing LP 0 and bit 1 representing LP 1 A bit set to 1 means a logical processor should be enabled after the next system reset and a bit set to 0 means a logical processor should be disabled after the next reset Note that bits 63 2 are forced to 0 since their corresponding logical processors are not implemented in the UItraSPARC IV processor If a bit in the LP Available register is 0 unavailable hardware forces the corresponding bit in the LP Enable register to 0 and ignores attempts to write 1 to that bit Since the UItraSPARC IV processor always has both logical processors available this scenario does not exist in the UItraSPARC IV processor Note A disabled logical processor in the UltraSPARC IV processor will not respond to any transaction issued to it The sender should encounter an unmapped reply or a timeout error Note In the UltraSPARC IV processor if both bits 1 and 0 are set to 0 then both logical processors will be disabled after a Hard Soft POR State After Reset The value of the LP Enable register is set to the value of the LP Available register at the assertion of a power on reset The value of the LP Enable register remains unchanged during all other resets including system resets or equivalent resets Suspending and Running Logical Processors Suspending is a way to temporarily suspend the operation of a log
68. it value FsTOd for use with double precision instructions A load double floating point LDDF instruction writes to a pair of adjacent 32 bit f registers aligned to an even boundary and it can write to a 64 bit register This must be converted to a 32 bit value FdTOs for use with single precision instructions Two LDF instructions can be used to load a 64 bit value when the memory address alignment to 64 bits is not guaranteed Similarly two STF instructions can be used to store a 64 bit value when the memory address alignment to 64 bits is not guaranteed VIS Operations VIS instructions are unaffected by floating point models However the floating point unit must be enabled VIS instructions do not generate interrupts unless the floating point unit is disabled 9 4 9 64 Traps and Exceptions There are 3 trap vectors defined for floating point operations fp disabled fp exception ieee 754 see section 9 5 IEEE Traps on page 67 fp exception other fp disabled Trap The floating point unit can be either enabled or disabled UltraSPARC IV Processor User s Manual April 2004 un microsystems fp exception other Trap The fp exception other trap occurs when a floating point operation cannot be completed by the processor unfinished FPop or an operation is requested that is not implemented by the processor unimplemented FPop IEEE 754 1985 Standard 9 65 un microsystems 9 4 1 9 4 2 9
69. ity The traps generated by floating point exceptions fp_disabled fp_exception_ieee_754 and fp exception other are prioritized 9 5 IEEE Traps The Underflow Overflow Inexact Division by zero and Invalid IEEE traps are supported in standard and nonstandard modes They are listed in TABLE 9 12 Floating point Unit Exceptions on page 66 and operate according to the IEEE 754 1985 Standard 9 5 1 IEEE Trap Enable Mask TEM Individual IEEE traps nv of uf dz and nx are masked by the FSR TEM bits When a trap is masked and an exception is detected then the appropriate FSR cexc bit s are set and the destination register is written with data shown in TABLE 9 3 TABLE 9 4 TABLE 9 5 TABLE 9 6 TABLE 9 7 TABLE 9 8 and TABLE 9 9 9 5 2 IEEE Invalid nv Trap The IEFE invalid exception nv is generated when the source operand is a NaN signalling or quiet or the result cannot fit in the integer format The nv trap for an invalid case can be masked using the FSR 02 3 IEEE Overflow of Trap When an overflow occurs the inexact flag is also set If an overflow occurs and the IEEE Overflow of and Invalid nv traps are enabled FSR TEM NVM 1 then a fp exception IEEE 754 is generated If the Overflow trap is masked and the operation is valid then the destination register rd receives Infinity The Overflow Trap is caused when the result of an arithmetic operation exceeds the range supported by the floatin
70. l processor with bit 0 representing LP 0 and bit 1 representing LP 1 For any bit set to 1 in the LP Running register the corresponding bit needs to be in the LP Running Status register Note For one suspend command to a logical processor the corresponding bit of the specified logical processor in the LP Running Status register will have only one transition from 1 to 0 Note The LP Enable LP Running and LP Running Status registers are mainly used to support debug and diagnostics The LP Running register is also used to support booting State After Reset The value of the LP Running Status register is the same as the value of the LP Running register at the end of a system reset Chip Multithreading CMT 3 19 un microsystems 3 5 3 5 1 2 313 9 3 20 Reset Handling Each Reset is handled differently in a CMT processor Some resets apply to all the logical processors some apply to an individual logical processor and some apply to an arbitrary subset The following sections address how each type of reset is handled with respect to having multiple logical processors integrated into a package In general the reset nomenclature used is consistent with UltraSPARC IV processors Future processors may have a different classification of resets if this is the case the processors should extend this model appropriately Private Resets SIR and WDR Resets The only resets that are limited to a sing
71. le logical processor are the private resets internally generated by a logical processor An UltraSPARC IV processor has a number of resets of this class These types of resets are generated by an individual logical processor and are not propagated to the other logical processors on a CMT processor Full CMT Resets System Reset There is a class of resets that are generated by an external agent and apply to all the logical processors in a CMT processor These include any reset that can be associated with fundamentally reconfigure the CMT processor Current SPARC processors have a system reset of which power on reset is a special case This is a reset that is required for certain reconfigurations of the processor Future processors may have multiple resets that replace the single system reset of current processors The power on and system resets or their equivalents in future processors are sent to all logical processors in a CMT processor All logical processors except the lowest enabled logical processor are set by default to suspended at the beginning of system reset The logical processor that is set to run is the default master logical processor which should arbitrate for the bootbus 1f multiple CMT processors share the same bootbus The master logical processor should run the other logical processors at the proper time in the booting process Partial CMT Resets XIR Reset There is a class of resets that are generated by an external a
72. lier documents are read in conjunction with this supplement replace the term processor with logical processor to read them in context of the UltraSPARC IV processor 3 2 3 2 1 Accessing CMT Registers A key part of the CMT Programming Model is a set of specific privileged registers This section covers how these registers are organized and accessed These registers can be accessed by software running on each of the logical processors The CMT specific registers private or shared can be accessed by privileged software running on one of the logical processors as ASI mapped registers The SPARC instruction set provides a convenient way to map an additional architectural state through the use of address space identifiers ASIs This state is accessible through special load and store instructions that provide an ASI value and an address virtual address Certain address space identifier values are used to access main memory but with different behaviors than the default semantics of normal load and store operations Other ASI values are used to access special state for configuration diagnostics or other uses The CMT Programming Model defines a number of ASIs specifically for accessing the CMT specific registers Types of CMT Registers The two main classes of CMT specific registers are private registers and shared registers Private registers a private copy of the register is associated with each logical processor Shared registe
73. nts 11 ASI CORE RUNNI 63 2 are not NG STATUS implemented 0x41 VA 0x58 ASI CORE RUN ASI CORE 0 when the NING 1 0 _RUNNING I corresponding 0 LP is successfully suspended 12 ASI CMP ERROR 63 2 are not STEERING implemented 0x41 VA 0x40 Deassertion Deassertion By default this LP Ois if LPO is register encodes the lowest running LP after reset However the JTAG controller can overwrite the default value running running 1 otherwise ed otherwise 1 Except for the Sun Fireplane Interconnect Clock Ratio SAFARI CONFIG 2 has the same reset values as the SAFARI CONFIG in the UltraSPARC III Cu processor 2 Except for the INT ID field the SAFARI CONFIG has the same reset values as the SAFARI CONFIG 2 regis ter Note AFAR2 ASI 0x4C VA 0x8 has an unknown state after Hard POR and is unchanged after all other types of resets Note The following UltraSPARC IV processor implementations may cause different behavior regarding the initial state after reset for some CMT registers 1 Final states after reset of some CMT registers are determined by the ASI CORE ENABLED register However the UItraSPARC IV processor requires a system reset to propagate the value of the ASI CORE ENABLE register to ASI CORE ENABLED even though ASI CORE ENABLE is programmed while reset is asserted 2 After the assertion of Hard POR changes to the ASI CORE RUN
74. o CMT registers An attempt to access a CMT register with any other instruction results in a data_access_exception trap 3 3 3 10 Private Processor Registers There are three private registers used for logical processor identification UltraSPARC IV Processor User s Manual April 2004 3 3 1 3192 un microsystems LP ID Register ASI CORE ID The LP ID register is a read only private register that holds the ID value assigned by hardware to each implemented logical processor The ID value is unique within the CMT The LP ID register corresponds to a bit offset for corresponding bit mask CMT registers like LP Enable register Many of the CMT specific registers provide a bit mask wherein each bit corresponds to an individual logical processor For these registers the LP ID field indicates which bit of a bit mask corresponds to a specific logical processor Name ASI CORE ID ASI 0x63 VA 63 0 0x10 Read Only Privileged Access JTAG Accessible As described in the TABLE 3 1 the LP ID register has two fields TABLE 3 1 LP ID Register Bit Field Description 63 22 Reserved Reserved 21 16 MAX LP ID Max LP ID which gives the logical processor ID value of the highest numbered implemented but not necessarily enabled logical processor in this CMT processor For the UltraSPARC IV processor the value of this field is 1 because there are two logical processors 15 6 Reserved Reserved 5
75. o minimize the need for synchronization between logical processors in writing to this register separate virtual addresses are provided to set and reset the bits of this register This combined with the reset setting means that the need for special interlocking on the register is not necessary When writing to this register there is a choice between writing an exact value and modifying individual bits When a logical processor suspends itself a write to the clear bit VA should be used When a logical processor wants to become the only logical processor active it is more appropriate to write the desired value directly to the direct access VA A direct write eliminates the need to perform a set and a clear operation to write a specific value to the register State After Reset On assertion of power on reset or system reset Soft POR the LP Running register will be initialized such that all the logical processors are suspended except the logical processor with the lowest number which is marked enabled in the LP Enable Status register This provides an integrated boot master logical processor for systems without a System Controller SC reducing bootbus contention In systems with a SC the value of the LP Running register can be changed using JTAG In this way the SC which is the boot master in these systems can be set to run the proper logical processor before removing the reset signal The logical processor that is suspended at the
76. ocessor bit 0 and bit 1 are defined for LP 0 and LP 1 respectively Bits 63 2 are reserved and read as 0 TABLE 3 5 LP Enable Status Register Shared Bit Field Description Reserved Must be 0 when read This bit represents LP 1 This bit represents LP 0 A logical processor disabled by programming the LP Enable register it requires a power on reset or system reset for the updates to the LP Enable register to take effect is considered not enabled A logical processor suspended for debug or diagnostics is considered enabled State After Reset The LP Enable Status register changes only at system resets or power on reset The logical processor enable status register value is set by hardware to the value of the LP Enable register at the deassertion of reset LP Enable Register AST CORE ENABLE The LP Enable register illustrated in TABLE 3 6 is used by software to enable disable logical processor s The enable disable action takes effect only when a power on reset or a system reset Soft POR is deasserted Name AST CORE ENABLE ASI 0x41 VA 63 0 0x20 Privileged Read Write JTAG Accessible TABLE 3 6 LP Enable Register Shared Bit Field Description 63 2 Reserved Reserved Must be 0 when read This bit represents LP 1 This bit represents LP 0 Chip Multithreading CMT 3 15 un microsystems 3 4 3 3 16 The LP Enable register is a 64 bit reg
77. or Shared Registers 22 Data Cache Unit Control 7 25 L2 Cache Control 7 28 L2 Cache Configuration and Timing Control Register o ceececeesceeseeseceseceneeeseceeeeneeeeeeaeeaees 29 4 MB Direct Mapped uec EEN RO EU ERROR e ees 31 AMB 2 Way Direct Mapped EENEG EEN 31 8 MB Direct Mapped aco et ENEE EE ED ETE DERE p ERR 31 8 MB 2 Wa y Direct 0 32 List of Tables vi un microsystems TABLE 4 8 TABLE 4 9 TABLE 4 TABLE 4 TABLE 4 TABLE 4 TABLE 4 TABLE 5 TABLE 5 2 TABLE 6 1 TABLE 7 1 TABLE 8 1 TABLE 8 2 TABLE 9 1 TABLE 9 2 TABLE 9 3 TABLE 9 4 TABLE 9 5 TABLE 9 6 TABLE 9 7 TABLE 9 8 TABLE 9 9 TABLE 9 TABLE 9 TABLE 9 TABLE 9 TABLE 9 N gt 4MB Direct M pped neo REO eee ere a See 32 32 SMB Direct Mapped iua ee UR er 0 33 8 MB2 Way Direct Mapped seat tele DEE DERE 33 4 MB L2 cache Tag State Access Data Format 34 8 MB L2 cache Tag State Access Data Format 34 4 MB and 8 MB L2 Cache Tag State Access Data Format 35 UltraSPARC IV Processor New Defined Private Register Field Reset Machine State 38 UltraSPARC IV Defined Shared Registers Field Reset Machine State sss 39 Gounter Behavior differente tued ee ee ap gd 43 Pretetch Purictiofls a eut SEG IDEE peo ele pii b AE e ei 46 New MC
78. ounter 23 60 PIA UDS IERR Undefined PTA state Specific to a LP 10 86 UltraSPARC IV Processor User s Manual April 2004 un microsystems TABLE 10 14 Internal Errors of the TOB Bit Field Error Type Description Comment 61 AID ERR IERR Trying to retire inactive AID Specific to a LP 62 AID ILL Illegal AID transaction with Specific to a AID 0 LP 63 AID_UD Undefined AID for retry Specific to a transaction request request for LP a retry Tx with an inactive AID 64 WB_FSM_ILL Write Back state machine Specific to a encounters illegal state LP 65 WBAR_OV WBAR queue overflow Specific to a LP 66 Retry queue overflow Specific to a LP 67 Multiple retire request for the Specific to a same transaction LP 68 Multiple Pull Flag requests for Specific to a the same transaction LP 69 USB buffer overflow Specific to a LP 70 CWBB UE IERR Unexpected write back or Specific to a copyback request for data from LP the CWBB 71 CUSB UE IERR Unexpected data request for Specific to a non cached data buffer LP TABLE 10 15 Internal errors of the ECU 72 CAM OV Overflow condition for the Specific to a blocking CAM in the miss LP block 73 WBE UF Underflow condition for a write Specific to a back entry a WB entry is LP retired multiple times 74 ERR Illegal miss request Src Specific to a src idx size are not legal LP Error Handling 1
79. ow y erflow overflow see 9 4 overflow see 9 4 FsTOd lnfinit 0 y 4 Infinity no Infinity no rdTOs nfinity 9 61 un microsystems 9 3 8 9 62 Floating point to Integer Number Conversion TABLE 9 10 Floating Point to Integer NUMBER CONVERSION Instruction single operand Floating point to Integer Number Conversion RESULT from the operation includes one or more of the following Number in f register see Trap Event note page 66 Exception bit set see TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 Underflow Overflow may occur FsTOi 52 rd Masked Exception TEM NVM 0 Enabled Exception TEM NVM 1 FsTOx 52 rd FdTOi rsz rd Destination Register Flag s Destination Register Flag s FdTOx rs2 rd Written rd 9 Written rd Trap 0 000 000 no 000 000 no 0 111 111 no 111 111 no SP DP tn ZDP infimo 011 111 no no PERNS ieee trap tn Infinity 100 000 no no ehre ieee trap Integer representation Integer representation Normal lt 23 n no re of the Normal number i of the Normal number set nvc t Normal gt 23 011 111 no WEE set nva ieee trap 9 Int tati Int tati nteger representation nteger representation Normal gt 23 1 n no ipu 1 of the Normal number S of the Normal number set nvc set nvc Normal lt 2 1 100 000 no set nva ieee trap Integer representation Integer representation Normal lt 2 n no SEN of the No
80. pare Continued The fcc bit set Masked Exception TEM 0 RESULT from the operation includes one or more of the following Exception bit set see TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 Enabled Exception TEM 1 un microsystems Condition Code Setting Condition Code Setting fccN Flag s fccN Flag s Trap 0 0 x 0 Normal Infinity 100 1 rs gt rsz no 100 1 rs gt rsz no 0 0 Normal 0 5 Infinity fcc 2 rau gt real no fcc 2 rau gt real no Normal Normal gt or lt no gt or lt no Precision Conversion TABLE 9 9 PRECISION CONVERSION Operations single operand Precision Conversion Masked Exception TEM 0 RESULT from the operation includes one or more of the following e Number in f register see Trap Event note page 66 Exception bit set see TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 Underflow Overflow may occur Enabled Exception TEM 1 Examples FsTOd 7FD1 0000 7FFA 2000 0000 0000 FsTOd FDD1 0000 FFFA 2000 0000 0000 FdTOs 7FFA 2000 0000 0000 7FD1 0000 FdTOs FFFA 2000 0000 0000 FFD1 0000 IEEE 754 1985 Standard FsTOd rs gt rd Destination Register Flag s Destination Register ri s Tra FdTOs rs gt rd Written rd 9 Written rd ats rap FOTO 0 no 0 no FdTOs 0 FsTOd Normal Normal no Normal no May underfl May underflow FdTOs Normal y eril
81. pril 2004 10 1 3 un microsystems reporting mechanism is used that is used for reporting logical processor specific errors This reporting mechanism may require extending the logical processor s asynchronous error reporting mechanism to enable it to record a larger set of errors Asynchronous errors may be defined as logical processor specific If the same error can occur also in a shared resource it must be broken into two different errors for reporting purposes The type of trap sent to the logical processor to handle a shared resource is implementation specific A logical processor can choose to use the same trap type used for corresponding logical processor specific asynchronous errors or it can choose to use a new trap type Listing of CMT Errors The following tables from TABLE 10 4 to TABLE 10 16 list the various errors reported in the EMU Error Status register described in Section 10 1 1 1 EMU Error Status Register A logical processor s errors are reported to its AFSR AFAR All other errors are serviced by the logical processor whose ID is in the Error Steering Register TABLE 10 4 TABLE 10 5 TABLE 10 6 describes the Etag ECC Errors Internal errors of the MCU and of the Write Cache respectively TABLE 10 7 TABLE 10 8 explains the System Bus Protocol Error Data and Internal errors of the DPCTL respectively TABLE 10 9 TABLE 10 10 describes the System Bus Protocol Errors Transaction and Cache Consistenc
82. ption generated Standard mode No unfinished FPop Nonstandard mode No FSR NX Result already generates an exception Divide by zero or Invalid operation e FSQRT number less than zero invalid Result is Infinity e Subnormal Infinity Infinity no exception generated Standard mode No unfinished FPop Nonstandard mode No FSR nx Standard mode Subnormal x Infinity Infinity e Nonstandard mode Subnormal x Infinity QNaN with nv exception Subnormal is flushed to zero Result is zero e Subnormal x 0 0 no exception generated Standard mode No unfinished FPop Nonstandard mode No FSR nx 9 9 9 76 Conditions for Software Trapping The following special case generate traps to software Floating point conversions of fixed to floating point format where there are more significant bits in the fixed point representation than bits of mantissa in the floating point representation UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 10 Error Handling This chapter describes processor behavior to a programmer writing operating system and service processor diagnosis and recovery code for the UltraSPARC IV processor This chapter discusses only asynchronous errors Synchronous error reporting is the same as the UltraSPARC III Cu processor Chapter Topics Error Handling in UltraSPARC IV Processors on page 77 10 1 10 1 1 Error Handling in UltraSPARC IV Proce
83. r gt 27 Integer is rounded to Ho set nvc 52 msb and converted set nxc ieee trap DP Integer gt 2 1 Normal no Normal no Integer is rounded to set nvc set nvc Integer lt 2 1 8 no 52 msb and converted set nxc ieee trap 9 3 10 Copy Move Operations Floating point numbers are not modified by the copy and move instructions FMOV FABS and FNEG The copy move instructions will not generate an unfinished FPop or unimplemented FPop exception but they will generate the fp disabled exception if the floating point unit is disabled The processor performs the appropriate sign bit transformation but will not cause an invalid exception and will not perform a QNaN to SNaN transformation These are single operand instructions that use the rs register as the source operand FMOV e f register to f register move e No change to any bit regardless of register content Useful with VIS instructions IEEE 754 1985 Standard 9 63 un microsystems 9 3 11 9 3 12 FABS Changes the floating point integer sign bit to positive if needed e No change to any other bit regardless of register content FNEG Changes the floating point integer sign bit If 0 then 1 If 1 then 0 e No change to any other bit regardless of register content f Register Load Store Operations A load single floating point LDF instruction writes to a 32 bit register This must be converted to a 64 b
84. registers from different logical processors is not defined but there are a number of hardware rules that are enforced The hardware guarantees that accesses to a shared register from the same logical processor follow sequential semantics The hardware also guarantees that if multiple logical processors attempt to store to the register at the same time after the updates the register contains the value from one of those stores That is stores to these registers must be performed atomically on all bits of the register All the CMT registers are 64 bit registers although some of the bits of individual registers can be reserved or defined to a fixed value Reserved register fields should always be written by software with values of those fields previously read from that register or with zeroes they should read as zero in hardware Software intended to run on future versions of CMTs should not assume that these fields will read as 0 or any other particular value This software convention makes future expansion of the interface easier Only the LDXA LDDFA STXA and STDFFA instructions can be used to access the CMT registers Only the Load extended from alternate space LDXA or Load double floating point register from alternate space LDDFA instructions can be used to read CMT registers Only the Store extended into alternate space STXA and the Store double floating point register to alternate space STDFA instructions can be used to store t
85. registers include 4 Memory Timing Control registers 4 Memory Address Decoding registers and 1 Memory Address Control register Note Using the PIO method to access the MCU registers by either of the 2 logical processors that are on the same die as these registers will result in undefined behavior Note Since the UltraSPARC IV processor does not support low power modes writing to Mem Timing3 CTL and Mem Timing4 CTL registers and to bits 55 37 of the Memory Address Control register has no effect reading from these registers will result in undefined data Note The UltraSPARC IV processor requires that the Mem Timing5 CTL register is programmed first before all other MCU Timing Control registers Chip Kill DIMM Support In addition to NG DIMMs the UltraSPARC IV processor can also support Chip Kill SDRAM DIMMs CK DIMMs CK DIMM solely uses x4 SDRAM Each bit of an SDRAM is protected by one ECC code Therefore the system can correct errors resulting from one failed SDRAM When the CK DIMMs are used the SDRAM internal banking can be enabled to enhance the memory bandwidth Moreover the refresh mode register setting and precharge all to one CK DIMM can be spread into two consecutive commands to minimize the maximum SDRAM power Three bits are added for supporting these features dimm type Memory_Timing5_CTL bit 1 0 NG DIMM is used 1 CK DIMM is used int_bank_enable Memory Address Control register bit
86. rmal number i of the Normal number set nvc tn Normal gt 26 011 111 no DP Int set nva ieee trap Integer representation Integer representation Normal gt 2 1 n no ZS 1 of the Normal number S of the Normal number Normal 2 1 100 000 no 100 000 no UltraSPARC IV Processor User s Manual April 2004 un microsystems 9 3 9 Integer to Floating point Number Conversion TABLE 9 11 Integer to Floating point Number Conversion Integer to Floating Point RESULT from the operation includes one or more of the following NUMBER CONVERSION e Number in f register see Trap Event note page 66 Instruction Exception bit set see TABLE 9 12 d Trap occurs see abbreviations in TABLE 9 12 single operand Underflow Overflow may occur FiTOs rs gt rd Masked Exception TEM NXMz0 Enabled Exception TEM NXM 1 E Destination Regist Destination Regist Flag s FXTOs rs rd estination Register Flag s estination Register ag s Written rd Written rd Tra FxTOd rs gt rd in P SP DP 0 0 no 0 no Integer lt 273 Normal no Normal no Integer is rounded to set nvc set nvc Integer 273 8 no 23 msb and converted set nxc ieee trap SP Integer gt 22 1 Normal no Normal no Int i ded t set nvc t nvc Integer lt 2 1 nteger is rounded to io set nvc 23 msb and converted set nxc ieee trap Integer lt 27 Normal no Normal no i set nvc Intege
87. rs a single copy of each register is shared by all the logical processors Both private and shared registers can be accessed as ASI mapped registers by privileged software running on one of the logical processors Software can access the private registers as well as the shared registers Each logical processor can access only its own private registers It cannot access the private registers of another logical processor as there is no way to address those registers The specific semantics for accessing the CMT registers through the ASI interface are described in Section 3 2 2 Accessing CMT Registers Through ASI Interface Chip Multithreading CMT 3 9 un microsystems 2 Accessing CMT Registers Through ASI Interface Each CMT specific register is accessible through an ASI address a combination of an address space identifier value and virtual address All CMT registers are mapped into ASI values that are only accessible in privileged mode The specific ASI number and virtual address of each CMT register is covered later in this document Each logical processor can access the private registers associated with that logical processor Accesses to these registers follow the standard semantics for accessing ASI mapped internal registers Each logical processor can access all the shared registers An update to a shared register from one logical processor will be visible to all other logical processors The ordering of accesses to shared
88. rted in the EESR if their corresponding mask bits are 0 in the EMU Error Mask Register EEMR EESR content can only be updated when there is no prior fatal error logged in the AFSR register therefore only the first fatal error is logged and subsequent errors are ignored Multiple errors can be reported if they happen in the same cycle Once an error is logged in the EESR a corresponding bit PERR or IERR or TUE in the AFSR will also be set and error signal will be asserted Errors that are logged in the EESR can be cleared when their associated field in the AFSR is cleared by software The EESR is reset to 0 only during Power on reset other resets have no effect on this register EMU Error Mask Register Each logical processor has its own EMU Error Mask Register EEMR The EEMR is used to disable error generation of certain error conditions Each bit in the EEMR controls a group of errors in the EESR or the AFSR Once a bit is set in the EEMR error logging for the affected fields in the EESR or the AFSR is disabled and the processors error output pin will not be asserted for these events For the UltraSPARC IV processor one new bit was added to this register TABLE 10 1 EMU Error Mask Register Additional Bits Bit Field Description 20 M TOB1 When this bit is set to 1 all the errors corresponding to TABLE 10 16 will not be reported to the EESR 85 79 and AFSR IERR bit
89. s created from non NaN operands the nv flag is set UltraSPARC IV Processor User s Manual April 2004 IT 2 9 9 un microsystems Signaling and Quiet NaNs SNaN and QNaN numbers are unsigned the sign bit is an extension of the NaN s fraction field SNaN operands propagate to the destination register as a QNaN result when the nv exception is masked All operations with NaN operands keep the sign bit unchanged including a FSQRT operation NaNs are generated for the conditions shown in section 9 7 4 NaN Results from Operands without NaNs on page 73 SNaN to QNaN Transformation The signalling to quiet NaN transformation causes The most significant bits of the operand fraction are copied to the most significant bits of the result s fraction In conversion to a narrower format excess low order bits of the operand fraction are discarded In conversion to a wider format unwritten low order bits of the result fraction are set to 0 The quiet bit the most significant bit of the result fraction is set to 1 the NaN transformation produces a QNaN The sign bit is copied from the operand to the result without modification Operations with NaN Operands Operations with NaN operands may assert the IEEE invalid trap flag nv These operations are listed in TABLE 9 16 If the Invalid Trap is enabled FSR TEM NVM 1 then a trap event occurs as described in section 9 4 2 Trap Event on page 66 IEEE 754
90. sentable 9 6 9 6 1 Underflow Operation Underflow occurs when the result of an operation before rounding is less than that representable by a Normal number After rounding the tiny number underflow is usually represented by a Subnormal number but may equal the smallest Normal number if the unrounded result is just below the range of Normal numbers and the rounding mode specified in FSR RD moves it into the Normal number range The underflow result will be zero Subnormal or the smallest Normal value Compatibility Note The floating point unit does not support exponent wrapping for underflow or overflow Trapped Underflow The floating point unit will trap on underflow if the FSR TEM UFM bit is set to 1 Since tininess is detected before rounding trapped underflow occurs when the exact unrounded result has a magnitude between zero and the smallest representable Normal number in the precision of the destination format When underflow is trapped the destination and other registers are left unchanged see section 9 4 2 Trap Event on page 66 IEEE 754 1985 Standard 9 69 un microsystems 9 6 2 Untrapped Underflow The floating point unit will not generate an underflow trap when an underflow occurs if the FSR TEM UFM bit is set to 0 If the result causes an underflow and the result after rounding is exact then the floating point unit will not generate an inexact trap Tininess detection before
91. signment For example if X Y and Z are 1 bit vectors and the 2 bit vector T equals 115 then UltraSPARC IV Processor User s Manual April 2004 un microsystems 21 results in X 0 1 and Z 1 A mod B means A modulus B where the calculated value is the remainder when A is divided by B Notation for Numbers Numbers throughout this specification are decimal base 10 unless otherwise indicated Numbers in other bases are followed by a numeric subscript indicating their base for example 10015 FFFF 0000 96 In some cases numbers may be preceded by Ox to indicate hexadecimal base 16 notation for example OxFFFF 0000 Long binary and hexadecimal numbers within the text have spaces or periods inserted every four characters to improve readability The notation 7h 1F indicates a hexadecimal number of 1F g with 7 binary bits of width Informational Notes This guide provides several different types of information in notes as follows Programming Note Programming notes contain incidental information about programming the UltraSPARC IV processor unless otherwise restricted to a particular processor in the family Implementation Note Implementation notes contain information that contains implementation specific information to the UltraSPARC IV processor compared to other UltraSPARC processors Compatibility Note Compatibility notes contain inform
92. ssors Errors within a logical processor are reported using the error reporting mechanism These errors are considered specific to a logical processor An error in a shared structure is whenever possible reported to the logical processor initiating the request that caused or detected the error These errors are considered specific to a logical processor Some errors in a shared structure cannot be attributed to a logical processor and are therefore not specific to any one logical processor Error Reporting Specific to a Logical Processor Errors specific to a logical processor are reported using only that logical processor s error reporting mechanism These errors consist of both synchronous and asynchronous errors They also include errors that occur in shared structures It is the responsibility of the error handling software to recognize the implication of errors in shared structures and take appropriate action 10 77 un microsystems 41 12 10 78 The EMU Error Status Register EESR contains information to identify errors Other error registers are strictly specific to logical processors and therefore their behavior is identical to the registers in the UltraSPARC III Cu processor Those error registers are not described in this chapter EMU Error Status Register Each logical processor has its own EMU Error Status Register EESR Fatal hardware errors that belong to the PERR IERR and TUE error types are repo
93. ssors and do not participate in cache coherency Any transaction issued to a disabled logical processor such as an interrupt results in an unmapped reply or a time out LP Enable Status Register AST CORE ENABLE STATUS The LP Enable Status register is a shared register that indicates whether each logical processor is currently enabled The register is a read only register with a single 64 bit field assuming a maximum of 64 logical processors per CMT processor in which each bit corresponds to a possible logical processor The UltraSPARC IV processor has only two software visible logical processors Name AST CORE ENABLE STATUS ASI 0x41 VA 63 0 0x10 Read Only Privileged JTAG Accessible UltraSPARC IV Processor User s Manual April 2004 3 4 2 2 un microsystems Bit 0 and bit 1 represents LP 0 and LP 1 respectively If a bit in the register is asserted 1 the corresponding logical processor is implemented and enabled A logical processor not implemented in a CMT device indicated as not available in the LP Available register cannot be enabled and its corresponding enabled bit in this register will be 0 A logical processor that is suspended is still considered enabled TABLE 3 5 shows the format of the LP Enable Status register Each bit represents one logical processor A bit set to 1 indicates the corresponding logical processor is enabled if set to 0 it is otherwise In the UltraSPARC IV pr
94. t generally flushes Subnormal operands to 0 with the same sign as the SbN number and proceeds to use the value in the operation Subnormal results those that would otherwise cause an unfinished FPop are also flushed to 0 in Nonstandard mode IEEE 754 1985 Standard 9 73 un microsystems 9 8 2 9 74 If the higher priority invalid operation nv or divide by zero dz condition occurs then the corresponding condition s are flagged in the FSR cexc field If the trap is enabled FSR TEM then an fp exception ieee 754 trap occurs If the trap is disabled then the corresponding condition s are also flagged in the FSR aexc field If neither the invalid nor divide by zero conditions occur then an inexact condition plus any other detected floating point exception conditions are flagged in the FSR cexc field If an IEEE trap is enabled FSR TEM then an fp exception ieee 754 trap occurs If the trap is disabled then the corresponding condition s are also flagged in the FSR aexc field Subnormal Number Generation Handling of the FMULs FMULd FDIVs FDIVd and FdTOs instructions requires further explanation Define Sign sign of result RTgg round nearest effective truncate or round truncate e RP round to Infinity e RM round to Infinity RND FSR RD e Er biased exponent result Ey the biased exponent result before rounding e E rs biased exponent of rs operand and e P rar precision of the
95. talled in the E cache fcn 17 Page Implemented as NOP fen 4 Prefetch Invalidate fcn 16 a line in the P cache is invalidated if the specified target address is found in the P cache A prefetch invalidate instruction must be followed by a MEMBAR sync instruction UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 8 Memory Controller This chapter enhances the material described in Chapter of the Secondary Document to UltraSPARC III Cu Processor User s Manual Chapter Topics SDRAM Timing Control on page 47 Chip Kill DIMM Support on page 49 8 1 SDRAM Timing Control In the UltraSPARC III Cu processor some of the MCU timing settings were based on processor clock rate Due to the clock rate increase the UltraSPARC IV processor needs to add one bit the most significant bit for each of the following 12 fields sdram ctl dly sdram clk dly rd wait auto rfr cycle rfr int rd msel dly rdwr rd ti dly rd wr ti dly wr wr ti dly rdwr rd pi more dly addr le pw and cmd pw The UltraSPARC IV processor adds another MCU timing control register to accommodate these bits This register bears the same access constraints as other MCU timing control registers Name Mem Timing5 CTL ASI 0x72 VA 63 0 0x48 PIO Addr SAFARI ADDRESS REG 0x400048 Read Write shared register 8 47 8 48 un microsystems TABLE 8
96. tate and Error state 5 39 un microsystems TABLE 5 2 No New Register 75 ASI CORE ENABL E STATUS 0x41 VA 0x10 8 ASI XIR STEERI NG 0x41 VA 0x30 9 ASI CORE ENABL E 0x41 VA 0x20 Hard POR System Reset Soft POR UItraSPARC IV Defined Shared Registers Field Reset Machine State Comments 63 2 are not implemented Value of ASI CORE ENAB LE 1 0 at the time of reset deassertion Value of ASI CORE ENABLE 1 0 at the time of reset deassertion 63 2 are not implemented Value of ASI CORE ENAB LED 1 0 at the time of reset deassertion Value of ASI CORE ENABLED 1 0 at the time of reset deassertion 63 2 are not implemented Unchanged Both LPs are enabled by default During reset this register could be overwritten by the JTAG controller 10 ASI CORE RUNNI NG 0x41 VA 0x50 0x60 0x68 Deassertion 01 if LP 0 is enabled 10 otherwise Deassertion if LP 0 is enabled 10 otherwise 5 40 UltraSPARC IV Processor User s Manual April 2004 63 2 are not implemented By default only the lowest enabled LP will be running after reset The JTAG controller can overwrite this default setting However only enabled LPs can become running un microsystems TABLE 5 2 UltraSPARC IV Defined Shared Registers Field Reset Machine State System Reset No New Register Hard POR Soft POR Comme
97. ted to a logical processor occurs it must be recorded and a logical processors must be trapped to deal with the error Where to record the error and which logical processor to trap is addressed in the following subsections By definition errors not associated with a logical processor are asynchronous errors if they could be identified with an instruction they could be identified with a logical processor that occur in shared resources Error Handling 10 79 un microsystems 11 2 10 80 Error Steering When an error occurs in a shared resource the error must be reported to one of the logical processors that shares that resource Error steering registers are used to determine which logical processor will handle the error Error steering registers are software configurable registers where software can specify which logical processor should handle an error That is the error steering register defines to which logical processor the error is reported and that logical processor will be trapped to handle the error The CMT Error Steering register described in TABLE 10 3 is used to direct the hardware which logical processor s AFAR AFSR is used to report an error not specific to any one logical processor Name ASI_CMP_ERROR_STEERING ASI 0x41 VA 63 0 0x40 Privileged Read Write JTAG Accessible TABLE 10 3 CMT Error Steering Register Shared Bit Field 63 6 Reserved 5 1 Mandatory Value Should be 0 s
98. traSPARC IV processor inherits all of the RAS Reliability Availability and Serviceability features implemented in the UltraSPARC HI Cu processor with the following differences and enhancements Architectural Overview 2 5 2 6 un microsystems The UltraSPARC IV Processor Adds Chip Kill DIMM Support In addition to NG DIMM the UltraSPARC IV processor also supports Chip Kill SDRAM DIMM CK DIMM The CK DIMM employs x4 SDRAM parts Each bit of an SDRAM is protected by different Error Correction Code bits Therefore the system can correct errors resulting from one failed SDRAM The UltraSPARC IV Processor Adds L2 cache Address Bus Error Detection Capability In the UltraSPARC III Cu processor two sets of address and control signals are used to read write the L2 cache data one for the lower 16 bytes of data and its corresponding ECC the other for the upper 16 bytes of data and the corresponding ECC In the UltraSPARC IV processor the same two sets of address and control signals are maintained However the set of signals that accesses the lower 16 bytes of data now accesses the ECC of the upper 16 bytes of data and the set of signals that accesses the upper 16 bytes of data now accesses the ECC of the lower 16 bytes of data By splitting the ECC this way the address buses used to access the L2 cache are implicitly protected UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 3 3 1 Sch Chip
99. ts Field Description 63 10 Reserved Reserved 9 0 Int ID The Int ID is used as the source or target logical processor identities in a Sun Fireplane Interconnect INT transaction In a Sun Fireplane Interconnect INT transaction the source logical processor identity is placed in the Sun Fireplane Interconnect Address bus bits 38 29 and the target logical processor identity is placed in Address bus bits 23 14 Note If the Int ID of the two logical processors in an UltraSPARC IV processor are not unique in a system then the behavior of the logical processor when an interrupt specifying that ID is sent or received is undefined CESR Cluster Error Status Register ID Register The CESR ID register summarized in TABLE 3 3 provides support for a tightly clustered system This register contains an 8 bit field CESR ID which uniquely identifies a logical processor in a tightly clustered system Certain transactions append this value into the transaction This allows software at a remote node or within the cluster switch to associate the initiating logical processor with the transaction The CESR ID register should only be used with the appropriate cluster interconnect and the corresponding cluster specific software support The specific value to encode in the CESR ID register is platform specific When not used in a cluster architecture this register should always be programmed to zero UltraSPARC IV Processor
100. u Processor User s Manual Any material that is not referred to in this supplement remains unchanged for the UltraSPARC IV processor Target Audience This user s manual is mainly targeted for programmers who write software for the UltraSPARC IV processor This user s manual supplement contains a depository of information that is useful to operating system programmers application software programmers logic designers and third party vendors who are trying to understand the architecture and operation of the UltraSPARC IV processor This supplement is both a guide and a reference manual for low level programming of the processor Prerequisites This user s manual is a companion to the UltraSPARC III Cu Processor User s Manual The reader of this user s manual should be familiar with the contents of the UltraSPARC III Cu Processor User s Manual xi e Sun xii microsystems Textual Usage Fonts Fonts are used as follows Italic font is used for emphasis assembly language terms book titles and the first instance of a word that is defined It is used for exception and trap names Examples include The privileged action exception e fp exception ieee 754 unfinished fp Courier font is used for register names named bits software examples instruction fields and instruction names Examples include The rs1 field contains PSTATE RED RED state NWINDOWS PREFETCH assign rand out lfsr reg 1 amp
101. ual April 2004 un microsystems ASI 0x76 Writing or 0x7E Reading VA 63 23 0 Name ASI ECACHE W 0x76 ASI ECACHE R 0x7E TABLE 4 4 4 MB Direct Mapped Bit Field Description 63 22 Reserved Reserved 21 5 EC addr uses a 17 bit index 21 5 to read and write a 32 byte field from the L2 cache to and from the L2 cache Data Staging registers 4 0 Mandatory value should be 0 s TABLE 4 5 4MB 2 Way Direct Mapped Bit Field Description 63 22 Reserved Reserved 21 EC way uses a 16 bit index 20 5 plus way select to read and write a 32 byte field from the 20 5 EC addr L2 cache to and from the L2 cache Data Staging registers 4 0 Mandatory value should be 0 s TABLE 4 6 8 MB Direct Mapped Bit Field Description 63 23 Reserved Reserved 22 5 EC addr uses a 18 bit index 22 5 to read and write a 32 byte field from the L2 cache to and from the L2 cache Data Staging registers 4 0 Mandatory value should be 0 s Caches and Cache Coherency 4 31 e Sun microsystems TABLE 4 7 8 MB 2 Way Direct Mapped Bit Field Description 63 23 Reserved Reserved 22 EC_way uses a 17 bit index 21 5 plus way select to read and write a 32 byte field from the 20 5 EC addr L2 cache to and from the L2 cache Data Staging registers 4 0 Mandatory value should be 0 s The size of EC addr is determined by the EC siz
102. xc and FSR cexc overflow and inexact flags FSR ofa 1 FSR nxa 1 FSR ofc l and FSR nxc 1 No trap is generated fany or both of the appropriate trap enable masks are set FSR OFM 1 or FSR NXM 1 then only an IEEE overflow trap is generated FSR ftt I The particular FSR cexc bit that is set follows the SPARC V9 architecture If FSR OFM 0 and FSR NXM 1 then FSR nxc 1 If FSR OFM 1 independent of FSR NXM then FSR o c 1 and FSR nxc 0 Gross Underflow Zero result Result 0 with correct sign If the appropriate trap enable masks are not set FSR UFM 0 and FSR NXM 0 then set the FSR aexc and FSR cexc underflow and inexact flags FSR ufa 1 FSR nxa I FSR ufc l and FSR nxc l A trap is not generated IEEE 754 1985 Standard 9 75 un microsystems If either or both of the appropriate trap enable masks are set FSR UFM I or FSR NXM I then only an IEEE underflow trap is generated FSR ftt 1 and FSR cexc uf 1 The particular FSR cexc bit that is set diverges from previous UltraSPARC implementations to follow the SPARC V9 architecture e If FSR UFM 0 and FSR NXM 1 then FSR nxc 1 e If FSR UFM 1 independent of FSR NXM then FSR ufc 1 and FSR nxc 0 Subnormal Handling Override Result is an QNaN or SNaN e Subnormal SNaN QNaN invalid exception generated Standard mode No unfinished FPop Nonstandard mode No FSR NX e Subnormal QNaN QNaN no exce
103. xception TEM 1 FDIV rs rs gt rd se d SN 1 Destination Register Flag s Destination Register Flag s Tra Written rd 9 Written rd ts rap 0 40 sign 0 expo 111 111 set nvc 6 set nvc Eege frac 111 111 QNaN set nva ieee trap 0 Normal 0 no 0 no 0 Infinity 0 no 0 no set nvc set dze Normal 0 Infinity 9 no set nvc set nva ieee trap set nvc Sepe Normal 0 Infinity no set nvc set nva i ieee trap mre set dzc Normal 0 Infinity no set nvc set nva ieee trap S tnve set dzc Normal 0 Infinity no set nvc set nva ieee trap Normal Normal May underflow May underflow overflow see 9 5 overflow see 9 5 set nvc set nvc Infinity Infinit a d n d Kees SEN set nva S ieee trap Infinity Normal Infinity no Infinity no Infinity Normal Infinity no Infinity no Infinity Normal Infinity no Infinity no Infinity Normal Infinity no Infinity no IEEE 754 1985 Standard 9 59 un microsystems 9 3 5 9 3 6 9 60 Square Root TABLE 9 7 SQUARE ROOT Instruction sq root of rs FSORT rs gt rd Floating point Square Root RESULT from the operation includes one or more of the following e Number in f register see Trap Event note page 66 Exception bit set see TABLE 9 12 Trap occurs see abbreviations in TABLE 9 12 Underflow Overflow may occur Masked Exception TEM 0 Destination Register Enabled Ex
104. xception if either operand is a quiet or signalling NaN The rCMP instruction causes an exception for signalling NaNs only UltraSPARC IV Processor User s Manual April 2004 9 7 4 un microsystems NaN Results from Operands without NaNs The following operations generate NaNs see section 9 3 JEEE Operations on page 55 for details FSOQRT Normal or 0 FDIV 0 9 8 9 8 1 Subnormal Operations The handling of Subnormals is different for standard and nonstandard floating point modes The handling of operands and results are described separately in the following sections Response to Subnormal Operands The floating point unit responds to Subnormal operands and results in either hardware or by generating an Dn exception other with 5 566 2 unfinished FPop The response of the floating point unit depends on the operating mode of the floating point unit This is controlled by the FSR NS bit Standard Mode In Standard mode the floating point unit generally traps when a Subnormal operand is detected or a Subnormal result is generated In this situation the system software must perform or complete the operation The floating point unit supports the following in Standard mode Some cases of Subnormal operands are handled in hardware Gross underflow results are supported in hardware for FdTOs FMULs d and FDIVs d instructions Nonstandard Mode In Nonstandard mode the floating point uni
105. y Errors respectively TABLE 10 11 TABLE 10 12 TABLE 10 13 explains the Snoop result errors Mtag Errors and Internal errors on the PENDQ and QCTL respectively TABLE 10 14 TABLE 10 15 describes the Internal Errors of the TOB and the ECU respectively Error Handling 10 81 un microsystems 10 82 In addition the UltraSPARC IV processor adds three bits in L2 cache error enable register ASI_ESTATE_ERROR_EN_REG Each of the bits enforce one type of parity error so that the software can test the error report mechanism TABLE 10 4 Etag ECC errors Error Bit Field Type Description Comment 0 TSUE Uncorrectable Etag ECC error Specific to a due to DCache or ICache LP access 1 TSNPU Uncorrectable Etag ECC error Specific to a due to foreign snoop request LP 2 THUE Uncorrectable Etag error due to Specific to a other Etag accesses PCache LP WCache write back etc TABLE 10 5 Internal errors of the MCU Error Bit Field Type Description Comment 3 CANCL NH IERR Request to cancel a transaction Not specific to a that has never entered the LP MCU queues 4 REFSH IERR Refresh starvation on one of Not specific to a SDRAM banks LP 5 MO OV PERR Memory controller backing Not specific to a queue overflows after LP PauseOut is asserted TABLE 10 6 Internal Error of the Write Cache Error Bit Type Description Comment
106. ystems 3 7 CMT Register Changes Due to Reset FIGURE 3 1 shows the changes in CMT registers during reset Set at time of manufacture LP_AVAILABLE encoded index of most sig 1 bit State of running processor before reset Set at beginning of reset During reset Set at end of reset LP_ENABLE non POR e system reset LP_ENABLE value is unchanged by processor but may be changed by an external agent non POR system reset least significant 1 bit remains 1 others set to 0 LP_RUNNING value is unchanged by processor but may be changed by an external LP_ENABLE LP ENABLE STATUS LP RUNNING STATUS XIR STEERING p u encoded index of least significant 1 bit LP ERROR STEERING If modification of this value by an external agent causes it to be incompatible with other logical processor states logical processor behavior after reset is undefined FIGURE 3 1 CMT Register Changes During Reset 3 24 UltraSPARC IV Processor User s Manual April 2004 un microsystems CHAPTER 4 Caches and Cache Coherency This chapter supplements Chapter 10 of the UltraSPARC III Cu Processor User s Manual and contains additional information for the UltraSPARC IV processor All registers described in this chapter are private unless otherwise specified Chapter Topics Write Cache W cache on page 25 External L2 Cach
Download Pdf Manuals
Related Search
Related Contents
Stratos CF PowerMonster II 取扱説明書 Operating Instructions the Brochure Getting Started - Scenario Design Center mode d`emploi Procolor Integra Screen Preservez votre vue ! Behringer SD16 Quick Start Guide Manual de usuario - produktinfo.conrad.com minitritatutto - manuale di istruzioni • mini chopper - use Copyright © All rights reserved.
Failed to retrieve file