Home

MIPS R10000 Microprocessor User's Manual

image

Contents

1. R10000 PECL differential 3 SysCIk ciput system clock SysCI ME M L SysCikRet L __SysClkRet Termination resistors poe e l pL SCCIk 5 0 clock generators r SCCIk 5 0 SRAM Replicated HSTL differential PCIk output clocks pa es Ee pet c 4 Figure 7 1 R10000 System and Secondary Cache Clock Interface Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 8 Initialization This section describes initialization of the R10000 processor including initialization of logical registers Initialization of the processor occurs during a reset sequence The processor supports three separate reset sequences e Power on reset e Cold reset e Soft reset These sequences are described in this chapter Also described are the mode bits MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997159 160 Chapter 8 8 1 Initialization of Logical Registers After a power on or cold reset sequence all logical registers both in the integer and the floating point register files must be written before they can be read Failure to write any of these registers before reading from them will have an unpredictable result 8 2 Power On Reset Sequence The Power on Reset sequence is used to reset the processor after the initial power on or whenever power or SysClk are interrupted The Power on R
2. sss 139 System Interface Coherency ti iae a tere a E apa o ea arei easa Dee aaaea Eei tenerent 141 External Intervention Shared Request ccccccccssesesseteeecseesesessenenesssesesneteeseeceeeseseseeenes 141 External Intervention Exclusive Request sss 141 External Invalidate Rejuest eni ann nn entente 141 External Coherency Request ActiOn ccccccccesssesesssneesescesesescssenenesssesesnesneseecseessesseananes 142 Coherency Conflicts regrei e ra a e EE EE RE tereti te eH Fe ee ire ESE EE 143 Internal Coherency Conflicts c ccccscscscecseesesssescscesesescssenseseseseenesesceeecesesssesnananenens 143 External Coherency Conflicts ese hengst nesesito seert pekena iE a ira nias 144 External Coherency Request Latency sse 146 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 xvi Table of Contents SysGblPerf Signal 23 2 tette eie mede eiie Ee ie dri lee ie 148 Cluster B s Operil Ner eista a ee a eE a ae n tenente tenente R EAS 148 Support for Q rie iiem ie iibi mi ite pi i e e er dt dS 152 Support for External Duplicate Tags ssssssssseseseee eene eren enne 152 Support for a Directory Based Coherency Protocol sese 153 Support for Uncached Attribute sssssssssseeeeeeeeeeeneen nennen 153 Support for Hardware Emulation sese een enne 154 Version 2 0 of January 29 1997 MIPS R10
3. um AINO 3ON33343M N04 NMOHS TVLIA VIA 9 eee NOI93M M31N39 OSOO00 L 1d3O0X3 OSOSZ Z 30lS WOLLOG INTEL OSOSZ Z IAS dOl EE E SAIN SNOIO3M 100d334 LNINOdWOD G Q310N SV 1d30X3 OSOGZ Z 13400S VOT HIV3N38 G3llBIHOMd ASYN M3010S v IN S3HONI OMOI OS M3AO NV S3HONI OMOIW OF NILYId Ehe J S00 T 33V S3ONVM3101 Q3ijiO3dS 3SIMN3HIO SS3INn c 0071 S3HONI NI 33v SNOISN3NIQ I OSO oso H KH SALON SC S alv s eoo ee 14M S60 oO is 310H Q3lv1d NON dau ZION a N 00 9680 6 TN 438 VIA GSHSINIS 6009 i KT Jaa TUNG LLO fe 6 JAldWVX4 d o i c Re MI 2 Be go E Noisy SIHL olo Cs ee Oo pt NIHLIM Q3MOTV 339 eyo e E ASYN Y3C10S ae YON a em S 080011 7 43M VIA O3HSINIJ 109 D ee e j34 TUNG 9109 o Pa AN NMOHS j J o eT te n ne Be re RELL XT L dIdWvX3 4 10H 310H Q31Vld NON ozz Q31V1d NON 00 6T601 6 vec o a v coo e e Ae oed 0091 Z00 630 0 9 OI93M 19VINOO X668 AVI30 ONVI Figure 13 3 599LGA PWB Footprint Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Chapter 13 v Ay NOILdO SMOON3A N ISIQ Nld G Y97669 00r g XVN M 2 6 0 VS VI3HL 3ONVISIS3H TWWYSHL v YNISLV3H ooz z WANINM
4. Table 14 15 Config Register Field Definitions Field Bits Name Hardwired Values Coherency algorithm 0002 reserved 001 reserved 0105 uncached KO 2 0 011 cacheable noncoherent 1005 cacheable coherent exclusive 1015 cacheable coherent exclusive on write 1105 gt reserved 111 uncached accelerated DN 4 3 Device number CT 5 CohPrcReqTar PE 6 PrcElmReq PM 8 7 PrcReqMax EC 12 9 SysClkDiv SB 13 SCBlkSize SK 14 SCCorEn BE 15 MemEnd SS 18 16 SCSize SC 21 19 SCCIkDiv 2522 Reserved 0 DC 28 26 Primary data cache size hardwired to 0112 32 Kbytes IC 31 29 Primary instruction cache size hardwired to 011 32 Kbytes 31 29 28 26 25 24 3 3 1 2 23 22 21 19 SD SC 1 3 Config Register 18 16 15 14 19 12 98 7 6 5 4 32 0 3 1 1 1 4 2 1 1 2 3 Figure 14 16 Config Register Format MIPS R10000 Microprocessor User s Manual Coprocessor 0 257 14 15 Load Linked Address LLAddr Register 17 Physical addresses for Load Link instructions are no longer written into this register LLAddr is implemented as a read write scratch register used for NT compatibility Figure 14 17 shows the format of the LLAddr register LLAddr Register R W NT 32 Figure 14 17 LLAddr Register Format MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 258 Chapter 14 14 16 WatchLo 18 and WatchHi 19 Registers WatchHi and WatchLo are 32 bit
5. ssss 324 64 bit Kernel Mode Physical Spaces xkphys sss 324 64 bit Kernel Mode Kernel Space xkseg sss 326 64 bit Kernel Mode Compatibility Spaces ckseg1 0 cksseg ckseg3 326 Address Space Access Privilege Differences Between the R4400 and R1000 326 Virtual Address Translation oreet tht inte terre rre rere andes ae 328 Vartual Pages second m Gott mi doa cm sei des 328 Virtual Page Size Encodihg Seisin ao iede e E Ee 328 Using the TEBs etenim eie metet eoo de E e etit ge ut 329 Cache Algorithni Field ane eate e eed tet Eget e rige e e Poe aed 329 Format of a TEB Entty uiae eet nti entente decet yu ee ener es tuer 329 Address Translation issnin ane sosser AE ea ean on e trae etnia 330 Address Space Identification ASID see nennen 330 Global Processes r 7 5 o p a CR pete eee e 330 Avoiding TEB Conflict tren ente nente iri dieere re Pre Peine 330 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 xxviii Table of Contents 17 CPU Exceptions Causing and Returning from an Exception 332 Exception Vector Locations eere 332 TEB Refill Vector Selections eene Ie ete e n 333 PriOrity Of Exceptions oie en etae em e Pe Fei ern n n Pere Pen e arcades 335 Cold Reset Exception eiue nitet e te tet ellen itia rise te EE 336 Soft Reset Exception uses ns bete iae tte seed hal Or hee Dh p AI eti
6. Sysemd 11 0 gt 4 SysCmdPar 4 SysAD 63 SC A B Addr 18 0 P SysADChk 7 0 4 h SysVal No gt a 4 SysState 2 0 SC A B DWay Addr SysStatePar o lt SysStateVal SCData 127 0 Data OO SCDataChk 9 0 pe w SysResp 4 0 SC A B DWr Wr o SysRespPar SC A B DCS CS SysRespVal SC A B DOE OE 4 SysReq SCTWI Wr LL SysGnt SCTCS CS 4 SysRel R10000 SCTOE OE a SCTag 25 0 Data Sos I SysRdRgy SCTagChk 6 0 gt a I SysWrRdy SCTWay Addr SCTagLSBAddr 4 Syscmd 11 0 4 1 SysCmdPar 4 n SysAD 63 0 SC A B Addr 18 0 a re SysADChk 7 0 n SysVal lt i SysState 2 0 SC A B DWay Addr 4 t SysStatePar n 4 1 SysStateVal SCData 127 0 Data AU SCDataChk 9 0 E S SysResp 4 0 SC A B DWr wr SysRespPar SC A B DCS CS SysRespVal SC A B DOE OE MIPS R10000 Microprocessor User s Manual System Interface Connections for Multiprocessor using Dedicated External Agents System Interface Operations Multiprocessor System Using the Cluster Bus Mem I O Figure 6 4 System Interface Connectio
7. SysCyc Version 2 0 of January 29 1997 System cycle The external agent may use this signal to define a virtual System interface clock in a hardware emulation environment MIPS R10000 Microprocessor User s Manual Input Interface Signal Descriptions 43 3 4 Test Interface Signals Table 3 4 presents the R10000 processor test interface signals Errata PLLDis and SelDVCO signal descriptions are revised in Table 3 4 Table 3 4 Test Interface SignalsPLLDis Signal Name Description Type JTAG Signals JTAG serial data input ae Serial data input Input JT AG serial data output pe Serial data output JT AG clock TICK Clock input JTAG mode select TES Mode select input Miscellaneous Test Signals TCA Testability control A for manufacturing test only Input This signal must be tied to Vss through a 100 ohm resistor P TCB Testability control B for manufacturing test only Mout This signal must be tied to Vss through a 100 ohm resistor P PLL disable for manufacturing test only FLLDis This signal must be tied to Vss through a 100 ohm resistor DE PLLRC PLL Control Node for manufacturing test only There must be no connection made to this signal PLLSpare 1 4 These four pins must be tied to Vss Spare 1 3 These two pins must be tied to Vss through a 100 ohm resistor Tristate Control TriState The system asserts this signal to tristate all outputs and input Input output p
8. Reserved Uncached accelerated MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 54 Chapter 4 Descriptions of the Cache Algorithms Uncached This section describes the cache algorithms listed in Table 4 1 Loads and stores under the Uncached cache algorithm bypass the primary and secondary caches They are issued directly to the System interface using processor double single partial word read or write requests Cacheable Noncoherent Under the Cacheable noncoherent cache algorithm load and store secondary cache misses result in processor noncoherent block read requests External agents containing caches need not perform a coherency check for such processor requests Cacheable Coherent Exclusive Under the Cacheable coherent exclusive cache algorithm load and store secondary cache misses result in processor coherent block read exclusive requests Such processor requests indicate to external agents containing caches that a coherency check must be performed and that the cache block must be returned in an Exclusive state Cacheable Coherent Exclusive on Write Version 2 0 of January 29 1997 The Cacheable coherent exclusive on write cache algorithm is similar to the Cacheable coherent exclusive cache algorithm except that load secondary cache misses result in processor coherent block read shared requests Such processor requests indicate to external agents containing caches that a coherency
9. Whenever the processor writes the secondary cache data array it drives the proper ECC on SCDataChk 8 0 and even parity on SCDataChk 9 MIPS R10000 Microprocessor User s Manual Error Protection and Handling 177 Data Array in Correction Mode The secondary cache operates in correction mode when the SCCorEn mode bit is asserted Whenever the processor reads the secondary cache data array in correction mode the data is sent through a data corrector If a correctable error is detected in line correction is automatically made without affecting latency The processor informs the external agent that a correctable error was detected and corrected by asserting SysCorErr for one SysClk cycle If an uncorrectable error is detected the secondary cache unit posts a Cache Error exception and initializes the D and SIdx fields in the local CacheErr register see Chapter 14 CacheErr Register 27 for more information In correction mode secondary to primary cache refill latency is increased by two PCIK cycles Multiple processors operating in a lock step fashion remain synchronized in the presence of secondary cache data array correctable errors Table 9 5 presents the ECC matrix for the secondary cache data array MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 L66L 67 fiavnuv fo g z uois42A unuvjw s 42sp1 40s92204d0421AE OOOOLM SdIIA 101111111 0111 1111 1111 1111 1111 Number of ones
10. K KSU 0 Kernel mode EXL 0 ERL 0 EXL EXL 1 ERL 0 The following rules apply e The performance counter registers may be preloaded with an MTCO instruction and counting is enabled by asserting one or more of the count enable bits in the performance control registers e The interrupt enable bit must be asserted to cause IP 7 To determine the cause of the interrupt the interrupt handler routine must query the following the performance counter register the interrupt enable bit of the associated performance control register of both counters If neither of the counters caused the interrupt IP 7 must be the result of the CPO Count register matching the CPO Compare register MIPS R10000 Microprocessor User s Manual Coprocessor 0 14 21 ECC Register 26 The R10 273 000 processor implements a 10 bit read write ECC register which is used to read and write the secondary cache data ECC or the primary cache data parity bits Tag ECC and parity are loaded to and stored from the TagLo register Unlike the R4400 the only CacheOps that use ECC register are Index Load Data and Index Store Data In the R4400 both the primary instruction and data caches are parity byte protecte d In the R10000 processor the following protection schemes are used The primary instruction cache is word protected where one word contains 36 bits and one parity bit is used for each instruction word IP in F
11. MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 170 Chapter 9 9 3 Propagation of Uncorrectable Errors Version 2 0 of January 29 1997 The processor assists the external agent in limiting the propagation of uncorrectable errors in the following manner During external block data response cycles if the data quality indication on SysCmd 5 is asserted or if an uncorrectable ECC error is encountered on the system address data bus while the ECC check indication on SysCmd 0 is asserted the processor intentionally corrupts the ECC of the corresponding secondary cache quadword after receiving an external ACK completion response During processor data cycles the processor asserts the data quality indication on SysCmd 5 if the data is known to contain uncorrectable errors The System interface ECC is never intentionally corrupted the SysCmd 5 bit is used to indicate corrupted data If an uncorrectable cache tag error is detected the processor asserts SysUncErr for one SysClk cycle An external coherency request that detects a secondary cache tag array uncorrectable error asserts the secondary cache block tag quality indication on SysState 2 during the corresponding processor coherency state response If an external coherency request requires a processor coherency data response and a primary data cache tag parity error is encountered during the primary cache interrogation or a secondary cache tag
12. The processor may only issue a processor request when the System interface is in master state If the System interface is in master state the processor may issue a processor request immediately Processor requests may occur in adjacent SysClk cycles If the System interface is not in master state the processor must first assert SysReq and then wait for the external agent to relinquish mastership of the System interface bus by asserting SysGnt and SysRel When multiple nonconflicting processor requests and or coherency data responses are ready and meet all issue requirements the processor uses the following priority e block read and upgrade requests have the highest priority followed by processor coherency data responses e processor eliminate and typical block write requests e processor double single partial word read write and uncached accelerated block write requests which have the lowest priority MIPS R10000 Microprocessor User s Manual System Interface Operations 113 Processor Block Read Request Protocol Errata A processor block read request results from a cached instruction fetch load store or prefetch that misses in the secondary cache Before issuing a processor block read request the processor changes the secondary cache state to Invalid Additionally if the secondary cache block former state was DirtyExclusive a write back is scheduled Note that if the processor block read request receives an external
13. The processor reads each subset block from the primary instruction cache If ITag STag and IState 1 Valid then the block is invalidated by writing the IState bit to 0 Invalid and the IState parity bit to 0 Read each subset block from the primary data cache If DTag STag and DState is not equal to 00 Invalid then write the DState bits 00 Invalid the StateMod bits 001 Normal the SCWay bit 0 and the DState parity bit 0 If the original block is DState 11 Dirty and StateMod 0105 Inconsistent also write this block back to the secondary cache using the DTag and the SCWay bit from the primary data tag array Set the state of the secondary cache block to 00 Invalid Since the secondary cache is designed so all tag bits must be written at once the Tag VA and ECC bits are also written The tag is written with the PA and VA 13 12 virtual index of the original CACHE instruction address The ECC is generated If the secondary cache block s original State bits were 11 Dirty the block is written back to the system interface unit If the block s State was Shared or CleanExclusive the system interface unit is notified with a Tag Invalidation request that the block has been deleted The MRU bit is set to point away from the block invalidated unless the line was already invalid MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 194 Chapter 10 10 5 Index Load Tag I Index Load
14. tete er ree ere Dre edid e peter itead 195 Index Store Tag I2 5 nig c tete dire Pie rures iE HE D Eee HI Pe iude ade 196 Index Store Tag 5 4 5 55 eb eee bb es 196 Fit Invalid te Ds ito nep HERR POR HDI OPERE ied t ls 197 Hit Inyalidate DJ eiat toe ede en cede eR ee e ER NERA Lee Cond Len 197 FH tInvalid te 5 insit er ee dde tte der re e RR 198 Cache Bartietizc eee erem entree nec eee ote eee oe tele eee ee eie ees 198 Hit Writeback Invalidate D epe e a eer er tbe rte ertet ts 199 Hit WriteBack Invalidate S sse seen entente teen trennen aE enne 200 Index Load Data I nhe ix tenerte repete ex br e bre Fen eor ete coe 201 Index Load Data QD iiio hates EAR AA A AS ARE A ASEA 201 inder Load Data S henor A O E ele Rt E O 201 Index Store Data Lyrics etii teet t eet Ras Oba i ttes 202 Index Store Data D reete rii ettet etae eni etn een O RNEER EERE ATEA EATA 202 index Store Dara S ennan A E O BI AE eed ea eet 202 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents xxi 11 JT AG Interface Operation Test Access Port EAD heo etie dede po ERE ebd eb epis 204 TAP Controller Input meret tire trae prie i e ete e Ee ead 204 Instruction Registe inani tee na EE eE AEE RA EA e AEE RA EA EEA tete tents A SS 205 Bypass Register i erie eee tp ee n e ener He een E HH RH E ria eE S EE iit ea 205 Boundary Scan Registers gisper e a E A 206 MIPS R10000 Mic
15. 31 26 25 24 65 0 CO 0 010000 0000000000000000000 000010 6 1 19 6 COPO TLBWI Format TLBWI Description The G bit of the TLB is written with the logical AND of the G bits in the EntryLoO and EntryLol registers The TLB entry pointed at by the contents of the TLB Index register is loaded with the contents of the EntryHi and EntryLo registers The operation is invalid and the results are unspecified if the contents of the TLB Index register are greater than the number of TLB entries in the processor In the R4400 this instruction had to be executed in unmapped spaces and in the R10000 processor it can be executed in unmapped spaces without any hazard There is no hazard to executing a TLB write in mapped space unless the write affects those instructions that have been fetched and buffered by the processor If necessary a flush to the instruction fetch pipeline such as execution of a jump register instruction after a TLB write can avoid this hazard In the R4400 processor a TLB write instruction is used to write the whole page frame number from the EntryLo registers to the TLB entry Depending on the page size specified in the corresponding PageMask register the lower bits of PEN may not be used for address translation In the R10000 processor the lower bits not used for address translation are forced to zeroes during a TLB write This does not affect TLB address translation however a TLB read
16. During a reset of the TAP controller the value 1111 is loaded into the parallel output of the instruction register thus selecting the bypass register as the default During the Shift IR state of the TAP controller data is shifted serially into the instruction register from JTDI and the LSB of the instruction register is shifted out onto JTDO During the Update IR state the current state of the instruction register is shifted to its parallel output for decoding 11 3 Bypass Register The bypass register is 1 bit wide When the bypass register is selected and the TAP controller is in the Shift DR state data on JTDI is shifted into the bypass register and the output of the bypass register is shifted out onto JTDO MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 206 Chapter 11 11 4 Boundary Scan Register Version 2 0 of January 29 1997 The bypass register is 1 bit wide The boundary scan data register is selected by loading 0000 into the instruction register The Shift DR Update DR and Capture DR states of the TAP controller are used to operate the boundary scan register according to the IEEE 1149 1 standard specifications The boundary scan register provides serial access to each of the processor interface pins as shown in Figure 11 1 Hence the boundary scan register can be used to load and observe specific logic values on the processor pins Integrated Circuit IC package pin Boun
17. For example beq 12 r3 L1 nop A comparison of r2 and 13 is made to determine whether the branch is taken or not If the branch prediction is correct the branch instruction is graduated Otherwise the processor must back out of the instruction stream decoded after this branch and inform the IFetch to fetch the correct instructions The evaluation is made in the ALU for integer branches and in the Graduation Unit for floating point branches A single integer branch can be evaluated during any cycle but there may be up to 4 condition codes waiting to be evaluated for floating point branches Once the condition code is evaluated all dependant FP branches can be evaluated during the same cycle Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 31 Other Performance Issues Table 1 2 shows execution times within the functional units only Performance may also be affected by instruction fetch times and especially by the execution of conditional branches In an effort to keep the execution units busy the processor predicts branches and speculatively executes instructions along the predicted path When the branch is predicted correctly this significantly improves performance for typical programs branch prediction is 85 to 90 correct When a branch is mispredicted the processor must discard instructions which were speculatively fetched and executed Usually this effort uses
18. The two software interrupts are accessible as bits 9 8 of the Cause register as shown in Figure 6 5 An MTCO instruction is used to write these bits The timer interrupt is accessible as bit 15 of the Cause register IP 7 as shown in Figure 6 5 This bit is set when one of the following occurs the Count register is equal to the Compare register either one of the two performance counters overflows A nonmaskable interrupt is accessible to an external agent as the SysNMI signal To post a nonmaskable interrupt an external agent asserts SysNMI for at least one SysCIk cycle The processor recognizes the nonmaskable interrupt on the first SysClk cycle that SysNMI is asserted After the nonmaskable interrupt is serviced an external agent may post another nonmaskable interrupt by first negating SysNMI for at least one SysClk cycle and reasserting SysNMI for at least one SysCIk cycle MIPS R10000 Microprocessor User s Manual System Interface Operations 6 15 Protocol Abbreviations 107 The following abbreviations are used in the System interface protocols SysCmd 11 0 Abbreviations Cmd BIkRd RdShd RdExc DSPRd BlkWr DSPWr Ugd Elm IvnShd IvnExc Alc Ivd Int ExtCoh ReqDat RspDat ReqLst RspLst Empty Unspecified command Block read request command Coherent block read shared request command Coherent block read exclusive request command Double single partial word read command Block write request command
19. e driving the data type indication on SysCmd 4 3 e driving the cache block state on SysCmd 2 1 e driving the ECC check indication on SysCmd 0 e driving the data on SysAD 63 0 e asserting SysVal The first 7 or 15 data cycles have a response data type indication and the last data cycle has a response last data type indication The external agent may negate SysVal between data cycles of an external block data response MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 128 Chapter 6 External block data response data must be supplied in subblock order beginning with the quadword aligned address specified by the corresponding processor request External block data responses for processor coherent block read shared or noncoherent block read requests may indicate a state of Shared CleanExclusive or DirtyExclusive External block data responses for processor coherent block read exclusive or upgrade requests may indicate a state of CleanExclusive or DirtyExclusive Figure 6 17 depicts two processor block read requests and the corresponding external block data responses Cycle 42 Be 450 8958 7 208 779 290 711532 A ee 5746 747 SysCik gtarn aca twee ne a oe Pos od metodos Master D Po Po Pp Po Po i fh Po Po i EA JA EA CEA Py SysReq EE nO aa c EX et UM i SysGnt m TW NE NE SysRel Lr
20. e load or store a doubleword that is not aligned on a doubleword boundary load fetch or store a word that is not aligned on a word boundary e load or store a halfword that is not aligned on a halfword boundary This exception is not maskable The common exception vector is used for this exception The AdEL or AdES code in the Cause register is set indicating whether the instruction caused the exception with an instruction reference load operation or store operation shown by the EPC register and BD bit in the Cause register When this exception occurs the Bad VAddr register retains the virtual address that was not properly aligned or that referenced protected address space The contents of the VPN field of the Context XContext and EntryHi registers are undefined as are the contents of the EntryLo register The EPC register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot If it is in a branch delay slot the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set as indication The process executing at the time is handed a UNIX SIGSEGV segmentation violation signal This error is usually fatal to the process incurring the exception MIPS R10000 Microprocessor User s Manual CPU Exceptions 341 TLB Exceptions Three types of TLB exceptions can occur e TLB Refill occurs when there is no TLB entry
21. 12 Status Processor Status Register 13 Cause Cause of the last exception taken 14 EPC Exception Program Counter 15 PRId Processor Revision Identifier 16 Config Configuration Register secondary cache size etc 17 LLAddr Load Linked memory address 18 WatchLo Memory reference trap address low bits Adr 39 32 19 WatchHi Memory reference trap address high bits Adr 31 3 20 XContext Pointer to kernel virtual PTE table in 64 bit addressing mode 21 FrameMask Mask the physical addresses of entries which are written into the TLB 22 BrDiag Branch Diagnostic register 23 Undefined Undefined 24 Undefined Undefined 25 PC Performance Counters 26 ECC Secondary cache ECC and primary cache parity 27 CacheErr Cache Error and Status register 28 TagLo Cache Tag register low bits 29 TagHi Cache Tag register high bits 30 ErrorEPC Error Exception Program Counter Coprocessor 0 instructions are enabled if the processor is in Kernel mode or if bit 28 CUD is set in the Status register Otherwise executing one of these instructions generates a Coprocessor 0 Unusable exception Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 237 14 1 Index Register 0 The Index register is a 32 bit read write register containing six bits to index an entry in the TLB The high order bit of the register shows the success or failure of a TLB Probe TLBD instruction The Index register also specifies the TLB entry affec
22. 31 26 25 21 20 16 15 11 10 0 COPO MT rt rd 0 010000 00100 000 0000 0000 6 5 5 5 11 Format MTCO rt rd Description The contents of general register rt are loaded into coprocessor register rd of CPO Operation 32 64 T data GPR rt T 1 CPR O rd data Exceptions Coprocessor unusable exception Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 297 14 34 TLBP Instruction TLBP Probe TLB For Matching Entry TLBP 31 26 25 24 65 0 COPO 0 TLBP 010000 0000000000000000000 001000 6 19 6 Format TLBP Description The Index register is loaded with the address of the TLB entry whose contents match the contents of the EntryHi register If no TLB entry matches the high order bit of the Index register is set to 0x80000000 as it is in the R4400 processor The architecture does not specify the operation of memory references associated with the instruction immediately after a TLBP instruction nor is the operation specified if more than one TLB entry matches Operation 32 Indexe 1 02 undefined for i in 0 TLBEntries 1 if TLB i os 77 EntryHis 42 and TLB i zg or TLB i z4 64 EntryHiz 9 then Index 0 6 i5 o endif endfor 64 Indexe 1 0 2 undefined for i in 0 TLBEntries 1 if TLB i 471 141 and not 0 TLB ilts 205 EntryHi4s 13 and not 075 TLB i o16 205 and TLB i 4o or TLB i 135 12g E
23. 4 SCData 61 SCData 60 SCData 59 T SCData 58 8 SCData 57 s SCData 56 10 SCData 55 SCData 54 SCData 53 13 SCData 52 14 SCData 51 15 SCData 50 16 SCData 49 SCData 48 SCData 47 19 SCData 46 20 SCData 45 21 SCData 44 22 SCData 43 SCData 42 SCData 41 25 SCData 40 26 SCData 39 27 SCData 38 28 SCData 37 SCData 36 SCData 35 31 SCData 34 32 SCData 33 33 SCData 32 34 SysADIO0 SysAD 1 SysAD 2 37 SysAD 3 38 SysAD 4 39 SysAD 5 40 SysAD 6 SysAD 7 SysAD 8 43 SysAD 9 44 SysAD 10 45 SysAD 11 46 SysAD I2 SysAD 13 SysAD 14 49 SysAD 15 50 SCData 0 51 SCData 1 52 SCData 2 SCData 3 SCData 4 55 SCData 5 56 SCData 6 57 SCData 7 58 SCData 8 SCData 9 SCData 10 61 SCData 11 62 SCData 12 63 SCData 13 64 SCData 14 SCData 15 SCData 16 67 SCData 17 68 SCData 18 69 SCData 19 70 SCData 20 SCData 21 SCData 22 73 SCData 23 74 SCData 24 75 SCData 25 76 SCData 26 SCData 27 SCData 28 79 SCData 29 80 SCData 30 81 SCData 31 82 SCDataChk 0 SCAAddr 18 SCAAddr 17 85 SCAAddr 16 86 SCAAddr 15 87 SCAAddr 14 88 SCAAddr 13 SCAAddr 12 SCAAddr 11 9 SCAAddr 10 92 SCAAddr 9 93 SCDataChk 2 94 SCDataChk 4 SCData 64 SCData 65 97 SCData 66 98 SCData 67 99 SCData 68 100 SCData 69 SCData 70 SCData 71 103 SCDataChk 9 104 SysCyc 105 SysAD 32 106 SysAD 33 SysAD 34 SysAD 35 109 SysAD 36 110 SysAD 37 111 SysAD 38 112 SysAD 39 SysAD 40 SysAD 41
24. 80000000 00000000 TO BERPEFENF EREIEERER 40001000 00000000 TO 7FFFFFFF FFFFFFFF 40000000 00000000 TO 40000FFF FFFFFFFF 00001000 00000000 TO 3FFFFFFF FFFFFFFF 00000000 80000000 TO 00000FFF FFFFFFFF AddrErr AddrErr AddrErr 00000000 00000000 TO 00000000 7FFFFFFF OK AddrErr AddrErr AddrErr AddrErr AddrErr if UX 0 OK OK OK OK AddrErr AddrErr if SX 0 AddrErr if UX 0 AddrErr AddrErr if SX 0 AddrErr OK OK t For data references the upper 32 bits of the virtual addresses are cleared before checking access privilege and TLB translation MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 328 Chapter 16 16 3 Virtual Address Translation Errata Virtual Pages Programs can operate using either physical or virtual memory addresses e physical addresses correspond to hardware locations in main memory e virtual addresses are logical values only and do not correspond to fixed hardware locations Virtual addresses must first be translated finding the physical address at which the virtual address points before main memory can be accessed This translation is essential for multitasking computer systems because it allows the operating system to load programs anywhere in main memory independent of the logical addresses used by the programs This translation also implements a memory protection scheme
25. Based on these processor coherency state responses the cluster coordinator allows R10000 to become master of the System interface so that it may provide a processor coherency data response which will be observed as an external block data response by R100009 Finally the cluster coordinator issues an external ACK completion response to forward the external block data response and to free the request number Figure 6 28 presents an example of a processor upgrade request with four R10000 processors residing on the cluster bus The CohPrcReqTar mode bit is asserted for a snoopy based coherency protocol R100009 issues a processor upgrade request observed as an external invalidate request by R10000 R100005 and R100003 R10000 and R100003 provide Shared processor coherency state responses R10000 provides an Invalid processor coherency state response Based on these processor coherency state responses the cluster coordinator issues an external ACK completion response for the processor upgrade request to indicate that the request was successful and to free the request number MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 150 Chapter 6 e A o N Cycle 1 2 SysClk Master Po Po SysReq0 SysGnt0 SysReq1 SysGnt1 SysReq2 SysGnt2 SysReq3 SysGnt3 SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysResp 4 0 SysRespPar SysRespVal SysStatePa
26. Cache Operation see also CACHE instructions 285 cache test mode entry 361 exit 362 cacheable coherent exclusive on write cache algorithm 53 cacheable coherent exclusive cache algorithm 5 3 54 cacheable noncoherent cache algorithm 53 54 cached request buffer 89 CacheErr register 171 172 174 175 274 CacheOp see also CACHE instructions 187 287 capacitors decoupling 218 cause bits FPU 310 Cause register 105 106 244 252 254 Cause field FP 310 CE bit 190 249 250 252 CH bit 190 250 285 chip revisions R10000 255 ckseg0 space 326 ckseg1 space 326 ckseg3 space 326 cksseg space 326 CLGA ceramic land grid array 220 electrical characteristics 221 layout 220 mechanical characteristics 220 package 220 pinout 224 thermal characteristics 222 clock domain in secondary cache 157 internal processor clock domain 155 secondary cache clock domain 155 System interface clock domain 155 signal PCIk 156 SCCIKk 157 SysClk 155 SysCIkRET 156 signals overview of 41 Version 2 0 of January 29 1997 Index clock divisor system interface 80 360 cluster bus 36 82 operation 148 cluster coordinator 81 82 cluster request buffer 89 coherency conflicts 143 coherency protocol directory based 153 coherency request external 138 140 coherency schemes 36 coherency System interface external intervention exclusive request 141 external intervention shared request 141 external invalidate request 14
27. Coherency Conflicts 143 Coherency conflicts arise when a processor request and an external request target the same secondary cache block Coherency conflicts may be categorized as either internal or external and are described in this section Internal Coherency Conflicts Processor Request Pending Issue A processor request is considered to be pending issue when it is buffered in the processor and has not yet been issued to the System interface bus Internal coherency conflicts occur when the processor has a processor request pending issue and a conflicting external coherency request is received Internal coherency conflicts are unavoidable and cannot be anticipated by the external agent since it cannot anticipate when the processor will have processor requests pending issue Table 6 28 describes the manner in which the processor resolves internal coherency conflicts Table 6 28 Internal Coherency Conflict Resolution Conflicting External Coherency Request Resolution Coherent block read Intervention shared Intervention exclusive Invalidate The processor allows the conflicting external coherency request to proceed and provides an Invalid processor coherency state response The processor stalls the processor coherent block read request until the conflicting external coherency request has received an external completion response Upgrade Intervention shared Intervention exclusive Invalidate T
28. E EM LL SysCmd 11 0 BERK BERK 5 naa Da GEEET Dalasi SyscmidPar XX 5 M s gt SysAD 63 0 XAK KATK gt K Da0 XY Kri4X Datis X Dato XY jid Da5 SysADChk 7 0 CEOE gt lt X a OE SysVal hee y SysRdRdy T EM BEEN SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Version 2 0 of January 29 1997 Figure 6 17 External Block Data Response Protocol MIPS R10000 Microprocessor User s Manual System Interface Operations 129 External Double Single Partial Word Data Response Protocol Cycle SysClk Master SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal MIPS R10000 Microprocessor User s Manual jose 5 M AMI ee it An external agent may issue an external double single partial word data response in response to a processor double single partial word read request An external agent issues an external double single partial word data response with a single data cycle the data cycle consists of e asserting SysCmd 11 e driving the request number associated with the corresponding processor request on SysCmd 10 8 e driving the data quality indication on SysCmd 5 e driving the response last data type indication on SysCmd 4 3 e driving the ECC check indication on SysCmd
29. Each external coherency request that targets an R10000 results in a processor coherency state response Additionally each external intervention request that targets the R10000 and hits a DirtyExclusive secondary cache block results in a processor coherency data response External coherency requests and the corresponding processor coherency state responses are handled in FIFO order External Intervention Shared Request An external intervention shared request is used by an external agent to obtain a Shared copy of a cache block If the desired block resides in the processor cache it is marked Shared If the secondary cache block s former state was DirtyExclusive the processor issues a processor coherency data response External Intervention Exclusive Request An external intervention exclusive request is used by an external agent to obtain an Exclusive copy of a cache block If the desired block resides in the processor cache it is marked Invalid If the secondary cache block s former state was DirtyExclusive the processor issues a processor coherency data response External Invalidate Request An external invalidate request is used by an external agent to invalidate a cache block If the desired block resides in the processor cache it is marked Invalid Under normal circumstances the secondary cache block former state should not be CleanExclusive or DirtyExclusive MIPS R10000 Microprocessor User s Manual Version 2 0 of
30. Intervention shared hit JE 7 Read hit Intervention shared hit Intervention shared hit N Write hit and Upgrade ACK Subset enforcement Write miss Read miss obtained DirtyExclusive CACHE Index Store Tag D Read miss obtained Shared CACHE Index Store Tag D Legend Internally initiated action Externally initiated action S Secondary cache D Data cache Figure 4 5 Primary Data Cache State Diagram Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Cache Organization and Coherency 51 4 3 Secondary Cache The R10000 processor must have an external secondary cache ranging in size from 512 Kbytes to 16 Mbytes in powers of 2 as set by the SCSize mode bit The SCBIkSize mode bit selects a block size of either 16 or 32 words The secondary cache is two way set associative that is two cache blocks are assigned to each set as shown in Figure 4 6 with an LRU replacement algorithm The secondary cache uses a write back protocol which means a cache store writes data into the cache instead of writing it directly to memory Some time later this data is independently written to memory The secondary cache is indexed with a physical address and tagged with a physical address Way 0 256 Kbytes to 8 Mbytes Way 1 256 Kbytes to 8 Mbytes Word Data 0 Word Word Data 1 Word Tag 0 0 75 Tag 1 0 7h5 Figure 4 6 Organization of Second
31. SCCIk lt 2 gt AA 31 SCCIk lt 3 gt AM 15 SCCIk lt 4 gt WY SIR 1 SCCIk lt 5 gt E uns 1 SCCIk 0 Eius 15 SCCIk 1 Briss 26 SCCIk lt 2 gt AB 33 SCCIk 3 AR 14 SCCIk lt 4 gt Wasis 4 SCCIk 5 i 4 SCData lt 0 gt Raus 31 SCData lt 1 gt IN eee 34 SCData lt 2 gt Pistia 33 SCData lt 3 gt Mss 35 SCData lt 4 gt D une 32 SCData lt 5 gt Mies 34 SCData lt 6 gt IN setius 33 SCData lt 7 gt Iv 35 SCData lt 8 gt Neia 31 SCData lt 9 gt Eiee 33 SCData lt 10 gt Missus 32 SCData lt 11 gt K uns 34 SCData lt 12 gt M 31 SCData lt 13 gt Poorer 35 SCData lt 14 gt Liars 32 SCData lt 15 gt eee 34 SCData lt 16 gt Iss 33 SCData lt 17 gt Herra 35 SCData lt 18 gt Keress 31 SCData lt 19 gt Ges 34 SCData lt 20 gt teres 32 SCData lt 21 gt ain 33 SCData lt 22 gt Jedes 31 SCData lt 23 gt xs 35 SCData lt 24 gt Fee 32 SCData lt 25 gt Tas 34 SCData lt 26 gt Gees 31 SCData lt 27 gt Beside 35 SCData lt 28 gt ees 32 SCData lt 29 gt Ds 34 SCData lt 30 gt Basses 31 SCData lt 31 gt een 32 SCData lt 32 gt AA wee 32 SCData lt 33 gt AB 35 SCData lt 34 gt AC Es 34 SCData lt 35 gt AB 32 SCData lt 36 gt AD 35 SCData lt 37 gt AC uus 33 SCData lt 38 gt AD 34 SCData lt 39 gt AC 31 SCData lt 40 gt AE 35 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Packaging 225 Table 13 3 cont Signal Location Signal Location Signal Loc
32. SysCmd T11 Encoding nsa use e eatenus 95 SysCmd 10 0 Address Cycle Encoding sss 95 SysCmd 10 0 Data Cycle Encoding sss 99 SysCcmda T EO Map pete etui m tier Pret teste tlds Pond ed 101 SYSAD 63 0 Encoding sssssssssssesseeeeeeenenenenenntnnet eene nnne nennen 102 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents xu SysAD 63 0 Address Cycle Encoding sse 102 SysAD 63 0 Data Cycle Encoding sssssssssseesseeennee 104 SysState 2 0 Encoding ettet tet edic e rni ied 104 SysResp 4 0 Encoding niet tete reet ttti e dena karte ein 105 Irit rtupts utineeth getan eto efte Bei iie set Pleite te d e nei ta TEn ipe 105 Hardware Interrupts ccccccccscsesseseesesseseseseecesesesesnsnsneseseeceeseseecenenesesesnsenssesesceeessesseananes 105 Software Interrupts 5 enne nn e e e ed eee 106 Amer Intertupltczos c ee SERRE RR Rae epe a si erp pe E ape ed 106 Nonmaskable Interrupts rae ee erea aaraa aE E eE a oaa EA aae Raae E Eais 106 Protocol Abbreviations a5 von tie eit reget e Ka ded dr eid rnit eset 107 System Interface Arbitration nete nennen nennen nennen 108 System Interface Arbitration Rules sss 109 Uniprocessor System hisa Gerris aiai a eanet Eae he SRE Siea nRT tenete nene nter nennen 110 Multiprocessor System Using Cluster Bus sss 111 System Interface Re
33. TBD TBD Symbol Parameter Minimum Maximum Units Conditions VOL Output low voltage Vee VecQ min VIH Input high voltage N A VIL Input low voltage x N A ILeak I O leakage current TBD Errata All the JTAG output drivers are push pull CMOS TTL compatible with Vcc core as the supply independent of VccO SC Sys All the TAG inputs require full CMOS swings as given by the DC specifications in the Table 12 3 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Electrical Specifications 215 12 2 AC Electrical Specification This section describes the following AC electrical characteristics of the R10000 processor e maximum operating conditions e test specification e secondary cache and system interface timing e enable output delay setup hold time e asynchronous inputs Maximum Operating Conditions The R10000 chip clamps signals that overshoot the DC limits established for input logic levels These limits are published as part of the fabrication process characterization The R10000 chip provides silicon diode clamps on all signal pins Test Specification HSTL test conditions are based on the JEDEC Standard conditions Secondary Cache and System Interface Timing Timing measurements are referenced from the mid swing point of the input signal to the crossing point of the SysCIk and SysClk input clocks All input signals maintain
34. To service this exception the contents of the Context or XContext register are used as a virtual address to fetch memory locations containing the physical page frame and access control bits for a pair of TLB entries The two entries are placed into the EntryLo0 EntryLo1 register the EntryHi and EntryLo registers are written into the TLB Itis possible that the virtual address used to obtain the physical address and access control information is on a page that is not resident in the TLB This condition is processed by allowing a TLB refill exception in the TLB refill handler This second exception goes to the common exception vector because the EXL bit of the Status register is set MIPS R10000 Microprocessor User s Manual CPU Exceptions 343 TLB Invalid Exception Cause Processing Servicing The TLB invalid exception occurs when a virtual address reference matches a TLB entry that is marked invalid TLB valid bit cleared This exception is not maskable The common exception vector is used for this exception The TLBL or TLBS code in the ExcCode field of the Cause register is set This indicates whether the instruction as shown by the EPC register and BD bit in the Cause register caused the miss by an instruction reference load operation or store operation When this exception occurs the BadVAddr Context XContext and EntryHi registers contain the virtual address that failed address translation The EntryHi register also
35. When the SCCorEn mode bit is negated the secondary cache operates in noncorrection mode Whenever the processor reads the secondary cache data array in noncorrection mode it checks for even parity on SCDataChk 9 Ifa parity error is detected it is assumed that a correctable error has occurred and the secondary cache block is again read through a data corrector During this re read the processor checks the SCDataChk 8 0 bus for the proper ECC If a correctable error is detected correction is automatically performed in line To inform the external agent that a correctable error had been detected and corrected the processor asserts SysCorErr for one SysClk cycle If an uncorrectable error is detected the secondary cache unit posts a Cache Error exception and initializes the D and SIdx fields in the local CacheErr register Secondary cache data array correctable errors are monitored with Performance Counter 0 Tag Array The 26 bit wide secondary cache tag array is protected by a 7 bit wide ECC Table 9 6 presents the ECC matrix for the secondary cache tag array Table 9 6 ECC Matrix for Secondary Cache Tag Array Check Bit 0 12 34 56 Data Bit 2p22 P2 1 5432 H0 P 1 n 8 f6 11 0 0100 1000 1000 0001 1111 1000 1000 1000 13 0 1000 0100 0100 0010 1111 1111 0000 0100 11 1 0010 1000 0001 1000 0000 1111 0100 0010 umber of ones per row column Whenever the processor reads the secondary cac
36. block aligned address An instruction or data access that misses in the secondary cache but targets an entry in the outgoing buffer is stalled until the outgoing buffer entry is issued as a processor block write request or coherency data response to the System interface bus External coherency requests probe the four typical outgoing buffer entries with the following results e If an external intervention request hits a typical entry that entry is converted from a processor block write request to a processor coherency data response e If an external invalidate request hits a typical outgoing buffer entry that entry is deleted If an external intervention request does not hit a typical outgoing buffer entry but hits a DirtyExclusive block in the secondary cache the special outgoing buffer entry is used to buffer the processor coherency data response A typical outgoing buffer entry containing a block write is ready for issue to the System interface bus when the first quadword is received from the secondary cache The processor allows data to stream from the secondary cache to the System interface bus through the outgoing buffer Errata An outgoing buffer entry containing a coherency data response is ready for issue to the System interface bus when the quad word specified by the corresponding external intervention request is received from the secondary cache The processor then allows the data to stream from the secondary cache
37. e Prefetch instruction is not included here The latency for multiplication and division depends upon the next instruction e An instruction using register Lo can be issued one cycle earlier than one using Hi e For floating point instructions CP1 branches are evaluated in the Graduation Unit e CTC1 and CFC are not included in this table e The repeat pattern for the CVT S W L is IIxxIIxx the repeat rate given here 2 is the average e The latency for MADD instructions is 2 cycles if the result is used as the operand specified by fr of the second MADD instruction e Load Linked and Store Conditional instructions LL LLD SC and SCD do not implicitly perform SYNC operations in the R10000 Any of the following events that occur between a Load Linked and a Store Conditional will cause the Store Conditional to fail an exception execution of an ERET a load a store a SYNC a CacheOp a prefetch or an external intervention invalidation on the block containing the linked address Instruction cache misses do not cause the Store Conditional to fail e Up to four branches can be evaluated at one cycle For more information about implementations of the LL SC and SYNC instructions please see the section titled R10000 Specific CPU Instructions in this chapter Only one branch can be decoded at any particular cycle Since each conditional branch is predicted the real direction of each branch must be evaluated
38. e replacement of a DirtyExclusive secondary cache block due to a load store or prefetch secondary cache miss e a CACHE Index WriteBack Invalidate S or Hit WriteBack Invalidate S instruction acompletely gathered uncached accelerated block As shown in Figure 6 12 the processor issues a processor block write request with a single address cycle followed by 8 or 16 data cycles The address cycle consists of the following e negating SysCmd 11 e driving the block write command on SysCmd 7 5 e driving the write cause indication on SysCmd 4 3 e driving the target indication on SysAD 63 60 e driving the physical address on SysAD 39 0 e asserting SysVal Errata If the processor block write request results from the writeback of a secondary cache block the Dirty Exclusive secondary cache block former state is driven on SysAD 2 1 the secondary cache block way is driven on SysAD 57 and SysCmd 0 is asserted If the processor block write request results from a completely gathered uncached accelerated block the uncached attribute is driven on SysAD 59 58 and SysCmd 0 is negated Each data cycle consists of the following e asserting SysCmd 11 driving the data quality indication on SysCmd 5 e driving the data type indication on SysCmd 4 3 e driving the data on SysAD 63 0 e asserting SysVal The first 7 or 15 data cycles have a request data type indication and the last data cycle has a reque
39. the SysCyc input should be permanently asserted Figure 6 29 depicts the use of SysCyc to create a virtual SysClk of one third the normal SysCIk frequency Cycle HE 3i4i5i6i7i 8 9 10 11 12 13 14 15 46 17 SysCIk ge egtatae stat ae ae et pt at CINE NSIC NEIN Master Po Py Po Po Poi Py Poi Po Py Poi 1 LEA T EAS EA SysReq l MEME pee MUN i MK UPS SysGnt i SysRel 1 1 3 ij ij L i i1 soman mmi icm a 1 ic immi SyscmdPar X ae oe LLLL EL SysAD 20 ee ic Em syao I EC M XA X ELEC LLL E SysRdRdy i i i i i i SysWrRdy SysState 2 0 l SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal SysCyc Virtual SysClk Version 2 0 of January 29 1997 Figure 6 29 Hardware Emulation Protocol MIPS R10000 Microprocessor User s Manual 7 Clock Signals The R10000 processor has differential PECL clock inputs SysClk and SysClk from which all processor internal clock signals and secondary cache clock signals are derived Three major clock domains are in the processor e the System interface clock domain which operates at the system clock frequency and controls the System interface signals e the internal processor clock domain which controls the processor core logic e the secondary cache clock domain which controls si
40. the load is prevented from graduating When it is ready to graduate the entire pipeline is flushed and the processor is restored to the state it had before the load was decoded An uncached or uncached accelerated load or store instruction is executed when the instruction is ready to graduate This guarantees strong ordering for uncached accesses Since the R10000 processor behaves as if it implemented strong ordering a suitable system design allows the processor to be used to create a shared memory multiprocessor system with strong ordering MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 16 Chapter 1 An Example of Strong Ordering Given that locations X and Y have no particular relationship that is they are not in the same cache block an example of strong ordering is as follows e Processor A performs a store to location X and later executes a load from location Y e Processor B performs a store to location Y and later executes a load from location X The two processors are running asynchronously and the order of the above two sequences is unknown For the system to be strongly ordered either processor A must load the new value of Y or processor B must load the new value of X or both processors A and B must load the new values of Y and X respectively under all conditions If processors A and B both load old values of Y and X respectively under any conditions the system is not strongly or
41. the processor asserts the corresponding incoming buffer Uncorrectable Error flag if the data quality indicator SysCmd 5 is asserted or if an uncorrectable ECC error is encountered on the system address data bus and the ECC check indication on SysCmd 0 is asserted When the processor forwards block data from an incoming buffer entry after receiving an external ACK completion response the associated incoming buffer Uncorrectable Error flags are checked and if any are asserted a single Cache Error exception is posted When the processor forwards double single partial word data from an incoming buffer entry after receiving an external ACK completion response the associated incoming buffer Uncorrectable Error flag is checked and if asserted a Bus Error exception is posted MIPS R10000 Microprocessor User s Manual System Interface Operations 91 Outgoing Buffer The System interface contains a five entry outgoing buffer to provide buffering for the following e DirtyExclusive blocks that are cast out of the secondary cache because of a block replacement e various CACHE instructions anexternal intervention request Four 32 word typical entries are associated with the four possible outstanding processor cached requests allowed by the processor One 32 word special entry is reserved for external intervention requests only The data is stored in each entry of the outgoing buffer in sequential order beginning with a secondary cache
42. which is usually fatal Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual CPU Exceptions 347 Integer Overflow Exception Cause An Integer Overflow exception occurs when an ADD ADDI SUB DADD DADDI or DSUB instruction results in a 2 s complement overflow This exception is not maskable Processing The common exception vector is used for this exception and the OV code in the Cause register is set The EPC register contains the address of the instruction that caused the exception unless the instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set Servicing The process executing at the time of the exception is handed a UNIX SIGFPE FPE INTOVF TRAP floating point exception integer overflow signal This error is usually fatal to the current process MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 348 Trap Exception Cause Processing Servicing Version 2 0 of January 29 1997 Chapter 17 The Trap exception occurs when a TGE TGEU TLT TLTU TEQ TNE TGEI TGEUI TLTI TLTUI TEOI or TNEI instruction results in a TRUE condition This exception is not maskable The common exception vector is used for this exception and the Tr code in the Cause register is set The EPC register contains the address of the instruction causing the exception
43. 0 e driving the data on SysAD 63 0 e asserting SysVal Figure 6 18 depicts a processor double single partial word read request and the corresponding external double single partial word data response 2 3i4 5i86i7 8 5 9 510 11 312 13 14 1 15 16 17 1 Py ioc EA Po Po 1 Po Po Po Po 1 Po Py 1808 L Figure 6 18 External Double Single Partial Word Data Response Protocol Version 2 0 of January 29 1997 130 Chapter 6 External Completion Response Protocol Version 2 0 of January 29 1997 An external agent issues an external completion response to provide an acknowledge error or negative acknowledge to an outstanding request and to free the associated request number An external agent issues an external completion response by driving the response on SysResp 4 0 and asserting SysRespVal for one SysClk cycle SysResp 4 2 contains the request number associated with the corresponding outstanding request and SysResp 1 0 contains an acknowledge error or negative acknowledge indication as described below The external agent issues an external ACK completion response for a processor read or upgrade request to indicate that the request was successful An external ACK completion response may only be issued for a processor read request if a corresponding external data response is coincidentally or previously issued The external agent issues an external ERR completion response for a processor read or upgrad
44. 1 in the Status register and bits 63 62 of the 64 bit virtual address are 00 selection of the xkuseg virtual address space is dependent upon the UX and ERL bits e if UX 1 and ERL 0 the entire space from 0x0000 0000 0000 0000 through 0000 OFFF FFFF FFFF 16 Tbytes is selected e If UX 0 or ERL 1 the address space 0x0000 0000 0000 0000 through 0000 0000 7FFF FFFF 2 Gbytes is selected Addressing the space ranging from 0000 0000 8000 0000 through 0000 OFFF FFFF FFFF will cause an address error Moreover if ERL 1 the selected 2 Gbyte address space becomes unmapped and uncached The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address 64 bit Kernel Mode Current Supervisor Space xksseg In Kernel mode when KX 1 in the Status register and bits 63 62 of the 64 bit virtual address are 019 selection of the xksseg virtual address space is dependent upon the SX bit e if SX 1 the entire space from 0x4000 0000 0000 0000 through 4000 OFFF FFFF FFFF 16 Tbytes is selected e If 5X 0 access to any address in the space ranging from 0x4000 0000 0000 0000 through 4000 OFFF FFFF FFFF causes an address error The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address 64 bit Kernel Mode Physical Spaces xkphys Version 2 0 of January 29 1997 In Kernel mode when KX 1 in the Status register and bits 63 62 of the 64 bit vir
45. 115 121 125 164 precise exceptions 15 prediction branch 374 prediction secondary cache way 63 prefetch instruction 25 primary data cache see also cache primary data 9 primary instruction cache see also cache primary instruction Probe TLB for Matching Entry instruction 285 processor block read request protocol 113 processor block write request 94 protocol 117 processor coherency data response 94 protocol 139 processor coherency state response protocol 138 processor double single partial word read request protocol processor double single partial word write request protocol processor eliminate request protocol 123 processor request 80 86 flow control protocol 125 protocol 112 processor response 80 87 protocols 137 Processor Revision Identifier PRId register 255 processor upgrade request 131 protocol 121 processor specific instructions 26 program order 13 dynamic execution 13 instruction completion 371 instruction decoding 371 instruction execution 371 instruction fetching 371 instruction graduation 371 instruction issue 371 protection ECC 173 memory 328 parity 173 SECDED 173 sparse encoding 173 protocol arbitration System interface 108 Version 2 0 of January 29 1997 Index error handling 185 write back 45 write invalidate cache coherency 45 PTE page table entry 241 PTEBase field 241 259 Q queue address 6 instruction 17 integer 6 R R region field 245 259 R bit 258
46. 15 This section describes the R10000 processor specific implementations of the following FPU instructions e CVT L fmt e moves and conditional moves e CFCI CTCICVT L fmt The CVT L fmt instruction has a slight change in the R10000 processor implementation The R4400 processor allows conversion from a single or a double up to a 53 bit long integer If the result is greater than 53 bits after the conversion an Unimplemented Operation exception is taken A back conversion from a 53 bit long integer to single double also takes an Unimplemented Operation exception In the R10000 the conversion allows only up to 51 bits otherwise an Unimplemented Operation exception is taken The back conversion from a 51 bit long integer to single double no longer takes an Unimplemented Operation exception MIPS R10000 Microprocessor User s Manual Floating Point Unit 313 Moves and Conditional Moves The only legal formats for the move and conditional move instructions are single and double precision The move instructions do not trap if their operands are either denormalized or NaNs which is consistent with the R4400 implementation Execution of floating point move and conditional move instructions do not affect the Cause field of the floating point Status register unless they take an Unimplemented Operation exception because an illegal formats was used The upper 32 bits of the destination registers are undefined in architecture for all the float
47. 5 VecQSC Kem 1 VccQSC uu 35 VccQSC Missus 3 VccQSC Mi 33 VecQSC P2 31 VecQSC Pade 5 VecQSC RA 2 VecQSC Tess 1 VccOSys Acus 16 VecOSys Ausus 20 VccOSys AB 5 VecQSys AD 3 VccOSys AF 1 VccOSys AH 3 VccOSys AE 5 VccOSys AK 3 VccOSys AG iss 22 VecQSys AM 1 VcecOSys AN 24 VccOSys AR 16 VecQSys AR 20 VecOSys AR 26 VecQSys T seien 35 VccOSys bem 1 VecQSys Voss 35 VrefSC AA wee 35 VrefSys por 33 Vss A ais 13 Vss JA uet asus 27 Vss AA acis 29 Vss V CEN 3 Vss Aveda de 33 Vss Asi diens 35 Vss AEA 7 Vss AA wee 3 Vss AA 33 Vss AC seus 1 Vss AC uuu 32 Vss AQ uu 35 Vss AQ 4 Vss AE 2 Vss AE 31 Vss AE 34 Vss AE 5 Vss AG 3 Vss A Eus 33 Vss AJ 1 Vss AJ 32 Vss AJ uu 35 Vss AJ 4 Vss AL 2 Vss AL 34 Vss AM 13 Vss AM 23 Vss AM 29 Vss AM 32 Vss AM 4 Vss AM 7 Vss AN 1 Vss AN 15 Vss AN 21 Vss AN 27 Vss AN 35 Vss AN 9 Vss AP auus 11 Vss AP Sens 17 Vss AP ies 19 Vss AP hisses 2 Vss AP 25 Vss AP ous 31 Vss AP au 34 Vss AP usus 5 Vss AR 1 Vss AR 13 Vss AR 23 Vss AR 29 Vss AR 3 Vss AR 33 Vss AR 35 Vss AR 7 Vss Bu 11 Vss Bisse 17 Vss Biss 19 Vss Buon 2 Vss Biss 31 Vss Biz ks 34 Vss Biss 5 Vss
48. 64 Bit Bs e age 9 1010 No No 32 1 0 10 0 0 srned No Yes 32 oL e eats eats eo oa ica ee Yes No 64 1 1 10 0 0 Yes Yes 64 0 01 0 0 S i n d No Yes 32 1 01 0 0 be a ae TINEO Yes Yes 64 0 00 0 0 ae aaa Yes Yes 32 EV eo ee ie C MM RE SM io Yes Yes 64 0 0 1 j Yes Yes 32 sulle sels 0 1 Exception Level Yes Yes 64 0 1 X Yes Yes 32 7 1 f 1 X Error Level Yes Yes 6A i No means the ISA is disabled Yes means the ISA is enabled Dashes are don t care Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Memory Management 317 Addressing Modes The processor s addressing mode determines whether it generates 32 bit or 64 bit memory addresses Refer to Table 16 1 for the following addressing mode encodings e In Kernel mode the KX bit allows 64 bit addressing all instructions are always valid e In Supervisor mode the SX bit allows 64 bit addressing and the MIPS III instructions MIPS IV ISA is enabled all the time in Supervisor mode e In User mode the UX bit allows 64 bit addressing and the MIPS III instructions the XX bit allows the new MIPS IV instructions 16 2 Virtual Address Space The processor uses either 32 bit or 64 bit address spaces depending on the operating and addressing modes set by the Status register Table 16 1 lists the decoding of these modes The processor uses the following addresses e virtual addre
49. Branch on Coprocessor 0 instructions 285 BRCH field 261 BRCV field 261 BRCW field 261 Breakpoint exception 350 BSIdx field 261 buffer cached request 89 cluster request 80 incoming 89 90 outgoing 89 91 uncached 89 92 bus SysAD 102 SysCmd 95 SysResp 105 SysState 104 Bus Error exception 346 busy bit table 372 bypass register JTAG 205 C C coherency attribute bit 230 cache 4 algorithms 53 and processor requests 57 cacheable coherent exclusive on write description of 54 cacheable coherent exclusive description of 54 cacheable noncoherent description of 54 fields encoding of 53 for kseg0 address space 53 for mapped address space 53 for xkphys address space 53 MIPS R10000 Microprocessor User s Manual Index uncached accelerated description of 55 uncached description of 54 where specified 53 associativity 45 block ownership 58 misses 25 nonblocking 23 25 ordering constraints 15 pages 328 primary 4 primary data 9 block size 48 changing states 49 description of 48 diagram state 50 error handling 175 index and tag 49 interleaving 32 refill 31 state diagram 50 states 49 subset of secondary cache 49 write back protocol 48 primary instruction 9 block size 46 description of 46 diagram state 47 error handling 174 error protection 174 index and tag 46 refill 31 state diagram 47 states 46 rules ownership of a cache block 58 secondary 4 associativity 10 51 block size 51 block state 67 block
50. DMTOO rt rd Description The contents of general register rt are loaded into coprocessor register rd of the CPO This operation is defined for the R10000 operating in 64 bit mode or in 32 bit kernel mode Execution of this instruction in 32 bit user or supervisor mode causes a reserved instruction exception All 64 bits of the coprocessor 0 register are written from the general register source The operation of DMTCO on a 32 bit coprocessor 0 register is undefined Because the state of the virtual address translation system may be altered by this instruction the operation of load instructions store instructions and TLB operations immediately prior to and after this instruction are undefined Operation T data GPR rt T 1 CPR O rd data Exceptions Coprocessor unusable exception R10000 in 32 bit user mode R10000 in 32 bit supervisor mode MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 292 Chapter 14 14 30 ERET Instruction E R ET Exception Return E R ET 31 26 25 24 65 0 COPO CO 0 ERET 010000 1 000 0000 0000 0000 0000 011000 6 1 19 6 Format ERET Description ERET is the R10000 instruction for returning from an interrupt exception or error trap Unlike a branch or jump instruction ERET does not execute the next instruction ERET must not itself be placed in a branch delay slot If the processor is servicing an error trap SR 1 then lo
51. Dedicated External Agents sss 35 Multiprocessor Systems Using a Cluster Bus sse eee 36 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents xi 3 Interface Signal Descriptions Power Interface Signals thee ne ad EE ee e E ER ERI a Rein odes 38 Secondary Cache Interface Signals seer oheen roesini R E E AEE eee 39 System Interface Sio nals raes ne E AE E A O E AE E AA A anO ES 41 Test Interface SignalSn roiie resa E EEEE EON eS PEE E EARSTE TEE T ENE N 43 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 xii Table of Contents 4 Cache Organization and Coherency Primary l struction Cache ep ten et e HR ae de ha eie AR 46 Primary Data Cache eem Rp eR Ie en ek Eee e a ihe 48 Secondary Caches eee dette RAIN ESNEARER 51 Cache Algorithtis ederet ertet eite te hee nre ede a tars 53 Descriptions of the Cache Algorithms sse 54 Uricached nisi nu ete e Oe etre itis er iste i ERR er Pep ke its Biden 54 Cacheable Noncoherent ie nir ce ipee ern e tre ted 54 Cacheable Coherent Exclusivess inete rinrin ean a eaae A E aS 54 Cacheable Coherent Exclusive on Write 54 Uncached Accelerated 5 iine dub eene bee derer 55 Relationship Between Cached and Uncached Operations 56 Cache Algorithms and Processor Requests sssssssssssssseneeeeenenneeenene 57 Cache Block Ownership
52. Double single partial word write request command Upgrade request command Eliminate request command Intervention shared request command Intervention exclusive request command Allocate request number command Invalidate request command Interrupt request command External coherency request command Request data Response data Request last Response last Empty SysCmd 11 0 and SysAD 63 0 are undefined SysAD 63 0 Abbreviations Adr Dat Dat n Physical address Unspecified data Doubleword n of a block SysState 2 0 Abbreviations State Ivd Shd ClnExc DrtExc MIPS R10000 Microprocessor User s Manual Unspecified state Invalid Shared Clean Exclusive DirtyExclusive Version 2 0 of January 29 1997 108 Chapter 6 SysResp 4 0 Abbreviations Rsp Unspecified completion response ACK Acknowledge completion response ERR Error completion response NACK Negative acknowledge completion response Master Abbreviations EA External agent Pn R10000 processor whose device number is n Dead cycle 6 16 System Interface Arbitration Version 2 0 of January 29 1997 The processor supports a simple System interface arbitration protocol which relies on an external arbiter This protocol is used in uniprocessor systems multiprocessor systems using dedicated external agents and multiprocessor systems using the cluster bus System interface arbitration is handled by the SysReq SysGnt and SysRel signals
53. ESA P B SEA noonnmnnpnssunnonnu nnonnnnnanonunoo v by lb db i b OoOo000000000000000po0000000000000000 2a OS ij oooocoooooooo0o000002gb 000200020000000000 9 OoOoo0000000000000ho0000000000000000 a OOds 2A BoOooooo0000000000mo00000000000000000 3 i googo ooooo a E i nooo f Goooe 4 A 00000 Ooooo BOA ooooo ooooo SSA E nonon Ooooo SSA NS ooooo ooooo n A 00000 00000 N N y 33A N 00000 00000 a oO cA 33A 00000 ooooo oO A oooo0 Ooooo 2 O 00000 00000 a O Oo o ooooo D0000 m Oe 00000 nonna A Oo o EEN EAI ooooo ooooo w gt S oooo00 Oooo 8v SSA SSA ooooo BDBBD fov x N 00000 ooooo ov on ooooo 00000 a SASDOPA SO PA 00000 Ooooo wv fo E nonon DBODO foy nonon nnonn Y DDDDO M J 95000 Ns Jd nonnn coooo Oooo 00bo0000000000go0000000000d000000 a S099 39A S SO92A 39 G gBgBBBBSHBBSUOBOSRRBHHRBBSSHBHBSRB v SSA SSA Ooooooboooo0000000po000000000q000000 w mi S R10000 nonnmnmpnunnonounn nnnnnonounnonnnnn ev M sOOo0ooo000000000po0000000000qo00008 4 ons an N an XVWOZ 0 H I 00 0 9L C R I XVNWOO0 6c e 9nls LoL 2 nx SCOMGS m GCO FOS LY Bf OTIS v Joro T 8 Figure 13 1 R10000 599CLGA Package Outline Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 224 Chapter 13 599CLGA Pinout Table 1
54. Idx Version 2 0 of January 29 1997 264 Chapter 14 14 20 Performance Counter Registers 25 Version 2 0 of January 29 1997 The R10000 processor defines two performance counters and two associated control registers which are mapped into CPO register 25 An encoding in the MTCO MFCO instructions on register 25 indicates which counter or control register is used Each counter is a 32 bit read write register and is incremented by one each time the countable event specified in its associated control register occurs Each counter can independently count one type of event at a time The counter asserts an interrupt IP 7 when its most significant bit bit 31 becomes one the counter overflows and the associated performance control register enables the interrupt The counting continues after counter overflow whether or not an interrupt is signalled The format of the control registers are shown in Figure 14 22 31 9 8 5 4 32 5 0 Do o Per E ful s xf ew 23 4 1 1 1 1 1 Figure 14 22 Control Register Format MIPS R10000 Microprocessor User s Manual Coprocessor 0 265 The fields of the Control register are e The Event field specifies the event to be counted listed in Table 14 18 Table 14 18 Counter Events Event Counter 0 Counter 1 0 Cycles Cycles Instructions issued Instructions graduated Load prefetch sync CacheOp issued Load prefetch sync CacheOp graduated Stores including
55. January 29 1997 142 External Coherency Request Action Chapter 6 Table 6 27 indicates the action taken for external coherency requests that target the processor Table 6 27 Action Taken for External Coherency Requests that Target the R10000 Processor Secondary Cache Block Former State Type of External Request Secondary Cache Block New State Processor Coherency State Response SysState 1 0 Processor Coherency Data Response Required Processor Coherency Data Response State SysCmd 2 1 Invalid Shared Intervention shared Intervention exclusive Invalidate Intervention shared Intervention exclusive Invalidate Invalid Invalid Invalid Shared Invalid Invalid ZZZ s ZZZ CleanExclusive DirtyExclusive t This should not occur under normal circumstances Intervention shared Intervention exclusive Invalidate Intervention shared Intervention exclusive Invalidate Shared Invalid Invalid Shared Invalid Invalid WWWINNNIRPRRI OOO The processor coherency data response must be written back to memory OO 0 0 O OJ0 0 ZZZ Shared DirtyExclusive N A t These actions are taken in cases where there are no internal coherency conflicts For exceptions due to internal coherency conflicts please refer to Table 6 28 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations
56. P EA EA EA EA EA EA EA EA EA EA EA arene Gru 5 Nc X X Ac X Al X qu cum mimm m m gt lt gt lt EO KOKU K Figure 6 21 External Allocate Request Number Request Protocol MIPS R10000 Microprocessor User s Manual System Interface Operations 135 External Invalidate Request Protocol An external agent issues an external invalidate request to invalidate a secondary cache block Anexternal agent issues an external invalidate request with a single address cycle This address cycle consists of the following e negating SysCmd 11 driving a request number on SysCmd 10 8 e driving the invalidate command on SysCmd 7 5 e driving the ECC check indication on SysCmd 0 e driving the target indication on SysAD 63 60 e driving the physical address on SysAD 39 0 e asserting SysVal An external agent may only issue an external invalidate request address cycle when the System interface is in slave state typically a free request number is specified An external agent may have as many as eight external invalidate requests outstanding on the System interface at any given time Figure 6 22 depicts three external invalidate requests Since the System interface is initially in master state the external agent must first negate SysGnt and then wait until the processor relinquishes mastership of the System interface by asserting SysRel Cycle SysClk Master SysRe
57. Previous MIPS processors had linear pipeline architectures an example of sucha linear pipeline is the R4400 superpipeline shown in Figure 1 2 In the R4400 superpipeline architecture an instruction is executed each cycle of the pipeline clock PCycle or each pipe stage af i 1 PCycle IF IS RF EX DF DS TC WB IF IS RF EX DF DS TC WB Instruction 2 IF IS RF EX DF DS TC WB Figure 1 2 R4400 Pipeline The structure of 4 way superscalar pipeline is shown in Figure 1 3 At each stage four instructions are handled in parallel Note that there is only one EX stage for IF instruction fetch ID instruction decode and dependency IS instruction issue EX execution 1 only WB write back integers Instruction 1 IF ID IS EX WB ee NNNM Instruction 2 IF ID IS EX WB ee NENM SeSsemm Instruction 3 IF ID IS EX WB ee NENMNMMMZMZZMMAMN Instruction 4 IF ID IS EX WB ee NENMNMMEMMZZMMZMAMN pu R H mE E LEH Instruction 8 IF ID IS EX WB T Figure 1 3 4 Way Superscalar Pipeline MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 Chapter 1 1 3 What is an R10000 Microprocessor The R10000 processor is a single chip superscalar RISC microprocessor that is a follow on to the MIPS RISC processor family that includes chronologically the R2000 R3000 R6000 R4400 and R8000 The R10000 processor uses
58. Protocol Errata Cycle SysClk Master SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Version 2 0 of January 29 1997 A processor coherency state response results from an external coherency request that targets the processor The processor issues a processor coherency state response by driving the secondary cache block tag quality indication on SysState 2 driving the secondary cache block former state on SysState 1 0 and asserting SysStateVal for one SysClk cycle The processor coherency state responses are issued in an order designated by the external coherency requests and will always be issued before an associated processor coherency data response Note that processor coherency state responses can be pipelined ahead of the associated processor coherency data responses and processor coherency data responses can be returned out of order These cases typically arise from external coherency requests hitting outgoing buffer entries Figure 6 24 depicts two external coherency requests and the resulting processor coherency state responses 1 2 3 74 15 6 75 8 9 10 11 12 13 14 15 16 17 MM Po Po Po Po Po EA EA EA EA JA EA A EA EA EA EA KimEx E j X Adr JC Adr X 7
59. R10000 System Interface System Interface Cluster Bus Gu a e a Eh es S Cluster Coordinator Directory Structure To Other System Resources Figure 2 3 Multiprocessor System Organization Using the Cluster Bus Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 3 Interface Signal Descriptions This chapter gives a list and description of the interface signals The R10000 interface signals may be divided into the following groups e Power interface e Secondary Cache interface e System interface e Test interface The following sections present a summary of the external interface signals for each of these groups An asterisk indicates signals that are asserted as a logical 0 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 199737 38 Chapter 3 3 1 Power Interface Signals Table 3 1 presents the R10000 processor power interface signals Table 3 1 Power Interface Signals Signal Name Description Type Vcc core is Vcc for the core circuits np VecQSC Vcc output driver secondary cache Vcc for the secondary cache interface output drivers Vcc output driver system Mosen Vcc for the System interface output drivers VrefSC Voltage reference secondary cache Voltage reference for the secondary cache interface input receivers VrefSvs Voltage reference system y Voltage reference for the System interface inp
60. RR REA 173 Sparse Encoding o edu eo reiner t ee e Ree HE retirer hain OR 173 BGGQ optet sedie e REID teo dd en ttr EE Un 173 Primary Instruction Cache Error Protection and Handling sss 174 BrrorxPr otecti n zc aste e REA om SN nit tas 174 Error Handlitg issue heroin edt Miele eddy tie ind ben er 174 Primary Data Cache Error Protection and Handling 175 ErrOnProtecHOn 53 5nd i toii n d ent tarta te i ete i fies salt des 175 Error Handlings eee ane e ee HORE i e SPARER ETIE EaP 175 Secondary Cache Error Protection and Handling sss 176 Error Protections ior eee et reg s ERO HH a EC E ac deb to CEU 176 Error Handling eon sree tede RA ERA iS 176 Dath Atrae ysu ended en eth agde fete rare rer oomen 176 Tag vri SEa 179 System Interface Error Protection and Handling esse 180 Error Protection nie eren En T ENES uh e ERPOO A sasssassdnasgaes sheds i eed 180 Error Handling 32 onenouisau une ee ne Ro nuda nadie eid tied 181 SysCmd 11 0 Bus 3 e ei tee eee prie re i eene 181 Sys AD 63 0 Dis neta eani voee Stan otters Rn dts 182 Sys5tate 2 0 Bus srein d eee Pete e e endet n reges 184 SysR sp 4 0 Bus eroe nte tie tete ieri ra ite tieniti ee ro 184 Protocol Observation 4 ied etui te Toth se te AS ee dig Ie Hg HI He red Ro dedere 185 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 xx Table of Contents 10 CACHE Ins
61. SCDataChk lt 4 gt SCDataChk lt 5 gt SCDataChk lt 6 gt SCDataChk lt 7 gt SCDataChk lt 8 gt SCDataChk lt 9 gt SCTCS SCTag lt 0 gt SCTag lt 1 gt SCTag lt 2 gt SCTag lt 3 gt SCTag lt 10 gt SCTag lt 11 gt SCTag lt 12 gt SCTag lt 13 gt SCTag lt 14 gt SCTag lt 15 gt MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 226 Chapter 13 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table 13 3 cont Signal Location Signal Location Signal Location SCTag lt 16 gt Tree 1 SCTag lt 17 gt Kostis 3 SCTag lt 18 gt Forss 2 SCTag lt 19 gt Kirepi 5 SCTag lt 20 gt Hips 1 SCTag lt 21 gt Joers 4 SCTag lt 22 gt Gines 2 SCTag lt 23 gt Jines 5 SCTag lt 24 gt Gissi 3 SCTag lt 25 gt Hes 4 SCTagChk lt 0 gt Voon 4 SCTagChk lt 1 gt Wisin 3 SCTagChk lt 2 gt Morres 2 SCTagChk lt 3 gt Varr 5 SCTagChk lt 4 gt V sss 1 SCTagChk lt 5 gt D ess 3 SCTagChk lt 6 gt jo 1 SCTOE Gus 5 SCTWay Jessie 3 SCTWr Boouvm 2 SCTagLSBAddr June 5 SeIDVCO Ese 21 Sparel Een 5 Spare3 Ue 4 SysAD lt 0 gt rins 34 SysAD 1 Wess 32 SysAD lt 2 gt Musis 35 SysAD lt 3 gt Musis 31 SysAD lt 4 gt Veenines 34 SysAD lt 5 gt BI mr 33 SysAD 6 Mene 32 SysAD lt 7 gt Us 32 SysAD 8 jun 35 SysAD lt 9 gt LE 33 SysAD 10 Jens 34 SysAD 11 isa 31 SysAD lt 12 gt Rises 35 SysAD 13 osos 32 SysAD lt 14 gt Riyas 34 SysAD lt 15 gt P
62. Special During the address cycle of processor read requests SysCmd 4 3 contain the read cause indication as shown in Table 6 5 This information is useful in handling the associated external response Table 6 5 Encoding of SysCmd 4 3 for Processor Read Requests SysCmd 4 3 Read Cause Indication 0 Instruction access 1 Data typical access 2 Data LL LLD access 3 Data prefetch access During the address cycle of processor write requests SysCmd 4 3 contain the write cause indication as shown in Table 6 6 This information is useful in handling the associated write data Table 6 6 Encoding of SysCmd 4 3 for Processor Write Requests SysCmd 4 3 Write Cause Indication 0 Reserved 1 Data typical access 2 Data uncached accelerated sequential access 3 Data uncached accelerated identical access MIPS R10000 Microprocessor User s Manual System Interface Operations 97 During the address cycle of processor upgrade requests SysCmd 4 3 contain the upgrade cause indication as shown in Table 6 7 This information useful in handling the associated external response Table 6 7 Encoding of SysCmd 4 3 for Processor Upgrade Requests SysCmd 4 3 Upgrade Cause Indication Reserved Data typical access 0 1 Z DwaSC SCDanes UU 3 Dwaprdekhaxem OOOO During the address cycle of processor special requests SysCmd 4 3 contain the processor special cause indication as shown in Tab
63. SysCmd 5 is asserted during an external data cycle for a processor read or upgrade request originated by the processor the R10000 asserts the corresponding incoming buffer uncorrectable error flag When the processor forwards block data from an incoming buffer entry after receiving an external ACK completion response the associated incoming buffer uncorrectable error flags are checked and if any are asserted the System interface unit posts a single Cache Error exception and initializes the EE D and SIdx fields in the local CacheErr register MIPS R10000 Microprocessor User s Manual Error Protection and Handling 183 When the processor forwards double single partial word data from an incoming buffer entry after receiving an external ACK completion response the associated incoming buffer uncorrectable error flag is checked and if asserted the System interface unit posts a Bus Error exception Table 9 8 presents the ECC matrix for the System interface address data bus This ECC matrix is identical to that used by the R4X00 System interface Table 9 8 ECC Matrix for System Interface Address Data Bus Check Bit 43 52 70 61 Data Bit Number of ones per row umber of ones per column MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 184 SysState 2 0 Bus SysResp 4 0 Bus Errata Version 2 0 of January 29 1997 Chapter 9 The 3 bit wide system state bus SysState 2 0
64. SysRel lMIK gt SysReq SysReq1 R10000 SysGnt BM SysGntt SysRel IK gt External Agent SysReq SysReq2 R100005 SysGnt i SysGn2 SysRel MI SysReq SysReq3 R100003 SysGnt Bll SysGnt3 SysRel MI V pe SysRel MIPS R10000 Microprocessor User s Manual Figure 6 8 Arbitration Signals for Multiprocessor System Using the Cluster Bus Figure 6 9 is an example of the System interface arbitration in a four processor system using the cluster bus Cycle 4 1 1 9 5 4 1 8 8 1 2 8 2 971 30 11 32 33 14 15 16 17 SysCIk NST NET INET tcl ole Ne NT SENS cl Master i EA i EA i EA EA Po i Po EAS Pa Po EA i EA EA EA SysReq0 IA i 7 SysGnt0 i i SysReq1 IA SysGnt1 E r T SysReq2 SysGnt2 SysReq3 SysGnt3 l SysRel i SysCmd 11 0 SysVal Figure 6 9 Arbitration Protocol for Multiprocessor System Using the Cluster Bus Version 2 0 of January 29 1997 112 Chapter 6 6 17 System Interface Request and Response Protocol The following sections detail the System interface request and response protocol A 32 word secondary cache block size is assumed in the examples below Processor Request Protocol Version 2 0 of January 29 1997 A processor request is generated when the R10000 processor requires a system resource
65. TLB Entry TLBWR Write Random TLB Entry The processor detects most of the pipeline hazards in hardware including CPO hazards and load hazards No NOP instructions are required to correct instruction sequences On the R4400 processor CacheOps that hit in the specified cache set the CH bit in the Diagnostic field of the CPO Status register bit 18 Though it was undocumented this bit could be tested by the Branch on Coprocessor 0 instructions bcOt bcOf bcOtL bcOfT The R10000 processor also implements the CH bit but it is not associated with a Coprocessor 0 condition Instead execution of a branch on Coprocessor 0 instruction takes a Reserved Instruction exception MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 286 Chapter 14 14 26 CPO Move Instructions The R10000 processor implements Coprocessor 0 move instructions MTCO MFCO DMTCO and DMFCO exactly the same as in the R4400 processor even though some operations are undefined during certain conditions The exact operations of CPO move instructions on 32 64 bit CPO registers are summarized Table 14 26 Table 14 26 CPO Move Instructions Instruction GEQ i pus i T Operation MECO rt rd 32 or 64 Don t care rt lt rd3 Il rd44 9 MTCo rtrd 32 Don t care rd lt rt41 9 32 Yes undefined rt lt 0 7Il rd4 o 32 Yes undefined rd lt rt3 o DMTCO rtrd 64 Yes rd lt rtg3 o The returned value of MFC0 DMFCO
66. Table 9 2 Primary Instruction Cache Array Error Protection Array Width Error Protection Tag Address 27 bit Even parity Tag State 1 bit Even parity Data 36 bit Even parity LRU 1 bit None All primary instruction cache errors are uncorrectable If an error is detected the instruction cache unit posts a Cache Error exception and initializes the D TA TS and PIdx fields in the local CacheErr register see Chapter 14 CacheErr Register 27 for more information If an error is detected on the tag address or state array the processor informs the external agent that an uncorrectable tag error was detected by asserting SysUncErr for one SysClk cycle MIPS R10000 Microprocessor User s Manual Error Protection and Handling 175 9 10 Primary Data Cache Error Protection and Handling This section describes error protection and error handling schemes for the primary data cache Error Protection The primary data cache arrays have the following error protection schemes as listed in Table 9 3 Table 9 3 Primary Data Cache Array Error Protection Array Width Error Protection Tag Address 28 bit Even parity Tag State 3 bit Even parity Tag Mod 3 bit Sparse encoding Data 8 bit Even parity LRU 1 bit None Error Handling All primary data cache errors are uncorrectable If an error is detected the data cache unit posts a Cache Error exception and initializes the EE D TA TS TM and Pldx fiel
67. The Watch exception must then be reenabled The faulting instruction can be executed either by interpretation or by setting breakpoints An MTCO to the WatchLo register clears a delayed Watch exception MIPS R10000 Microprocessor User s Manual CPU Exceptions 355 Interrupt Exception Cause The Interrupt exception occurs when one of the eight interrupt conditions is asserted The significance of these interrupts is dependent upon the specific system implementation Each of the eight interrupts can be masked by clearing the corresponding bit in the Interrupt Mask IM field of the Status register and all of the eight interrupts can be masked at once by clearing the IE bit of the Status register Processing The common exception vector is used for this exception and the Int code in the Cause register is set The IP field of the Cause register indicates current interrupt requests It is possible that more than one of the bits can be simultaneously set or even no bits may be set if the interrupt is asserted and then deasserted before this register is read On Cold Reset an R4400 processor can be configured with IP 7 either as a sixth external interrupt or as an internal interrupt set when the Count register equals the Compare register There is no such option on the R10000 processor IP 7 is always an internal interrupt that is set when one of the following occurs e the Count register is equal to the Compare register
68. X X IL PTET TERE oe es rf e Rr co a Figure 6 24 Processor Coherency State Response Protocol MIPS R10000 Microprocessor User s Manual System Interface Operations 139 Processor Coherency Data Response Protocol A processor coherency data response results from an external intervention request that targets the processor and hits a DirtyExclusive secondary cache block The processor issues a processor coherency data response with a single empty cycle followed by either 8 or 16 data cycles The empty cycle consists of negating SysVal for a single SysClk cycle The data cycles consist of the following e asserting SysCmd 11 e driving the request number associated with the corresponding external coherency request on SysCmd 10 8 e driving the data quality indication on SysCmd 5 e driving the data type indication on SysCmd 4 3 driving the state of the cache block on SysCmd 2 1 e asserting SysCmd 0 e driving the data on SysAD 63 0 e asserting SysVal The first 7 or 15 data cycles have a response data type indication and the last data cycle has a response last data indication The processor may negate SysVal between data cycles of a processor coherency data response only if the SCCIK frequency is less than half of the SysClk frequency The processor may only issue a processor coherency data response when the System interface is in master state and SysWrRdy was asserted two SysClk cycles previou
69. a 0 op 1 2 2 2 9 1 2 Figure 14 21 Diagnostic Register Format MIPS R10000 Microprocessor User s Manual Coprocessor 0 Errata 263 There are two ways to read the branch prediction state from the Branch Prediction Table BPT Place an mfcO rx CO_Diag a Move From Diagnostic register to GPR rx in the delay slot of the conditional branch This read of the Diagnostic register returns the next predicted state from the branch stacks before the BPT is updated Move the Index and the BPT read operation into the Idx and BPOp field MIPS R10000 Microprocessor User s Manual of the Diagnostic register This mtc0 into CPO_Diag graduates as soon as the write is completed however there could be a significant delay in transferring the data from BPT to CPO Diag This delay occurs because CO Diag has a lower priority to access the BPT as compared to the accesses by IFETCH and other processes Thus the prediction state read from the CO Diag may not reflect the content of the BPT Use the code sequence shown below to get the correct prediction state from the BPT li rx rx has index and BPT read for Idx and BPOp respectively mtcO rx CO Diag Set the Diagnostic register for reading the BPT la ry label ry r31 la could be replaced by a dla for 64 bits jt ry This gives priority for CO Diag to access BPT mfcO rz CO Diag rz holds the state from BPT entry pointed by
70. and in the User mode when the RE bit is 0 Setting the RE bit to 1 inverts the User mode endianness The 9 bit Diagnostic Status DS field is used for self testing and checks the cache and virtual memory system This field is described in Table 14 11 and Figure 14 12 The 8 bit Interrupt Mask IM field controls the enabling of eight interrupt conditions Interrupts must be enabled before they can be asserted and the corresponding bits are set in both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause register The processor mode is undefined if the KSU field is set to 3 115 The R10000 processor implements this as User mode Status Register 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 N I o gt gt 12 3 i m x XX Biles ra RP FR RE 0 0 Hu TS SRNMI CH CE DE IM 8 bits KX SX UX KSU m MS IE V A Coprocessor Usable Diagnostic Status Fields MIPS R10000 Microprocessor User s Manual Figure 14 11 Status Register Version 2 0 of January 29 1997 248 Status Register Fields Chapter 14 Table 14 10 describes the Status register fields Table 14 10 Status Register Fields Field XX Description Enables execution of MIPS IV instructions in User mode 1 MIPS IV instructions usable 0 MIPS IV instructions unusable CU Controls the usability of each of the four coprocessor unit numbers CPO is alwa
71. are inevitable the data reference should be scheduled as early as possible so that the data can be fetched in parallel with other unrelated operations As a further antidote to cache miss stalls the R10000 processor supports prefetch instructions which serve as hints to the processor to move data from memory into the secondary and primary caches when possible Because prefetches do not cause dependency stalls or memory management exceptions they can be scheduled as soon as the data address can be computed without affecting exception semantics Indiscriminate use of prefetch instructions can slow program execution because of the instruction issue overhead but selective use of prefetches based on compiler miss prediction can yield significant performance improvement for dense matrix computations t In compiler parlance block a dominates block b if and only if every time block b is executed block a is executed first Note that block a does not have to immediately precede block b in execution order some other block may intervene MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 26 Chapter 1 1 8 R10000 Specific CPU Instructions PREF Version 2 0 of January 29 1997 This section describes the processor specific implementations of the following instructions e PREF e LL SC e SYNC Chapter 14 the section titled CPO Instructions describes the CPO specific instructions and C
72. array uncorrectable error is encountered during the secondary cache interrogation the processor asserts the data quality indication on SysCmd 5 for all doublewords of the corresponding processor coherency data response MIPS R10000 Microprocessor User s Manual Error Protection and Handling 171 9 4 Cache Error Exception The processor indicates an uncorrectable error has occurred by asserting a Cache Error exception The following four internal units detect and report uncorrectable errors e instruction cache e data cache e secondary cache e System interface Each of these four units maintains a unique local CacheErr register A Cache Error exception is imprecise that is it is not associated with a particular instruction When any of the four units post a Cache Error exception completed instructions are graduated before the Cache Error exception is taken If there are Cache Error exceptions posted from more than one of the units the exceptions are prioritized in the following order l instruction cache 2 datacache 3 secondary cache 4 System interface The corresponding local CacheErr register is transferred to the CPO CacheErr register and the CPO Status register ERL bit is asserted Instruction fetching begins from 0xa0000100 or Oxbfc00300 depending on the CPO Status register BEV bit The CPO ErrorEPC register is loaded with the virtual address of the next instruction that has not been graduated so that execution can
73. asserted two SysClk cycles previously the maximum number of outstanding processor requests specified by the PrcReqMax mode bits is not exceeded there is a free request number A single processor may have a maximum of one processor double single partial word read request outstanding on the System interface at any given time MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 116 Cycle 1 2 Chapter 6 Figure 6 11 depicts a processor double single partial word read request Since the System interface is initially in slave state the processor must first assert SysReq and then wait until the external agent gives up mastership of the System interface by asserting SysGnt and SysRel 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 SysCIk Master SysReq SysGnt BSEC SIUS EA EA EA EA N 7 N SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Figure 6 11 Processor Double Single Partial Word Read Request Protocol Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 117 Processor Block Write Request Protocol A processor block write request results from the following
74. bits 310 flag bits 310 inexact result 310 invalid operation 310 load exceptions 311 loading the FSR 311 overflow 310 RM round to minus infinity 311 RN round to nearest representable value 311 RP round to plus infinity 311 RZ round toward zero 311 underflow 310 unimplemented operation 310 functional unit 9 branch 10 Version 2 0 of January 29 1997 I 8 floating point adder 9 floating point multiplier 9 instruction decode and rename 10 integer ALU 9 iterative 9 Load Store Unit 9 G G Global bit in TLB 240 330 gathering data in identical mode 92 gathering data in sequential mode 92 global processes G bit in TLB 330 graduation definition of 373 of an instruction 371 Grant parking 108 H hardware emulation support for 154 hardware interrupts 105 hazards CPO 285 Hit Writeback Invalidate CACHE instruction 199 hold times AC electrical 216 I I O signals DC characteristics 214 I O support for 152 IC instruction cache size field 256 IE interrupt enable bit 249 IE bit 265 IM interrupt mask field 247 implementation number R10000 processor 255 incoming buffer 89 90 Index Hit Invalidate CACHE instruction 197 Index Invalidate CACHE instruction 192 Index Load Data CACHE instruction 201 Index Load Tag CACHE instruction 194 195 197 198 199 201 202 Index Load Tag instruction 72 Index register 237 Index Store Data CACHE instruction 74 202 Index Store Tag CACHE instructio
75. cont Status Register Fields Mode bits 11 Undefined implemented as User mode KSU 10 User 01 Supervisor 00 Kernel Error Level set by the processor when Reset Soft Reset NMI or Cache Error exception are taken 0 normal 1 error ERL Exception Level set by the processor when any exception other than Reset Soft Reset NMI or Cache Error exception EXL are taken 0 normal 1 gt exception Interrupt Enable IE 0 disable all interrupts 1 enables all interrupts Diagnostic Status Field The 9 bit Diagnostic Status DS field is used for self testing and checks the cache and virtual memory system This field is described in Table 14 11 and shown Figure 14 12 Some of the important DS fields include e Inthe R4400 the TS bit of the diagnostic field indicates a TLB shutdown has occurred due to matching of multiple virtual page entries during address translation In the R10000 processor the TS bit indicates a TLB write has introduced an entry that would allow matching of more than one virtual page entry during translation In this case the TLB entries that allow the multiple matches even in the Wired area are invalidated before the new TLB entry is written This prevents multiple matches during address translation The TS bit is updated for each TLB write It can also be read and written by software in the R4400 the TS bit is read only to clear the TS bit one needs to write
76. content Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocess Errata or 0 279 Primary Instruction Cache Operation If the CacheOp is an Index Load Store Tag for a primary instruction cache operation the fields of the TagHi and TagLo registers are defined as follows PTag0 contains physical address bits 35 12 stored in the cache tag PState contains the primary instruction cache state for the line as follows 1 Valid 0 Invalid LRU indicates which way is the least recently used of the set SP state even parity bit for the PState field TP tag even parity bit PTag1 contains physical address bits 39 36 stored in the cache tag Figure 14 28 shows the fields of the TagHi and TagLo registers 31 8 7 6 5 4 3 2 1 0 PTag0 o PState 0 j LRU SP 0 TP M TagLo 24 1 1 2 1 d 4 1 81 43 p 0 PTag1 TagHi 28 4 Figure 14 28 TagHi Lo Register Fields in Primary Instruction Cache When CacheOp is Index Load Store Tag 0 Reserved Must be written as zeroes and returns zeroes when read Primary Data Cache Operation If the CacheOp is an Index Load Store Tag for primary data cache operations the fields of the TagHi and TagLo registers are defined as follows State Modifier holds the status of the line as follows 0015 neither refilled or written 010 this line may have been written and inconsistent from the secondary cache W bit 100 this line is bei
77. describes error handling on the system command bus system address data bus system state bus and system response bus The 12 bit wide system command bus SysCmd 11 0 is protected by odd parity Whenever the processor is in master state and it asserts SysVal to indicate that it is driving valid information on the SysCmd 11 0 bus it also drives odd parity on the SysCmdPar signal Whenever the processor is in slave state and an external agent asserts SysVal to indicate that it is driving valid information on the SysCmd 11 0 bus the processor checks the SysCmdPar signal for odd parity If a parity error is detected the processor ignores the SysCmd 11 0 and SysAD 63 0 buses for one SysCIk cycle The System interface unit posts a Cache Error exception and sets the SC bit in the local CacheErr register Additionally the processor informs the external agent by asserting SysUncErr for one SysClk cycle Caution By ignoring the SysCmd 11 0 and SysAD 63 0 buses the processor to become unsynchronized with other processors or the external agent on the cluster bus MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 182 SysAD 63 0 Bus Chapter 9 The 64 bit wide system address data bus SysAD 63 0 is protected by an 8 bit wide ECC Processor in Master State Whenever the processor is in master state and it asserts SysVal to indicate it is driving valid information on the SysAD 63 0 bus i
78. due to changes in CacheOps and cache architecture R10000 formats depend on the type of CacheOp executed and the cache to which it is applied The reserved fields are read as zeroes after executing an Index Load Tag or an Index Load Data CacheOp and ignored when executing an Index Store Tag or an Index Store Data CacheOp To ensure NT kernel compatibility the TagLo register is implemented as a 32 bit read write register The value written by an MICO instruction can be retrieved by a MFCO instruction unless an intervening CACHE instruction has modified the content This section gives the TagLo and TagHi register formats for the following CacheOp and cache combinations e CacheOp is Index Load Store Tag primary instruction cache operation primary data cache operation secondary cache operation e CacheOp is Index Load Store Data primary instruction cache operation primary data cache operation secondary cache operation CacheOp is Index Load Store Tag This section describes the three states of the TagLo and TagHi registers when the CacheOp is an Index Load Store Tag for the following operations e primary instruction cache operation e primary data cache operation e secondary cache operation To ensure NT kernel compatibility the TagLo register is implemented as a 32 bit read write register The value written by a MTCO instruction can be retrieved by a MFCO instruction unless intervening CACHE instructions modify the
79. e driving the read data on SysAD 39 0 e asserting SysVal for one SysClk cycle In the following SysClk cycle the processor reverts to Slave state Auto increment reads have a repeat rate of 17 PCIK cycles Figure 18 6 depicts two cache test mode auto increment reads 3 121739 475376171879 40 11 42 19 14 15 18 17 CRANE NENTS NINE Neil NONE Rad Ne A EA EA EA Py EA I i Ka 5 3 EE NEC ET rcp Gn qu ED MC CUEES s E REE Figure 18 6 Cache Test Mode Auto Increment Read Protocol MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 368 Chapter 18 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual A Glossary The following terms are defined in this Glossary MIPS R10000 Microprocessor User s Manual superscalar processor pipeline pipeline latency pipeline repeat rate out of order execution dynamic scheduling instruction fetch decode issue execution completion and graduation active list free list and busy registers register renaming and unnaming nonblocking loads and stores speculative branching logical and physical registers register files ANDES architecture Version 2 0 of January 29 1997 A 369 A 370 Appendix A A 1 Superscalar Processor A 2 Pipeline A 3 Pipeline Latency A superscalar processor is one that can fetch execute and complete more than one instruction in pa
80. either one of the two performance counters overflows Software needs to poll each source to determine the cause of the interrupt which could come from more than one source at a time For instance writing a value to the Compare register clears the timer interrupt but it may not clear IP 7 if one of the performance counters is simultaneously overflowing Performance counter interrupts can be disabled individually without affecting the timer interrupt but there is no way to disable the timer interrupt without disabling the performance counter interrupt Servicing If the interrupt is caused by one of the two software generated exceptions described in Chapter 6 the section titled Software Interrupts the interrupt condition is cleared by setting the corresponding Cause register bit IP 1 0 to 0 Software interrupts are imprecise Once the software interrupt is enabled program execution may continue for several instructions before the exception is taken Timer interrupts are cleared by writing to the Compare register The Performance Counter interrupt is cleared by writing a 0 to bit 31 the overflow bit of the counter Cold Reset and Soft Reset exceptions clear all the outstanding external interrupt requests IP 2 to IP 6 If the interrupt is hardware generated the interrupt condition is cleared by correcting the condition causing the interrupt pin to be asserted MIPS R10000 Microprocessor User s Manual Version 2 0 of January
81. for branch prediction algorithm control 005 2 bit counter scheme 015 all conditional branches are predicted not taken 105 all conditional branches are predicted taken 115 forward conditional branches are predicted not taken and backward conditional branches are predicted taken The default mode is 00 on processor reset MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 262 Errata Version 2 0 of January 29 1997 Chapter 14 BPState this field contains the new 2 bit state for a conditional branch after it is verified It is also used to hold the 2 bit state to read write when a branch prediction table read write operation is executed BPIdx this field contains the index to the Branch Prediction Table BPT for BPT read write initialization operations and should contain VA 11 3 of the branch for BPT read write operations The upper six bits of the BPIdx field contain the line address for BPT line initialization operations the lower three bits of BPIdx are ignored BPOp this field indicates the following BPT operations 005 BPT read 015 BPT write 105 initializes BPT line to all zeroes strongly not taken 11 initializes BPT line to all ones strongly taken 0 Reserved Must be written as zeroes and returns zeroes when read Figure 14 21 shows the format of the Diagnostic register 63 52 51 48 47 32 12 4 16 17 16 15 14 13 1211 3 2 1 0 us BG up 0 B bet ods Sus OT
82. in the R4400 The XContext register is for use with the XTLB refill handler which loads TLB entries for references to a 64 bit address space and is included solely for operating system use The operating system sets the PTE base field in the register as needed Normally the operating system uses the Context register to address the current page map which resides in the kernel mapped segment kseg3 Figure 14 19 shows the format of the XContext register Table 14 17 describes the XContext register fields XContext Register 63 37 36 35 34 4 3 0 PTEBase R BadVPN2 0 27 2 31 4 Figure 14 19 XContext Register Format The 31 bit BadVPN2 field holds bits 43 13 of the virtual address that caused the TLB miss bit 12 is excluded because a single TLB entry maps to an even odd page pair For a 4 Kbyte page size this format may be used directly to address the pair table of 8 byte PTEs For other page and PTE sizes shifting and masking this value produces the appropriate address Errata The 0 field in Table 14 17 is revised Table 14 17 XContext Register Fields The Bad Virtual Page Number 2 field is written by hardware on a miss It contains the VPN of the PRU most recent invalidly translated virtual address The Region field contains bits 63 62 of the virtual address R 00 user 01 supervisor 11 kernel 0 Reserved Must be written as zeroes and returns zeroes when read The Page Table Entry Base r
83. index a byte within a selected way or 19 bits to index a quadword within a way Consequently the processor supplies PA 22 4 on SC A B Addr 18 0 to index a quadword within a way The processor selects a secondary cache data way with the SC A B DWay signal Table 5 1 presents the secondary cache data array index for each secondary cache size for instance a 4 Mbyte cache uses the 17 address bits PA 20 4 on SC A B Addr 16 0 concatenated with the way bit SC A B DWay to index a quadword within a 2 Mbyte way Table 5 1 Secondary Cache Data Array Index SCSize Seconda Physical Mode M Secondary Cache Data Array Index Address Bits Cache Size Bits Used 512 Kbyte SC A B DWay SC A B Addr 13 0 PA 17 4 SC A B DWay SC A B Addr 14 0 PA 18 4 SC A B DWay SC A B Addr 15 0 PA 19 4 SC A B DWay SC A B Addr 16 0 PA 20 4 SC A B DWay SC A B Addr 17 0 PA 21 4 SC A B DWay SC A B Addr 18 0 PA 22 4 MIPS R10000 Microprocessor User s Manual Secondary Cache Interface 63 Indexing the Tag Array The processor supplies the secondary cache tag array s least significant index bit on SCTagLSBA ddr to support two block sizes without system hardware changes This signal functions normally as a least significant index bit when the secondary cache block size is 16 words However when the secondary cache block size is 32 words this signal is always negated since only half as many tags are req
84. individually enable when set to a 1 or disable when set to a 0 exceptions when the corresponding Cause bit is set Flag Bits 6 2 One of the five Flag bits is set when a floating point arithmetic instruction graduates if the corresponding Cause bit is set The Flag bits are sticky and remain set until the FSR is written Thus the Flag bits indicate the status of all floating point instructions graduated since the FSR was last written The Flag bits are not modified for any instructions which cause an exception to be taken Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Floating Point Unit 311 Round Mode 1 0 RM bits select one of the four IEEE rounding modes Most floating point results cannot be precisely represented by the 32 bit or 64 bit register formats and must be truncated and rounded to a representable value The modes selected by the RM bit values are 0 RN round to nearest representable value If two values are equally near set the lowest bit to zero 1 RZ round toward zero Round to the closest value whose magnitude is not greater than the result 2 RP round to plus infinity Round to the closest value whose magnitude is not less than the result 3 RM round to minus infinity Round to the closest value whose magnitude is not greater The Round and Enable bits only change when the FSR is written by a CTC1 Move To Coprocessor 1 Control Register instruction Each CTC1 instruction
85. internal processor clock domain 156 Interrupt exception 355 interrupt mask bit 244 Interrupt register 105 interrupt request external 105 interrupts 105 hardware 105 nonmaskable 106 software 106 timer 106 invalid operation FP 310 invalidate request external 135 invalidation and CACHE instructions 190 IP interrupt pending bit 252 2773 ISA Instruction Set Architecture MIPSI2 MIPS II 2 MIPS III 2 MIPS IV 2 303 issue dynamic 371 issuing an instruction 371 iterative execution units 375 ITLB instruction TLB 330 ITLBM field 261 J JTAG MIPS R10000 Microprocessor User s Manual 1 9 boundary scan register 206 bypass register 205 Capture DR state 206 instruction register 205 interface 203 instruction register 205 JTCK signal 204 JTDI signal 204 JTDO signal 204 JTMS signal 204 Tap controller 204 test access port 204 Shift DR state 205 206 signals 43 Update DR state 206 Update IR state 205 JTCK signal 43 204 213 JTDL signal 43 204 205 213 JTDO signal 43 204 205 JTLB joint TLB 330 JTMS signal 43 204 213 K KO field 256 Kernel mode 316 address mapping 322 ckseg0 space 326 ckseg1 space 326 ckseg3 space 326 cksseg space 326 kseg0 space 323 kseg1 space 323 kseg3 space 323 ksseg space 323 kuseg space 323 operations 322 xkphys space 324 xkseg space 326 xksseg space 324 xkuseg space 324 kseg0 space 323 Kseg0CA mode bits 164 kseg1 space 323 kseg3 space
86. kept every time a prediction is made allowing the R10000 processor to recover from a mispredicted branch in a single cycle Simulations have shown the R10000 branch prediction algorithm to be over 90 accurate MIPS R10000 Microprocessor User s Manual A 375 A 13 Logical and Physical Registers Register renaming described above distinguishes between logical registers which are referenced within instruction fields and physical registers which are actually located in the hardware register file The programmer is only aware of logical registers the implementation of physical registers is entirely transparent Logical register numbers are dynamically mapped onto physical register numbers This mapping uses mapping tables which are updated after each instruction is decoded each new result is written into a new physical register This value is temporary and the previous contents of each logical register can be restored if its instruction must be aborted following an exception or a mispredicted branch Register renaming simplifies dependency checks Logical register numbers can be ambiguous when instructions are executed out of order since a succession of different values may be assigned to the same register But physical register numbers uniquely identify each result making dependency checking unambiguous The queues and execution units use physical register numbers Integer and floating point registers are implemented with separ
87. mapped to physical registers Integer and floating point registers are renamed independently A logical register is mapped to a new physical register whenever that logical register is the destination of an instruction Thus when an instruction places a new value in a logical register that logical register is renamed mapped to a new physical register while its previous value is retained in the old physical register As each instruction is renamed its logical register numbers are compared to determine if any dependencies exist between the four instructions decoded during this cycle After the physical register numbers become known the Physical Register Busy table indicates whether or not each operand is valid The renamed instructions are loaded into integer or floating point instruction queues Only one branch instruction can be executed during stage 2 If the instruction register contains a second branch instruction this branch is not decoded until the next cycle The branch unit determines the next address for the Program Counter if a branch is taken and then reversed the branch resume cache provides the instructions to be decoded during the next cycle The processor checks only the instruction cache during an instruction fetch it does not check the data cache MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 18 Chapter 1 Stage 3 In stage 3 decoded instructions are written into the queues
88. must be used to ensure that no other branches speculate into the protected block However if a block that dominates the fall through block can be shown to be protected this may be sufficient Thus if block a dominates block b and block b is the fall through block shown above and block a is the immediately previous block in the program i e only the single conditional branch that is being replaced intervenes between a and b then ensuring that a is protected by serial instruction means a branch likely can safely be used as protection for b Nonblocking Caches As processor speed increases the processor s data latency and bandwidth requirements rise more rapidly than the latency and bandwidth of cost effective main memory systems The memory hierarchy of the R10000 processor tries to minimize this effect by using large set associative caches and higher bandwidth cache refills to reduce the cost of loads stores and instruction fetches Unlike the R4400 the R10000 processor does not stall on data cache misses instead defers execution of any dependent instructions until the data has been returned and continues to execute independent instructions including other memory operations that may miss in the cache Although the R10000 allows a number of outstanding primary and secondary cache misses compilers should organize code and data to reduce cache misses When cache misses
89. n ee p rete vot Per P re E ns 297 TEBRInsttUuCHO ci ii dele li essen reo reet e Eee de eee E i pene teres 298 T BWILITnstr ction 522 iere natn ioter een Febr teet Thee ere etos ach eaae ei eel Ge en ee E 299 TEBWRJInSit ctor he e e n etn 300 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 xxvi Table of Contents 15 Floating Point Unit Floating Point Unit Operations sss nennen entente nenne 302 Floating Point Unit Control n dee pete i tet ei ee m eet ee ine e dde 303 Floating Point General Registers FGRs ssssssssseseeeeeeneeene nennen 303 32 and 64 Bit Operations tenente nnne nennen nennen 304 Load and Store Operations iusiose i e a e e E e E E nentes 305 Floating Point Control Registers eara a ear rE E E SE nenne 308 Floating Point Implementation and Revision Register ssssssssssss 308 Floating Point Status Register FSR ssssssssssssssseeeeneneneenreetn 309 Bit Descriptions of the FSR ee eed ec ege te teet tit net eat 310 Loading the POR i ueber eh eH rer tire eR e eR Seia 311 EPU Instt cti ns rere e eee dert nien eI I Rees 312 CV Te E a ties ie Ueupeim ae qi ete listen ie ean mie periere tone 312 Moves and Conditional Moves sse eene 313 CECT CTC T Aneirin need medie tied edi ede edited obti P agen 313 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Cont
90. number of quad words 9 10 5 11 12 13 14 15 16 47 Figure 5 5 16 or 32 Word Read Sequence MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 72 Tag Read Sequence Cycle SCCIk SC A B Addr 18 0 SCTagLSBAddr SC A B DWay SCData 127 0 SCDataChk 9 0 SC A B DOE SC A B DWr SC A B DCS SCTWay SCTag 25 0 SCTagChk 6 0 SCTOE scTwr SCTCS Version 2 0 of January 29 1997 Chapter 5 A tag read sequence is performed when the state of a secondary cache block is required but it is not necessary to access the data array This sequence is used for the CACHE Index Load Tag S instruction Figure 5 6 depicts a secondary cache tag read sequence 9 11 17 i30 42 i13 144 15 16 Figure 5 6 Tag Read Sequence MIPS R10000 Microprocessor User s Manual Secondary Cache Interface 73 5 7 Write Sequences There are five basic write sequences e a4 word write e an 8 word write e a 16 word write e a 32 word write e a tag write Errata The SCCIK referred in the secondary cache read and write timing diagrams is an internal SCClk The relationship between this internal SCClk and the external SCCIk 5 0 SCCIk 5 0 can be programmed during boot time by setting the SCCIKTap mode bits see the section titled Mode Bits in Chapter 8 for detail on mode bits MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 74 Chapt
91. occurred Once the CPO CacheErr register EW bit is set it can only be cleared by a reset sequence 9 6 CPO Status Register DE Bit Asserting the CPO Status register DE bit suppresses the posting of future Cache Error exceptions All local CacheErr registers are also prevented from being updated Unlike the R4400 processor architecture when the DE bit is asserted cache hits are not inhibited when an uncorrectable error is detected Correctable errors are handled normally when the DE bit is set NOTE Be careful when setting this bit since it may cause erroneous data and or instructions to be propagated 9 7 CACHE Instruction Version 2 0 of January 29 1997 Uncorrectable error protection is suppressed for the Index Load Tag Index Store Tag Index Load Data and Index Store Data CACHE instruction variations These four variations may be used within a Cache Error exception handler to examine the cache tags and data without the occurrence of further uncorrectable errors MIPS R10000 Microprocessor User s Manual Error Protection and Handling 173 9 8 Error Protection Schemes Used by R10000 Error protection schemes used in the R10000 processor are e parity sparse encoding e ECC These schemes are described in this section and listed in Table 9 1 Table 9 1 Error Protection Schemes Used in the R10000 Processor Error Detection Used What is Protected Primary caches Parity Secondary cache data System interface bu
92. one store can graduate per cycle Event 4 for Counter 0 Store Conditional Issued This counter is incremented on the cycle after a store conditional instruction is issued to the address calculation unit Note that an SC can only be counted as having been issued once even though it may actually be issued more than once due to DCache Tag being busy or there already being four load store cache misses waiting in the SCTP logic Event 4 for Counter 1 Store Conditional Graduation At most one store conditional can graduate per cycle This counter is incremented on the cycle following the graduation of a store conditional instruction Event 5 for Counter 0 Failed Store Conditional This counter is incremented when a store conditional instruction fails Event 5 for Counter 1 Floating Point Instruction Graduation This counter is incremented by the number of FP instructions which graduated on the previous cycle Any instruction that sets the FP Status register bits EVZOUI is counted as a graduated floating point instruction There can be 0 to 4 such instructions each cycle Event 6 for Counter 0 Conditional Branch Resolved Version 2 0 of January 29 1997 This counter is incremented when a conditional branch is determined to have been resolved Note that when multiple floating point conditional branches are resolved ina single cycle this counter is still only incr
93. operands or results These cases are indicated by setting one of six specific Cause bits The Cause bits indicate the status of the floating point arithmetic instruction which graduated most recently or caused an exception to be taken The FSR is not modified by load store or move instructions All cause bits except E have corresponding Enable and Flag bits in the FSR E Unimplemented operation the execution unit does not perform the specified operation This exception is always enabled V Invalid operation this operation is not valid for the given operands Z Division by zero divide unit only the result of division by zero is not defined O Overflow the result is too large in magnitude to be correctly represented in the result format U Underflow the result is too small in magnitude to be correctly represented in the result format I Inexact Result the result cannot be represented exactly NOTE The FSR is modified only for instructions issued by the floating point queue Move From MFC or DMFC instructions never set the Cause field status bits from the functional unit multiplier must be ignored Move or Move Conditional instructions can set the Unimplemented Operation exception only in the Cause field Load and store instructions are issued by the address queue The functional units generate the Cause bits and send them to the graduation unit when the operation is completed Enable Bits 11 7 The five Enable bits
94. operation Then the following side effect exists if a noncoherent line is changed to Clean Exclusive and this line is also present in noncoherent space the noncoherent data could be modified by an external component and the processor would then have stale data Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 23 Workarounds for Noncoherent Cached Systems The suggestions presented below are not exhaustive the solutions and trade offs are system dependent Any one or more of the items listed below might be suitable in a particular system and testing and simulations should be used to verify their efficacy 1 The external agent can reject a processor block read request to any I O location in which a speculative load would cause an undesired affect Rejection is made by returning an external NACK completion response A serializing instruction such as a cache barrier or a CPO instruction can be used to prevent speculation beyond the point where speculative stores are allowed to occur This could be at the beginning of a basic block that includes instructions that can cause a store with an unsafe pointer Stores to addresses like stack relative global pointer relative and pointers to non I O memory might be safe Speculative loads can also cause a side effect To make sure there is no stale data in the cache as a result of
95. per row 0010 0000 0100 001010000000 1000 0000 00010010 1111 1100 1111 1011 1111 1100 0011 1111 1111 1101 0110 0010 0000 0010 0010 0010 1000 0101 0000 0100 1100 01000000 1000 000100000010 0001 0100 0000 01000011 1111 1111 1000 0000 1000 0000 1000 0100 1111 1111 1111 1111 0010 0001 0000 0001 00000001 1111 1111 1100 0010 0000 0010 1000 0100 0000 1000 0001 000000100011 0010 0000 1000 0001 0100 0100 0100 0000 0100 0110 1011 1111 1111 1100 0011 1111 1101 1111 0011 1111 0100 1100 0000 0001 0000 0001 0100 0010 0000 01000000 1010 0001 0010 0001 0010 0001 1000 0010 0011 00100000 0010 0001 0000 0000 1011 1010 1111 1111 1100 0000 1010 0000 1101 0000 1111 111 1111 1111 1100 0001 0000 0010 0000 01000000 1001 0000 1001 010001000001 0011 0001 0010 0001 00000000 1000 0100 1101 1100 0000 1110 0000 1111 1111 1110 0000 1100 0000 1111 1111 1111 1111 000010001 0000 0001 00000100 00000100 0010 0000 0000 1011 0000 1101 0000 1011 0100 0111 0000 0100 1110 0000 1111 1111 0110 0000 1100 0011 1111 1111 1100 0000 1111 1111 1111 3333 3355 3333 3355 3333 3355 3333 3355 3333 3555 3333 3355 3333 3555 3355 5555 5555 55335553 3333 5533 3333 5553 3333 5533 3333 5533 3333 5533 3333 5533 3333 fiv44V vju aov fiavpuooag sof xiyyjg DOF GS 6 qu L 8ZI 6 4aydvyD Error Protection and Handling 179 Data Array in Noncorrection Mode
96. processor does not cache any lines of DMA area B If the processor and the DMA operations are performed in sequence the following could occur 1 Dueto speculative execution at the exit of the code loop the line of data beyond the end of the memory area A that is the starting line of memory area B is refilled to the cache This cache line is then marked Dirty 2 The DMA operation starts writing noncoherent data into memory area B 3 Acacheline replacement is caused by later activities of the processor in which the cache line is written back to the top of area B Thus the first line of the DMA area B is overwritten by old cache data resulting in incorrect DMA operation and data The OS can restrict the writable pages for each user process and so can prevent a user process from interfering with an active DMA space The kernel on the other hand retains xkphys and kseg0 addresses in registers There is no write protection against the speculative use of the address values in these registers User processes which have pages mapped to physical spaces not in RAM may also have side effects These side effects can be avoided if DMA is coherent Speculative Instruction Fetch The change in a cache line s state due to a speculative instruction fetch is not reversed if the speculation is aborted This does not cause any problems visible to the program except during a noncoherent memory
97. processor with the DCOK pin until the processor has stabilized NOTE The input voltage required for the DCOK is 3 3V in either the CMOS TTL or the HSTL configuration Both DCOk pins must be tied together externally Maximum Operating Conditions Table 12 1 shows the maximum conditions under which the processor operates Table 12 1 Maximum Operating Conditions Parameter Symbol Value Core Supply Voltage Vcc 3 6 volts Output Supply VecQ HTSL 1 6 volts Voltage VecQ CMOS TTL 3 6 volts Case Temperature Tc 20 to85 C Applied Input Voltage Vin 0 5 to Vec 0 5 volts Maximum Power PR10000 30 watts PCIk Frequency f 200 MHz Errata Revised Case Temperature in Table 12 1 above MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 212 Chapter 12 Input Signal Level Sensing Mode Definitions Vref SC Sys Version 2 0 of January 29 1997 The processor input signals are all received by CMOS receivers that are compatible with either HSTL or CMOS TTL logic levels The I O levels are defined by VrefSC and VrefSys according to the appropriate logic family HSTL or CMOS TIL The mode bit ODrainSys is provided to select the characteristics of the pad ring When asserted this mode bit tristates the PMOS pullup devices to select system interface output drivers This mode is included to allow for multiprocessor systems to use a GTL like open drain configuration wit
98. program order execution 370 outgoing buffer 89 91 92 outstanding requests 87 overflow FP 310 P package configuration 219 package see CLGA PAddr0 field 258 PAddri field 258 page address 328 offset 328 size code 328 defined 328 virtual 328 page table entry PTE array 241 PageMask register 242 328 329 parity protection 173 PCIK signal 61 80 366 367 MIPS R10000 Microprocessor User s Manual I 11 PE bit 256 performance branch prediction 31 cache 31 R10000 28 31 Performance Counter interrupt 244 Performance Counter register 264 permanent register 372 PEN bits 240 fields in EntryLo registers 240 phase locked loop 158 physical address 187 188 physical memory addresses 328 physical page frame number 239 physical register see also logical register 375 PIdx primary cache index 67 pipeline 17 definition of 370 fetch 6 17 floating point 7 floating point multiplier 6 integer ALU 6 latency 370 Load Store Unit 6 out of order execution 370 repeat rate 370 sequence 370 stage definition 370 stage 1 17 18 stage 2 17 stages 4 6 18 stalls 13 PLL 158 PLLDis signal 43 213 PLLRC capacitor 221 PLLSpare signals 213 PM field 256 power interface signals see also individual signals 38 power supply levels DC 210 regulation 217 power on reset 159 sequence 160 PrcElmReq mode bit 123 153 164 198 Version 2 0 of January 29 1997 I 12 PrcReqMax mode bits 93 113
99. request grant and release As described earlier in this chapter the System interface resides in either master or slave state the processor enters slave state during all of the reset sequences When mastership of the System interface changes there is always one dead SysClk cycle during which the bidirectional signals are not driven the processor ignores all bidirectional signals during this dead SysClk cycle The protocol supports overlapped arbitration which allows arbitration to occur in parallel with requests and responses This results in fewer wasted cycles when mastership of the System interface changes Grant parking is also supported allowing a device to retain mastership of the System interface as long as no other device requests the System interface In multiprocessor systems using the cluster bus the external arbiter typically implements a round robin priority scheme MIPS R10000 Microprocessor User s Manual System Interface Operations 109 System Interface Arbitration Rules The rules for the System interface arbitration are listed below MIPS R10000 Microprocessor User s Manual If the System interface is in slave state and a processor request or coherency data response is ready for issue and the required resources are available e g a free request number SysRdRdy asserted etc the processor asserts SysReq The processor will not assert SysReq unless all of the above conditions are met The proce
100. request address cycle when the following are true e the System interface is in master state e SysRdRdy was asserted two SysClk cycles previously the maximum number of outstanding processor requests specified by the PrcReqMax mode bits is not exceeded e there is a free request number e the processor is not the target of a conflicting outstanding external coherency request A single processor may have as many as four processor upgrade requests outstanding on the System interface at any given time MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 122 Chapter 6 Figure 6 14 depicts four processor upgrade requests Since the System interface is initially in slave state the processor must first assert SysReq and then wait until the external agent relinquishes mastership of the System interface by asserting SysGnt and SysRel Cycle 425590 4 55 8 5 0 7 9 129 10 11512 13 014 45 16 17 SysCIk AAAA AAAA EQ T MO XNe NA Tm Master EA EA EA EA Po Pg Po Py Po Po Pg Po Po Po Po Po SysReq SysGnt SS cmm Re me Syshel re eee na XD AI am SysCmd 11 0 So Mug XCUgd CUgd Kh i X 3 SysCmdPar es ee es SysAD 63 0 i Adr X i Adr X Adr X Adr X SysADChk 7 0 X X K K K X SysVal M y a8 ae o M EE SysRdRdy SysWrRdy SysState 2 0 SysStat
101. shared processor coherency state response Errata Revised the two footnotes in Table 6 29 above MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 146 External Coherency Request Latency This section describes the R10000 external coherency request latency depicts the following e an external coherency request which targets the processor e the resulting processor coherency state response e the potential processor coherency data response Two external coherency request latency parameters are also defined e the processor coherency state response latency tyes Chapter 6 Figure 6 26 specifies the time from external coherency request to processor coherency state response e the processor coherency data response latency tpcqy specifies the time from the external coherency request to the processor coherency data response if a master or to the assertion of the processor coherency data response indication on SysState 0 if a slave Cycle 4 2 3 4 5 6 7 8 9 10 41 12 13 14 15 16 SysReq SysGnt SysRel SysCmd 11 0 mm SysCmdPar X X mi mmmUZEE bos scs Mem xX SysAD 63 0 CRG I I fT SysADChk 7 0 ooa aa a S a aa a a E BM SysVal b a sa amc NN a NN E sewmy i igi jgl j i iii j SysCIk j j j j j uer y o o SysState 2 0 swsweba Eg SysResp 4 0 ue eee pz Mese cue ome ie FAT SysRespPar e e e SysRespVal tpcer
102. signals secondary cache see also cache secondary 51 SelDVCO signal 43 213 serial operations 23 190 serial operations and CACHE instructions 190 serializing instruction 23 190 setup times AC electrical 216 signal integrity 217 decoupling capacitance 218 maximum input voltage levels 217 power supply regulation 217 reference voltage 217 signals power interface see also individual signals 38 secondary cache interface see also individual signals 39 System interface see also individual signals 41 test interface see also individual signals 43 size page in memory 328 SK bit 256 slave state 81 and flow control 93 soft warm reset 159 163 Soft Reset exception 337 Soft Reset exception 332 software interrupts 106 SP bit 273 sparse encoding protection 173 special interrupt vector 336 specifications test AC electrical 215 speculative branching 374 speculative execution 14 21 374 square root unit FPU 301 SR bit 250 337 339 SS secondary cache size field 256 Version 2 0 of January 29 1997 Index sseg space 321 SSRAM 59 64 address signals 39 clock signals 39 data signals 39 tag signals 40 stage definition of 370 stalls improving performance 13 standard package configuration 219 state master 81 slave 81 state bus signals 42 Status register 171 in FPU see also FSR 304 store conditional 27 store operations FPU registers 305 stores and uncached buffer 55 nonblocking 373 strong orderi
103. state parity is set to 0 The LRU bit does not change If the StateMod of the block to be invalidated 010 Inconsistent the block in the primary data cache must be written back to the secondary cache The address and way in the secondary cache to be written back to are read out of the primary data cache tag address and secondary way fields and all 32 bytes are written back Only the data field of the secondary cache is modified by this instruction since the processor follows state and data subset rules Since the CE bit is not defined in the R10000 processor this instruction no longer has a CPO ECC register mode MIPS R10000 Microprocessor User s Manual CACHE Instructions 193 10 4 Index WriteBack Invalidate S The Index WriteBack Invalidate S instruction sets a block in the secondary cache to Invalid and writes back any dirty data to the System interface unit This operation extends to any blocks in the primary data or instruction caches which are subsets of the secondary cache block The CACHE instruction physical address PA Cachesize 2 Blocksize defines the address and PA 0 defines the way to be invalidated The invalidation occurs in the following sequence 1 The processor reads the STag PIdx and State bits from the secondary cache tag array If State 00 Invalid no further activity takes place If there is a valid entry then the STag is used to interrogate the primary instruction and data caches
104. store conditional issued Stores including store conditional graduated Store conditional issued Store conditional graduated Failed store conditional Floating point instructions graduated Branches resolved Quadwords written back from primary data cache Quadwords written back from secondary cache Correctable ECC errors on secondary cache data TLB refill exceptions Branches mispredicted Instruction cache misses Secondary cache misses instruction Secondary cache load store and cache ops operations Secondary cache misses data Secondary cache way mispredicted instruction External intervention requests Secondary cache way mispredicted data External intervention request is determined to have hit in secondary cache External invalidate requests Functional unit completion cycles External invalidate request is determined to have hit in secondary cache Stores or prefetches with store hint to CleanExclusive secondary cache blocks Errata Errata MIPS R10000 Microprocessor User s Manual Instructions graduated Stores or prefetches with store hint to Shared secondary cache blocks Made various changes to Table 14 18 as indicated by the underlines Note that the updated material reflects the functionality of silicon revision 3 0 and later The status of earlier silicon revisions are documented as silicon errata available on www mips com e The IE bit enables the a
105. tMove to from selects an integer register instead Moved 32 bit data is sign extended in 64 bit register Figure 15 4 Loading and Storing Floating Point Registers in 32 Register Mode Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Floating Point Unit 307 Doubleword load store and move to from instructions load or store an entire 64 bit floating point register as shown in Figure 15 5 32 bit Single Precision 64 bit Double Precision 63 32 31 0 zero 63 32 31 0 Undefined 32 bit Value 64 bit Result Value In MIPS 1 and II ISA arithmetic operations are valid only for even numbered registers Figure 15 5 Operators on Floating Point Registers In MIPS I and MIPS II ISAs all arithmetic instructions whether single or double precision are limited to using even register numbers Load store and move instructions transfer only a single word Even and odd register numbers are used to access the low and high halves respectively of double precision registers When storing a floating point register SWC1 or MFC1 the processor reads the entire register but writes only the selected half to memory or to an integer register Because the register renaming scheme creates a new physical register for every destination it is not sufficient just to enable writing half of the Floating Point register file when loading LWC1 or MTC1 the unchanged half must also be copied into the destination T
106. that matches an attempted reference to a mapped address space e TLB Invalid occurs when a virtual address reference matches a TLB entry that is marked invalid e TLB Modified occurs when a store operation virtual address reference to memory matches a TLB entry which is marked valid but is not dirty the entry is not writable The following three sections describe these TLB exceptions NOTE TLB Refill vector selection is also described earlier in this chapter in the section titled TLB Refill Vector Selection MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 342 TLB Refill Exception Cause Processing Servicing Version 2 0 of January 29 1997 Chapter 17 The TLB refill exception occurs when there is no TLB entry to match a reference to a mapped address space This exception is not maskable There are two special exception vectors for this exception one for references to 32 bit address spaces and one for references to 64 bit address spaces The UX SX and KX bits of the Status register determine whether the user supervisor or kernel address spaces referenced are 32 bit or 64 bit spaces the TLB refill vector is selected based upon the address space of the address causing the TLB miss user supervisor or kernel mode address space together with the value of the corresponding extended addressing bit in the Status register UX SX or KX The current operating mode of the processor is not impo
107. the MIPS ANDES architecture or Architecture with Non sequential Dynamic Execution Scheduling The R10000 processor has the following major features terms in bold are defined in the Glossary Errata it implements the 64 bit MIPS IV instruction set architecture ISA it can decode four instructions each pipeline cycle appending them to one of three instruction queues it has five execution pipelines connected to separate internal integer and floating point execution or functional units it uses dynamic instruction scheduling and out of order execution it uses speculative instruction issue also termed speculative branching it uses a precise exception model exceptions can be traced back to the instruction that caused them it uses non blocking caches it has separate on chip 32 Kbyte primary instruction and data caches it has individually optimized secondary cache and System interface ports it has an internal controller for the external secondary cache it has an internal System interface controller with multiprocessor support The R10000 processor is implemented using 0 35 micron CMOS VLSI circuitry on a single 17 mm by 18 mm chip that contains about 6 7 million transistors including about 4 4 million transistors in its primary caches Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 5 R10000 Superscalar Pipeline The R10000 superscalar proces
108. the previous write address and the write way is obtained from the previous write way If an overflow occurs when incrementing the previous write address the address wraps to 0 and the way is toggled The write data is identical to the previous write data For proper results an auto increment write must always be proceeded by a normal or auto increment write The external agent issues an auto increment write command by e asserting the auto increment select on SysAD 44 e asserting the Write Read select on SysAD 43 driving the array select on SysAD 42 40 e asserting SysVal for one SysClk cycle Auto increment writes have a repeat rate of one PCIK cycle Figure 18 4 depicts three cache test mode auto increment writes 25134 4 7518945758 79 10 1t 12 13 14 15 16 17 NSE eoe AI e SN SEN eoe Cd NE ONCE NA EA EA EA 1 EA 1 EA i EA EA EA EA EA EA EA EA EA EA EA X InaWr InaWr Mowry Figure 18 4 Cache Test Mode Auto Increment Write Protocol MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 366 Chapter 18 Normal Read Protocol Cycle SysCIk Master SysReset SysGnt SysAD 63 0 SysVal A cache test mode normal read operation reads a selected RAM array The read address way and array are specified by the read command The external agent issues a normal read command by e driving the address on SysAD 57 46 e driving the way on SysAD 45 e negat
109. three instruction queues integer queue e address queue e floating point queue The three instruction queues can issue see the Glossary for a definition of issue one new instruction per cycle to each of the five execution pipelines the integer queue issues instructions to the two integer ALU pipelines e the address queue issues one instruction to the Load Store Unit pipeline e the floating point queue issues instructions to the floating point adder and multiplier pipelines A sixth pipeline the fetch pipeline reads and decodes instructions from the instruction cache 64 bit Integer ALU Pipeline Version 2 0 of January 29 1997 The 64 bit integer pipeline has the following characteristics e it has a 16 entry integer instruction queue that dynamically issues instructions e it has a 64 bit 64 location integer physical register file with seven read and three write ports 32 logical registers see register renaming in the Glossary it has two 64 bit arithmetic logic units ALUI contains an arithmetic logic unit shifter and integer branch comparator ALU2 contains an arithmetic logic unit integer multiplier and divider MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor Load Store Pipeline The load store pipeline has the following characteristics it has a 16 entry address queue that dynamically issues instructions and uses the integer register file for base and index
110. to the System interface bus through the outgoing buffer MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 92 Uncached Buffer Version 2 0 of January 29 1997 Chapter 6 Each quadword of the outgoing buffer maintains an Uncorrectable Error flag If an uncorrectable error is encountered while a block is being cast out of the secondary cache the associated outgoing buffer quadword Uncorrectable Error flag is asserted When the processor empties an outgoing buffer entry by issuing a processor block write or coherency data response the outgoing buffer Uncorrectable Error flags are reflected by the data quality indication on SysCmd 5 The System interface contains an uncached buffer to provide buffering for uncached and uncached accelerated load and store operations All operations retain program order within the uncached buffer The uncached buffer is organized as a 4 entry FIFO followed by a 2 entry gatherer Each gathered entry has a capacity of 16 or 32 words as specified by the SCBlkSize mode bit The uncached buffer begins gathering when an uncached accelerated double or singleword block aligned store is executed Gathering continues if the subsequent uncached operation executed is an uncached accelerated double or singleword store to a sequential or identical address Once a second uncached accelerated store is gathered the gathering mode is determined to be sequential or identical Gather
111. type Indication 0 Request data 1 Response data 2 Request last 3 Response last During data cycles of an external block data response or processor coherency data response SysCmd 2 1 contain the state of the cache block as shown in Table 6 18 Table 6 18 Encoding of SysCmd 2 1 for Block Data Responses SysCmd 2 1 Cache Block State 0 Reserved 1 Shared 2 CleanExclusive 3 DirtyExclusive During data cycles SysCmd 0 specifies whether ECC checking and correcting is to be performed for the SysAD 63 0 bus as shown in Table 6 19 During processor data cycles the processor asserts SysCmd 0 Consequently in a multiprocessor system using the cluster bus ECC checking and correcting will be enabled for external block data responses resulting from processor coherency data responses Table 6 19 Encoding of SysCmd 0 for External Data Cycles SysCmd 0 ECC check indication 0 ECC checking and correcting disable 1 ECC checking and correcting enable Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 101 SysCmd 11 0 Map Table 6 20 presents a map for the SysCmd 11 0 bus Table 6 20 SysCmd 11 0 Map Cycle SysCmd 11 0 Bit Command Type 7 6 5 4 Coherent block read shared 0 0 Coherent block read exclusive 0 0 Block state Request number Read cause Noncoherent block rea
112. unless the instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set The process executing at the time of a Trap exception is handed a UNIX SIGFPE FPE INTOVF TRAP floating point exception integer overflow signal This error is usually fatal MIPS R10000 Microprocessor User s Manual CPU Exceptions 349 System Call Exception Cause A System Call exception occurs during an attempt to execute the SYSCALL instruction This exception is not maskable Processing The common exception vector is used for this exception and the Sys code in the Cause register is set The EPC register contains the address of the SYSCALL instruction unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction If the SYSCALL instruction is in a branch delay slot the BD bit of the Status register is set otherwise this bit is cleared Servicing When the System Call exception occurs control is transferred to the applicable system routine Additional distinctions can be made by analyzing the Code field of the SYSCALL instruction bits 25 6 and loading the contents of the instruction whose address the EPC register contains To resume execution the EPC register must be altered so that the SYSCALL instruction does not re execute this is accomplished by adding a value of 4 to th
113. valid User mode virtual addresses have bits 63 44 equal to 0 an attempt to reference an address with bits 63 44 not equal to 0 causes an Address Error exception Although the system may be in 32 bit mode address logic still generates 64 bit values In this case the high 32 bits must equal the sign bit 31 or an Address Error exception is taken MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 320 Chapter 16 Supervisor Mode Operations 32 bit KSU 01 and EXL 0 and ERL 0 and SX 0 Ox FFFF 0x E000 Ox DFFF 0x C000 Ox BFFF 0x 8000 Ox 7FFF 2 Gbytes Mapped 0x 0000 Supervisor mode is designed for layered operating systems in which a true kernel runs in processor Kernel mode and the rest of the operating system runs in Supervisor mode The processor operates in Supervisor mode when the Status register contains the Supervisor mode bit values shown in Table 16 1 Figure 16 2 shows Supervisor mode address mapping 64 bit KSU 01 and EXL 0 and ERL 0 and SX 1 Ox FFFF FFFF FFFF FFFF 0x FFFF FFFF E000 0000 Ox FFFF FFFF DFFF FFFF csseg 0x FFFF FFFF C000 0000 Ox FFFF FFFF BFFF FFFF 0x 4000 1000 0000 0000 0x 4000 OFFF FFFF FFFF 16 Tbytes sseg Mapped xsseg 0x 4000 0000 0000 0000 Ox 3FFF FFFF FFFF FFFF 0x 0000 1000 0000 0000 Address MOREE ASE 0x 0000 OFFF FFFF FFFF 0x 0000 0000 8000 0000 16 Tbytes pro suseg 0x 0000 0000 7FFF FFFF 0x 0000 0000 0000 0000 Figure 1
114. 000 Microprocessor User s Manual Table of Contents xvii 7 Clock Signals System Interface Clock and Internal Processor Clock Domains sss 156 Secondary Cache Clock rostas ensien esa a eue de eite a e et n c N ih 157 Ph se Locked L00p iae S ONERE SURE Ia en 158 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 xviii Table of Contents 8 Initialization Initialization of Logical Registers nnne nene 160 Power On Reset Sequence s ci e OX ERR Ped teh hehe RR eene ee aser d 160 Cold Reset Sequence siio eun REA IA IECUR 162 Soft Reset Sequendce eee erae e e UR n rr P n e ter ener eene 163 Mode Bits eek E Geo eite REGI Aet 164 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents xix 9 Error Protection and Handling Correctable Errors nesies iein op bi adie edad das uh o oio deis lh 168 Uncorrectable Ertors cetero este e Rad e I ED Re d 169 Propagation of Uncorrectable Errors sse nennen 170 Cache Error Exception x i ne ener re e eren here iim ete e 171 CPO CacheErr Register EW Bit sssssssesssssseeeeeeeneneneteene netten nennen nnne 172 CPO Status Register DE Bit migre Lie t Oe e he pre n Msi Ee HER E HE RR ned 172 CACHE Instr ctioh eti ertet riim tec ed eia eod e e ode ad nce e e eg 172 Error Protection Schemes Used by R10000 sse eee 173 Darty ect Ee Ee Ete eaten ete ae au uere ere ERE EET
115. 000 Microprocessor User s Manual Version 2 0 of January 29 1997 364 Chapter 18 18 6 Cache Test Mode Protocol This section describes the cache test mode protocol in detail including e normal write protocol e auto increment protocol e normal read protocol e auto increment read protocol Normal Write Protocol Cycle SysCIk Master SysReset SysGnt SysAD 63 0 SysVal Version 2 0 of January 29 1997 A cache test mode normal write operation writes a selected RAM array The write address way array and data are specified in the write command The external agent issues a normal write command by e driving the address on SysAD 57 46 e driving the way on SysAD 45 negating the auto increment select on SysAD 44 e asserting the Write Read select on SysAD 43 driving the array select on SysAD 42 40 e driving the write data on SysAD 39 0 e asserting SysVal for one SysClk cycle Normal writes have a repeat rate of 8 PCIK cycles Figure 18 3 depicts two cache test mode normal writes EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA EA mW oF mw Figure 18 3 Cache Test Mode Normal Write Protocol MIPS R10000 Microprocessor User s Manual Cache Test Mode 365 Auto Increment Write Protocol Cycle SysClk Master SysReset SysGnt SysAD 63 0 SysVal A cache test mode auto increment write operation writes a selected RAM array The write address is obtained by incrementing
116. 1 CohPrcReqTar mode bit 102 149 152 164 cold reset 159 sequence 162 Cold Reset exception 332 Compare register 106 244 completing an instruction 371 completion definition of 373 condition bit dependencies 14 Condition field FP 310 conditional move instruction FP 313 Config register 256 conflicts coherency 143 internal 143 TLB avoiding 330 Context register 241 259 context switch 330 control registers FPU 308 controller TAP 204 coordinator cluster 81 COP1 instructions 357 COP instructions 357 Coprocessor 0 see also CPO 235 Coprocessor 1 see also CP1 COP1 251 Coprocessor 2 see also CP2 COP2 251 Coprocessor 3 see also CP3 COP3 251 Coprocessor Unusable exception 352 correctable error 168 Count register 106 244 CPO coprocessor 0 235 MIPS R10000 Microprocessor User s Manual Index branch on CP0 instructions 285 hazards 285 instructions 285 357 load hazards 285 move instructions 286 registers list of 236 csseg space 321 CT bit 256 CTM mode bit 166 361 362 CU coprocessor usability field 246 248 251 CVT L fmt instruction 312 D D dirty bit 239 data cache see also cache primary data 48 data dependencies 20 data path secondary cache 10 data quality indication 92 DBRC field 261 DC characteristics of I O signals 214 DC electrical specifications 210 input and output 214 input level sensing 212 maximum operating conditions 211 mode definitions 212 power supply levels 210 unus
117. 1 0 Secondary cache block former state 0 Invalid 1 Shared 2 CleanExclusive 3 DirtyExclusive When SysStateVal is negated SysState 0 indicates if a processor coherency data response is ready for issue Table 6 25 presents the encoding of the SysState 2 0 bus when SysStateVal is negated Table 6 25 Encoding of SysState 2 0 When SysStateVal Negated SysState 2 1 Reserved SysState 0 Processor coherency data response indication 0 Not ready for issue 1 Ready for issue MIPS R10000 Microprocessor User s Manual System Interface Operations SysResp 4 0 Encoding 6 14 Interrupts Hardware Interrupts 105 An external agent issues an external completion response by driving the request number associated with the corresponding request on SysResp 4 2 driving the completion indication on SysResp 1 0 and asserting SysRespVal for one SysClk cycle Table 6 26 presents the encoding of the SysResp 4 0 bus Table 6 26 Encoding of SysResp 4 0 SysResp 4 2 Request number SysResp 1 0 Completion indication 0 Acknowledge ACK 1 Error ERR 2 Negative acknowledge NACK 3 The processor supports five hardware two software one timer and one nonmaskable interrupt The Interrupt exception is described in Chapter 17 the section titled Interrupt Exception Five hardware interrupts are accessible to an external agent via external interrupt requests An external interrupt request cons
118. 108 361 SCData 107 364 SCData 104 365 SysAD 63 366 SysAD 62 367 SysAD 61 370 SysAD 58 371 SysAD 57 372 SysAD 56 373 SysAD 55 376 SysAD 52 377 SysAD 51 378 SysAD 50 379 SysAD 49 382 SysADChk 6 383 SysADChk 5 384 SysADChk 4 385 SysADChk 3 388 SysADChk 0 389 SysAD 31 390 SysAD 30 391 SysAD 29 394 SysAD 26 395 SysAD 25 396 SysAD 24 397 SysAD 23 400 SysAD 20 401 SysAD 19 402 SysAD 18 403 SysAD 17 406 SCData 102 407 SCData 101 408 SCData 100 418 SCBAddr 13 419 SCBAddr 14 420 SCBAddr 15 ti Will be eliminated after rev 1 2 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 12 Electrical Specifications This chapter contains the following electrical and signal information about the R10000 processor e DC electrical specification e AC electrical specification signal integrity issues MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997209 210 Chapter 12 12 1 DC Electrical Specification This section describes the following DC electrical characteristics of the R10000 processor e DC power supply levels e DCOk and power supply sequencing e maximum operating conditions e input signal level sensing e mode definitions e Vref SC Sys unused inputs DC input output specifications DC Power Supply Levels The processor core is powered by a 3 3V 5 supply The processor output drivers are pow
119. 115 SysAD 42 116 SysAD 43 117 SysAD 44 118 SysAD 45 SysAD 46 SysAD 47 121 SCData 72 122 SCData 73 123 SCData 74 124 SCData 75 SCData 76 SCData 77 127 SCData 78 128 SCData 79 129 SCAAddr 0 130 SCAAddr 1 SCAAddr 2 SCAAddr 3 133 SCAAddr 4 134 SCAAddr 5 135 SCAAddr 6 136 SCAAddr 7 SCAAddr 8 SCADWay 139 SCADCS 140 SCADOE 141 SCADWr 142 SCData 80 SCData 81 SCData 82 145 SCData 83 146 SCData 84 147 SCData 85 148 SCData 86 SCData 87 SCData 88 151 SCData 89 152 SCData 90 153 SCData 91 154 SCData 92 SCData 93 SCData 94 157 SCData 95 158 SCDataChk 6 159 SCDataChk 8 160 Sparel SCTCS SCTOE 163 SCTWr 164 SCTag 25 165 SCTag 24 166 SCTag 23 SCTag 22 SCTag 21 169 SCTag 20 170 SCTag 19 171 SCTag 18 172 SCTag 17 SCTag 16 SCTag 15 175 SCTag 14 176 SCTag 13 177 SCTag 12 178 SCTag 11 SCTag 10 SCTag 9 181 SCTag 8 182 SCTag 7 183 SCTag 6 184 SCTag 5 SCTag 4 SCTag 3 187 SCTag 2 188 SCTag 1 189 SCTag 0 190 SCTagLSBAddr TriState SCTWay 193 SCTagChk 6 194 SCTagChk 5 195 SCTagChk 4 196 SCTagChk 3 SCTagChk 2 SCTagChk 1 199 SCTagChk 0 200 SysCmd 0 201 SysCmd 1 202 SysCmd 2 SysCmd 3 SysCmd 4 205 SysCmd 5 206 SysCmd 6 207 SysCmd 7 208 SysCmd 8 SysCmd 9 SysCmd 10 211 SysCmd 11 212 SysCmdPar 213 SysVal 214 SysReq SysRel SysGnt 217 SysReset 218 SysRespVal 219 SysRes
120. 160 165 SCCIkTap 157 166 SCCorEn 165 177 179 SCSize 51 60 165 SysClkDiv 80 156 160 165 mode definitions DC 212 Move from CPO instruction 285 Move from performance counter instruction 295 Move from performance event specifier instruction 295 move instruction FP 313 Move to CPO instruction 285 Move to performance counter instruction 295 Move to performance event specifier instruction 295 Move To From the Performance Counter instructions 204 MP field 261 MTCO instruction 69 286 296 multiple matches in TLB 330 multiplier pipeline 6 multiply unit FPU 301 multiprocessor system 35 arbitration 111 cluster bus 35 with external agent 35 multiprocessor system using dedicated external agents 84 multiprocessor system using the cluster bus 85 N NACK completion response 130 NACK signal 90 MIPS R10000 Microprocessor User s Manual Index NMI see also nonmaskable interrupt 284 NMI bit 249 250 nonblocking cache 25 nonblocking loads and stores 373 Nonmaskable Interrupt NMI exception 106 332 339 normal read cache test mode 366 normal write cache test mode 364 NT compatibility LLAddr register 257 number request 87 O ODrainSys mode bit 166 212 offset in page address 328 op field encoding of CACHE instructions 191 operating conditions AC 215 operating mode Kernel 316 322 Supervisor 316 320 User 316 318 operations FPU 302 ordering memory 15 ordering strong 15 out of
121. 18 adder pipeline 6 divide 18 302 multiplier 18 pipeline 7 queue 6 11 instructions written each cycle 11 number of allowable entries 11 ports 11 sequencing 11 registers 303 rounding mode 311 square root 18 Floating Point exception 353 Floating Point Status register see also FSR 309 Floating Point Unit see also FPU 301 flow control 93 external data response 93 external request 93 processor coherency data response 93 processor eliminate request 93 processor read request 93 processor upgrade request 93 processor write request 93 signals 41 format TLB entry 329 FPU 301 Active List control of FSR 300 add unit 301 arithmetic instructions 310 MIPS R10000 Microprocessor User s Manual I 7 cause bits FSR 310 changing rounding mode using a CTC1 311 compare 310 condition bits 310 control registers 308 divide unit 301 FGRs general registers 303 FSR Status register in FPU 309 graduation control of FSR 309 instructions processor specific 312 latency 301 logic diagram 302 move to floating point 307 multiply unit 301 operations 302 queue controlling units 303 move unit FPU 302 read ports 302 register file 302 repeat rate 301 rounding modes 311 serial dependency circuit 307 square root unit 301 ER field 248 FrameMask register 240 260 free list 372 freeing the request number with completion response 130 FSR Floating Point Status register cause bits 310 condition bits 310 division by zero 310 enable
122. 29 1997 356 Chapter 17 17 4 MIPSIV Instructions Version 2 0 of January 29 1997 The system must either be in Kernel or Supervisor mode or have set the XX bit of the Status register to a 1 in order to use the MIPS IV instruction set In User mode if XX is a 0 and an attempt is made to execute MIPS IV instructions an exception will be taken The type of exception that will be taken depends upon the type of instruction whose execution was attempted a list is given in Table 17 4 Note that operating with MIPS IV instructions does not require that MIPS III instruction set or 64 bit addressing is enabled MIPS IV instructions that use or modify the floating point registers CP1 state are also affected by the CU1 bit of the CPO Status register If CLI1 is not set a Coprocessor Unusable exception may be signaled The Reserved Instruction RI Coprocessor Unusable CU and Unimplemented Operation UO exceptions for MIPS IV instructions are listed in the Table 17 4 below Table 17 4 MIPS IV Instruction Exceptions Exceptions Instructions CU1 MIPS4 RI CPU undefined RI MOVN Z RI MOVT F CU RI PREF CU COPI all instructions UO undefined RI BC cc gt 0 UO C cc gt 0 UO MOVN ZT F UO RECIP RSORT RI COPI1X all instructions CU all instructions RI undefined MIPS R10000 Microprocessor User s Manual CPU Exceptions 357 17 5 COPO Instructions Execution of an RFE instru
123. 3 3 599CLGA Pinout Signal Location Signal Location Signal Location DCOk AF 2 DCOk Buses 22 JICK Muss 33 JIDI Missus 35 JIDO M oisi 31 JIMS AA 34 PLLDis Egg 24 PLLRC A snas 25 PLLSparel Quunat 21 PLLSpare2 p DNE 21 PLLSpare3 Dx 2 PLLSpare4 Bis 21 SCAAddr lt 0 gt E uus 13 SCAAddr lt 1 gt IA Ese 11 SCAAddr lt 2 gt Du 12 SCAAddr lt 3 gt Cus 11 SCAAddr 4 E nk 12 SCAAddr lt 5 gt Bonus 10 SCAAddr lt 6 gt D iic 11 SCAAddr lt 7 gt Qa 10 SCAAddr lt 8 gt Ass 9 SCAAddr lt 9 gt Busse 30 SCAAddr 10 E ee 29 SCAAddr 11 E Lees 31 SCAAddr 12 Dusk 30 SCAAddr 13 cuu 31 SCAAddr 14 EX 30 SCAAddr 15 B2 uu 32 SCAAddr 16 Ds 31 SCAAddr lt 17 gt Bus 33 SCAAddr 18 orte i 32 SCADCS Bins 9 SCADOE Din 9 SCADWay Eds 10 SCADWr Asse 8 SCBAddr lt 0 gt AL 13 SCBAddr lt 1 gt AP argan 12 SCBAddr lt 2 gt AM 12 SCBAddr lt 3 gt AR 11 SCBAddr lt 4 gt Ala 12 SCBAddr lt 5 gt AN 11 SCBAddr lt 6 gt AM 11 SCBAddr lt 7 gt AP 10 SCBAddr lt 8 gt AL 11 SCBAddr lt 9 gt AL 29 SCBAddr lt 10 gt AB 30 SCBAddr lt 11 gt AM 30 SCBAddr lt 12 gt AR 31 SCBAddr lt 13 gt AL sos 30 SCBAddr lt 14 gt AN pe 31 SCBAddr lt 15 gt AM 31 SCBAddr lt 16 gt AP 32 SCBAddr lt 17 gt AP 2 220002 33 SCBAddr lt 18 gt AN 32 SCBDCS AN 10 SCBDOE AL 10 SCBDWay AR 9 SCBDWr AP uuu 9 SCCIk 0 Bass 13 SCCIk 1 JA ds 26
124. 323 ksseg space 323 KSU field 247 249 332 Version 2 0 of January 29 1997 I 10 kuseg space 323 KX bit 248 316 L latency 29 accessing secondary cache 31 definition of 370 external coherency request 146 FPU 301 least recently used replacement algorithm LRU 9 level sensing input 212 list free 372 LL instruction 27 LLAddr register 257 load hazards CPO 285 load linked 27 load operations FPU registers 305 Load Store Unit pipeline 6 loads nonblocking 373 logic diagram FPU 302 logical register initialization necessity for 160 logical register see also physical register 375 LRU least recently used replacement algorithm 9 M mapped virtual address region 317 mapping table 375 Mask field 242 master state 81 and flow control 93 matches multiple in TLB 330 MemEnd mode bits 165 memory dependencies 14 memory ordering 15 memory protection 328 MECO instruction 286 293 MIPS III ISA disabled and enabled 240 MIPS IV instruction set see also ISA 356 miscellaneous system signals 42 mispredicted branch 31 mode Version 2 0 of January 29 1997 Index addressing 317 addressing encodings 317 Kernel mode 317 Supervisor mode 317 User mode 317 operating 316 mode bits 164 CohPrcReqTar 102 149 152 164 CTM 166 361 362 DevNum 164 Kseg0CA 164 MemEnd 165 ODrainSys 166 212 PrcElmReq 123 153 164 198 PrcReqMax 93 113 115 121 125 164 SCBlkSize 51 60 92 165 SCCIkDiv 61 156
125. 5 86i778 9 10 11 12 13 14 15 16 17 SCCIk ae he an ce one gee ee he ee ae ee one ce an Oe ne an SC A B Addr 18 0 on rr SCTagLSBAddr SC AB DWay D O X 1 scData 127 0 at Dat SCDataChk 9 0 IE SCIABIDOE i i yh SCABIDW A i y 1 scABIDS 3 SCTWay l SCTag 25 0 l SCTagChk 6 0 l SCTOE SCTWI SCTCS Figure 5 8 8 Word Write Sequence MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 76 Chapter 5 16 or 32 Word Write Sequence Cycle SCCIk SC A B Addr 18 0 SCTagLSBAddr SC A B DWay SCData 127 0 SCDataChk 9 0 SC A B DOE SC A B DWr SC A B DCS SCTWay SCTag 25 0 SCTagChk 6 0 SCTOE scTwr SCTCS Version 2 0 of January 29 1997 A 16 or 32 word write sequence refills a secondary cache block from the System interface after a secondary cache miss A 16 word write sequence is performed when the secondary cache block size is 16 words and a 32 word write sequence is performed when the secondary cache block size is 32 words Figure 5 9 depicts a secondary cache 16 or 32 word write sequence 9 10 11 12 13 14 15 16 17 Figure 5 9 16 32 Word Write Sequence MIPS R10000 Microprocessor User s Manual Secondary Cache Interface Tag Write Sequence 77 Cycle SCCIk SC A B Addr 18 0 SCTagLSBAddr SC A B DWay SCData 127 0 SCDataChk 9 0 SC A B DOE SC A B DWr SC A B DCS
126. 6 2 Supervisor Mode Address Space 32 bit Supervisor Mode User Space suseg Version 2 0 of January 29 1997 In Supervisor mode when SX 0 in the Status register and the most significant bit of the 32 bit virtual address is set to 0 the suseg virtual address space is selected it covers the full 2 bytes 2 Gbytes of the current user address space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space starts at virtual address 0x0000 0000 and runs through Ox7FFF FFFF MIPS R10000 Microprocessor User s Manual Memory Management 321 32 bit Supervisor Mode Supervisor Space sseg In Supervisor mode when SX 0 in the Status register and the three most significant bits of the 32 bit virtual address are 1105 the sseg virtual address space is selected it covers 22 bytes 512 Mbytes of the current supervisor address space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space begins at virtual address 0xC000 0000 and runs through OxDFFF FFFF 64 bit Supervisor Mode User Space xsuseg In Supervisor mode when SX 1 in the Status register and bits 63 62 of the virtual address are set to 00 selection of the xsuseg virtual address space is dependent upon the UX bit e if UX 1 the entire space from 0x0000 0000 0000 0000 through 0000 OFFF FFFF FFFF 16 Tbytes is selected If
127. 71 execution order 13 execution pipelines 6 execution units iterative 375 execution speculative 20 374 EXL exception level bit 249 254 316 332 external ACK completion response 90 130 external agent 34 35 79 also referred to as cluster coordinator 81 connecting to 81 external allocate request number request protocol 134 external block data response 94 128 protocol 127 external coherency conflicts 144 external coherency request latency 146 external coherency requests action taken 142 external completion response 131 protocol 130 external double single partial word data response protocol external duplicate tags support for 152 external interface 10 memory accesses 32 priority operations 32 external interrupt request 105 protocol 136 external intervention exclusive request 141 external intervention request 133 protocol 133 external intervention shared request 141 external invalidate request 141 protocol 135 external NACK completion response 130 external request 80 87 protocol 132 external response 80 87 protocol 127 MIPS R10000 Microprocessor User s Manual Index F fetch pipeline 6 17 fetching an instruction 371 FGR Floating Point General register 32 bit operations 304 5 bit select 303 64 bit operations 304 load operations 305 operations 304 Status register FR bit 304 store operations 305 Fill field 245 flag uncorrectable error 90 Flag field FP 310 floating point adder
128. 8 Uncached attribute SysAD 57 Secondary cache block way indication SysAD 56 40 Reserved Physical address During the address cycle of processor noncoherent block read double single partial word read block write double single partial word write and eliminate requests the processor always drives a target indication of 0 on SysAD 63 60 This indicates that the request targets the external agent only When the CohPrcReqTar mode bit is negated during the address cycle of processor coherent block read and upgrade requests the processor also drives a target indication of 0 on SysAD 63 60 However when the CohPrcReqTar mode bit is asserted during the address cycle of processor coherent block read and upgrade requests the processor drives a target indication of 0xF on SysAD 63 60 This indicates that the request targets all processors together with the external agent on the cluster bus In multiprocessor systems using the cluster bus the CohPrcReqTar mode bit is asserted for a snoopy based coherency protocol and negated for a duplicate tag or directory based coherency protocol When the processor is in slave state an external agent uses the target indication field to specify which processors are targets of an external request SysAD 59 58 Uncached Attribute Version 2 0 of January 29 1997 During the address cycle of processor double single partial word read and write requests and during the address cycle of process
129. 97 Index Write Indexed TLB Entry instruction 285 Write Random TLB Entry instruction 285 write sequences 73 16 word 76 32 word 76 4 word 74 8 word 75 tag TI X XContext register 259 xkphys decoding virtual address bits VA 61 59 330 space 324 xkseg space 326 xksseg space 324 xkuseg space 324 xsseg space 321 xsuseg space 321 XTLB Refill 333 XTLB refill handler used with XContext register 259 xuseg space 318 319 XX MIPS IV User mode bit 246 248 316 356 MIPS R10000 Microprocessor User s Manual
130. 97 MIPS R10000 Microprocessor User s Manual Table of Contents xxiii 13 Packaging R10000 Single Chip Package 599CLGA cecccesesessesescscesesescetessesesesnenssescecenesesssnananeneseanenenenessees 220 Mechanical Characteristics esses eese eene nnne tes ettet tras ete nettes iia 220 Electrical Characteristics cccccccccccssccssscessecessecessecssecessecsscecasesecessecessecssseesseceaeeceseeenaeesees 221 Thermal Characteristics reete tre ierit etre ires ed deer tede ee eo reu o Peers 222 Assembly Drawings and Pinout List sess 222 599C E GA Pinout sie in se ttt cede bite iicet itete i Goa dada Pe er Dol Ga Pose dits 224 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 xxiv Table of Contents 14 Coprocessor 0 Index Register 0 erunt ra a een mte dan ned iren ester RRE tade 237 Random Register 1 iem ie en mee e e nee EE EP d e d 238 EntryLo0 2 and EntryLol 3 Registers sees eene eene 239 Context 4 een tte generen dene euet eet eere e even ete eee eee eer einen ein Ret 241 PageMask Register 5 ret ep rd ieri e clus Laces ipee pene ee ats 242 Wired Registet 6 ee tente e e Re hh er eter Pa e HIP te i fades 243 BadVAddr Register 8 cen ee eret e eee rne de Prid ees 244 Count and Compare Registers 9 and 11 244 Entry Hi Register 10 3 1er epp ftp Te RE God pte hits ee tend 245 Status Register 12 5 do adde D ier RE
131. B mappings The processor supports system designs utilizing hardware or software for coherentI O The external coherency requests are useful for creating systems with hardware I O coherency and the CACHE instruction is sufficient for creating a system with software I O coherency 6 21 Support for External Duplicate Tags Version 2 0 of January 29 1997 Some system designs implement an external duplicate copy of the secondary cache tags to reduce the coherency request latency and also filter out unnecessary external coherency requests made to the R10000 processor For such systems it must be remembered that blocks may reside in either the secondary cache or in the outgoing buffer During the address cycle of processor block read requests the secondary cache block former state is provided The external agent may use this information to maintain the external duplicate tags Typically ina multiprocessor system using the cluster bus the cluster coordinator specifies a free request number for an external coherency request However ina system using a duplicate tag or directory based coherency protocol where the CohPrcReqTar mode bit is negated the cluster coordinator may specify a busy request number for an external coherency request providing each targeted R10000 processor has the request number busy due to an outstanding processor coherency request from another processor For example suppose the processor in master state issues a process
132. C instruction tests a floating point condition bit it causes a coprocessor unusable exception if coprocessor 1 is disabled e Coprocessor 2 is defined but does not exist in the R10000 its instructions COP2 LWC2 LDC2 SWC2 SDC2 always cause an exception but the exception code depends upon whether the coprocessor as indicated by CU2 is enabled e Coprocessor 3 has been removed from the MIPS III ISA and is no longer defined If MIPS IV is disabled the coprocessor 3 instruction COP3 always causes a Reserved Instruction exception MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 252 Chapter 14 14 11 Cause Register 13 The 32 bit read write Cause register describes the cause of the most recent exception Figure 14 13 shows the fields of this register Table 14 12 describes the Cause register fields A 5 bit exception code ExcCode indicates one of the causes as listed in Table 14 13 All bits in the Cause register with the exception of the IP 1 0 bits are read only IP 1 0 are used for software interrupts Table 14 12 Cause Register Fields Field Description Indicates whether the last exception taken occurred in a branch delay slot 1 delay slot 0 normal BD Coprocessor unit number referenced when a Coprocessor CE Unusable exception is taken This bit is undefined for any other exception Indicates an interrupt is pending This bit remains unchanged for NMI Sof
133. EntryLo 33 31 or PFN 39 38 can only be set to all zeroes or all ones In 32 and 64 bit modes the UC and PEN bits of both EntryLo registers are written into the TLB The PFN bits can be masked by setting bits in the FrameMask register described in this chapter but the LIC bits cannot be masked or initialized in 32 bit User or Supervisor modes In 32 bit Kernel mode MIPS III is enabled and 64 bit operations are always available to program the UC bits There is only one G bit per TLB entry and it is written with EntryLo0 0 and EntryLo1 0 on a TLB write MIPS R10000 Microprocessor User s Manual Coprocessor 0 14 4 Context 4 Errata 241 The Context register is a read write register containing the pointer to an entry in the page table entry PTE array this array is an operating system data structure that stores virtual to physical address translations When there is a TLB miss the CPU loads the TLB with the missing translation from the PTE array Normally the operating system uses the Context register to address the current page map which resides in the kernel mapped segment kseg3 The Context register duplicates some of the information provided in the BadVAddr register but the information is arranged in a form that is more useful for a software TLB exception handler Figure 14 4 shows the format of the Context register Table 14 5 describes the Context register fields Context Register 63 23 22 4 3 0 PTEBase BadV
134. ILL ILL RESOP FAULT illegal instruction reserved operand fault signal This error is usually fatal MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 352 Coprocessor Unusable Exception Cause Processing Servicing Version 2 0 of January 29 1997 Chapter 17 The Coprocessor Unusable exception occurs when an attempt is made to execute a coprocessor instruction for either a corresponding coprocessor unit CP1 or CP2 that has not been marked usable or CPO instructions when the unit has not been marked usable and the process executes in either User or Supervisor mode This exception is not maskable The common exception vector is used for this exception and the CpU code in the Cause register is set The contents of the Coprocessor Usage Error field of the coprocessor Control register indicate which of the four coprocessors was referenced The EPC register contains the address of the unusable coprocessor instruction unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction The coprocessor unit to which an attempted reference was made is identified by the Coprocessor Usage Error field which results in one of the following situations If the process is entitled access to the coprocessor the coprocessor is marked usable and the corresponding user state is restored to the coprocessor If the process is entitled access to the copro
135. Instruction Type Execution Unit Latency Comment Integer Instructions Add Sub Logical Set ALU 1 2 MF MT HI LO ALU 1 2 Shift LUI ALU 1 Cond Branch Evaluation ALU Cond Move ALU MULT ALU 5 6 Latency relative to Lo Hi MULTU ALU 6 7 Latency relative to Lo Hi DMULT ALU 9 10 Latency relative to Lo Hi DMULTU ALU 10 11 Latency relative to Lo Hi DIV DIVU ALU 34 35 Latency relative to Lo Hi DDIV DDIVU ALU 2 66 67 Latency relative to Lo Hi Load not include loads to CP1 Load Store 2 Assuming cache hit Store Load Store Assuming cache hit Floating Point Instructions MTC1 DMTC1 ALU 1 Add Sub Abs Neg Round Trunc Ceil Floor C cond CVT S W CVT S L Repeat rate is on average CVT others Mul MFC1 DMFC1 Cond Move Move DIV S RECIP S DIV D RECIP D SORT S SORT D RSORT S RSORT D Latency is 2 only if the result is used as the MADE RADDERMEX operand specified by fr of another MADD LWCI LDC1 LWXCI LDXCI1 LoadStore 3 1 Assuming cache hit FADD MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 30 Chapter 1 Please note the following about Table 1 2 e For integer instructions conditional trap evaluation takes a single cycle like conditional branches e Branches and conditional moves are not conditionally issued e The repeat rate above for Load Store does not include Load Link and Store Conditional
136. Instructions Operation 32 64 T vAddr amp offset 5 8 offset 5 9 GPR base pAddr uncached lt AddressTranslation vAddr DATA CacheOp op vAddr pAddr Exceptions Coprocessor unusable exception MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 290 Chapter 14 14 28 DMFCO Instruction Doubl dM F D M FCO System Control Copracessoi D M FCO 31 26 25 21 20 16 15 11 10 0 COPO DMF rt rd 0 010000 00001 000 0000 0000 6 5 5 5 11 Format DMFOO rt rd Description Operation The contents of coprocessor register rd of the CPO are loaded into general register rt This operation is defined for the R10000 operating in 64 bit mode and in 32 bit kernel mode Execution of this instruction in 32 bit user or supervisor mode causes a reserved instruction exception All 64 bits of the general register destination are written from the coprocessor register source The operation of DMFCO on a 32 bit coprocessor 0 register is undefined T data CPR 0 rd T 1 GPR rt data Exceptions Coprocessor unusable exception Reserved instruction exception R10000 in 32 bit user mode R10000 in 32 bit supervisor mode Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 291 14 29 DMTCO Instruction Doubl d M T D MTCO System Control Coprocessor D MTCO 0 000 0000 0000 11 Format
137. Invalidate S 00111 Index Load Tag S 01011 Index Store Tag S 10011 Hit Invalidate S 10111 Hit WriteBack Invalidate S 11011 Index Load Data S 11111 Index Store Data S MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 288 Chapter 14 CACHE ieontnuied CACHE Fill Create Dirty Hit WriteBack and Hit Set Virtual are not supported in the R10000 processor The R10000 processor adds two new CacheOps Index Load Data 1105 and Index Store Data 1115 These changes are also reflected in the CPO TagHi TagLo and ECC registers The primary instruction and data caches have a block size of 16 words and 32 bytes 8 data words respectively NOTE A 32 bit instruction is predecoded into a 36 bit instruction word before entering the primary instruction cache The instruction fetch addresses remain the same and are not affected by the predecode The secondary cache a unified cache has a block size of either 64 or 128 bytes configurated during reset For a cache of DERESE bytes with JPLOCKSIZE bytes per tag VACACHESIZE 2 BLOCKSIZE specifies the block for the primary cache and PACACHESIZE 2 BLOCKSIZE specifies the block for the secondary cache For the Index CacheOps address bit 0 is used to specify the way 0 or 1 for the CacheOp For this reason bit 0 is not checked for alignment type Address Error exception for the Index CacheOps For CacheOps that access data in caches VABLOCKSI
138. January 29 1997 I 16 rules 109 block write request protocol 117 buffers 89 bus encoding description of buses 95 SysAD 102 SysCmd 95 SysResp 105 SysState 104 cached request buffer 89 clock domain 156 cluster bus 82 cluster request buffer 89 coherency 141 coherency conflicts action taken 143 connecting to an external agent 81 connections to various system configurations 83 directory based coherency protocol 153 error handling on buses 181 on SysAD bus 182 on SysCmd bus 181 on SysResp bus 184 on SysState bus 184 schemes 180 error protection for buses 180 schemes 180 external agent 79 external allocate request number request protocol 134 external block data response protocol 127 external coherency requests action taken 142 external completion response protocol 130 external data response flow control 93 94 external double single partial word data response protocol 129 external duplicate tags support for 152 external interrupt request protocol 136 external intervention exclusive request 141 external intervention request protocol 133 external intervention shared request 141 external invalidate request 141 protocol 135 external request 80 87 flow control 93 protocol 132 external response 80 87 protocol 127 Version 2 0 of January 29 1997 Index flow control 93 frequencies 80 grant parking 108 hardware emulation support for 154 1 0 152 incoming buffer 90 internal coherency conflicts 143 in
139. LU The integer queue controls six dedicated ports to the integer register file two operand read ports and a destination write port for each ALU Floating Point Queue The floating point queue issues instructions to the floating point multiplier and the floating point adder The floating point queue contains 16 instruction entries Up to four instructions may be written during each cycle newly decoded floating point instructions are written into empty entries in random order Instructions remain in this queue only until they have been issued to a floating point execution unit The floating point queue controls six dedicated ports to the floating point register file two operand read ports and a destination port for each execution unit The floating point queue uses the multiplier s issue port to issue instructions to the square root and divide units These instructions also share the multiplier s register ports The floating point queue contains simple sequencing logic for multiple pass instructions such as Multiply Add These instructions require one pass through the multiplier then one pass through the adder MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 12 Address Queue Version 2 0 of January 29 1997 Chapter 1 The address queue issues instructions to the load store unit The address queue contains 16 instruction entries Unlike the other two queues the address queue is organized as a circ
140. MI bit is set to 1 during a Soft Reset the NMI bit is set to 0 Processing When a Soft Reset exception occurs the SR bit of the Status register is set distinguishing this exception from a Cold Reset exception When a Soft Reset is detected the processor initializes minimum processor state This allows the processor to fetch and execute the instructions of the exception handler which in turn dumps the current architectural state to external logic Hardware state that loses architectural state is not initialized unless it is necessary to execute instructions from unmapped uncached space that reads the registers TLB and cache contents The Soft Reset can begin on an arbitrary cycle boundary and can abort multicycle operations in progress so it may alter machine state Hence caches memory or other processor states can be inconsistent data cache blocks may stay at the refill state and any cached loads stores to these blocks will hang the processor Therefore CacheOps should be used to dump the cache contents After the processor state is read out the processor should be reset with a Cold Reset sequence t Soft Reset is also known colloquially as Warm Reset MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 338 Servicing Version 2 0 of January 29 1997 Chapter 17 A Soft Reset exception preserves the contents of all registers except for ErrorEPC register which contains the PC ERL bit of
141. MIPS R10000 Microprocessor User s Manual Version 2 0 Copyright 1996 MIPS Technologies Inc ALL RIGHTS RESERVED U S GOVERNMENT RESTRICTED RIGHTS LEGEND Use duplication or disclosure by the Government is subject to restrictions as set forth in FAR 52 227 19 c 2 or subparagraph c 1 ii of the Rights in Technical Data and Computer Software clause at DFARS 252 227 7013 and or in similar or successor clauses in the FAR or the DOD or NASA FAR Supplement Contractor manufacturer is Silicon Graphics Inc 2011 N Shoreline Blvd Mountain View CA 94039 7311 RISCompiler RISC os R2000 R6000 R4000 R4400 and R10000 are trademarks of MIPS Technologies Inc MIPS and R3000 are registered trademarks of MIPS Technologies Inc UNIX is a registered trademark in the United States and other countries licensed exclusively through X Open Company Ltd MIPS Technologies Inc 2011 North Shoreline Mountain View California 94039 7311 http www mips com Acknowledgments This book represents a consortium of efforts and is principally derived from material provided by Randy Martin Yung Chin Chen and Ken Yeager Thanks also to Randy for his many painstaking reviews of this manual Also providing invaluable service were the following Shabbir Latif for once again running point between Engineering and Publications answering questions and presenting tutorials to clarify the complicated details of the R10000 processor operati
142. MIPS R10000 Microprocessor User s Manual CPU Exceptions 333 17 3 TLB Refill Vector Selection In all present implementations of the MIPS III ISA there are two TLB refill exception vectors e one for references to 32 bit address space TLB Refill one for references to 64 bit address space XTLB Refill Table 17 2 lists the exception vector addresses The TLB refill vector selection is based on the address space of the address user supervisor or kernel that caused the TLB miss and the value of the corresponding extended addressing bit in the Status register UX SX or KX The current operating mode of the processor is not important except that it plays a part in specifying in which address space an address resides The Context and XContext registers are entirely separate page table pointer registers that point to and refill from two separate page tables however these two registers share BadVPN2 fields see Chapter 14 for more information For all TLB exceptions Refill Invalid TLBL or TLBS the BadVPN2 fields of both registers are loaded as they were in the R4400 In contrast to the R10000 the R4400 processor selects the vector based on the current operating mode of the processor user supervisor or kernel and the value of the corresponding extended addressing bit in the Status register UX SX or KX In addition the Context and XContext registers are not implemented as entirely separate registers the PTEbase fields are
143. Move MOV and conditional move MOVC MOVN MOVZ are included in these arithmetic operations although no arithmetic is actually performed MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 306 When FR 1 floating point load and stores operate as follows Chapter 15 e Single precision operands are read from the low half of a register leaving the upper half ignored Single precision results are written into the low half of the register The high half of the result register is architecturally undefined in the R10000 implementation it is set to Zero e Double precision arithmetic operations use the entire 64 bit contents of each operand or result register Because of register renaming every new result is written into a temporary register and conditional move instructions select between a new operand and the previous old value The high half of the destination register of a single precision conditional move instruction is undefined shown in Figure 15 5 even if no move occurs Singleword and doubleword loads and stores with the FPU in 32 register mode FR 1 are shown in Figure 15 4 FR 1 Doubleword Load Store 63 0 Memory or 64 bit register LDC1 ft address DMTC1 ft rs 63 0 Load 64 bit Value DMFC1 rt fs SDC1 ft address 63 0 Memoryt or 64 bit register 32 Register Mode Singleword Load Store SWC1 ft address 63 32 31
144. NACK or ERR completion response the secondary cache block state remains Invalid The processor issues a processor block read request with a single address cycle The address cycle consists of the following e negating SysCmd 11 e driving a free request number on SysCmd 10 8 e driving the block read command on SysCmd 7 5 e driving the read cause indication on SysCmd 4 3 driving the secondary cache block former state on SysCmd 2 1 e asserting SysCmd 0 e driving the target indication on SysAD 63 60 driving the secondary cache block way on SysAD 57 driving the physical address on SysAD 39 0 e asserting SysVal The processor may only issue a processor block read request address cycle when the following are true e the System interface is in master state e SysRdRdy was asserted two SysClk cycles earlier e there is no conflicting entry in the outgoing buffer the maximum number of outstanding processor requests specified by the PrcReqMax mode bits is not exceeded e there is a free request number e the processor is not the target of a conflicting outstanding external coherency request MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 114 Chapter 6 A single processor may have as many as four processor block read requests outstanding on the System interface at any given time Figure 6 10 depicts four processor block read requests Since the System interface is initially
145. Not by load store or move Enable bits enable an exception if the corresponding Cause bit is set Flag bits are set whenever the corresponding Cause bit is a 1 These bits are cumulative Once a bit is set it remains set until the FSR is written by a CTC1 instruction E Unimplemented operation This exception is always enabled IEEE 754 Exception bits The following bits may be individually enabled V Invalid operation Z Division by zero Divide unit only O Overflow U Underflow I Inexact operation Result can not be stored precisely Round Mode RM IEEE specification 0 RN Round to nearest representable value If two values are equally near set the lowest bit to zero 1 RZ Round toward Zero Round to the closest value whose magnitude is not greater than the result 2 RP Round to Plus Infinity Round to the closest value whose magnitude is not less than the result 3 RM Round to Minus Infinity Round to the closest value whose magnitude is not greater Figure 15 7 Floating Point Status Register FSR MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 310 Chapter 15 Bit Descriptions of the FSR Description of the bits in the FSR are as follows Condition Bits 31 25 23 The Condition bits indicate the result of floating point compare instructions The active list keeps track of these bits Cause Bits 17 12 Each functional unit can detect exceptional cases in their function codes
146. OFC ALO OJ CO A aS ocoojo AHl OoO OF OC ALO OF OO HT Oo OF Oo HATO OF oO ALO oF oO Aloo e e Bek EE EAE Be Tj e O Ole Ho OFF ALO OFF ALO of RFP ALo Oe ALo Ol Aloo oO nj e e C ce e ce ce Address Error Uncached Accelerated Address Error Uncached Accelerated Address Error Uncached Accelerated Address Error Uncached Accelerated Address Error Reserved Address Error Cacheable Exclusive Write Address Error Cacheable Exclusive Ox Ox 0X Ox Ox 0X 0X OX 0X 0X 0X Ox 0X 0X 0X Ox Ox Ox Ox OX Ox Ox Ox Ox Ox Ox OX Ox 100 OFF 000 100 OFF 000 100 OFF 000 100 OFF 000 100 OFF 000 100 OFF 000 100 OFF 000 FFFFFFFF 00000000 BEL EE BSE 00000000 FFFFFFFF 00000000 FFFFFFFF 00000000 RGEDBZSEOBNEBSESE 00000000 FFFFFFFF 00000000 FFFFFFFF 00000000 BGESBUECOBRORSBUE 00000000 FFFFFFFF 00000000 BEL EE BBS 00000000 BUR ESR RE BORE 00000000 FFFFFFFF 00000000 EGECBZEVERGESRSE 00000000 Boob BSB Bb PE 00000000 3 N Address Error Cacheable Noncoherent Address Error Uncached Address Error Uncached Address Error Uncached Address Error Uncached Address Error Reserved Address Error Reserved t Accessing a reserved space results in undefined behavior Figure 16 4 xkphys Virtual Address Space MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 5 326 Chapter 16 64 b
147. P HR ERE ttt pir de end 246 Status Register Fields ie eee ine e tene en ted hne gent 248 Diagnostic Status Field isesi epini ieena e eee eo Erai e aee EPE aa TE tnter 249 Coprocessor Accessibility niei aE oean ea E e ERA A OAE eene 251 Cause Register 19 ene ons teer eh tee ege odii A be P n A 252 Exception Program Counter 14 sse nennen 254 Processor Revision Identifier PRId Register 15 ssessssseeeseeeeeennen 255 Config Register 16 oie en anter e AENA rnae Sa AREAS ES A SERERA yadeanensfussiane estate 256 Load Linked Address LLAddr Register 17 sse 257 WatchLo 18 and WatchHi 19 Registers sssssssssseeeeeee eene e enn 258 XContext Kegister 20 ute ORI e RE 259 FrameMask Register 21 iier ort tret e t eerte ente er eid irn nd n 260 Diagnostic Register 22 gunna e a e a E nenennn 261 Performance Counter Registers 25 seen e eene nnns 264 ECC Resister 26 fess ceases aal Sean 273 CacheErr Register 27 ignateetn A hdd trae ana ueneno adipe tende 274 CacheErr Register Format for Primary Instruction Cache Errors 274 CacheErr Register Format for Primary Data Cache Errors sssssssssss 275 CacheErr Register Format for Secondary Cache Errors 276 CacheErr Register Format for System Interface Errors 277 TagLo 28 and TagHi 29 Registers sees e eene e eene nnns 278 CacheOp is Index Loa
148. PCIKk cycle MIPS R10000 Microprocessor User s Manual Coprocessor 0 267 Event 1 for Counter 0 Instructions Issued The counter is incremented on each cycle by the sum of the three following events e Integer operations marked as done on the cycle 0 1 or 2 such operations can be marked on each cycle Since these operations all except for MUL and DIV are marked done on the cycle following their being issued to a functional unit this number is nearly identical to the number issued The only difference is that re issues are not counted e Floating point operations marked done in the active list Possible values are 0 1 or 2 Since these operations take more than one cycle to complete it is possible for an instruction to be issued and then aborted before it is counted due to a branch misprediction or exception rollback e Load store instructions first issued to the address calculation unit on the previous cycle Possible values are 0 or 1 Prefetch instructions are counted as issued Load store instructions are counted as being issued only once even though they may have been issued more than one time Any instruction which does not go to the load store unit integer functional unit or FP functional is counted Some of those not counted are nops bc1 f t fl tl break syscall j jal jr jalr cp0 instructions Event 1 for Counter 1 Instruction Graduation The co
149. PN2 0 44 19 4 Figure 14 4 Context Register Format The 0 field in Table 14 5 is revised Table 14 5 Context Register Fields Field Description This field is written by hardware on a miss It contains BadVPN2 the virtual page number VPN of the most recent virtual address that did not have a valid translation Reserved Must be written as zeroes and returns zeroes x when read This field is a read write field for use by the operating system It is normally written with a value that allows the operating system to use the Context register as a pointer into the current PTE array in memory PTEBase The 19 bit BadVPN2 field contains bits 31 13 of the virtual address that caused the TLB miss bit 12 is excluded because a single TLB entry maps to an even odd page pair For a 4 Kbyte page size this format can directly address the pair table of 8 byte PTEs For other page and PTE sizes shifting and masking this value produces the appropriate address MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 242 Chapter 14 14 5 PageMask Register 5 The PageMask register is a read write register used for reading from or writing to the TLB it holds a comparison mask that sets the variable page size for each TLB entry as shown in Table 14 6 Format of the register is shown in Figure 14 5 TLB read and write operations use this register as either a source or a destination when virtual addresses ar
150. R10000 processor ANDES architecture 4 caches 4 execution pipelines 6 overview 4 pipeline stages 5 superscalar pipeline 5 R4000 superpipeline 3 Random entries 243 Random register 238 RE reverse endian bit 247 Read Indexed TLB Entry instruction 285 read port FPU 302 read sequences 68 16 word 71 32 word 71 4 word 69 8 word 70 tag 72 reference voltage 217 Dc 212 register BadVAddr 241 244 259 340 boundary scan JTAG 206 bypass JTAG 205 CacheErr 171 172 174 175 274 Cause 105 106 244 252 254 Compare 106 244 Config 256 Context 241 259 Count 106 244 MIPS R10000 Microprocessor User s Manual Index CPO description of 235 dependency 14 375 Diagnostic 261 ECC 69 74 273 EntryHi 245 EntryLo0 239 EntryLol 239 EPC 254 Error Exception Program Counter ErrorEPC 284 Exception Program Counter EPC 254 file FPU 302 ports 375 FrameMask 240 260 Index 237 instruction JTAG 205 LLAddr 257 logical see also physical register 17 375 PageMask 242 328 Performance Counter 264 permanent 372 physical see also logical register 17 375 Processor Revision Identifier PRId 255 Random 238 renaming 14 372 Status 171 172 ERL bit 316 EXL bit 316 SX bit 326 TS bit 330 USL field 316 UX bit 326 TagHi 69 74 278 TagLo 69 74 278 temporary 372 unnamed 373 WatchHi 258 WatchLo 258 Wired 238 243 write before reading necessity for 160 XContext 259 renaming register 372 repeat rate 29 accessing secondary
151. RL 0 and UX 0 UX 1 Ox FFFF FFFF Ox FFFF FFFF FFFF FFFF Address Error Ox 8000 0000 Ox 0000 1000 0000 0000 Ox 7FFF FFFF Ox 0000 OFFF FFFF FFFF useg epee xuseg Ox 0000 0000 0x 0000 0000 0000 0000 Figure 16 1 User Mode Virtual Address Space The User segment starts at address 0 and the current active user process resides in either useg in 32 bit mode or xuseg in 64 bit mode The TLB identically maps all references to useg xuseg from all modes and controls cache accessibility Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Memory Management 319 32 bit User Mode useg In User mode when UX 0 in the Status register User mode addressing is compatible with the 32 bit addressing model shown in Figure 16 1 and a 2 Gbyte user address space is available labelled useg All valid User mode virtual addresses have their most significant bit cleared to 0 any attempt to reference an address with the most significant bit set while in User mode causes an Address Error exception The system maps all references to useg through the TLB and bit settings within the TLB entry for the page determine the cacheability of a reference 64 bit User Mode xuseg In User mode when UX 1 in the Status register User mode addressing is extended to the 64 bit model shown in Figure 16 1 In 64 bit User mode the processor provides a single uniform virtual address space of 2 bytes labelled xuseg All
152. Req SysGnt SysRel SysAD 63 0 SysRespVal i i es ooms le gt 100ms le gt 100ms Figure 8 2 Cold Reset Sequence Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Initialization 163 8 4 Soft Reset Sequence A Soft Reset sequence is used to reset the external interface of the processor without altering the mode bits while power and SysClk are stable The Soft Reset sequence is as follows e The external agent negates SysGnt and SysRespVal e After waiting at least one SysCIk cycle the external agent asserts SysReset for at least 16 SysClk cycles e The external agent must retain mastership of the System interface refrain from issuing external requests or nonmaskable interrupts and ignore system state bus until the processor asserts SysReq The assertion of SysReq indicates the processor is ready for operation In a cluster arrangement all processors must assert SysReq indicating they are ready for operation During a Soft Reset sequence all external interface state is initialized The internal and secondary cache clocks are not affected by a Soft Reset sequence The general purpose CPO and CP1 registers are preserved as well as the primary and secondary caches A Soft Reset sequence causes a Soft Reset exception in which the Soft Reset exception handler executes instructions from uncached space and uses CACHE instructions to analyze and dump the contents of the primary and seco
153. S stores a quadword of data and 10 check bits into the secondary cache data array It stores a doubleword of data from CPO TagHi and TagLo and pads the remaining doubleword with zeroes This allows the ECC and parity which are based on the quadword to be valid for the doubleword of data stored The address of the quadword stored is defined by the PA of the CACHE instruction and the way is defined by PA 0 The data stored in the non padded doubleword comes from CP0 TagHi and TagLo The check bits are stored from ECC 9 0 The tag array including the MRU bit is left unchanged MIPS R10000 Microprocessor User s Manual 11 JTAG Interface Operation The JTAG interface is implemented according to the standard IEEE 1149 1 test access port protocol specifications Errata The JTAG interface accesses the JTAG controller and instruction register as well as a boundary scan register The TAG operation does not require DCOk to be asserted or SysClk to be running however if DCOk is asserted the SysCIk must run at the specified minimum frequency or the core logic may be damaged MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997203 204 Chapter 11 11 1 Test Access Port TAP TAP Controller Input Version 2 0 of January 29 1997 The test access port TAP consists of four interface signals These signals are used to control the serial loading and unloading of instructions and test data as well as to exe
154. SCCIk ANANIN AINI NA IANS NINSI NINI IO LEE IR SC A B Adar 18 0 Xam hat Kt 1 ECCE SCTagLSBAddr 1 X x SC A B DWay eae quo SCData 127 0 XDaXOyDaXiX SCDataChk 9 0 3 3 X X x SC A BIDOE E ee JE GANE EE SC A B DWr SC A B DCS i i SCTWay X X X X X SCTag 25 0 5 X Tag X Tak X SCTagChk 6 0 1 X X X SCTOE RAMS AM EE ERE SCTWr SCTCS Figure 5 4 8 Word Read Sequence Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Secondary Cache Interface 71 16 or 32 Word Read Sequence Cycle SCCIk r SC A B Addr 18 0 SCTagLSBAddr SC A B DWay SCData 127 0 SCDataChk 9 0 SC A B DOE SC A B DWr SC A B DCS SCTWay SCTag 25 0 SCTagChk 6 0 SCTOE SCTWr SCTCS A 16 word read sequence refills the primary instruction cache from the secondary cache after a primary instruction cache miss A 16 word read sequence is also performed when the secondary cache block size is 16 words and a DirtyExclusive secondary cache block must be written back to the System interface A 32 word read sequence is performed when the secondary cache block size is 32 words and a DirtyExclusive secondary cache block must be written back to the System interface Figure 5 5 depicts a secondary cache 16 or 32 word read sequence This is similar to an 8 word read sequence except that more addresses must be issued in order to read the appropriate
155. SCTWay SCTag 25 0 SCTagChk 6 0 SCTOE SCTWr SCTCS MIPS R10000 Microprocessor User s Manual A tag write sequence updates the secondary cache tag array without affecting the data array This sequence is used for the following e to reflect primary cache state changes in the secondary cache for external coherency requests e for the CACHE Index Store Tag S instruction Figure 5 10 depicts the secondary cache tag write protocol Ree BB OB Oe AO AD 2 13 14 15 16 17 EE al NEN Na VE aN LN KATO i i i i i i i i i i i i a ae Se E NES NES I Fk T l l l l Figure 5 10 Tag Write Sequence Version 2 0 of January 29 1997 78 Chapter 5 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 6 System Interface Operations The R10000 System interface provides a gateway between processor with its associated secondary cache and the remainder of the computer system For convenience any device communicating with the processor through the System interface is referred to as the external agent MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 199779 80 Chapter 6 6 1 Request and Response Cycles The System interface supports the following request and response cycles e Processor requests are generated by the processor when it requires a system resource e E
156. Stage 3 is also the start of each of the five execution pipelines Stages 4 6 In stages 4 through 6 instructions are executed in the various functional units These units and their execution process are described below Floating Point Multiplier 3 stage Pipeline Single or double precision multiply and conditional move operations are executed in this unit with a 2 cycle latency and a 1 cycle repeat rate The multiplication is completed during the first two cycles the third cycle is used to pack and transfer the result Floating Point Divide and Square Root Units Single or double precision division and square root operations can be executed in parallel by separate units These units share their issue and completion logic with the floating point multiplier Floating Point Adder 3 stage Pipeline Single or double precision add subtract compare or convert operations are executed with a 2 cycle latency and a 1 cycle repeat rate Although a final result is not calculated until the third pipeline stage internal bypass paths set a 2 cycle latency for dependent add or multiply instructions Integer ALU1 1 stage Pipeline Integer add subtract shift and logic operations are executed with a 1 cycle latency and a 1 cycle repeat rate This ALU also verifies predictions made for branches that are conditional on integer register values Integer ALU2 1 stage Pipeline Integer add subtract and logic operations are executed with
157. SysAD 63 60 driving the secondary cache block way on SysAD 57 driving the physical address of the eliminated secondary cache block on SysAD 39 0 asserting SysVal The processor may only issue a processor eliminate request address cycle when the following are true MIPS R10000 Microprocessor User s Manual the System interface is in master state SysWrRdy was asserted two SysCIk cycles previously the PrcElmReq mode bit is asserted the processor is not the target of a conflicting outstanding external coherency request Version 2 0 of January 29 1997 124 Cycle SysCIk Master SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Version 2 0 of January 29 1997 Chapter 6 Figure 6 15 depicts three processor eliminate requests Since the System interface is initially in slave state the processor must first assert SysReq and then wait until the external agent relinquishes mastership of the System interface by asserting SysGnt and SysRel 1 2 3 4 5 6 7 8 9 10 BL 12 13 14 15 16 ELE Bo eo ee 8 ge a8 pig ud aea asa c peat at PEAS EAS EA OCEA X e sop Foro Ps Jace Pose Poi Pat Poni Foe Per cu Ee i j i N i i i CE jCER X A A jen k X Adr X Adr X i i X Adr X Figure 6 15 Processor Elimin
158. System Interface Secondary Cache Ctlr 19109 S e Ta oT 128 bit refil 128 bit refill or writeback g 9 Instruction Cache Data Cache 2647 _ S o 32 Kbytes 32 Kbytes tJ 292 2 way Set Associative 2 way Set Associative Dat FP O E 5 16 word blocks S Ban a E 5 i o Unaligned access 8 word blocks 128410 E 3 tt o EE Four 32 bit instr fetch Addr 64 bitload or store i Secondary Cache 512 Kbytes to 16 Mbytes Synchronous Static RAM 4 Mbyte cache requires Address TLB ten 256Kx18 bit Queue RAM chips P Integer h Queue System Bus 64 bit data 8 bit check 12 bit command gt Adr Calc gt ALU1 Registers A o o o E st gt ALU2 Instruction Decode Register Mapping Adder Registers R10000 Figure 1 5 Block Diagram of the R10000 Processor Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor Functional Units The five execution pipelines allow overlapped instruction execution by issuing instructions to the following five functional units two integer ALUs ALU1 and ALU2 the Load Store unit address calculate the floating point adder the floating point multiplier There are also three iterative units to compute more complex results Integer multiply and divide operations are performed by an Integer Multiply Divide executi
159. TCA signal 43 213 TCB signal 43 213 temporary register 372 test access port TAP 204 test interface signals see also individual signals 43 test mode cache 361 362 test signals miscellaneous 43 Timer interrupt 106 disabling 244 TLB 329 32 bit mode entry format 329 64 bit mode entry format 329 address translation avoiding multiple matches 330 ASID field 330 avoiding conflict 330 Cache Algorithm fields 329 entry formats 329 exceptions 341 Global G bit 330 ITLB 330 misses 241 multiple matches avoiding 330 number of entries 320 page size code 328 used with Context register 241 TLB Translation Lookaside Buffer 7 JTLB 330 TLB Invalid exception 189 341 343 TLB Modified exception 341 344 TLB Probe TLBP instruction 237 245 MIPS R10000 Microprocessor User s Manual TLB Read TLBR instruction 237 TLB Read Indexed TLBR instruction 245 TLB Refill 333 TLB Refill exception 189 341 342 TLB Write Indexed TLBWI instruction 237 245 TLB Write Random instruction 238 245 TLBP instruction 207 TLBR instruction 298 TLBWI instruction 209 TLBWR instruction 300 Translation Look Aside Buffer see also TLB 320 translation virtual address 328 330 Trap exception 348 trap physical address and Watch registers 258 TriState signal 43 206 TS TLB shutdown bit 249 250 TS bit in Status register 330 two level cache structure 45 U UC uncached attribute bit 239 uncached accelerated block
160. Tag I reads the primary instruction cache tag fields into the CPO TagLo and TagHi registers VA 13 6 defines the address and VA 0 defines the way of the tag to be read All parity errors caused by Index Load Tag I are ignored The following mapping defines the operation TagLo 0 Tag parity bit TagLo 2 State parity bit TagLo 3 LRU bit TagLo 6 State bit TagLo 31 8 Tag 35 12 TagHi 3 0 Tag 39 36 All other CPO TagLo and TagHi bits are set to 0 10 6 Index Load Tag D Index Load Tag D reads the primary data cache tag fields into the CPO TagLo and TagHi registers VA 13 5 defines the address and V A 0 defines the way of the tag to be read All parity errors caused by Index Load Tag D are ignored The following mapping defines the operation TagLo 0 Tag parity bit TagLo 1 SCWay TagLo 2 State parity bit TagLo 3 LRU bit TagLo 7 6 State bits TagLo 31 8 Tag 35 12 TagHi 3 0 Tag 39 36 TagHi 31 29 StateMod bits All other CPO TagLo and TagHi bits are set to 0 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual CACHE Instructions 10 7 Index Load Tag S 195 Index Load Tag S reads the secondary cache tag fields into the CPO TagLo and TagHi registers The PA Cachesize 2 Blocksize defines the address and PA 0 defines the way to be read All parity and ECC errors caused by Index Load Tag D are ignored The following mapping defines the operat
161. The FrameMask register is set to 0 e Branch prediction bits are set to 0 e Performance Counter register Event field is set to 0 e All pending cache errors delayed watch exceptions and external interrupts are cleared The Cold Reset exception is serviced by e initializing all processor registers coprocessor registers caches and the memory system e performing diagnostic tests e bootstrapping the operating system t If SysGnt remains deasserted high while SysReset is asserted the processor interprets this as a Soft Reset exception MIPS R10000 Microprocessor User s Manual CPU Exceptions 337 Soft Reset Exception Cause The Soft Reset exception occurs in response to a Soft Reset See Chapter 8 the section titled Soft Reset Sequence A Soft Reset exception is not maskable The processor differentiates between a Cold Reset and a Soft Reset as follows e A Cold Reset occurs when the SysGnt signal is asserted while the SysReset signal is also asserted e A Soft Reset occurs if the SysGnt signal remains negated when a SysReset signal is asserted In R4400 processor there is no way for software to differentiate between a Soft Reset exception and an NMI exception In the R10000 processor a bit labelled NMI has been added to the Status register to distinguish between these two exceptions Both Soft Reset and NMI exceptions set the SR bit and use the same exception vector During an NMI exception the N
162. This section presents the encoding of the following four System interface buses e SysCmd 11 0 e SysAD 63 0 e SysState 2 0 e SysResp 4 0 SysCmd 11 0 Encoding This section describes address and data cycle encodings for the system command bus SysCmd 11 0 SysCmd 11 Encoding When SysVal is asserted SysCmd 11 indicates whether the SysAD 63 0 bus represents an address or a data cycle as shown in Table 6 2 Table 6 2 Encoding of SysCmd 11 SysCmd 11 Data Address Cycle Indication 0 SysAD 63 0 address cycle 1 SysAD 63 0 data cycle SysCmd 10 0 Address Cycle Encoding During the address cycle of processor read and upgrade requests SysCmd 10 8 contain the request number as shown in Table 6 3 The request number provides a mechanism to associate an external response with the corresponding processor request Table 6 3 Encoding of SysCmd 10 8 for Processor Read and Upgrade Requests SysCma 05 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 96 Version 2 0 of January 29 1997 Chapter 6 During the address cycle of processor requests SysCmd 7 5 contain the command as shown in Table 6 4 Table 6 4 Encoding of SysCmd 7 5 for Processor Requests SysCmd 7 5 Command 0 Coherent block read shared 1 Coherent block read exclusive 2 Noncoherent block read 3 Double single partial word read 4 Block write 5 Double single partial word write 6 Upgrade 7
163. UX 0 the address space 0x0000 0000 0000 0000 through 0000 0000 7FFF FFFF 2 Gbytes is selected Addressing the space ranging from 0000 0000 8000 0000 through 0000 OFFF FFFF FFFF will cause an address error The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address 64 bit Supervisor Mode Current Supervisor Space xsseg In Supervisor mode when SX 1 in the Status register and bits 63 62 of the virtual address are set to 01 the xsseg current supervisor virtual address space is selected The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space begins at virtual address 0x4000 0000 0000 0000 and runs through 0x4000 OFFF FFFF FFFF 64 bit Supervisor Mode Separate Supervisor Space csseg In Supervisor mode when SX 1 in the Status register and bits 63 62 of the virtual address are set to 11 the csseg separate supervisor virtual address space is selected Addressing of the csseg is compatible with addressing sseg in 32 bit mode The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space begins at virtual address OXFFFF FFFF C000 0000 and runs through OxFFFF FFFF DFFF FFFF MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 322 Kernel Mode Operations Chapter 16 The processor operates in Kernel mode when the Status re
164. V VIYILYN ONI S3 5010NHO3L SdIW 001 I G00 F JYV S3JONYYJTOL WdJ1 MO13MIV NOILdO Q3423dS 3SIMM3HIO SSIINN Z S3HONI NI JYV SNOISN3MIQ I NIN SNIJ OL 3ONvV3V31O SZG 9 SALON M3IA 3dIS HOLWW OL JYNLWa4 NOISQMIX3 NO NOISNSWIG SLL 3AZIHOV OL 3408 9 00S NJ 7009 Z00 9 G00 8 veZo Xv Cp B ee SSS N N oO O SSS s 2 L o ET L L EL 3 M s L CIN NL 4 Y rai Aeee 315 S seo H X Sly H ooz SLL HE OSLZ ogi bz 0001 I 010 X00ZZ 232 MIPS R10000 Microprocessor User s Manual Figure 13 4 599LGA Heatsink Version 2 0 of January 29 1997 233 Packaging S Add 31V ld V9 1669 4319108 ONI S3I9010NHO3L SdIN qQ3SOgdX3 NO 13 3JovJsns S SYSANYOO NI 1d30X3 Q3MOTIV ONVHYSAO ON HLM Lld OL LAD S3dIS H108 VSd NOldvM S00 S3dIS HLO QNnOsNV TIY S3903 Vv3s48 NOLdVM LNOHLIM 00L HSINIJ JdOOSNd 1Sf J HLIM 1331S Q31103 102 YO 1331S SS3INIVIS IVIJ31IVW S00 F J8NV S3ONVMH3IOL Q3l4l03dS 3SIMM3HIO SS3INn Z 2 JOVINAS SJHONI NI 33v SNOISN3MIQ I FHS 3AILOSLONd S310N LI HL OL Q31NnONW S roosje 92 2 9 SOS N d jJOQNVIS 9NIHONI10 413S Wad Xv C i M M oO O H o oO KE c a AE e GZL 00l GZS X X8 OSI Z SLZ O10 FOOL Z Figure 13 5 599LGA Bolster Plate Versi
165. When negated the block size is 16 words and when asserted the block size is 32 words MIPS R10000 Microprocessor User s Manual Secondary Cache Interface 61 5 2 Secondary Cache Interface Frequencies The secondary cache interface operates at the frequency of SCCIk which is derived from PClk The SCCIkDiv mode bits select a PCIk to SCCIk divisor of 1 1 5 2 2 5 or 3 using the formula described in Chapter 7 the section titled Secondary Cache Clock Synchronization between the PCIk and SCCIk is performed internally and is invisible to the system The processor supplies six complementary copies of the secondary cache clock on SCCIk 5 0 and SCCIK 5 0 Errata The outputs and inputs at this interface are triggered by an internal SCCIK The relationship between the internal SCCIk and the external SCCIK 5 0 SCCIK 5 0 can be programmed during boot time by setting the SCCIKTap mode bits see the section titled Mode Bits in Chapter 8 for detail on mode bits MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 62 Chapter 5 5 3 Secondary Cache Indexing Indexing the Data Array Version 2 0 of January 29 1997 The secondary cache data array width is one quadword and therefore PA 3 0 which specify a byte within a quadword are unused by the Secondary Cache interface Since the maximum secondary cache size is 16 Mbytes 8 Mbytes per way each way requires a maximum of 23 bits to
166. ZE 1 2 specifies a word within a block for primary caches and PABLOCKSIZE 1 3 specifies a doubleword in the secondary cache A cache hit accesses the specified cache as normal data references and performs the specified operation if the cache block contains valid data at the specified physical address If the cache line is invalid or contains a differing physical address a cache miss no operation is performed Since the R10000 processor uses 2 way set associative caches the Hit operation performs tag comparison in both ways of the cache No index needs to be provided for such CacheOps If both ways register a hit the execution of the CacheOp is undefined Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 289 CACHE earinied CACHE Write back from the primary data cache goes to the secondary cache and write back from the secondary cache goes to the system interface The primary data cache is written back to the secondary cache before the secondary cache is written back to the system interface the address to be written is based by the cache tag rather than the translated PA from the CacheOp instruction A secondary cache write back also interrogates the primary data cache for any dirty inconsistent data When a line is invalidated in the secondary cache all subset lines in the primary caches are also invalidated CacheOps are serialized with respect to cached loads stores and CPO instructi
167. a 0 into it As in the R4400 Reset Soft Reset NMI exceptions also clear the TS bit e The NMI bit is new to the R10000 processor it distinguishes between Soft Reset and NMI exceptions Both exceptions set the SR bit to 1 the NMI exception sets the NMI bit to 1 whereas the Soft Reset exception sets it to 0 e The CE bit is reserved in the R10000 processor and should be a 0 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 250 Version 2 0 of January 29 1997 24 2 Chapter 14 23 17 16 22 21 20 19 18 pete ts oe Jw ow ce oe 1 1 1 1 1 1 1 Figure 14 12 Diagnostic Status Field Table 14 11 Status Register Diagnostic Status Bits Bit Description Controls the location of TLB refill and general exception BEV vectors 0 gt normal 1 bootstrap This bit is set when a TLB write presents an entry that matches any other virtual page entry in the TLB Should this occur any TLB entries that allow multiple matches even in the TS Wired area are invalidated before this new entry can be written into the TLB This prevents multiple matches during address translation 0 normal 1 TLB shutdown has occurred SR 1 Indicates a Soft Reset or NMI exception 1 Indicates a nonmaskable interrupt has occurred Used to NMI distinguish between a Soft Reset and a nonmaskable interrupt in a Soft Reset exception Hit tag match and valid state or miss indication for last CACHE Hit Invalid
168. a 1 V ns edge rate in the 20 to 80 range of the input signal swing MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 216 Chapter 12 Enable Output Delay Setup Hold Time Asynchronous Inputs Version 2 0 of January 29 1997 Table 12 4 lists the delay setup and hold times for the HTSL version of the processor Table 12 4 AC Characteristics for HSTL Configuration HSTL Minimum Maximum Output delay Setup Hold Table 12 5 lists the delay setup and hold times for the CMOS TTL version of the processor Table 12 5 AC Characteristics for CMOS TTL Configuration LVCMOS Minimum Maximum Output delay Setup Hold The SysReset input can be asserted asynchronously to SysClk but must be negated synchronously with SysClk adhering to the AC electrical specifications listed above MIPS R10000 Microprocessor User s Manual Electrical Specifications 217 12 3 Signal Integrity Issues In this section the following signal integrity considerations are described for a R10000 based system e Power Supply Regulation e Decoupling Capacitance e Reference Voltage e Maximum Input Voltage Levels Output I V Curves e Switching and Slew Rate Characteristics Reference Voltage Most input pins on the processor use a current mirror sense amp with Vref SC Sys supplied to the negative input to provide a single rail input receiver The following input pins are exceptions to thi
169. a 1 cycle latency and a 1 cycle repeat rate Integer multiply and divide operations take more than one cycle Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 19 Address Calculation and Translation in the TLB A single memory address can be calculated every cycle for use by either an integer or floating point load or store instruction Address calculation and load operations can be calculated out of program order Errata The calculated address is translated from a 44 bit virtual address into a 40 bit physical address using a translation lookaside buffer The TLB contains 64 entries each of which can translate two pages Each entry can select a page size ranging from 4 Kbytes to 16 Mbytes inclusive in powers of 4 as shown in Figure 1 6 Exponent Page Size 4 om 16 IT 64 cara 256 aa 1 I x 4 von 16 ees Virtual address VA 11 VA 15 VA PIN 1 6 TLB Page ps Load instructions have a 2 cycle latency if the addressed data is already within the data cache Store instructions do not modify the data cache or memory until they graduate MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 20 Chapter 1 1 7 Implications of R10000 Microarchitecture on Software The R10000 processor implements the MIPS architecture by using the following techniques to improve throughput e superscalar instruction issue speculative execution e non blocking ca
170. a single address cycle this address cycle consists of the following e negating SysCmd 11 driving a request number on SysCmd 10 8 e driving the intervention command on SysCmd 7 5 e driving the ECC check indication on SysCmd 0 e driving the target indication on SysAD 63 60 e driving the physical address on SysAD 39 0 e asserting SysVal An external agent may only issue an external intervention request address cycle when the System interface is in slave state typically a free request number is specified An external agent may have as many as eight external intervention requests outstanding on the System interface at any given time Figure 6 20 depicts three external intervention requests Since the System interface is initially in master state the external agent must first negate SysGnt and then wait until the processor relinquishes mastership of the System interface by asserting SysRel Cycle SysClk Master SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Figure 6 20 External Intervention Request Protocol MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 134 Chapter 6 External Allocate Request Number Request Protocol Cycle SysClk Master SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysW
171. access the entire xsuseg space and the 64 bit Kernel mode can access the entire xksseg and xkuseg spaces Access privileges in the R10000 are also dependent on the UX and SX bits e Access to the 64 bit user space in 64 bit Supervisor or Kernel mode xsuseg or xkuseg is controlled by the UX bit If UX 0 the 64 bit Supervisor and Kernel modes can only access the 32 bit user space suseg or kuseg e Access to the 64 bit supervisor space in Kernel mode xksseg is controlled by the SX bit If SX 0 the 64 bit Kernel mode can only access the 32 bit supervisor space ksseg An Address Error exception is taken on an illegal access The R10000 processor implements the same access privileges for 32 bit processor modes as in the R4400 The Table 16 2 summarizes the access privileges for all processor modes in the R10000 processor Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Memory Management Table 16 2 Access Privileges for User Supervisor and Kernel Mode Operations 64 bit Mode 64 bit Virtual Address Usert 32 bit Mode Supervisor Kernel Supervisor Kernel amp ERL 0 327 Kernel amp ERL 1 FFFFFFFF E0000000 TO FFFFFFFF FFFFFFFF FFFFFFFF C0000000 TO FFFFFFFF DFFFFFFF TO EFPEFFEEFPR BEPPERFEB FFFFFFFF 80000000 TO FFFFFFFF 9FFFFFFE COOOOFFF 00000000 TO FFFFFFFF 7FFFFFFE FFFFFFFF A0000000 C0000000 00000000 TO COO00FFE FFFFFFFF
172. ad the PC from the ErrorEPC and clear the ERL bit of the Status register SR gt Otherwise SR 0 load the PC from the EPC and clear the EXL bit of the Status register SR An ERET executed between a LL and SC also causes the SC to fail If there is no exception EXL 0 and ERL 0 in the Status register execution of an ERET instruction is meaningless Execution of an ERET when ERL 0 regardless of the state of EXL sets EXL to 0 anda jump is taken to the address presently held in the EPC register even when there is no exception Operation 32 64 T if SRo 2 1 then PC ErrorEPC SR SRs 3 0 SR o else PC EPC SR SR31 2 0 SRo endif LLbit 0 Exceptions Coprocessor unusable exception Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 293 14 31 MFCO Instruction M F M FCO System Control Coprocessor M FCO 0 000 0000 0000 11 Format MFCO rt rd Description The contents of coprocessor register rd of the CPO are loaded into general register rt Operation 32 T data CPRIO rd T 1 GPR rt data 64 T data CPR O rd T 1 GPR rt data34 datas o Exceptions Coprocessor unusable exception MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 294 Chapter 14 14 32 Move To From the Performance Counter Errata The R10000 processor defines two pe
173. ads except for SCCIk SCCLK and JTDO Select differential VCO for manufacturing test only SelDYCO This signal must be tied to Vcc Input MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 44 Chapter 3 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 4 Cache Organization and Coherency The processor implements a two level cache structure consisting of separate primary instruction and data caches and a joint secondary cache Each cache is two way set associative and uses a write back protocol that is two cache blocks are assigned to each set as shown in Figure 4 1 and a cache store writes data into the cache instead of writing it directly to memory Some time later this data is independently written to memory A write invalidate cache coherency protocol described later in this chapter is supported through a set of cache states and external coherency requests MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 199745 46 Chapter 4 4 1 Primary Instruction Cache The processor has an on chip 32 Kbyte primary instruction cache also referred to simply as the instruction cache which is a subset of the secondary cache Organization of the instruction cache is shown in Figure 4 1 The instruction cache has a fixed block size of 16 words and is two way set associative with a least recently used LRU replacement algorithm The instruction cache is index
174. adwords 7 bit ECC is made on tag words It allows connection to an external secondary cache that can range from 512 Kbytes to 16 Mbytes using external static RAMs The secondary cache can be organized into either 16 or 32 word blocks and is 2 way set associative Bit definitions are given in Chapter 3 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 11 1 4 Instruction Queues The processor keeps decoded instructions in three instruction queues which dynamically issue instructions to the execution units The queues allow the processor to fetch instructions at its maximum rate without stalling because of instruction conflicts or dependencies Each queue uses instruction tags to keep track of the instruction in each execution pipeline stage These tags set a Done bit in the active list as each instruction is completed Integer Queue The integer queue issues instructions to the two integer arithmetic units ALU1 and ALU2 The integer queue contains 16 instruction entries Up to four instructions may be written during each cycle newly decoded integer instructions are written into empty entries in no particular order Instructions remain in this queue only until they have been issued to an ALU Branch and shift instructions can be issued only to ALU1 Integer multiply and divide instructions can be issued only to ALU2 Other integer instructions can be issued to either A
175. ag and data sections of the two cache banks Load and store instructions begin with a tag check cycle which checks to see if the desired address is already in cache If it is not a refill operation is initiated and this instruction waits until it has completed Load instructions also read and align a doubleword value from the data array This access may be either concurrent to or subsequent to the tag check If the data is present and no dependencies exist the instruction is marked done in the queue e Third the address queue can issue store instructions to the data cache A store instruction may not modify the data cache until it graduates Only one store can graduate per cycle but it may be anywhere within the four oldest instructions if all previous instructions are already completed The access and store ports share four register file ports integer read and write floating point read and write These shared ports are also used for Jump and Link and Jump Register instructions and for move instructions between the integer and register files MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 13 1 5 Program Order and Dependencies From a programmer s perspective instructions appear to execute sequentially since they are fetched and graduated in program order the order they are presented to the processor by software When an instruction stores a new value in its destination register that new value is
176. agHi registers when the CacheOp is an Index Load Store Data e primary instruction cache operation e primary data cache operation e secondary cache operation Primary Instruction Cache Operation If the CacheOp is an Index Load Store Data for the primary instruction cache the TagHi register stores the most significant four bits of a 36 bit instruction as shown in Figure 14 31 the rest of the instruction is stored in the TagLo register 31 0 Inst 31 0 TagLo 32 31 43 0 0 Inst 35 32 TagHi 28 4 Figure 14 31 TagHi Lo Register Fields in Primary Instruction Cache When CacheOp is Index Load Store Data Errata 0 Reserved Must be written as zeroes and returns zeroes when read Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 283 Primary Data Cache Operation If the CacheOp is Index Load Store Data for primary data cache the TagHi register is not used The TagLo registers contains a 32 bit data word for the cache operation as shown in Figure 14 32 Data Word 31 0 TagLo 32 Not Used TagHi eo 32 Figure 14 32 TagHi Lo Register Fields in Primary Data Cache When CacheOp is Index Load Store Data Secondary Cache Operation If the CacheOp is Index Load Store Data for the secondary cache a doubleword of data is required for the CacheOp The TagHi register stores the upper 32 bits of the doubleword and the TagLo register stores the lower 32 bits as shown belo
177. al Characteristics Version 2 0 of January 29 1997 The 599CLGA has lands on a straight 1 27mm 050inch grid It is a cavity down multi layer ceramic package with an integral copper tungsten slug and is designed for use with a socket Preliminary information suggests that the 599CLGA can withstand a force of 100 kilograms applied to the CuW slug without damage and a PWB assembly should insure that this force is not exceeded Drawings for a reference LGA PWB assembly are included in this chapter MIPS R10000 Microprocessor User s Manual Packaging 221 Electrical Characteristics The 599CLGA uses multilayer construction incorporating stripline configuration for signals Multiple planes distribute power and ground throughout the package and provide built in distributed bypass coupling capacitance between the primary power supplies Vcc VccOSC VccOSys and Vss Pads are present on the package body for attaching chip capacitors to provide additional bypass capacitance between the primary power supplies and the PLL power supply VccPa and VssPa and to provide an additional PLL loop filter capacitor PLLRC Chip capacitors on the R10000 are assembled by the chip manufacturer Detailed electrical package characteristics will be provided by the MIPS Semiconductor Partners as they become available The data in Table 13 1 is provided as an estimate of the package parasitics These estimates include the effects of bondwires package traces
178. al coherency request address cycle address alignment Table 6 23 External Coherency Request Address Cycle Alignment Address Bits Which External Request Type Address Alignment Are Tpnored Intervention Quadword 3 0 Invalidate MIPS R10000 Microprocessor User s Manual 5 0 SCBlkSize 0 6 0 SCBlkSize 1 Secondary cache block Version 2 0 of January 29 1997 104 Chapter 6 SysAD 63 0 Data Cycle Encoding SysState 2 0 Encoding Version 2 0 of January 29 1997 During System interface data cycles when less than a doubleword is transferred on the SysAD 63 0 bus the valid byte lanes depend on the request address and the MemEnd mode bit For example consider the data cycle for a byte request whose address modulo 8 is 1 When MemEnd is negated little endian the SysAD 15 8 byte lane is valid When MemEnd is asserted big endian the SysAD 55 48 byte lane is valid The processor provides a processor coherency state response by driving the targeted secondary cache block tag quality indication on SysState 2 driving the targeted secondary cache block former state on SysState 1 0 and asserting SysStateVal for one SysClk cycle Table 6 24 presents the encoding of the SysState 2 0 bus when SysStateVal is asserted Table 6 24 Encoding of SysState 2 0 when SysStateVal Asserted SysState 2 Secondary cache block tag quality indication 0 Tag is good 1 Tag is bad SysState
179. al register physical register values are not overwritten until the physical register is placed in the free list previous values remain unchanged in the original physical registers and these previous values can be restored An instruction can be aborted up until the time it graduates and all register and memory values can be restored to a precise state following any exception This state is restored by unnaming the temporary physical registers assigned to subsequent instructions Registers are unnamed by writing the old destination register into the mapping table and returning the new destination register to the free list Unnaming is done in reverse program order in case a logical register was used more than once After renaming the register files contain only the permanent values which were created by instructions prior to the exception Once an instruction has graduated all previous values are lost A 11 Nonblocking Loads and Stores Loads and stores are nonblocking that is cache misses do not stall the processor All other parts of the processor may continue to work on non dependent instructions while as many as four cache misses are being processed t This same technique is used to reverse mispredicted speculative branches MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 A 374 Appendix A A 12 Speculative Branching Version 2 0 of January 29 1997 Normally about one of every six instruct
180. and interrogate operations The processor can execute tag checks data reads for load instructions or data writes for store instructions When the primary cache is refilled any required data can be streamed directly to waiting load instructions e The external interface can handle up to four non blocking memory accesses to secondary cache and main memory Main memory typically has much longer latencies and lower bandwidth than the secondary cache which make it difficult for the processor to mitigate their effect Since main memory accesses are non blocking delays can be reduced by overlapping the latency of several operations However although the first part of the latency may be concealed the processor cannot look far enough ahead to hide the entire latency Programmers may use pre fetch instructions to load data into the caches before it is needed greatly reducing main memory delays for programs which access memory in a predictable sequence MIPS R10000 Microprocessor User s Manual 2 System Configurations The R10000 processor provides the capability for a wide range of computer systems this chapter describes some of the uni and multiprocessor alternatives MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 199733 34 Chapter 2 2 1 Uniprocessor Systems Version 2 0 of January 29 1997 Ina typical uniprocessor system the System interface of the R10000 processor connects in a point to point fashion wi
181. and vias but not the sockets Table 13 1 R10000 599CLGA Electrical Characteristics Parameter Description Minimum Typical Maximum Lsig Effective signal inductance Mii Signal to signal mutual inductance Cig Signal loading capacitance C Signal to signal mutual capacitance 0 5pF Rig Signal resistance 1300mQ Zo Characteristic impedance Tpa Propagation delay 200ps The copper tungsten slug provided for thermal performance is hard connected to Vss to minimize EMI radiation from the package MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 222 Chapter 13 Thermal Characteristics The 599CLGA incorporates a copper tungsten slug to provide an efficient thermal path from the processor to the heatsink The thermal analysis listed in Table 13 2 gives a preliminary indication of heatsink requirements for the 599CLGA Table 13 2 R10000 599CLGA Thermal Characteristics Preliminary Parameter Description Value i PR10000 Maximum power dissipation 30 watts ant f Oea is used as an example to calculate the ambient temperature T needed Errata Revised Table 13 2 System designers must take care especially in desktop applications to ensure sufficient airflow and heat dissipation surface area to meet the required case to ambient thermal resistance Oca The thermal interface between the package and heatsink is very important Typically grease or compliant mat
182. andom register is set to the value of the upper bound upon system reset This register is also set to the upper bound when the Wired register is written Figure 14 2 shows the format of the Random register Table 14 3 describes the Random register fields Random Register 31 6 5 0 Po i 26 6 Figure 14 2 Random Register Table 14 3 Random Register Field Descriptions Field Description Random TLB Random index Reserved Must be written as zeroes and returns zeroes when read 0 MIPS R10000 Microprocessor User s Manual Coprocessor 0 239 14 3 EntryLo0 2 and EntryLol 3 Registers The EntryLo register consists of two registers with identical formats e EntryLo0 is used for even virtual pages e EntryLol is used for odd virtual pages The EntryLo0 and EntryLo1 registers are read write registers They hold the physical page frame number PFN of the TLB entry for even and odd pages respectively when performing TLB read and write operations Figure 14 3 shows the format of these registers EntryLoO and EntryLo1 Registers 63 62 61 34 33 6 5 5 35570 ucl o PFN e lljo 28 3 114 Figure 14 3 Fields of the EntryLoO and EntryLo1 Registers Table 14 4 Description of EntryLo Registers Fields Field Description UC Uncached attribute PFN Page frame number the upper bits of the physical address C Specifies the TLB page coherency attribute D Valid If this bit is set it indicate
183. arget Cache 00000 Index Invalidate I 00100 Index Load Tag I 01000 Index Store Tag I 10000 Hit Invalidate I 10100 Cache Barrier 11000 Index Load Data I 11100 Index Store Data I 00001 Index WriteBack Invalidate D 00101 Index Load Tag D 01001 Index Store Tag D 10001 Hit Invalidate D 10101 Hit WriteBack Invalidate D 11001 Index Load Data D 11101 Index Store Data D 00011 Index WriteBack Invalidate S 00111 Index Load Tag S 01011 Index Store Tag S 10011 Hit Invalidate S 10111 Hit WriteBack Invalidate S 11011 Index Load Data S 11111 Index Store Data S Version 2 0 of January 29 1997 192 Chapter 10 10 2 Index Invalidate I Index Invalidate I sets a block in the primary instruction cache to Invalid VA 13 6 defines the address and VA 0 defines the way to be invalidated The invalidation takes place by writing the primary instruction cache state bit to 0 Invalid This also sets the instruction cache state parity bit to 0 The LRU bit does not change Parity check is suppressed 10 3 Index WriteBack Invalidate D Version 2 0 of January 29 1997 Index WriteBack Invalidate D sets a block in the primary data cache to Invalid VA 13 5 defines the address and VA 0 defines the way to be invalidated The invalidation takes place by writing the following bits e primary data cache state bits are set to 00 Invalid e the SCWay bit is set to 0 e the StateMod bits 001 Normal e the
184. ary Cache Each secondary cache block is in one of the following four states e Invalid e CleanExclusive e DirtyExclusive e Shared t The precise implementation of the LRU algorithm is affected by the speculative execution of instructions MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 52 Chapter 4 A secondary cache block can be changed from one state to another as a result of any of the following events e primary cache read write miss e primary cache write hit to a Shared or CleanExclusive block e secondary cache read miss e secondary cache write hit to a Shared or CleanExclusive block e a CACHE instruction e external intervention shared request e intervention exclusive request e invalidate request These events are illustrated in Figure 4 7 which shows the secondary cache state diagram CACHE Index WriteBack Invalidate S CACHE Index Store Tag S CACHE Hit Invalidate S CACHE Hit WriteBack Invalidate S Read miss obtained CleanExclusive CACHE Index Store Tag S Clean Exclusive Read hit 4 7 Intervention exclusive hit 7 P nvalidate hit Write hit Intervention shared hit P4 Read hit Intervention shared hit Intervention shared hit Write hit and Upgrade ACK Write miss Read miss obtained DirtyExclusive CACHE Index Store Tag S Read miss obtained Shared CACHE Index Store Tag S Legend Internally initi
185. ate Hit Write Back Invalidate for a CH secondary cache 0 miss 1 hit CE Reserved in the R10000 and should be set to 0 Specifies that cache parity or ECC errors cannot cause DE exceptions 0 2 parity ECC remain enabled 1 gt disables parity ECC 0 Reserved Must be written as zeroes and returns zeroes when read MIPS R10000 Microprocessor User s Manual Coprocessor 0 251 Coprocessor Accessibility Three Status register CU bits control coprocessor accessibility CUO CU1 and CU2 enable coprocessors 0 1 and 2 respectively If a coprocessor is unusable any instruction that accesses it generates an exception The following describes the coprocessor implementations and operations on the R10000 e Coprocessor 0 is always enabled in kernel mode regardless of the CUO bit e Coprocessor 1 is the floating point coprocessor If CU1 is 0 disabled all floating point instructions generate a Coprocessor Unusable exception In MIPS IV the COP3 instruction is replaced with a second floating point instruction COP1X In addition new functions are added to COP1 see Chapter 15 FPU Instructions The floating point branch conditional and compare instructions are expanded to use the eight Floating Point Status register condition bits instead of the original single bit If any of these extra bits are referenced cc 0 when not using the MIPS IV ISA an Unimplemented Instruction exception is taken The integer conditional move MOV
186. ate Request Protocol MIPS R10000 Microprocessor User s Manual System Interface Operations 125 Processor Request Flow Control Protocol The processor provides the signals SysRdRdy and SysWrRdy to allow an external agent to control the flow of processor requests SysRdRdy controls the flow of processor read and upgrade requests whereas SysWrRdy controls the flow of processor write and eliminate requests The processor can only issue a processor read or upgrade request address cycle to the System interface if SysRdRdy was asserted two SysClk cycles previously Similarly the processor can only issue the address cycle of a processor write or eliminate request to the System interface if SysWrRdy was asserted two SysClk cycles previously To determine the processor request buffering requirements for the external agent note that the processor can issue any combination of processor requests in adjacent SysClk cycles Also since the System interface operates register to register with the external agent a round trip delay of four SysCIk cycles occurs between a processor request address cycle which prompts the external agent for flow control and the flow control actually preventing any additional processor request address cycles from occurring Consequently if the maximum number of outstanding processor requests specified by the PrcReqMax mode bits is four the external agent must be able to accept at least four processor read or upgrade requ
187. ate renaming hardware and multi port register files A 14 Register Files The R10000 processor has two 64 bit wide register files to store integer and floating point values Each file contains 64 registers The integer register file has seven read and three write ports the floating point register file has five read and three write ports The integer and floating point pipelines each use two dedicated operand ports and one dedicated result port in the appropriate register file The Load Store unit uses two dedicated integer operand ports for address calculation It must also load or store either integer or floating point values sharing a result port and a read port in both register files These shared ports are also used to move data between the integer and floating point register files to store branch and link return addresses and to read the target address for branch register instructions A 15 ANDES Architecture The R10000 processor uses the MIPS ANDES architecture or Architecture with Non sequential Dynamic Execution Scheduling MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 A 376 Appendix A Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Index Index Numerics 16 word cache refill read sequence 71 write sequence 76 32 bit address space 317 mode TLB entry format 329 32 word cache refill read sequence 71 write sequence 76 4 word cache refill read
188. ated action Externally initiated action S Secondary cache Figure 4 7 Secondary Cache State Diagram Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Cache Organization and Coherency 53 4 4 Cache Algorithms The behavior of the processor when executing load and store instructions is determined by the cache algorithm specified for the accessed address The processor supports five different cache algorithms e uncached e cacheable noncoherent e cacheable coherent exclusive e cacheable coherent exclusive on write e uncached accelerated Cache algorithms are specified in three separate places depending upon the access e the cache algorithm for the mapped address space is specified on a per page basis by the 3 bit cache algorithm field in the TLB e the cache algorithm for the kseg0 address space is specified by the 3 bit KO field of the CPO Config register e the cache algorithm for the xkphys address space is specified by VA 61 59 Table 4 1 presents the encoding of the 3 bit cache algorithm field used in the TLB EntryLo0 and EntryLo1 registers CPO Config register KO field for the kseg0 address space and VA 61 59 for the xkphys address space Table 4 1 Cache Algorithm Field Encodings Cache Algorithm 0 Reserved Reserved Uncached Cacheable noncoherent Cacheable coherent exclusive Cacheable coherent exclusive on write
189. ated block As shown in Figure 6 13 the processor issues a processor double single partial word write request with a single address cycle immediately followed by a single data cycle The address cycle consists of the following e negating SysCmd 11 e driving the double single partial word write command on SysCmd 7 5 e driving the write cause indication on SysCmd 4 3 e driving the data size indication on SysCmd 2 0 e driving the target indication on SysAD 63 60 e driving the uncached attribute on SysAD 59 58 e driving the physical address on SysAD 39 0 e asserting SysVal The data cycle consists of the following e asserting SysCmd 11 driving the request last data type indication on SysCmd 4 3 e driving the write data on SysAD 63 0 e asserting SysVal The processor may only issue a processor double single partial word write request address cycle when the System interface is in master state and SysWrRdy was asserted two SysClk cycles previously MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 120 Cycle SysCIk Master SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Version 2 0 of January 29 1997 Chapter 6 Figure 6 13 depicts three processor double single partial write requests Since the System interface is initially in slave state the processo
190. ation SCData lt 41 gt SCData lt 42 gt SCData lt 43 gt SCData lt 44 gt SCData lt 45 gt SCData lt 46 gt SCData lt 47 gt SCData lt 48 gt SCData lt 49 gt SCData lt 50 gt SCData lt 53 gt SCData lt 56 gt SCData lt 59 gt SCData lt 51 gt SCData lt 60 gt SCData lt 52 gt SCData lt 61 gt SCData lt 62 gt SCData lt 63 gt SCData lt 64 gt SCData lt 65 gt SCData lt 66 gt SCData lt 67 gt SCData lt 68 gt SCData lt 71 gt SCData lt 74 gt SCData lt 77 gt SCData lt 69 gt SCData lt 78 gt SCData lt 70 gt SCData lt 73 gt SCData lt 76 gt SCData lt 79 gt SCData lt 80 gt SCData lt 81 gt SCData lt 82 gt SCData lt 83 gt SCData lt 84 gt SCData lt 85 gt SCData lt 86 gt SCData lt 89 gt SCData lt 92 gt SCData lt 95 gt SCData lt 87 gt SCData lt 96 gt SCData lt 88 gt SCData lt 91 gt SCData lt 94 gt SCData lt 97 gt SCData lt 98 gt SCData lt 99 gt SCData lt 100 gt SCData lt 101 gt SCData lt 102 gt SCData lt 103 gt SCData lt 104 gt SCData lt 107 gt SCData lt 110 gt SCData lt 113 gt SCData lt 105 gt SCData lt 114 gt SCData lt 106 gt SCData lt 115 gt SCData lt 116 gt SCData lt 117 gt SCData lt 118 gt SCData lt 119 gt SCData lt 120 gt SCData lt 121 gt SCData lt 122 gt SCData lt 123 gt SCData lt 124 gt SCDataChk lt 3 gt
191. cache 31 definition of 370 FPU 301 replacement algorithm cache 9 request cycle 80 MIPS R10000 Microprocessor User s Manual 1 13 request number 87 freeing with completion response 130 request outstanding 87 Reserved Instruction exception 351 reset cold 159 162 power on 159 160 soft warm 159 163 response bus signals 42 response cycle 80 revision number R10000 processor 255 RM field FP 311 RN field FP 311 rounding modes in FSR 311 RP reduced power bit 247 RP field FP 311 rules arbitration for System interface 109 RZ field FP 311 S SB secondary cache block size bit 256 SC instruction 27 SC A B Addr signals 39 62 63 SC A B DWay signals 39 62 70 75 SC bit 256 SCADCS signal 39 SCADOE signal 30 SCADWr signal 39 SCBDCS signal 30 SCBDOE signal 39 SCBDWrr signal 39 SCBlkSize mode bits 51 60 92 165 SCCIk frequency 118 139 SCCIK signal 39 61 157 SCCIkDiv mode bits 61 156 160 165 SCCIkTap mode bits 157 166 SCCorEn mode bits 165 177 179 SCData signal 39 SCDataChk bus 176 179 SCDataChk signal 39 scheduling dynamic 371 SCSize mode bits 51 60 165 SCTag signals 40 66 Version 2 0 of January 29 1997 I 14 SCTagChk bus 179 SCTagChk signal 40 SCTagLSBAddr signal 39 63 SCTCS signal 40 SCTOE signal 40 SCTWay signal 40 63 65 70 SCTWTr signal 40 SECDED 173 secondary cache interface signals see also individual
192. cessor but the coprocessor does not exist or has failed interpretation of the coprocessor instruction is possible If the BD bit is set in the Cause register the branch instruction must be interpreted then the coprocessor instruction can be emulated and execution resumed with the EPC register advanced past the coprocessor instruction If the process is not entitled access to the coprocessor the process executing at the time is handed a UNIX SIGILL ILL_PRIVIN_FAULT illegal instruction privileged instruction fault signal This error is usually fatal MIPS R10000 Microprocessor User s Manual CPU Exceptions 353 Floating Point Exception Cause The Floating Point exception is used by the floating point coprocessor This exception is not maskable Processing The common exception vector is used for this exception and the FPE code in the Cause register is set The contents of the Floating Point Control Status register indicate the cause of this exception Servicing This exception is cleared by clearing the appropriate bit in the Floating Point Control Status register MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 354 Watch Exception Cause Processing Servicing Version 2 0 of January 29 1997 Chapter 17 A Watch exception occurs when a load or store instruction references the physical address specified in the WatchLo WatchHi System Control Coprocessor CPO registers The WatchLo re
193. cessor User s Manual Version 2 0 of January 29 1997 A 372 Appendix A A 9 Free List and Busy Registers A busy bit table indicates whether or not a result has been written into each of the physical registers Each register is initially defined to be busy when it is moved from the free list to the active list the register becomes available not busy when its instruction completes and its result is stored in the register file The busy bit table is read for each operand while an instruction is decoded and these bits are written into the queue with the instruction If an operand is busy the instruction must wait in the queue until the operand is not busy The queues determine when an operand is ready by comparing the register number of the result coming out of each execution unit with the register number of each operand of the instructions waiting in the queue With a few exceptions the integer and address queues have integer operand registers and the floating point queue has floating point operand registers A 10 Register Renaming Version 2 0 of January 29 1997 As it executes instructions the processor generates a myriad of temporary register results These temporary values are stored in register files together with permanent values The temporary values become new permanent values when their corresponding instructions graduate Register renaming is used to resolve data dependencies during the dynamic execution of
194. check must be performed and that the cache block may be returned in either a Shared or Exclusive state Store hits to a Shared block result in a processor upgrade request This indicates to external agents containing caches that the block must be invalidated MIPS R10000 Microprocessor User s Manual Cache Organization and Coherency 55 Uncached Accelerated The R10000 processor implements a new cache algorithm Uncached accelerated This allows the kernel to mark the TLB entries for certain regions of the physical address space or certain blocks of data as uncached while signalling to the hardware that data movement optimizations are permissible This permits the hardware implementation to gather a number of uncached writes together either a series of writes to the same address or sequential writes to all addresses in the block into an uncached accelerated buffer and then issue them to the system interface as processor block write requests The uncached accelerated algorithm differs from the uncached algorithm in that block write gathering is not performed There is no difference between an uncached accelerated load and an uncached load Only word or doubleword stores can take advantage of this mode Stores under the Uncached accelerated cache algorithm bypass the primary and secondary caches Stores to identical or sequential addresses are gathered in the uncached buffer described in Chapter 6 the section titled Uncached Buffer Com
195. ches These microarchitectural techniques have special implications for compilation and code scheduling Superscalar Instruction Issue The R10000 processor has parallel functional units allowing up to four instructions to be fetched and up to five instructions to be issued or completed each cycle An ideal code stream would match the fetch bandwidth of the processor with a mix of independent instructions to keep the functional units as busy as possible To create this ideal mix every cycle the hardware would select one instruction from each of the columns below Floating point divide floating point square root integer multiply and integer divide cannot be started on each cycle The processor can look ahead in the code so the mix should be kept close to the ideal described below Column A Column B Column C Column D Column E add sub FPdiv FPstore shift mul FPsqrt load branch div store logical logical Data dependencies are detected in hardware but limit the degree of parallelism that can be achieved Compilers can intermix instructions from independent code streams Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 21 Speculative Execution Speculative execution increases parallelism by fetching issuing and completing instructions even in the presence of unresolved conditional branches and possible exceptions Following are some suggestions f
196. cific CPU instructions MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 1 2 Chapter 1 1 1 MIPS Instruction Set Architecture ISA MIPS has defined an instruction set architecture ISA implemented in the following sets of CPU designs e MIPS I implemented in the R2000 and R3000 e MIPS II implemented in the R6000 e MIPS III implemented in the R4400 e MIPS IV implemented in the R8000 and R10000 The original MIPS I CPU ISA has been extended forward three times as shown in Figure 1 1 each extension is backward compatible The ISA extensions are inclusive each new architecture level or version includes the former levels Figure 1 1 MIPS ISA with Extensions The practical result is that a processor implementing MIPS IV is also able to run MIPS I MIPS II or MIPS III binary programs without change For more ISA information please refer to the MIPS IV Instruction Set Architecture published by MIPS Technologies and written by Charles Price Contact information is provided both in the Preface and inside the front cover of this manual Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 1 2 What is a Superscalar Processor Pipeline and Superpipeline Architecture Instruction 4 Instruction 3 Superscalar Architecture A superscalar processor is one that can fetch execute and complete more than one instruction in parallel
197. cle Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Error Protection and Handling 9 2 Uncorrectable Errors 169 Uncorrectable errors consist of Errata Primary instruction cache array parity errors Primary data cache array parity errors Secondary cache tag array uncorrectable ECC errors Secondary cache data array uncorrectable ECC errors System interface command bus parity errors System interface address data bus uncorrectable ECC errors System interface response bus parity errors When the processor detects an uncorrectable error a Cache Error exception is posted In general the detection of an uncorrectable error does not disrupt any ongoing operations However the instruction fetch and load store units never use data which contains an uncorrectable error To inform the external agent the processor asserts SysUncErr for one SysClk_ cycle whenever any of the following uncorrectable errors are detected Primary instruction cache tag array parity errors Primary data cache tag array parity errors Secondary cache tag array uncorrectable ECC errors System interface command bus parity errors System interface address data bus external address cycle uncorrectable ECC errors System interface response bus parity errors The processor informs the external agent that an uncorrectable tag error has been detected by asserting SysUncErr for one SysClk cycle
198. contains the ASID from which the translation fault occurred The Random register normally contains a valid location in which to put the replacement TLB entry The contents of the EntryLo registers are undefined The EPC register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set A TLB entry is typically marked invalid when one of the following is true a virtual address does not exist e the virtual address exists but is not in main memory a page fault atrap is desired on any reference to the page for example to maintain a reference bit After servicing the cause of a TLB Invalid exception the TLB entry is located with TLBP TLB Probe and replaced by an entry with that entry s Valid bit set MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 344 Chapter 17 TLB Modified Exception Cause Processing Servicing Version 2 0 of January 29 1997 The TLB modified exception occurs when a store operation virtual address reference to memory matches a TLB entry that is marked valid but is not dirty and therefore is not writable This exception is not maskable The common exception vector is used for this exception and the Mod code in the Cause register is set When this exception occurs the BadV Addr C
199. ct between user supervisor and kernel address spaces The intermediate address bits 61 44 must either be all zeros or all ones depending on the address region The TLB does not include virtual address bits 61 59 because these are decoded only in the xkphys region which is unmapped For data cache accesses the joint TLB JTLB translates addresses from the address calculate unit For instruction accesses the JTLB translates the PC address if it misses in the instruction TLB ITLB That entry is copied into the ITLB for subsequent accesses The ITLB is transparent to system software Address Space Identification ASID Global Processes G Avoiding TLB Conflict Version 2 0 of January 29 1997 Each independent task or process has a separate address space assigned a unique 8 bit Address Space Identifier ASID This identifier is stored with each TLB entry to distinguish between entries loaded for different processes The ASID allows the processor to move from one process to another called a context switch without having to invalidate TLB entries The processor s current ASID is stored in the low 8 bits of the EntryHi register These bits are also used to load the ASID field of an entry during TLB refill The ASID field of each TLB entry is compared to the EntryHi register if the ASIDs are equal or if the entry is global see below this TLB entry may be used to translate virtual addresses The ASID comparison is performed
200. ction Decode And Rename Unit The instruction decode and rename unit has the following characteristics e it processes 4 instructions in parallel e it replaces logical register numbers with physical register numbers register renaming it maps integer registers into a 33 word by 6 bit mapping table that has 4 write and 12 read ports it maps floating point registers into a 32 word by 6 bit mapping table that has 4 write and 16 read ports e it has a 32 entry active list of all instructions within the pipeline Branch Unit The branch unit has the following characteristics e it allows one branch per cycle conditional branches can be executed speculatively up to 4 deep e it has a 44 bit adder to compute branch addresses e it has a 4 quadword branch resume buffer used for reversing mispredicted speculatively taken branches Errata e the Branch Return Cache contains four instructions following a subroutine call for rapid use when returning from leaf subroutines e it has program trace RAM that stores the program counter for each instruction in the pipeline External Interfaces The external interfaces have the following characteristics a 64 bit System interface allows direct connection for 2 way to 4 way multiprocessor systems 8 bit ECC Error Check and Correction is made on address and data transfers asecondary cache interface with 128 bit data path and tag fields 9 bit ECC Error Check and Correction is made on data qu
201. ction causes a Reserved Instruction exception in the R10000 processor The execution of undefined COPO functions is undefined in the R10000 processor 17 6 COP1 Instructions The R10000 and R4400 processors do not generate the same exceptions for undefined COP1 instructions In the R4400 processor undefined opcodes or formats in the sub field take an Unimplemented Operation exceptions In the R10000 processor undefined opcodes bits 25 24 are 0 or 1 take Reserved Instruction exceptions and undefined formats bits 25 24 are 2 or 3 take Unimplemented Operation exceptions In MIPS II on an R4400 processor the execution of DMTC1 DMFC1 and L format take Unimplemented Operation exceptions In MIPS II on the R10000 processor the execution of DMTC1 and DMFC1 take Reserved Instruction exceptions The attempted execution of the L format takes an Unimplemented Operation exception when the MIPS III mode is not enabled A CTC1 instruction that sets both Cause and Enable bits also forces an immediate floating point exception the EPC register points to the offending CTC1 instruction 17 7 COP2 Instructions If the CU2 bit of the CPO Status register is not set during an attempted execution of such Coprocessor 2 instructions as COP2 LWC2 SWC2 LDC2 and SDC2 the system takes a Coprocessor Unusable exception In the R4400 processor if the CU2 bit is set COP2 instructions are handled as NOPs the operations of Coprocessor 2 load store instruc
202. cute tests The TAP consists of the following signals JTDI Serial data input Input signal JTDO Serial data output Output signal JTMS Mode select Input signal JICK Clock Input signal The timing and the relationship of the TAP signals follows the IEEE 1149 1 standard protocol The R10000 processor implements the 16 state TAP controller specified by the IEEE 1149 1 standard in the following manner e TheJTMS signal operates the state machine synchronized by the JTCK signal The TAP controller is reset by keeping the JTMS signal asserted through five consecutive edges of JTCK This reset condition sets the reset state of the controller The TAP controller is also reset by asserting SysReset This pin must not be asserted while using the boundary scan register MIPS R10000 Microprocessor User s Manual JTAG Interface Operation 205 11 2 Instruction Register The JTAG instruction register is four bits wide permitting a total of 16 instructions to control the selection of the bypass register the boundary scan register and other data registers The encoding of the instruction register is given in Table 11 1 Table 11 1 JTAG Instruction Register Encoding MSB LSB Selected Data Register 0000 Boundary Scan Register 0001 Sample Preload 0010 to Data Register not used 1110 1111 Bypass Register The 0001 value is provided to represent sample preload but also selects the boundary scan register
203. d 0 1 Double single partial word read 0 1 Data size Block write 1 0 Block state Processor Write cause address Double single partial word write 1 0 Data size cycles Upgrade Request number 1 1 Upgrade cause Block state Reserved Reserved 0 0 Reserved Eliminate Block state Special 1 Reserved Reserved Reserved Double single partial word write Processor data cycles External Block write Coherency data response Intervention shared Intervention exclusive Allocate request number Request number Request number Block state E data cycles Double single partial word data response MIPS R10000 Microprocessor User s Manual Request number quality Data type Version 2 0 of January 29 1997 NOP address eee Invalidate Request number ECC NOP X Special Interrupt ECC Block data response Block state External Data ECC 102 SysAD 63 0 Encoding Chapter 6 This section describes the system address data bus encoding SysAD 63 0 Address Cycle Encoding SysAD 63 60 Table 6 21 presents the encoding of the SysAD 63 0 bus for address cycles Table 6 21 Encoding of SysAD 63 0 for Address Cycles SysAD 63 60 Target Indication SysAD 63 Target processor with DevNum 3 SysAD 62 Target processor with DevNum 2 SysAD 61 Target processor with DevNum 1 SysAD 60 Target processor with DevNum 0 SysAD 59 5
204. d FrameMask register 260 EntryLo0 register 239 329 EntryLol register 239 329 EPC register 254 ERET instruction 292 ERL error level bit 171 249 316 Version 2 0 of January 29 1997 ERR completion response 130 ERR signal 90 error correctable 168 handling 167 protocol 185 levels in the Status register 316 protection 167 schemes used in R10000 173 protection schemes used in R10000 ECC 173 parity 173 sparse encoding 173 uncorrectable 169 handling an 171 limiting the propagation of 170 units that detect and report uncorrectable errors 171 error correcting code see also ECC 173 Error Exception Program Counter ErrorEPC register 284 Event field 265 EW bit in CacheErr register 172 ExcCode field 252 253 exception levels in the Status register 316 exception processing CPU exception types Address Error 340 Breakpoint 350 Bus Error 346 Cache Error 171 345 Coprocessor Unusable 352 Floating Point 353 Integer Overflow 347 Interrupt 355 NMI 339 Reserved Instruction 35 1 Soft Reset 337 System Call 349 TLB 341 TLB Invalid 341 343 TLB Modified 341 344 TLB Refill 341 342 Trap 348 Virtual Coherency 345 Watch 354 exception vector location Reset 332 TLB Refill 332 Version 2 0 of January 29 1997 Index exception vector selection 333 precise handling 15 priority of 333 335 TLB refill vector locations 334 Exception Program Counter EPC register 254 Exception Return instruction 285 executing an instruction 3
205. d Register 31 6 5 0 S we 6 26 Figure 14 7 Wired Register Table 14 7 Wired Register Field Descriptions Field Description Wired TLB Wired boundary Reserved Must be written as zeroes and returns zeroes when read MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 244 Chapter 14 14 7 BadVAddr Register 8 The Bad Virtual Address register BadV Addr is a read only register that displays the most recent virtual address that caused either a TLB or Address Error exception The BadV Addr register remains unchanged during Soft Reset NMI or Cache Error exceptions Otherwise the architecture leaves this register undefined Figure 14 8 shows the format of the BadV Addr register BadV Addr Register 63 0 Bad Virtual Address 64 Figure 14 8 BadVAddr Register Format 14 8 Count and Compare Registers 9 and 11 Count 9 The Count and Compare registers are 32 bit read write registers whose formats are shown in Figure 14 9 The Count register acts as a real time timer Like the R4400 implementation the R10000 Count register is incremented every other PCIk cycle However unlike the R4400 the R10000 processor has no Timer Interrupt Enable boot mode bit so the only way to disable the timer interrupt is to negate the interrupt mask bit IM 7 in the Status register This means the timer interrupt cannot be disabled without also disabling the Performance Counter interr
206. d Store Tag c cccescscssesesesseseescecesesescsnsnsesesssnesesesesceseneseseanananeness 278 Primary Instruction Cache Operation sss 279 Primary Data Cache Operation sss 279 Secondary Cache Operation sse nennen 281 CacheOp is Index Load Store Data sse een nennen 282 Primary Instruction Cache Operation sss 282 Primary Data Cache Operation s essere 283 Secondary Cache Operation sse enne 283 ErrorEPG Register 30 aee pete pon edam oae verted 284 CPO InStructiofis 5 eie n eu e ee ee den teer Eb E reete tie Pe TAE 285 Ha zardssst e hae USA ONE LECCE S 285 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents XXU Branch on Coprocessor Ossc seses ie enea eE eao a eTo aE SED EE EEEE A EE aR nenne 285 CPO Move Instruct ONS i gesendet Heise te det ise aa aa a aaa aaa aaa aaa a dee Taaak 286 CACHE Istr cHON saanunna isena eiad aaan A e aa eia ee irinae ia oesie 287 DMEGO Instruction nene lee s an de ec e eds 290 DMICO InStt CtiOn z nine detis ete tere tee tutte cte eet i erben 291 BRET An Str ct Oriack aet etre let eet EM ede adeste Aaea Medos 292 MECO Instructio eR en ee eee 293 Move To From the Performance Counter eese eere entente tet en nenas tenen 294 MTCO Instruction 5 c S ne eet eee i b n Cd cede e 296 TEBP Instructions uii acti e en eret ec
207. d the SysGblPerf signal may be used to guarantee that all cache accesses and uncached stores have been globally performed as described in Chapter 6 the section titled SysGblPerf Signal An uncached load followed by a SYNC instruction may be used to guarantee that all cache accesses uncached accesses and uncached accelerated accesses have been globally performed MIPS R10000 Microprocessor User s Manual Cache Organization and Coherency 4 6 Cache Algorithms and Processor Requests 57 The cache algorithm determines the type of processor request generated for secondary cache load misses secondary cache store misses and store hits Table 4 2 presents the relationship between the cache algorithm and processor requests Table 4 2 Cache Algorithms and Processor Requests read Double single word write for incompletely gathered blocks Partial word write for partial word stores Cache Algorithm Load Miss Store Miss Store Hit Uncached Double single partial word Double single partial NA read word write Cacheable noncoherent Noncoherent block read Noncoherent block read Upgrade if Shared Cacheable coherent Coherent block read Coherent block read 2 Upgrade if Shared exclusive exclusive exclusive ated SoBerent Coherent block read shared onerat block toad Upgrade if Shared exclusive on write exclusive Gather identical or sequential double single word stores in the uncached 1 buffe
208. dary cache controller with a dedicated secondary cache port The cache s tag and data arrays each consist of an external bank of industry standard synchronous SRAM SSRAM This SSRAM must have registered inputs and outputs asynchronous output enables and use the late write protocol data is expected one cycle after the address MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 199759 60 Chapter 5 5 1 Tag and Data Arrays Errata Version 2 0 of January 29 1997 The secondary cache consists of a 138 bit wide data array 128 data bits 9 ECC bits 1 parity bit and a 33 bit wide tag array 26 tag bits 7 ECC bits as shown in Figure 5 1 ECC is supported for both the data and tag arrays to improve data integrity 10 Check Bits 128 Data Bits Data 137 136 127 0 Array 7 Check bits 26 Tag Bits aXq lt q _ Tag 32 25 0 Array 1 ECC Figure 5 1 Secondary Cache Data and Tag Array The secondary cache is implemented as a two way set associative combined instruction data cache which is physically addressed and physically tagged as described in Chapter 4 the section titled Cache Organization and Coherency The SCSize mode bits specify the secondary cache size minimum secondary cache size is 512 Kbytes and the maximum secondary cache size is 16 Mbytes in power of 2 512 Kbytes 1 Mbyte 2 Mbytes etc The SCBIkSize mode bit specifies the secondary cache block size
209. dary scan cells Eg Figure 11 1 JTAG Boundary Scan Cells The main application of the boundary scan register is board level interconnect testing The use of the boundary scan register for applying data to and capturing data from the internal microprocessor circuitry is not supported The boundary scan register list for rev 1 2 of the fab is given in Table 11 2 The TriState signal will be eliminated from the BSR in rev 2 0 of the fab and beyond An additional bit is provided in the boundary scan register to control the direction of bidirectional pins As it is loaded through JTDI this bit is the first bit in the boundary scan chain The logic value of this bit is latched during the Update DR state and sets the direction of all bidirectional pins as follows Value Direction 0 Input 1 Output The value is set to 0 during reset setting all bidirectional pins to input prior to any boundary scan operations MIPS R10000 Microprocessor User s Manual JTAG Interface Operation Table 11 2 Boundary Scan Register Pinlist rev 1 2 207 ti Will be eliminated after rev 1 2 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 Signal Signal Signal Signal Signal Signal Il SCDataChk 1 2 SCData 63 SCData 62
210. ddress cycle of external intervention and invalidate requests SysCmd 10 8 contain the request number as shown in Table 6 11 The request number provides a mechanism to associate a potential processor coherency data response with the corresponding external coherency request Table 6 11 Encoding of SysCmd 10 8 for External Intervention and Invalidate Requests SysCmdrosi During the address cycle of external requests SysCmd 7 5 contain the command as shown in Table 6 12 Table 6 12 Encoding of SysCmd 7 5 for External Requests SysCmd 7 5 Command 0 Intervention shared 1 Intervention exclusive 2 Allocate request number 3 Allocate request number 4 NOP 5 NOP 6 Invalidate 7 Special Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 99 During the address cycle of external special requests SysCmd 4 3 contain the external special cause indication as shown in Table 6 13 This information is used to differentiate between the various external special requests Table 6 13 Encoding of SysCmd 4 3 for External Special Requests SysCmd 4 3 Special Cause Indication Reserved NOP 0 1 a SSCSC S 3 Rem Errata During external address cycles SysCmd 0 specifies whether ECC checking and correcting is to be performed for the SysAD 63 0 bus as shown in Table 6 14 During the address cycle of processor block read data typical bloc
211. dered New Value Strongly Processor A Processor B Ordered No No No Yes No Yes No Yes Yes Yes Yes Yes Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 1 6 R10000 Pipelines Stage 1 Stage 2 17 This section describes the stages of the superscalar pipeline Instructions are processed in six partially independent pipelines as shown in Figure 1 4 The Fetch pipeline reads instructions from the instruction cachet decodes them renames their registers and places them in three instruction queues The instruction queues contain integer address calculate and floating point instructions From these queues instructions are dynamically issued to the five pipelined execution units In stage 1 the processor fetches four instructions each cycle independent of their alignment in the instruction cache except that the processor cannot fetch across a 16 word cache block boundary These words are then aligned in the 4 word Instruction register If any instructions were left from the previous decode cycle they are merged with new words from the instruction cache to fill the Instruction register In stage 2 the four instructions in the Instruction register are decoded and renamed Renaming determines any dependencies between instructions and provides precise exception handling When renamed the logical registers referenced in an instruction are
212. ds in the local CacheErr register see Chapter 14 CacheErr Register 27 for more information If an error is detected on the tag address state or mod array the processor informs the external agent that an uncorrectable tag error was detected by asserting SysUncErr for one SysClk cycle MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 176 Chapter 9 9 11 Secondary Cache Error Protection and Handling Error Protection Error Handling Data Array Errata Version 2 0 of January 29 1997 This section describes error protection and error handling schemes for the secondary cache The secondary cache arrays have the following error protection schemes as listed in Table 9 4 Table 9 4 Secondary Cache Array Error Protection Array Width Error Protection Data 128 bit 9 bit ECC even parity Tag 26 bit 7 bit ECC MRU Way prediction table 1 bit None This section describes error handling for the data array and the tag array As shown in Table 9 4 errors are not detected for the way prediction table The 128 bit wide secondary cache data array is protected by a 9 bit wide ECC An even parity bit for the 128 bits of data is used for rapid detection of correctable single bit errors when a correctable parity error is detected the data is sent through the data corrector The parity bit does not have any logical effect on the processor s ability to either detect or correct errors
213. e EPC register EPC register 4 before returning If a SYSCALL instruction is in a branch delay slot a more complicated algorithm beyond the scope of this description may be required MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 350 Breakpoint Exception Cause Processing Servicing Version 2 0 of January 29 1997 Chapter 17 A Breakpoint exception occurs when an attempt is made to execute the BREAK instruction This exception is not maskable The common exception vector is used for this exception and the BP code in the Cause register is set The EPC register contains the address of the BREAK instruction unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction If the BREAK instruction is in a branch delay slot the BD bit of the Status register is set otherwise the bit is cleared When the Breakpoint exception occurs control is transferred to the applicable system routine Additional distinctions can be made by analyzing the Code field of the BREAK instruction bits 25 6 and loading the contents of the instruction whose address the EPC register contains A value of 4 must be added to the contents of the EPC register EPC register 4 to locate the instruction if it resides in a branch delay slot To resume execution the EPC register must be altered so that the BREAK instruction does not re execute this is accompli
214. e Specifies the size of the secondary cache SCCIkDiv Sets PCIk to SCCIk ratio determines the secondary cache clock frequency see Chapter 7 the section titled System Interface Clock and Internal Processor Clock Domains Reserved SINDWTIBWNFROIN DUB WNF c 512 Kbyte 1 Mbyte 2 Mbyte 4 Mbyte 8 Mbyte 16 Mbyte Reserved Reserved Reserved Result of division by 1 Result of division by 1 5 Result of division by 2 Result of division by 2 5 Result of division by 3 Reserved Reserved MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 166 Chapter 8 Table 8 1 cont Mode Bits 0 SCCIK same phase as internal clock 1 SCCIk 1 12 PCIk period earlier than internal clock 2 SCCIk 2 12 PCIk period earlier than internal clock 3 SCCIK 3 12 PCIKk period earlier than internal clock 4 SCCIk 4 12 PCIk period earlier than internal clock SCCIkTap 5 SCCIK 5 12 PCIK period earlier than internal clock Les i 6 undefined Specifies the alignment of 7 undefined eee UN TUE EE Ta dus 8 SCCIK 6 12 PCIK period earlier than internal clock eicbodode Y 9 SCCIK 7 12 PCIKk period earlier than internal clock TATEM A SCCIk 8 12 PCIk period earlier than internal clock B SCCIK 9 12 PCIK period earlier than internal clock C SCCIK 10 12 PCIK period earlier than internal clock D SCCIKk 11 12 PCIk period earlier than internal clock E undefined F undefined 29 Reserved 0 ODrainSys Specifies whether or no
215. e handler restores the previous value of the KSU field and sets the EXL bit back to 1 Returning from an exception also resets the EXL bit to 0 see the ERET instruction in Appendix A 17 2 Exception Vector Locations The Cold Reset Soft Reset and NMI exceptions are always vectored to the dedicated Cold Reset exception vector at an uncached and unmapped address Addresses for all other exceptions are a combination of a vector offset and a base address The boot time vectors when BEV 1 in the Status register are at uncached and unmapped addresses During normal operation when BEV 0 the regular exceptions have vectors in cached address spaces Cache Error is always at an uncached address so that cache error handling can bypass a suspect cache The exception vector assignments for the R10000 processor shown in Table 17 1 the addresses are the same as for the R4400 Table 17 1 Exception Vector Addresses Exception Vector Address BEV Exception Type eam 32 bit 64 bit Cold Reset Soft Reset NMI TLB Refill EXL 0 0 0 OxFFFFFFFF 80000000 XTLB Refill EXL 0 0 OxBFCO O OxFFFFFFFF BFC00000 OxFFFFFFFF 80000080 BEV 0 Cache Error 0xA0000100 OxFFFFFFFF A0000100 Others 0 TLB Refill EXL 0 0 0 OxFFFFFFFF BFC00200 XTLB Refill EXL 0 OxFFFFFFFF BFC00280 Cache Error 0 0 OxFFFFFFFF BFC00300 OxFFFFFFFF 80000180 BEV 1 Others OxFFFFFFFF BFC00380 Version 2 0 of January 29 1997
216. e the external agent supplies an external response to the processor coherent block read request that is pending response Intervention shared Intervention exclusive Upgrade Invalidate The external agent responds to the external coherency requestor that the block is Shared At some later time the external agent supplies an external response to the processor upgrade request that is pending response The external agent issues the conflicting external coherency request to the processor The processor allows the conflicting external coherency request to proceed and supplies a Shared processor coherency state response After observing the processor coherency state response the external agent provides an external ACK completion response for the conflicting external coherency request At some later time the external agent supplies an external response for the processor upgrade request that is pending response This external response may not be an external ACK completion response unless it is associated with an external block data response i Although it is not required the external agent may choose to issue the conflicting external coherency request to R10000 and the processor will return an invalid processor coherency state response Although it is not required the external agent may choose to issue the conflicting external coherency request to R10000 and the amp q g amp q processor will return a
217. e 0 1uF and 1 nF low inductance capacitors are placed in parallel with the 10 uF capacitor as close to the R10000 package as possible Vec 10 ohm VccPa 10 UF 0 1 UF 1nF Vss 10 ohm VssPa Figure 12 1 Decoupling VccPa and VssPa Vcc 2 ohm VccPd SERM 10 UF 0 1 UF 1 nF ell Vss 2 ohm VssPd Figure 12 2 Decoupling VccPd and VssPd Decoupling between VccPa and VssPa is far more important than decoupling between VccPd and VssPd if both are not possible Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 13 Packaging The R10000 microprocessor is presently supplied in one standard package configuration asingle chip 599 ceramic LGA Land Grid Array MIPS Licensees are encouraged to develop package solutions with MIPS Semiconductor Partners to meet specific requirements MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997219 220 Chapter 13 13 1 R10000 Single Chip Package 599CLGA The standard single chip R10000 package is a 599CLGA ceramic land grid array as shown in Figure 13 1 The 599CLGA package minimizes output switching noise by reducing the inductance of the power and ground paths leading into the package Much of the decrease in power ground inductance is accomplished by shortening the wire bonds running from the die pads to the package inner leads The 599CLGA is designed with its cavity side down and the die is connected directly to a thermal slug Mechanic
218. e dens 337 INMLIBXCODUOfRi eiit taie ite dtd epi tenere di ege neat 339 Address Error Exception nennen tentent 340 TEB Exceptions s o ettet eade epe tm t s 341 TLB Refill Exception ccccccccccccsssssesesssneeseececeesescenesescsesnsnenessseseesescececenenesesnananenees 342 TLB Invalid Exception e eie a EE i E tta seta desee E EE 343 TLB Modified ExceptioM seniai i e nene nennen 344 Cache Error Exceptions aetee detti Un eer RII EIER OR 345 Virtual Coherency Exception etie e e ee biete ended te i e reat 345 Bus Error Exception ssis eaae eai ae aa aa aa a AR ARR AEI is 346 Integer Overflow EXcePHON i insesti edesin eirate seekar eean enne nennen nennen 347 Trap Exception 1i edet Ree ee e tete HR Rena 348 System Call Exception etnies es ert nie e Prin iia hens 349 Breakpoint Exceplioncc un eget ee eate te em Were heen ai teas 350 Reserved Instruction Exception niea ena ae enen eaei E eSEE RA EEEE eene nenne 351 Coprocessor Unusable Exception isere eie nenne 352 Floating Point Exception nennen nennen nennen 353 Watch Exception i508 sea Risin ien aa Ea EEE E PEREDA ien eint a ih HER an eae 354 Interrupt Exception cede eed dte eden err dense edidere beside esos 355 MIPSIV Instr ctions riesis RP OG RO R ede nonet eds 356 COPO Instr cttOfs oet emere ppt teni re eie e ee n I ee EENE 357 COPITInstr ctons iecit enenatis tdem emn gains 357 COP2 Instr ctiOfis 0 inerte e o e
219. e designed to reject all cache coherent mode references to the memory except to that prior defined safe space in kseg0 within which there is no possibility of an I O DMA transfer Then before the DMA read process and before the cache is flushed for the DMA read buffers the cache attributes in the TLB for the I O buffer address space are changed from noncoherent to uncached After the DMA read the access modes are returned to the cached noncoherent mode Just before load store instruction use a conditional move instruction which tests for the reverse condition in the speculated branch and make all aborted branch assignments safe An example is given below MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 24 Chapter 1 bne r1 rO label movn ra rO r1 test to see if r1 0 if r1 0 then branch is mispredicted move safe address r0 into ra Id r4 0 ra Without the previous movn this Ild could create damaging read In the above example without the MOVN the read to the address in register ra could be speculatively executed and later aborted It is possible that this load could be premature and thus damaging The MOVN guarantees that if there is a misprediction r1 is not equal to 0 ra will be loaded with an address to which a read will not be damaging 6 The following is similar to the conditional move example given above in t
220. e die n e pe i E e IH e ER Hep eter 357 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents 18 Cache Test Mode Interface Signals System Interface Clock Divisor Entering Cache Test Mode Exit Sequence SysAD 63 0 Encoding Cache Test Mode Protocol Normal Write Protocol Auto Increment Write Protocol Normal Read Protocol Auto Increment Read Protocol eese A Glossary Superscalar Processor Pipeline ieiunio ee ht n ite uti bebes ee t s Pipeline Latency esee sei e regt en Re t e ade e EE Pipeline Repeat Rate Out of Order Execution Dynamic Scheduling Instruction Fetch Decode Issue Execution Completion and Graduation Active List Free List and Busy Registers Register Renaming Nonblocking Loads and Stores Speculative Branching Logical and Physical Registers Register Files ANDES Architecture MIPS R10000 Microprocessor User s Manual xxix Version 2 0 of January 29 1997 XXX Table of Contents Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 1 Introduction to the R10000 Processor This user s manual describes the R10000 superscalar microprocessor for the system designer paying special attention to the external interface and the transfer protocols This chapter describes the following e MIPS ISA e what makes a generic superscalar microprocessor e specifics of the R10000 superscalar microprocessor e implementation spe
221. e presented for translation into physical address the corresponding bits in the TLB identify which virtual address bits among bits 24 13 are used in the comparison When the Mask field is not one of the values shown in Table 14 6 the operation of the TLB is undefined The 0 field is reserved it must be written as zeroes and returns zeroes when read PageMask Register 81 25 24 18 12 0 0 MASK 0 7 12 13 Figure 14 5 PageMask Register Table 14 6 Mask Field Values for Page Sizes Page Size Bit Mask 4 Kbytes 16 Kbytes 64 Kbytes 256 Kbytes 1Mbyte 4 Mbytes 16 Mbytes Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 243 14 6 Wired Register 6 The Wired register is a read write register that specifies the boundary between the wired and random entries of the TLB as shown in Figure 14 6 Wired entries are fixed nonreplaceable entries which cannot be overwritten by a TLB write operation Random entries can be overwritten TLB 63 Range of Random entries Wired Register Y Range of Wired entries 0 This entry is Random not Wired Figure 14 6 Wired Register Boundary The Wired register is set to 0 upon system reset Writing this register also sets the Random register to the value of its upper bound see Random register above Figure 14 7 shows the format of the Wired register Table 14 7 describes the register fields Wire
222. e request to indicate that the request was unsuccessful Upon receiving an external ERR completion response the processor takes a Bus Error exception on the associated instruction If the processor read or upgrade request was caused by a PREFETCH instruction no exception is taken Also if the request was caused by a speculative instruction no exception is taken The external agent issues an external NACK completion response for a processor read or upgrade request to indicate that the request was not accepted Upon receiving an external NACK completion response the processor re evaluates the associated instruction Due to the speculative nature of the R10000 processor the re evaluation may or may not result in the reissue of a similar processor request An external ERR or NACK completion response issued in response to an external intervention allocate request number or invalidate has no affect on the processor except to free the request number MIPS R10000 Microprocessor User s Manual System Interface Operations 131 Figure 6 19 depicts a processor upgrade request and a corresponding external completion response Cycle SysClk Master SysReq SysGnt SysRel SysCmd 11 0 ay SysCmdPar c SysAD 63 0 VETE SysADChk 7 0 i3 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Figure 6 19 External Completion Respon
223. e the primary caches If there is not a hit the instruction ends The processor reads each subset block from the primary instruction cache If there is a match then invalidate the block by writing the IState bit to 0 Invalid and the IState parity bit to 0 Read each subset block from the primary data cache If there is a match then write the DState bits 00 Invalid the StateMod bits 001 Normal the SCWay bit 0 and the DState parity bit 0 If the original State of any subset block is StateMod 010 Inconsistent also write it back to the secondary cache using the DTag and the secondary way bit from the primary data tag array Write the State of the secondary cache block 00 Invalid Since the secondary cache is designed so all tag bits must be written at once the STag PIdx and ECC bits are also written The STag is written with whatever the PA and VA 13 12 of the original CACHE instruction were The Tag ECC is generated If the secondary block s original State bits were 11 Dirty then the block is written back to the system interface unit If the block s State was Shared or CleanExclusive the system interface unit is simply notified that the block has been deleted with a Tag Invalidation request The MRU bit is set to point away from the block invalidated Hit WriteBack Invalidate S set the CH bit if it hits in the secondary cache Once the CH bit is set it stays set until cleared by a MTCO Instruction Hit Cach
224. e to Performance Control 1 11001 11 Move to Performance Counter 1 The performance counters and associated control registers are read by using a MFCO instruction as shown in Table 14 20 Table 14 20 Reading Performance Registers Using MFCO Opcode 15 11 Opcode 1 0 Operation 11001 00 Move from Performance Control 0 11001 01 Move from Performance Counter 0 11001 10 Move from Performance Control 1 11001 11 Move from Performance Counter 1 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 272 Version 2 0 of January 29 1997 Chapter 14 The format of the performance control registers are shown in Table 14 21 Table 14 21 Performance Control Register Format 8 5 4 3 0 t Count enable Event select IP 7 interrupt enable U S K EXL The count enable field specifies whether counting is to be enabled during User Supervisor Kernel and or Exception level mode Any combination of count enable bits may be asserted A unused bits in the performance control registers are reserved All counting is disabled when the ERL bit of the CPO Status register is asserted Table 14 22 defines the operation of the count enable bits of the performance control registers Table 14 22 Count Enable Bit Definition Count Enable Bit Count Qualifier CPO Status Register Fields U KSU 2 User mode EXL 0 ERL 0 S KSU 1 Supervisor mode EXL 0 ERL 0
225. eOps can cause cache error exceptions if they check ECC or parity bits Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual CACHE Instructions 201 10 17 Index Load Data I Index Load Data I loads a single instruction from the primary instruction cache into the CPO TagHi TagLo and ECC registers A predecoded instruction in R10000 is 36 bits of data and one bit of parity The address of the target instruction is VA 13 2 of the CACHE instruction The way of the target instruction is VA 0 of the CACHE instruction The instruction itself is loaded into CPO TagHi 3 0 and TagLo 31 0 The parity bit is loaded into CPO ECCI 0 The tag field is not read Parity checking is suppressed during operation of Index Load Data I 10 18 Index Load Data D Index Load Data D loads a singleword of data and the corresponding four bits of byte parity into CPO TagLo and ECC The address of the target singleword is VA 13 2 of the CACHE instruction The way of the target singleword is V A 0 of the CACHE instruction The singleword of data will be loaded into the CPO TagLo register The byte parity will be loaded into CPO ECC 3 0 register The tag field is not read Parity checking is suppressed during operation of Index Load Data D 10 19 Index Load Data S Index Load Data S loads a doubleword of data and all 10 check bits into the CPO TagHi TagLo and ECC registers The address of the target doublewords comes fro
226. ePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Version 2 0 of January 29 1997 Figure 6 14 Processor Upgrade Request Protocol MIPS R10000 Microprocessor User s Manual System Interface Operations 123 Processor Eliminate Request Protocol A processor eliminate request results from the following a cached instruction fetch load store or prefetch that misses in the secondary cache and forces the replacement of a Shared or CleanExclusive secondary cache block a CACHE Index WriteBack Invalidate S Hit Invalidate S or Hit WriteBack Invalidate S instruction that forces the invalidation of a Shared or CleanExclusive secondary cache block a CACHE Hit Invalidate S instruction that forces the invalidation of a DirtyExclusive secondary cache block A processor eliminate request notifies the external agent that a Shared CleanExclusive or DirtyExclusive block has been eliminated from the secondary cache Such requests are useful for systems implementing a directory based coherency protocol and are enabled by asserting the PrcEImReq mode bit The processor issues a processor eliminate request with a single address cycle This address cycle consists of the following negating SysCmd 11 driving the special command on SysCmd 7 5 driving the eliminate special cause indication on SysCmd 4 3 driving the secondary cache block former state on SysCmd 2 1 asserting SysCmd 0 driving the target indication on
227. ead Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 275 CacheErr Register Format for Primary Data Cache Errors Figure 14 25 shows the format of the CacheErr register when a primary data cache error occurs 31 30 29 28 27 26 2524 232221 2019 14 13 3 2 0 01 EW D TA TS TM 0 Pldx 0 2 1 1 2 2 2 2 6 11 3 Figure 14 25 CacheErr Register Format for Primary Data Cache Errors EW set when CacheErr register is already holding the values of a previous error EE tag error on an inconsistent block D data array error way1 way0 TA tag address array error way1 way0 TS tag state array error way1 ll way0 TM tag mod array error way ll way0 PlIdx primary cache virtual double word index VA 13 6 Errata 0 Reserved Must be written as zeroes and returns zeroes when read MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 276 Chapter 14 CacheErr Register Format for Secondary Cache Errors Figure 14 26 shows the format of the CacheErr register when a secondary cache error occurs 31 30 29 28 2726 25 24 23 22 6 5 0 10 EW 0 D 0 Sldx 0 2 1 le 2 2 1 17 6 Figure 14 26 CacheErr Register Format for Secondary Cache Errors EW set when CacheErr register is already holding the values of a previous error D uncorrectable data array error way1 way0 TA uncorrectable tag array error way1 way0 SIdx secondary cache physical block
228. ead write field is normally written with a value that allows the PTEBase i operating system to use the Context register as a pointer into the current PTE array in memory MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 260 Chapter 14 14 18 FrameMask Register 21 The FrameMask register is new with the R10000 processor It masks bits of the EntryLo0 and EntryLo1 registers so that these masked bits are not passed to the TLB while doing a TLB write either TLBWI or TLBWR A zero in the FrameMask register allows its corresponding bit in the EntryLo 1 0 registers to pass to the TLB a one in the FrameMask register masks off its corresponding bit in the EntryLo registers and passes a zero to the TLB Bits 15 0 of the FrameMask register control bits 33 18 of the EntryLo registers The remaining bits of this register are ignored on write and read as zeroes The content of this register is set to zero after a processor reset or a power up event Figure 14 20 shows the FrameMask register format FrameMask Register 31 16 15 0 0 Mask bits PA 39 24 T6 16 Figure 14 20 FrameMask Register Format Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 261 14 19 Diagnostic Register 22 CPO register 22 the Diagnostic register is a new 64 bit register for processor specific diagnostic functions Since this register is designed for local use the diagnostic function
229. eculative Execution Branch prediction can be controlled by the CPO Diagnostic register Branch Likely instructions are always predicted as taken which also means the instruction in the delay slot of the Branch Likely instruction will always be speculatively executed Since the branch predictor is neither used nor updated by branch likely instructions these instructions do not affect the prediction of normal conditional branches Resolving Operand Dependencies Version 2 0 of January 29 1997 Operands include registers memory and condition bits Each operand type has its own dependency logic In the R10000 processor dependencies are resolved in the following manner register dependencies are resolved by using register renaming and the associative comparator circuitry in the queues e memory dependencies are resolved in the Load Store Unit condition bit dependencies are resolved in the active list and instruction queues MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 15 Resolving Exception Dependencies In addition to operand dependencies each instruction is implicitly dependent upon any previous instruction that generates an exception Exceptions are caused whenever an instruction cannot be properly completed and are usually due to either an untranslated virtual address or an erroneous operand The processor design implements precise exceptions by e identifyin
230. ed because of misprediction Note that the misprediction is determined on the same cycle that the conditional branch is resolved The misprediction rate is the ratio of branch mispredicted count to conditional branch resolve count Event 9 for Counter 0 Primary Instruction Cache Misses This counter is incremented one cycle after an instruction refill request is sent to the SCTP logic Event 9 for Counter 1 Secondary Cache Load Store and Cache ops Operations This counter is incremented one cycle after a request is entered into the SCTP logic provided the request was initially targeted at the primary data cache Such requests fall into three categories e primary data cache misses e requests to change the state of primary and secondary and primary data cache lines from Clean to Dirty due to stores hitting a clean line in the primary data cache requests initiated by Cache op instructions MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 270 Chapter 14 Event 10 for Counter 0 Secondary Cache Misses Instruction This counter is incremented the cycle after the last quadword of a primary instruction cache line is written from the main memory while the secondary cache refill continues Event 10 for Counter 1 Secondary Cache Misses Data This counter is incremented the cycle after the second quadword of a data cache line is written from the
231. ed inputs 213 Vref voltage reference 212 DC power supply levels 210 DC voltage reference 212 DC data cache size field 256 DCOK signal 38 160 211 212 217 DE bit 172 250 debugging and Watch registers 258 decoding an instruction 371 decoupling capacitance 218 delay times AC electrical 216 dependencies condition bit 14 exception 15 instruction 13 memory 14 pipeline 13 register 14 375 MIPS R10000 Microprocessor User s Manual 1 5 DevNum mode bits 164 Diagnostic register 261 directory based coherency protocol 153 divide unit FPU 301 division by zero FP 310 divisor clock system interface 80 360 DMECO instruction 286 290 DMTC instruction 286 291 DN device number field 256 Done bit 11 done see also completion 373 Doubleword Move From CPO instruction 285 Doubleword Move To CPO instruction 285 DP primary data cache parity field 273 DS diagnostic status field 247 248 249 duplicate tags external 34 dynamic issue 13 371 dynamic scheduling 371 E EC field 256 ECC error correcting code matrix for secondary cache data array 177 matrix for secondary cache tag array 179 matrix for System interface 183 register 273 secondary cache 10 ECC register 69 74 ECC field 273 efficiency program suggestions for increasing 21 electrical specifications AC 215 Dc 210 Enable field FP 310 enable output delay 216 EntryHi register 245 s 329 ASID field in 330 EntryLo registers an
232. ed with a virtual address and tagged with a physical address WayO 16 Kbytes Way 16 Kbytes Word Data 0 Word Word Data 1 Word Tag 0 0 15 Tag 1 0 15 Set I is CH C eal I Virtual Index Figure4 1 Organization of Primary Instruction Cache Each instruction cache block is in one of the following two states e Invalid e Valid t The precise implementation of the LRU algorithm is affected by the speculative execution of instructions Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Cache Organization and Coherency 47 An instruction cache block can be changed from one state to the other as a result of any one of the following events e a primary instruction cache read miss e subset property enforcement any of various CACHE instructions e external intervention exclusive and invalidate requests These events are illustrated in Figure 4 2 which shows the primary instruction cache state diagram Subset enforcement CACHE Index Invalidate I CACHE Index Store Tag l CACHE Hit Invalidate I S CACHE Index WriteBack Invalidate S CR desc Store Tag E Read hit Intervention exclusive hit Invalidate hit Legend Internally initiated action Externally initiated action I Instruction cache S Secondary cache Figure 4 2 Primary Instruction Cache State Dia
233. ee bits of the 32 bit virtual address are 1015 32 bit kseg1 virtual address space is selected it is the 22 byte 512 Mbyte kernel physical space References to kseg1 are not mapped through the TLB the physical address is selected by subtracting 0xA000 0000 from the virtual address Caches are disabled for accesses to these addresses and physical memory or memory mapped I O device registers are accessed directly 32 bit Kernel Mode Supervisor Space ksseg In Kernel mode when KX 0 in the Status register and the most significant three bits of the 32 bit virtual address are 110 the ksseg virtual address space is selected it is the current 2 byte 512 Mbyte supervisor virtual space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address References to ksseg are mapped through the TLB 32 bit Kernel Mode Kernel Space 3 kseg3 In Kernel mode when KX 0 in the Status register and the most significant three bits of the 32 bit virtual address are 1115 the kseg3 virtual address space is selected it is the current 22 byte 512 Mbyte kernel virtual space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address References to kseg3 are mapped through the TLB MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 324 Chapter 16 64 bit Kernel Mode User Space xkuseg In Kernel mode when KX
234. ee nennen 76 Tae Write Sequence natant ea I n ORE RO Re ERU E EOS 77 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 xiv Table of Contents 6 System Interface Operations Request and Response Cycles nennen nnne nennen 80 System Interface Prequencies recae eee te ente e ge e nde Edere doles ele 80 Register to Register OpefaHon s ni n aean a E nnne nennt nenne nennen 80 System Interface Signals ei eee e Eeee rane ASEE EAEE ATE AOT EE EEATT E PEP AES tenete nennen 81 Master and Slave States ccccccccscsessssssesesesseseeseseecesesssesnsnsneseseeceeseseesenssesesesesnensieseecenesesssananey 81 Connecting to an External Agenti inssi nisreen ein arenaer nne nennt 81 Chister Bus E N ia 82 System Interface Connecti ns iee nen deceret epi et ned ere eredi 83 Uriprocessor System i d etaed esed eee t e aet ee eee tete IIR edens 83 Multiprocessor System Using Dedicated External Agents sess 84 Multiprocessor System Using the Cluster Bus sse 85 System Interface Requests and Responses sess eee eres 86 Processor Requests cccccccsecsssscsssesescscscsssesescsescsesesescscecsssesescecscssssescsescessseseseeeaseeseeeeeeeeeaees 86 External Responses eene deed atit d as deinde e e d es 87 External Requests enean mette inti d nu n ite o fe pe ed 87 Processor Responses et ee e edite et Peter phe e e e pe PER dons 87 Outstandin
235. emented by one Although this is a rare event in this case the count would be incorrect t In other words this count is the sum of the conditional branches that are known to be both correctly predicted and mispredicted MIPS R10000 Microprocessor User s Manual Coprocessor 0 269 Event 6 for Counter 1 Quadwords Written Back From Primary Data Cache This counter is incremented once each cycle that a quadword of data is written from primary data cache to secondary cache Event 7 for Counter 0 Quadwords Written Back From Secondary Cache This counter is incremented once each cycle that a quadword of data is written back from the secondary cache to the outgoing buffer located in the on chip system interface unit Note that data from the outgoing buffer could be invalidated by an external request and not sent out of the processor Event 7 for Counter 1 TLB Refill Exception Due To TLB Miss This counter is incremented on the cycle after the TLB miss handler is invoked All TLB misses are counted whether they occur in the native code or within the TLB handler Event 8 for Counter 0 Correctable ECC Errors On Secondary Cache Data This counter is incremented on the cycle after the correction of a single bit error on a quadword read from the secondary cache data array Event 8 for Counter 1 Branch Misprediction This counter is incremented on the cycle after a branch is restor
236. ents xxvii 16 Memory Management Processor Modes eene dee me Panier eH e tb deti dete 316 Processor Operating Modes rette euh et sien e tein eed 316 Addressing Modes ted RERO RES aH a fee 317 Virtual Address Space sese teer ten ien en hd ene e n rele n re nete 317 User Mode Operations x ente t eee ertet ere edet 318 32 bit User Mode useg iusie e eee ge teret rhe Re Seasoned 319 64 bit User Mode x seg ere e ttt ere rne nies 319 Supervisor Mode Operations issiskiria a KEDER Raae AS EERE AREE 320 32 bit Supervisor Mode User Space suseg sss 320 32 bit Supervisor Mode Supervisor Space sseg sss 321 64 bit Supervisor Mode User Space xsuseg sss 321 64 bit Supervisor Mode Current Supervisor Space xsseg ssss 321 64 bit Supervisor Mode Separate Supervisor Space csseg occ 321 Ke rnel Mode Operatioris iocos anode oe iecore e i E fes 322 32 bit Kernel Mode User Space kuseg sssssssssssseee 323 32 bit Kernel Mode Kernel Space 0 kseg0 sss 323 32 bit Kernel Mode Kernel Space 1 kseg 1 sss 323 32 bit Kernel Mode Supervisor Space ksseg sss 323 32 bit Kernel Mode Kernel Space 3 kseg3 sss 323 64 bit Kernel Mode User Space xkuseg ssssssssssssseee 324 64 bit Kernel Mode Current Supervisor Space xksseg
237. eptions The remainder of this chapter describes exceptions in the order of their priority shown in Table 17 3 with certain of the exceptions such as the TLB exceptions and Instruction Data exceptions grouped together for convenience While more than one exception can occur for a single instruction only the exception with the highest priority is reported Some exceptions are not caused by the instruction executed at the time and some exceptions may be deferred See the individual description of each exception in this chapter for more detail Table 17 3 Exception Priority Order Cold Reset highest priority Soft Reset Nonmaskable Interrupt NMI 3 Cache error Instruction cache Cache error Data cache z Cache error Secondary cache Cache error System interface Address error Instruction fetch TLB refill Instruction fetch TLB invalid Instruction fetch Bus error Instruction fetch Integer overflow Trap System Call Breakpoint Reserved Instruction Coprocessor Unusable or Floating Point Exception Address error Data access TLB refill Data access TLB invalid Data access TLB modified Data write Watch Bus error Data access Interrupt lowest priority i These exceptions are interrupt types and may be imprecise Priority may not be followed when considering a specific instruction Generally speaking the exceptions desc
238. equest buffer When the System interface is in slave state and an external agentissues an external coherency or allocate request number request it places an entry into the cluster request buffer Once an entry is placed into the cluster request buffer the associated request number transitions from free to busy An entry remains busy until the processor receives an external completion response Processor requests that are ready to be issued to the System interface bus probe the cluster request buffer to detect conflict conditions Cached Request Buffer The System interface contains a four entry cached request buffer This buffer holds the status of the four possible outstanding processor cached requests including processor block read and upgrade requests The relative order of the requests is maintained in the cached request buffer External coherency requests probe the cached request buffer to detect conflict conditions MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 90 Incoming Buffer Version 2 0 of January 29 1997 Chapter 6 The System interface contains an incoming buffer for external block and double single partial word data responses The four 32 word entries of the incoming buffer correspond to the four possible outstanding processor requests Block data in each entry of the incoming buffer is stored in subblock order beginning with a quadword aligned address The incoming buffer eliminates
239. er 5 4 Word Write Sequence A 4 word write sequence is performed by a CACHE Index Store Data S instruction to store a quadword of data and 10 check bits into the secondary cache data array Figure 5 7 depicts a secondary cache 4 word write sequence A quadword is written to the index specified by PA 23 6 and the way specified by VA 0 of the CACHE instruction A doubleword specified by VA 3 is obtained from the CPO TagHi and TagLo registers and the other half of the doubleword is padded to zeros Normal ECC and parity generation is bypassed and the check field of the data array is written with the contents of the CPO ECC 9 0 register Cycle SCCIk SC A B Addr 18 0 SCTagLSBAddr SC A B DWay SCData 127 0 SCDataChk 9 0 SC A B DOE SC A B DWr SC A B DCS SCTWay SCTag 25 0 SCTagChk 6 0 SCTOE SCTWrI SCTCS Figure 5 7 4 Word Write Sequence Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Secondary Cache Interface 75 8 Word Write Sequence An 8 word write sequence writes back a dirty block from the primary data cache to the secondary cache Figure 5 8 depicts a secondary cache 8 word write sequence SC A B DWay are driven with the way bit obtained from the primary data cache tag The secondary cache tag is not written since it was previously updated when the primary data cache block was modified Cycle 1i2 3i4
240. er value SysNMI The following input pins may be unused in certain system configurations and each of them should be tied to Vss preferably through a resistor of 100 ohms or greater value SysRdRdy SysWrRdy SysGblPerf 2 SysCyc The following input pins may be unused in certain system configurations and each of them should be tied preferably to Vss or VccOSys through a resistor of 100 ohms or greater value SysADChk 7 0 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 214 Chapter 12 DC Input Output Specifications All processor output drivers are CMOS push pull and the signals swing between VccQ and Vss In open drain mode the gates of the PMOS pullup devices are disabled Input only pins include a disabled output buffer for implicit ESD protection Tables 12 2 and 12 3 describe the DC characteristics of the I O signals for the HSTL and CMOS TLL configurations NOTE As the JEDEC Standard 8 x evolves the HSTL specifications will also change and the processor will remain compliant with these standards Table 12 2 DC Characteristics for HSTL Configuration Symbol Parameter Units Conditions VOH Output high voltage VccQ 2 0 3V N A V N A VOL Output low voltage VecQ 2 0 3V V N A IH V N A V Input high voltage Vref 100mV Vcc 300mV VIL Input low voltage 300mV Vref 100mV ILeak I O leakage current
241. ered from a separate supply dependent on the output logic family used in the application system e For JEDEC compatible HSTL operation the nominal value for VccOSC and VecQSys are in the 1 5V 100 millivolt range e For CMOS TIL compatible systems VccOSC and VccQSys can be externally tied to the same Vcc as the core power supply NOTE The I O pins of the R10000 processor may not be driven higher than 4 0V by any device in the system until the Vcc and VccOQ inputs are stable Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Electrical Specifications 211 DCOk and Power Supply Sequencing The following guidelines are designed to protect the processor from damage or latch up e With respect to the Vcc 3 3V supply to the core VecO SC Sys either 1 5V or 3 3V must not be driven more than a diode threshold voltage e Vref should not go higher than VecQ SC Sys Generally Vref is derived from VccQ through a resistor divider and therefore cannot rise above VccQ e The power to termination resistors must not arrive before Vcc and VccQ SC Sys arrive at the processor e None of the supplies can float or be driven negative One method of protecting the processor from excessive input voltage is to sequence the power supplies for the entire system ensuring that the power to the processor is stable before any components drive signals to the processor Another method to tristate all external drivers to the
242. erial is inserted between the package and heatsink to increase the contact area between their surfaces Assembly Drawings and Pinout List The following pages contain a pinout list Table 13 3 and drawings of an example R10000 LGA PWB assembly including details of the PWB heatsink and bolster plate Actual hardware specifications are dependent on the user An assembly drawing of the 599LGA is also shown in Figure 13 2 Note that hardware specifications given in this drawing will require modifications to accommodate the actual dimensions of the socket PWB heatsink bolster etc Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 223 Packaging Z Add Avy GldD ANYI OIAVS3O 66S ANNLNO 39vMOVd ONI SSINOIONHOSL SdIN GL SY OL 0l 7 SL 0l MAIA dOL 8070 7 SSA OL Q3193N OO 38v 9f ls ANY AN L OT JYV S3ONVS3IOL SG ETE FOE SS31L3ATTI dS 3SIMJ3JHIO SSJINN IW NI 33V SNOISN3NIG L SALON LoL GvXOG 2 e ze Oc gz dz vz zz oz di 91 vi Zz ol 8 9 v c se ee ie ez 4z sz ez iz erases JR 6 Leeni OMTId uu S
243. ernel used to match vAddr 3 _ 62 Fill Reserved 0 on read ignored on write 0 Reserved Must be written as zeroes and returns zeroes when read In 64 bit addressing mode the VPN2 field contains bits 43 13 of the 44 bit virtual address In 32 bit addressing mode only the lower 32 bits of the EntryHi register are used so the format remains the same as in the R4400 processor s 32 bit addressing mode The FILL field is ignored on write and read as zeroes as it was in the R4400 implementation When either a TLB refill TLB invalid or TLB modified exception occurs the EntryHi register is loaded with the virtual page number VPN2 and the ASID of the virtual address that did not have a matching TLB entry MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 246 Chapter 14 14 10 Status Register 12 Version 2 0 of January 29 1997 The Status register SR is a read write register that contains the operating mode interrupt enabling and the diagnostic states of the processor The following list describes the more important Status register fields Figure 14 11 shows the format of the entire register and Table 14 10 describes the Status register fields Some of the important fields include e The 4 bit Coprocessor Usability CU field controls the usability of 4 possible coprocessors Regardless of the CUO bit setting CPO is always usable in Kernel mode The XX bit enables the MIPS IV ISA i
244. eset sequence is as follows Version 2 0 of January 29 1997 The external agent negates DCOk The external agent asserts SysReset The external agent negates SysGnt The external agent negates SysRespVal Once Vcc VecQ SC Sys Vref SC Sys Vcc Pa Pd and SysClk stabilize the external agent waits at least Ims and then asserts DCOk At this time the System interface resides in slave state and all internal state is initialized The SysClkDiv mode bits default to divide by 1 The SCCIkDiv mode bits default to divide by 3 After waiting at least 100 ms for the internal clocks to stabilize the external agent loads the mode bits into the processor by driving the mode bits on SysAD 63 0 waiting at least two SysClk cycles and then asserting SysGnt for at least one SysClk cycle After waiting at least another 100 ms for the internal clocks to restabilize the external agent synchronizes all clocks internal to the processor This is performed by asserting SysResp Val for one SysClk cycle After waiting at least 100 ms for the internal clocks to again restabilize a third 100 ms restabilization period the external agent negates SysReset The external agent must retain mastership of the System interface refrain from issuing external requests or nonmaskable interrupts and ignore the system state bus until the processor asserts SysReq The assertion of SysReq indicates the processor is ready for operation In a cluster arra
245. ests Also the external agent must be able to accept at least four processor eliminate requests two processor double single partial word write requests or one processor block write request Figure 6 16 depicts three processor double single partial word write requests and four processor block read requests After sensing the first processor double single partial word write request the external agent negates SysWrRdy The external agent must have buffering sufficient for one additional processor write request before the flow control takes effect The external agent negates SysRdRdy upon observing the first processor read request The external agent must have buffering sufficient for three additional processor read requests before the flow control takes effect MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 126 Chapter 6 Cycle SysCIk Master SysReq SysGnt SysRel SysCmd 11 0 ReqLst Reds Reds SEE SEE SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Figure 6 16 Processor Request Flow Control Protocol Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 127 External Response Protocol The processor supports two classes of external responses e external data responses provide a double single partial word of data
246. from a non existing CPO register is undefined Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 14 27 CACHE Instruction CACHE 287 Cache CAC H E 26 25 21 20 16 15 1 ee Dase see mee below ones 6 5 5 16 Format CACHE op offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form a CacheOp virtual address VA The VA is translated to a physical address PA through the TLB and the 5 bit opcode decoded inTable 14 27 specifies a cache operation for that address together with the affected cache Operation of this instruction on any combination not listed in the tables below is undefined The operation of this instruction on uncached addresses is also undefined More detailed descriptions of the CacheOps listed below are given separately in Chapter 10 CACHE Instructions Table 14 27 CACHE Instruction Op Field Encoding OpField CACHE Instruction Variation Target Cache 00000 Index Invalidate I 00100 Index Load Tag 1 01000 Index Store Tag 1 10000 Hit Invalidate I 10100 Cache Barrier 11000 Index Load Data I 11100 Index Store Data I 00001 Index WriteBack Invalidate D 00101 Index Load Tag D 01001 Index Store Tag D 10001 Hit Invalidate D 10101 Hit WriteBack Invalidate D 11001 Index Load Data D 11101 Index Store Data D 00011 Index WriteBack
247. g MIPS R10000 Microprocessor User s Manual Coprocessor 0 281 Secondary Cache Operation If the CacheOp is an Index Load Store Tag for secondary cache operations the fields of the TagHi and TagLo registers are defined as follows STag0 contains physical address bits 35 18 stored in the cache tag SState contains the secondary cache state of the line as follows 00 gt Invalid 015 Shared 105 Clean Exclusive 11 Dirty Exclusive VIndex virtual index contains only two bits of significance since the32 Kbyte 2 way set associative primary caches are addressed using only two untranslated address bits VA 13 12 plus the offset within the virtual page ECC contains the ECC for the STag SState and VIndex fields Errata MRU indicates which way was the most recently used in the set STag1 contains the physical address bits 39 36 stored in the cache tag 0 Reserved Must be written as zeroes and returns zeroes when read Figure 14 30 shows the fields of the TagHi and TagLo registers 31 14 13 1211 10 9 8 76 0 18 2 2 1 2 7 31 30 43 0 1 27 4 Figure 14 30 TagHi Lo Register Fields in Secondary Cache When CacheOp is Index Load Store Tag Errata Figure 14 30 size of the STag0 field is revised MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 282 Chapter 14 CacheOp is Index Load Store Data This section describes the following three states of the TagLo and T
248. g Point Pipeline sse 7 Functional Units cscs edge ertt e eene iem ete eet ri P C pepe o iced 9 Primary Instruction Cache I cache sse eee eene 9 Primary Data Cache D cache tte tet toe deren 9 Instruction Decode And Rename Unit 10 Branch rit ase Re dedi edente teat aR 10 External Interfaces aie eee te de nete iie te re E nd 10 Instruction Queu s ERR ene ee eee ein es ee ve ee 11 Integer Queue ese ee ene oi ae ie nant bled ees te adi ete dent 11 Floating Point Queues a sss dea detis Aun esta aes eet 11 Address Queue sec ete tto ar ee ee en tee ba ito egeta 12 Program Order and Dependencies sssssssssssssseeneeerene nene 13 Instruction Dependencies cccccccccsesesesnseseseeseseescscessescsesnsnenssssesceesesescesnesesesneenenees 13 Execution Order and Stalling sse tenente 13 Branch Prediction and Speculative Execution sss 14 Resolving Operand Dependencies ssssssssssseeseeeneneenerteenenee 14 Resolving Exception Dependencies sss 15 Strong Ordering s seire peire iri eee in eren ee rt tete le eri ees 15 An Example of Strong Ordering sse 16 RI0000 Pipelines 3 net tet o tee di e iiem a eese 17 Pr HT O 17 Stdge 2 inset enu ead cione tini A Pen e rins 17 SARE Sis CES 18 ONA TE A E RE GUERRE OR er EHE ERE He A 18 Floating Point M
249. g Requests and Request Numbers sess een 87 Request and Response Relationship sss 88 System Interface Buffers iii eee eea teo dente e ate e vede EE ne edi eive a 89 Cluster Request Buffer eem enna cts eth ies ERO duia editt 89 Cached Reqtest Butlers iet edes tg esi e tee eaa ie s 89 Incoming Buffer ieu see Rite teret e er ii HERI EE RU ennt tire ta 90 Outgoing Buffet ig s enisi eee itte tee tei cete rhe a o ete ees 91 Uncached Buffer caben ues eet de neri ee 92 System Interface Flow Control m Idee noter deri 93 Processor Write and Eliminate Request Flow Control sssssssss 93 Processor Read and Upgrade Request Flow Control sess 93 Processor Coherency Data Response Flow Control sse 93 External Request Flow Control sse retener 93 External Data Response Flow Control sss 93 System Interface Block Data Ordering ssssssssssssseeeeneeeenenenreenete nen 94 External Block Data Responses sss eene nnne 94 Processor Coherency Data Responses sse 94 Processor Block Write Requests cccccccscssssseseseessteescecesesescsnasneseseseeneseseececeneseseansnanenees 94 System Interface Bus Encoding cccesesceccesesseseseseecesessseseensneseseeceesesescensnesesesesensseseecseeseesseananey 95 SysCmd 11 0 Encoding sse eene 95
250. g array then the State bits are written to 00 Invalid the SCWay bit 0 the StateMod bits 001 Normal and the DState parity 0 The LRU bit is left unchanged Parity check is enabled Hit CacheOps can cause cache error exceptions if they check ECC or parity bits MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 198 10 13 Hit Invalidate S Hit Invalidate S invalidates all entries in the secondary primary instruction and primary data caches which match the PA of the CACHE instruction The following sequence takes place 10 14 Cache Barrier Version 2 0 of January 29 1997 1 Chapter 10 The processor reads the Tags from both ways of the secondary cache at the address pointed to by the PA of the CACHE instruction Ifthe tag entry s STag matches the CACHE instruction PA and the State of the entry is not equal to 00 Invalid then a Hit has occurred in that entry If there is no Hit the CACHE instruction completes The processor checks each entry in the primary caches to determine which corresponds to the CACHE instruction PA and the PIdx read from the secondary cache tag array Any entry which matches is invalidated No write back is required by Hit Invalidate S The processor sets the tag array entry of the secondary cache block which was hit to State 00 Invalid Tag PA of CACHE instruction and PIdx VA 13 12 of CACHE instruction ECC is generated The MRU bit is writ
251. g the instruction which caused the exception e preventing the exception causing instruction from graduating e aborting all subsequent instructions Thus all register values remain the same as if instructions were executed singly Effectively all previous instructions are completed but the faulting instruction and all subsequent instructions do not modify any values Strong Ordering A multiprocessor system that exhibits the same behavior as a uniprocessor system in a multiprogramming environment is said to be strongly ordered The R10000 processor behaves as if strong ordering is implemented although it does not actually execute all memory operations in strict program order In the R10000 processor store operations remain pending until the store instruction is ready to graduate Thus stores are executed in program order and memory values are precise following any exception For improved performance however cached load operations my occur in any order subject to memory dependencies on pending store instructions To maintain the appearance of strong ordering the processor detects whenever the reordering of a cached load might alter the operation of the program backs up and then re executes the affected load instructions Specifically whenever a primary data cache block is invalidated due to an external coherency request its index is compared with all outstanding load instructions If there is a match and the load has been completed
252. gister contains the Kernel mode bit values shown in Table 16 1 Kernel mode virtual address space is divided into regions differentiated by the high order bits of the virtual address as shown in Figure 16 3 32 bit KSU 00 or EXL 1 or ERL 1 and KX 0 Ox FFFF FFFF 0 5 Gbytes Ox E000 Mapped 0 5 Gbytes 0x C000 Mapped DECRE 0 5 Gbytes Unmapped Ox A000 Uncached ga AREN 0 5 Gbytes Unmapped 0x 8000 Cached Ox 7FFF 2 Gbytes Mapped Ox 0000 0000 Version 2 0 of January 29 1997 Ox Ox Ox Ox Ox Ox Ox Ox Ox kseg3 5 Ox ksseg ae Ox kseg1 s Ox kseg0 0x 0x 0x kuseg 0x 0x 0x 0x FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF C000 C000 C000 BFFF 8000 TEFE 4000 4000 4000 3FFF 0000 0000 0000 0000 0000 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF OFFF OFFE 0000 FFFF 0000 FFFF 1000 OFFF 0000 FFFF 1000 OFFF 0000 0000 0000 FFFF E000 DFFF C000 BFFF A000 9FFF 8000 TEFE 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 8000 TEFE 0000 64 bit KSU 00 or EXL 1 or ERL 1 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 FFFF 0000 and KX 1 0 5 Gbytes Mapped ckseg3 0 5 Gbytes Mapped 0 5 Gbytes Unmap
253. gister specifies whether a load or store initiated this exception A Watch exception violates the rules of a precise exception in the following way If the load or store reference which triggered the Watch exception has a cacheable address and misses in the data cache the line will then be read from memory into the secondary cache if necessary and refilled from the secondary cache into the data cache In all other cases cache state is not affected by an instruction which takes a Watch exception The CACHE instruction never causes a Watch exception The Watch exception is postponed if either the EXL or ERL bit is set in the Status register If either bit is set the instruction referencing the WatchLo WatchHi address is executed and the exception is delayed until the delay condition is cleared that is until ERL and EXL both are cleared set to 0 The EPC contains the address of the next unexecuted instruction A delayed Watch exception is cleared by system reset or by writing a value to the WatchLo register Watch is maskable by setting the EXL or ERL bits in the Status register The common exception vector is used for this exception and the Watch code in the Cause register is set The Watch exception is a debugging aid typically the exception handler transfers control to a debugger allowing the user to examine the situation To continue program execution the Watch exception must be disabled to execute the faulting instruction
254. gisters in the Floating Point Register File from which the 32 logical registers are taken FP instructions use a 5 bit logical number to select an individual FGR These logical numbers are mapped to physical registers by the rename unit in pipeline stage 2 before the Floating Point Unit executes them Physical registers are selected using 6 bit addresses MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 304 Chapter 15 32 and 64 Bit Operations The FR bit 26 in the Status register determines the number of logical floating point registers available to the program and it alters the operation of single precision load store instructions as shown in Figure 15 2 e FR is reset to 0 for compatibility with earlier MIPS I and MIPS II ISAs and instructions use only the 16 physical even numbered floating point registers 32 logical registers Each logical register is 32 bits wide e FR is set to 1 for normal MIPS III and MIPS IV operations and all 32 of the 64 bit logical registers are available Status Bit FR 0 Sixteen 64 bit Physical Registers Status Bit FR 1 Thirty two 32 bit Logical Registers Thirty two 64 bit Registers MIPS I and MIPS II compatible MIPS IIl and MIPS IV only Physical Register 63 32 31 0 63 0 FGR 1 FGR 0 FGR 0 Register 0 63 0 Register is not implemented FGR 1 Register 1 63 32 31 0 63 0 FGR 3 FGR 2 FGR 2 Register 2 63 0 Regis
255. gnals communicating with the external secondary cache synchronous SRAM These domains are described in this chapter MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997155 156 Chapter 7 7 1 System Interface Clock and Internal Processor Clock Domains Version 2 0 of January 29 1997 In high performance systems PECL level differential clocks are routinely used to minimize system clock skews The R10000 processor receives differential system clock signals at the SysClk and SysClk pins two additional pins SysClkRet and SysClkRet are the return paths for termination of these signals SysClk and SysClk are used to drive an on chip phase locked loop PLL which multiplies the system clock to create an internal processor clock PCIk The R10000 processor always communicates with the system at the SysClk frequency and PCIk always runs at a frequency multiple of SysClk according to the following formula PClk SysClk SysClkDiv 1 2 For example in a 50 MHz system with SysClkDiv 7 and SCCIkDiv 2 PCIk 50 8 2 200 MHz NOTE It is preferred that the R10000 processor uses a differential PECL clock input However ina less aggressive system a CMOS TTL single ended clock can be used to drive the processor provided its complementary clock input SysClk is tied to an appropriate reference voltage 1 4V for TTL Vcc 2 for CMOS In any case the reference voltage applied to SysCIk should not be less tha
256. gram MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 48 Chapter 4 4 2 Primary Data Cache The processor has an on chip 32 Kbyte primary data cache also referred to simply as the data cache which is a subset of the secondary cache The data cache uses a fixed block size of 8 words and is two way set associative that is two cache blocks are assigned to each set as shown in Figure 4 3 with an LRU replacement algorithm WayO 16 Kbytes Way 1 16 Kbytes Word Data 0 Word Word Data 1 Word Tag 0 0 7 Tag 1 0 7 Set Virtual Index Figure 4 3 Organization of Primary Data Cache The data cache uses a write back protocol which means a cache store writes data into the cache instead of writing it directly to memory Sometime later this data is independently written to memory as shown in Figure 4 4 Time JA oe Primary write back Secondary write back Cache Cache Processor Figure 4 4 Write Back Protocol Write back from the primary data cache goes to the secondary cache and write back from the secondary cache goes to main memory through the system interface The primary data cache is written back to the secondary cache before the secondary cache is written back to the system interface The precise implementation of the LRU algorithm is affected by the speculative execution of instructions Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Ma
257. h external load termination resistors providing logic high levels The Vref SC Sys pins must be connected to a stable reference voltage source This reference point is used in the input sense amp current mirror to provide the switch point for the logic levels Inside the processor the Vref SC Sys signals have a large capacitance and a low pass filter at each receiver The DCOk pins must not be asserted until there has been sufficient time for Vref SC Sys to stabilize at each of the receivers inside the processor A typical Vref SC Sys generator is two resistors which provide the Vref SC Sys level associated with the chosen logic family and a 10uF tantalum capacitor connected to the processor s Vref SC Sys pin to provide stability MIPS R10000 Microprocessor User s Manual Electrical Specifications Unused Inputs Errata 213 Several input pins are unused during normal system operation and should be tied to Vcc through resistors e JTDI e JTCK e JTMS Several input pins are unused during normal system operation and should be tied to Vss through 100 ohm resistors e TCA TCB e PLLDis e Sparel Spare3 Several input pins are unused during normal system operation and should be tied to Vss e PLLSpare1 PLLSpare2 PLLSpare3 PLLSpare4 e SelDVCO The following input pins may be unused in certain system configurations and each of them should be tied to VccOSys preferably through a resistor of 100 ohms or great
258. hapter 15 the section titled FPU Instructions describes the FPU specific instructions In the R1000 processor the Prefetch instruction PREF attempts to fetch data into the secondary and primary data caches The action taken by a Prefetch instruction is controlled by the instruction hint field as decoded in Table 1 1 Table 1 1 PREF Instruction Hint Field Hint Value Name of Hint Action Taken 0 Load Prefetch data into cache LRU way 1 Store Prefetch data into cache LRU way 2 3 undefined load_streamed Prefetch data into cache way 0 store_streamed Prefetch data into cache way 0 load_retained Prefetch data into cache way 1 store_retained Prefetch data into cache way 1 8 31 undefined For a store Prefetch an Exclusive copy of the cache block must be obtained in order that it may be written MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 27 LL SC Load Linked and Store Conditional instructions are used together to implement a memory semaphore Each LL SC sequence has three sections 1 The LL loads a word from memory 2 A short sequence of instructions checks or modifies this word This sequence must not contain any of the events listed below or the Store Conditional will fail e exception e execution of ERET e load instruction Store instruction e SYNC instruction e CACHE instruction e PREF instruction e external intervention exclusi
259. hat it protects speculation only for a single branch but in some instances it may be more efficient than either the conditional move or the cache barrier workarounds This workaround uses the fact that branch likely instructions are always predicted as taken by the R10000 Thus any incorrect speculation by the R10000 on a branch likely always occurs on a taken path Sample code is begl rx r1 label nop sw r2 OxO r1 The store to r1 will never be to an address referred to by the content of rx because the store will never be executed speculatively Thus the address referred to by the content of rx is protected from any spurious write backs Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 25 This workaround is most useful when the branch is often taken or when there are few instructions in the protected block that_are not memory operations Note that no instructions in a block following a branch likely will be initiated by speculation on that branch however in the case of a serial instruction workaround only memory operations are prevented from speculative initiation In the case of the conditional move workaround speculative initiation of all instructions continues unimpeded Also similar to the conditional move workaround this workaround only protects fall through blocks from speculation on the immediately preceding branch Other mechanisms
260. he snoopy based snoopy based with external duplicate tags and control directory based with external directory structure and control Secondary Cache R10000 System Interface Secondary Cache Interface Secondary Cache Interface System Interface R10000 External Agent Coherent Interconnect To Other System Resources l External Agent ee Directory Structure Figure 2 2 Multiprocessor System Organization using Dedicated External Agents MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 36 Chapter 2 Multiprocessor Systems Using a Cluster Bus A multiprocessor system may be created with R10000 processors by using a cluster bus configuration Such a system is shown in Figure 2 3 A cluster bus is created by attaching the System interfaces of up to four R10000 processors with an external agent the cluster coordinator The cluster coordinator is responsible for managing the flow of data within the cluster This organization can reduce the number of ASICs and the pin count needed for a small multiprocessor systems The cluster bus protocol supports three coherency schemes e snoopy based e snoopy based with external duplicate tags and control e directory based with external directory structure and control Secondary Secondary Cache Cache Secondary Cache Interface Secondary Cache Interface R10000
261. he associated error handling sequence Table 9 9 Protocol Observation Protocol Observation Error Handling External response data cycle with an unexpected request number during an external block data response for a processor block read or upgrade request originated by the processor External block data response specifying a Reserved cache block state for a processor block read or upgrade request originated by the processor Ignore the external response data cycle Override the cache block state to CleanExclusive External block data response specifying a Shared cache block state for a processor coherent block read exclusive or upgrade request originated by the processor Override the cache block state to CleanExclusive External completion response specifying a Reserved completion indication External ACK completion response for a processor read request originated by the processor that has not received an external data response MIPS R10000 Microprocessor User s Manual Ignore the external completion response Override the external ACK completion response toa NACK Version 2 0 of January 29 1997 186 Chapter 9 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 10 CACHE Instructions This chapter describes the CacheOps CACHE used in the R10000 processor The format of the CACHE instruction is CACHE op offset base In a CACHE instruction the 16 bit offset
262. he desired data is in the predicted way e the desired data is in the non predicted way e the desired data is not in the secondary cache The tags for both ways are read underneath the data access cycles in order to discern as rapidly as possible which of these states are valid This reading is possible because it takes two accesses to read a primary data block 8 words and 4 cycles to read a primary instruction block 16 words thus the bandwidth needed to read the tag array twice exists in all cases Only an extra address pin to the tag array is needed to make this operation parallel and this is implemented by the SCTWay pin The three possible states are handled in the following manner e If after reading the tags for both ways it is discovered that the data exists in the predicted way the processor continues normally e If the data exists in the non predicted way the processor accesses this non predicted way in the secondary cache and updates the way prediction table to point to this way Errata e If the access misses in both ways of the secondary cache the data is fetched from the system interface If the state of the predicted way is found to be invalid the fetched data is placed in it and the MRU is unchanged However if the state of the predicted way is found to be valid then the fetched data is placed into the non predicted way and the way prediction table is updated to point to this way since it is now the most recentl
263. he processor allows the conflicting external coherency request to proceed and provides a Shared processor coherency state response Once the conflicting external coherency request has received an external completion response the processor internally NACKs the processor upgrade request that is pending issue Block write Intervention shared Intervention exclusive Invalidate The processor provides a DirtyExclusive processor coherency state response and changes the processor block write request that is pending issue into a DirtyExclusive processor coherency data response The processor provides a DirtyExclusive processor coherency state response and deletes the processor block write request that is pending issue Eliminate Intervention shared Intervention exclusive Invalidate The processor provides a Shared or CleanExclusive processor coherency state response and deletes the processor eliminate request that is pending issue t If the processor eliminate request that is pending issue has a DirtyExclusive state a CleanExclusive processor coherency state response is provided MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 144 Chapter 6 External Coherency Conflicts Errata Version 2 0 of January 29 1997 A processor request is considered to be pending response when it has been issued to the System interface bus but has not yet received an external data or com
264. he tag array it checks the SCTagChk 6 0 bus for the proper ECC If a correctable error is detected correction is automatically performed in line without affecting latency The processor asserts SysCorErr for one SysClk cycle to inform the external agent that a correctable error has been detected and corrected If an uncorrectable error is detected the secondary cache unit posts a Cache Error exception and initializes the TA and SIdx fields in the local CacheErr register The processor asserts SysUncErr for one SysClk cycle to inform the external agent that an uncorrectable tag error has been detected Whenever the processor writes the secondary cache tag array it drives the proper ECC on the SCTagChk 6 0 bus MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 180 Chapter 9 9 12 System Interface Error Protection and Handling Error Protection Version 2 0 of January 29 1997 This section describes error protection and error handling schemes for the System interface The System interface buses have the following error protection schemes as listed in Table 9 7 Table 9 7 System Interface Bus Error Protection Bus Width Error Protection SysCmd 12 bit Odd parity SysAD 64 bit 8 bit ECC SysState 3 bit Odd parity SysResp 5 bit Odd parity MIPS R10000 Microprocessor User s Manual Error Protection and Handling Error Handling Errata SysCmd 11 0 Bus 181 This section
265. he tags are updated immediately for stores to the primary data cache and all caches use a write back protocol the data in the secondary cache may not always be consistent with data in the primary cache even though the tags always reflect the correct state of a secondary cache block MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 68 Chapter 5 5 6 Read Sequences There are five basic read sequences e a4 word read e an 8 word read e a 16 word read e a32 word read e a tag read Errata The SCCIK referred in the secondary cache read and write timing diagrams is an internal SCCIK The relationship between this internal SCCIK and the external SCCIK 5 0 5CCIK 5 0 can be programmed during boot time by setting the SCCIKTap mode bits see the section titled Mode Bits in Chapter 8 for detail on mode bits Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Secondary Cache Interface 69 4 Word Read Sequence A 4 word read sequence is performed by a CACHE Index Load Data S instruction to read a doubleword of data and 10 check bits from the secondary cache data array Figure 5 3 depicts a secondary cache 4 word read sequence A quadword is read from the index specified by PA 23 6 and the way specified by VA 0 of the CACHE instruction The doubleword specified by VA 3 is then stored into the CPO TagHi and TagLo registers and the corresponding check bits are stored i
266. hed or outgoing buffer entry the secondary cache just commenced an index conflicting CACHE Hit WriteBack Invalidate S and the external coherency request misses in the secondary cache MRU way tt This value assumes the external coherency request hits an outgoing buffer entry tH This value assumes the external coherency request does not hit a cached or outgoing buffer entry the secondary cache is not busy the external coherency request hits in the MRU way of the secondary cache no subset primary data cache blocks are inconsistent and the external coherency request is secondary cache block aligned If the external coherency request misses in the MRU way of the secondary cache 1 to 3 additional PCIK cycles are required to query the LRU way of the secondary cache depending on the SCCIkDiv mode bits This value assumes the external coherency request does not hit a cached or outgoing buffer entry the secondary cache just commenced an index conflicting CACHE Hit WriteBack Invalidate S the external coherency request hits in the LRU way of the secondary cache all subset primary data cache blocks are inconsistent and the external coherency request is not secondary cache block aligned MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 148 SysGblPerf Signal Chapter 6 The SysGblPerf signal is provided for systems implementing a relaxed consistency memory model The external agent asserts this signal when a
267. his old value is read using the shared read port it is then merged with the new word and the merged doubleword value is written A write to the register file writes all 64 bits in parallel When instructions are renamed in MIPS I or II the low bit of any FGR field is forced to zero Thus each even odd logical register number pair is treated as an even numbered double precision register Odd numbered logical registers are not used in the mapping tables and dependency logic but they remain mapped to their latest physical registers MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 308 Chapter 15 15 4 Floating Point Control Registers The MIPS IV ISA permits up to 32 control registers to be defined for each coprocessor but the Floating Point Unit uses only two e Control register 0 the FP Implementation and Revision register e Control register 31 the Floating Point Status register FSR Floating Point Implementation and Revision Register The following fields are defined for control register 0 in Coprocessor 1 the FP Implementation and Revision register as shown in Figure 15 6 e The Implementation field holds an 8 bit number 0x09 which identifies the R10000 implementation of the floating point coprocessor e The Revision field is an 8 bit number that defines a particular revision of the floating point coprocessor Since it can be arbitrarily changed it is not defined here Implementation and Revis
268. hree operating modes are listed in order of decreasing system privilege e Kernel mode highest system privilege can access and change any register The innermost core of the operating system runs in kernel mode e Supervisor mode has fewer privileges and is used for less critical sections of the operating system e User mode lowest system privilege prevents users from interfering with one another Selection between the three modes can be made by the operating system when in Kernal mode by writing into Status register s KSU field The processor is forced into Kernel mode when the processor is handling an error the ERL bit is set or an exception the EXL bit is set Table 16 1 shows the selection of operating modes with respect to the KSU EXL and ERL bits Table 16 1 also shows how different instruction sets and addressing modes are enabled by the Status register s XX UX SX and KX bits A dash in this table indicates a don t care For detailed information on the address spaces available in each mode refer to section titled Virtual Address Space in this chapter The R10000 processor was designed for use with the MIPS IV ISA however for compatibility with earlier machines the useable ISAs can be limited to either MIPS III or MIPSI II Table 16 1 Processor Modes XX KX SX UX KSU ERL EXL Pecas ISA ISA Addressing Mode 317 65 43 1 2 d ESCEHpHO HI IV 32 Bit
269. igure 14 23 The primary data cache is byte protected with four bits used for each 32 bit data word DP in Figure 14 23 Each quadword of the secondary cache data uses nine bits of ECC and one bit of parity SP and ECC in Figure 14 23 The primary instruction CacheOps load or store one instruction word at a time therefore one bit is used in the ECC register The primary data CacheOps load or store four bytes at a time therefore four bits are used in the ECC register The seconda ry CacheOps use ECC 9 as the parity bit and ECC 8 0 as the 9 bit ECC For the Index Store Data CacheOps the unused bits are ignored For Index Load Data CacheOps the unused a bits are with zeroes Figure 1 register 4 23 shows the format of the ECC register Table 14 23 describes the fields 31 10 9 8 0 1 9 0 SP ECC IP 22 1 9 Field Figure 14 23 ECC Register Format Table 14 23 ECC Register Fields Description SP A 1 bit field specifying the parity bit read from or written to a secondary cache ECC DP IP An 9 bit field specifying the ECC bits read from or written to a secondary cache An 4 bit field specifying the parity bits read from or written to a primary data cache An 1 bit field specifying the parity bit read from or written to a primary instruction cache MIPS R10000 Microprocessor User s Manual Reserved Must be written as zeroes and returns zeroes when read Version 2 0 of Januar
270. imilarly a processor coherency data response issued by a master processor is observed as an external data response by the slave processors Slave Slave Slave R10000 R10000 R10000 System Interface System Interface System Interface External Request Cluster Bus Cluster Coordinator Figure 6 1 Processor Request Master Slave Status In a multiprocessor system using the cluster bus a mode bit specifies whether processor coherent requests are to target the external agent only or all processors and the external agent This allows systems with efficient snoopy duplicate tag or directory based coherency protocols to be created MIPS R10000 Microprocessor User s Manual System Interface Operations 6 8 System Interface Connections The major System interface connections required for various system configurations are presented in this section Uniprocessor System Figure 6 2 shows the major System interface connections required for a typical uniprocessor system Mem IO HA gt Figure 6 2 MIPS R10000 Microprocessor User s Manual SysReq External syscn Agent SysRel SysRdRdy SysWrRdy SysCmd 11 0 SysCmdPa SysAD 63 0 SysADChk 7 0 SysVal SysState 2 0 SysStatePal SysStateVal SysResp 4 0 SysRespPai SysRespVa x r A SCTWI SCTCS SysRel SCTOE R1000 1 SCTag 25 0 SysRdRdy SCTagChk 6 0 pe x
271. immediately available for use by subsequent instructions Internal to the processor however instructions are executed dynamically and some results may not be available for many cycles yet the hardware must behave as if each instruction is executed sequentially This section describes various conditions and dependencies that can arise from them in pipeline operation including e instruction dependencies e execution order and stalling e branch prediction and speculative execution resolving operand dependencies resolving exception dependencies Instruction Dependencies Each instruction depends on all previous instructions which produced its operands because it cannot begin execution until those operands become valid These dependencies determine the order in which instructions can be executed Execution Order and Stalling The actual execution order depends on the processor s organization in a typical pipelined processor instructions are executed only in program order That is the next sequential instruction may begin execution during the next cycle if all of its operands are valid Otherwise the pipeline stalls until the operands do become valid Since instructions execute in order stalls usually delay all subsequent instructions A clever compiler can improve performance by re arranging instructions to reduce the frequency of these stall cycles e Inan in order superscalar processor several consecutive instructio
272. in slave state the processor must first assert SysReq and then wait until the external agent relinquishes mastership of the System interface by asserting SysGnt and SysRel Cycle SysClk Master SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 i b i l SysADChK 7 0 SysVal d d d i i r i SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Figure 6 10 Processor Block Read Request Protocol Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 115 Processor Double Single Partial Word Read Request Protocol A processor double single partial word read request results from an uncached instruction fetch or load The processor issues a processor double single partial word read request with a single address cycle The address cycle consists of negating SysCmd 11 driving a free request number on SysCmd 10 8 driving the double single partial word read command on SysCmd 7 5 driving the read cause indication on SysCmd 4 3 driving the data size indication on SysCmd 2 0 driving the target indication on SysAD 63 60 driving the uncached attribute on SysAD 59 58 driving the physical address on SysAD 39 0 asserting SysVal The processor may only issue a processor double single partial word read request address cycle when the System interface is in master state SysRdRdy was
273. index PA 22 6 for 16 word block size or PA 22 7 for 32 word block size Errata 0 Reserved Must be written as zeroes and returns zeroes when read Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 277 CacheErr Register Format for System Interface Errors Figure 14 27 shows the format of the CacheErr register when a System interface error occurs 31 30 29 28 27 26 25 24 23 22 6 5 0 11 EW EE D SA SC SR Sldx 0 2 1 1 2 1 1 1 17 6 Figure 14 27 CacheErr Register Format for System Interface Errors EW set when CacheErr register is already holding the values of a previous error EE data error on a CleanExclusive or DirtyExclusive D uncorrectable system block data response error way1 way0 SA uncorrectable system address bus error SC uncorrectable system command bus error SR uncorrectable system response bus error SIdx secondary cache physical block index Errata 0 Reserved Must be written as zeroes and returns zeroes when read MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 278 Chapter 14 14 23 TagLo 28 and TagHi 29 Registers The TagHi and TagLo registers are 32 bit read write registers used to hold the following e the primary cache tag and parity e the secondary cache tag and ECC e the data in primary or secondary caches for certain CacheOps TagHi Lo formats in the R10000 processor differ from those in the R4400
274. information to maintain an external directory structure 6 23 Support for Uncached Attribute The processor supports a 2 bit user defined Uncached Attribute which is driven on SysAD 59 58 during the address cycle of the following e processor double single partial word read requests e double single partial word write requests e block write requests resulting from completely gathered uncached accelerated blocks For unmapped accesses the uncached attribute is sourced from VA 58 57 For mapped accesses the uncached attribute is sourced from the TLB Uncached Attribute field The TLB Uncached Attribute field may be initialized in 64 bit mode using bits 63 62 of the CPO EntryLoO and EntryLo1 registers MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 154 Chapter 6 6 24 Support for Hardware Emulation When using the R10000 processor in hardware emulation it is desirable to operate the System interface at a relative low frequency typically 1 MHz or below Since the R10000 processor contains dynamic circuitry an external agent cannot simply provide low frequency SysClk so a SysCyc input to the processor allows an external agent to define a virtual system clock and yet supply a SysClk within the acceptable operating range The assertion of SysCyc in a particular SysClk cycle creates a virtual system clock pulse four SysClk cycles later SysCyc may be asserted aperiodically In a normal system environment
275. ing continues until one of the following conditions occurs acomplete block is gathered anuncached or uncached accelerated load is executed anuncached or uncached accelerated partial word store is executed anuncached store is executed achange in the current gathering mode is observed achange in the uncached attribute is observed When gathering terminates the data is ready for issue to the System interface bus A processor uncached accelerated block write request is used to issue a completely gathered uncached accelerated block One or more disjoint processor uncached accelerated double or singleword write requests are used to issue an incompletely gathered uncached accelerated block When gathering in an identical mode uncached accelerated double or singleword stores may be freely mixed The uncached buffer packs the associated data into the gatherer When gathering in sequential mode uncached accelerated singleword stores must occur in pairs to prevent an address error exception For instance SW SW SD SW SW is legal SD SW SD is not External coherency requests have no effect on the uncached buffer CACHE instructions have no effect on the uncached buffer SYNC instructions are prevented from graduating if an uncached store resides in the uncached buffer MIPS R10000 Microprocessor User s Manual System Interface Operations 93 6 11 System Interface Flow Control The System interface supports a maximum reques
276. ing point arithmetic operations in single precision or 32 bit fixed format S or W In the R10000 processor the implementation clears the upper 32 bits including MOV S whereas R4400 and R4200 processors preserve the upper 32 bits during the move For the floating point conditional move instructions MOVT S MOVE S MOVZ S and MOVN S the R10000 processor always clears the upper 32 bits of the destination register even though the condition is false In 32 floating point register mode FR 1 the upper 32 bits of the destination register for the MTC1 and LWC1 instructions are architecturally undefined The R10000 processor implementation clears the upper 32 bits CFC1 CTC1 There are only two valid Floating Point Control registers 0 and 31 Access to other registers is undefined The Cause field is set to 100000 E bit is 1 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 314 Chapter 15 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 16 Memory Management This section describes the R10000 processor memory management including e processor modes and exceptions e virtual address space e virtual address translation MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997315 316 16 1 Processor Modes Chapter 16 The R10000 has three operating modes and two addressing modes All are described in this section Processor Operating Modes The t
277. ing the auto increment select on SysAD 44 e negating the Write Read select on SysAD 43 e driving the array select on SysAD 42 40 e asserting SysVal for one SysClk cycle After a read latency of 15 PCIK cycles the processor provides the read response by entering Master state e driving the read data on SysAD 39 0 e asserting SysVal for one SysClk cycle In the following SysClk cycle the processor reverts to Slave state Normal reads have a repeat rate of 17 PCIK cycles Figure 18 5 depicts two cache test mode normal reads Figure 18 5 Cache Test Mode Normal Read Protocol Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Cache Test Mode 367 Auto Increment Read Protocol Cycle SysClk Master SysReset SysGnt SysAD 63 0 SysVal A cache test mode auto increment read operation reads a selected RAM array The read address is obtained by incrementing the previous access address and the read way is obtained from the previous access way If an overflow occurs when incrementing the previous access address the address wraps to 0 and the way is toggled The external agent issues an auto increment read command by e asserting the auto increment select on SysAD 44 negating the Write Read select on SysAD 43 e driving the array select on SysAD 42 40 e asserting SysVal for one SysClk cycle After a read latency of 15 PCIK cycles the processor provides the read response by entering Master state
278. instructions To ensure each instruction is given correct operand values the logical register numbers names used in the instruction are mapped to physical registers Each time anew value is put ina logical register it is assigned to anew physical register Thus each physical register has only a single value Dependencies are determined using these physical register numbers An example of register renaming is shown below The following Doubleword Shift Left Logical instruction opcode rs rt dest sa function spec r2 r3 2 DSLL DSLL r340122 has one register operand 72 plus a 5 bit shift count of value two stored in the sa field the value in 72 is shifted left by two and this value is stored in r3 The physical execution of the instruction above with register renaming is given below Physical execution Rename operation p3e p2 shift left 2 r3 p3 When the DSLL instruction is executed the logical destination register r3 is assigned a new physical register p3 from the free list MIPS R10000 Microprocessor User s Manual A 373 Register renaming also allows exceptions to be handled in a precise manner Out of order execution means that an instruction can change its result register even before all prior instructions have been completed However if any of the prior instructions cause an exception the original register value must be restored Since each new register value is loaded into a new physic
279. ion TagLo 6 0 Tag ECC bits TagLo 8 7 Virtual index bits TagLo 11 10 State bits TagLo 31 14 Tag 35 18 TagHi 3 0 Tag 39 36 TagHi 31 MRU Bit All other CPO TagLo and TagHi register bits are set to 0 10 8 Index Store Tag I Index Store Tag I stores the CPO TagLo and TagHi registers into the primary instruction cache tag array VA 13 6 defines the address and VA 0 defines the way of the tag to be written The following mapping defines the operation Tag parity bit TagLo 0 State parity bit TagLo 2 LRU bit TagLo 3 State bit TagLo 6 Tag 35 12 TagLo 31 8 Tag 39 36 TagHi 3 0 All the Tag fields including parity are directly written Parity check is suppressed for all Index Store Tags MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 196 Chapter 10 10 9 Index Store Tag D Index Store Tag D stores the CPO TagLo and TagHi registers into the primary data cache tag array VA 13 5 defines the address and V A 0 defines the way of the tag to be written The following mapping defines the operation Tag parity bit TagLo 0 SCWay TagLo 1 State parity bit TagLo 2 LRU bit TagLo 3 State bits TagLo 7 6 Tag 35 12 TagLo 31 8 Tag 39 36 TagHi 3 0 StateMod bits TagHi 31 29 All Tag fields including parity are directly written Parity check is suppressed for all Index Store Tags 10 10 Index Store Tag S Index Store Tag S st
280. ion Register 31 1615 87 0 16 8 8 Figure 15 6 FP Implementation and Revision Register Format Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Floating Point Unit 309 Floating Point Status Register FSR Figure 15 7 shows the Floating Point Status register FSR control register 31 in Coprocessor 1 It is implemented in the graduation unit rather than the Floating Point Unit because it is closely tied to the active list Bits 22 18 are unimplemented and must be set to zero All other bits may be read or written using Control Move instructions from or to Coprocessor 1 subfunctions CFC1 or CTC1 These move instructions are fully interlocked they are delayed in the decode stage until all previous instructions have been graduated and no subsequent instruction is decoded until they have been completed FP Status Register 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 109 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 FS0O 0 EI VIZIOJ UJ I VI Z OI UJ I V Z O U I RM Ae ES ESTEE 5 Tu rd d 9 A mE di decide f 2 i ae V s AN xe IN ids Condition Bits 7 0 Cause Enables Flags Condition bits are True False values set by floating point compare instructions Flush FS bit 0 A denormalized result causes an Unimplemented Operation exception 1 A denormalized result is replaced with zero No exception is flagged Cause bits indicate the status of each floating point arithmetic instruction
281. ions is a branch Since four instructions are fetched each cycle the R10000 processor encounters on average a branch instruction every other cycle as shown in Figure A 2 Cycle 1 Cycle 0 On average one of out every six instructions is a Branch Figure A 2 Speculative Branching When a branch instruction was encountered in previous processors the instruction fetch and instruction issue halted until it was determined whether or not to take the branch For instance a branch delay slot was designed into the MIPS architecture to handle the intrinsic delay of a branch and to keep the pipeline filled Since the processor fetches up to four instructions each clock cycle there is not enough time to resolve branches without stalling the fetch decode circuitry The processor therefore predicts the outcome of every branch and speculatively executes the branch based on this branch prediction The branch prediction circuit consists of a 512 entry RAM using a 2 bit prediction scheme two bits are assigned to a branch instruction and indicate whether or not the branch was taken the last time it occurred The four possible prediction states are strongly taken weakly taken weakly not taken strongly not taken If the branch was taken the last two times there is a good probability it will be taken this time too or the inverse The R10000 processor can speculate up to four branches deep Shadow copies of the mapping tables are
282. is executed sequentially after all previous floating point instructions have been completed so these FSR bits do not change while any floating point instruction is active These bits are broadcast from the graduation unit to all the floating point functional units When a Cause bit is set and its corresponding Enable bit is also set an exception is taken on the instruction The result of the instruction is not stored and the Flag bits are not changed If no exception is taken the corresponding Flag bits are set The Cause and Flag bits may be read or written If a CTC1 instruction sets both a Cause bit and its Enable bit an exception is taken immediately The FSR is written but the exception is reported on the move instruction Loading the FSR The FSR may be loaded from an integer register by a CTC1 instruction which selects control register 31 This instruction is executed serially that is it is delayed during decode until the entire pipeline has emptied and it is completed before the next instruction is decoded This instruction writes all FSR bits If any Cause bit and its corresponding Enable bit are both set an exception is taken after FSR has been modified The CTC1 instruction is aborted it does not graduate even though it has changed the processor state MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 312 15 5 FPU Instructions CVT L fmt Errata Version 2 0 of January 29 1997 Chapter
283. is buffer are maintained by the operating system if an instruction needs a translation which is not already in the buffer an exception is taken so the operating system can compute and load the needed translation If all the necessary translations are present the program is executed without any delays The TLB contains 64 entries each of which maps a pair of virtual pages Formats of TLB entries are shown in Figure 16 5 The Cache Algorithm fields of the TLB EntryLo0 EntryLo1 and Config registers indicate how data is cached Cache algorithms are described in Chapter 4 Cache Algorithms Figure 16 5 shows the TLB entry formats for both 32 and 64 bit modes Each field of an entry has a corresponding field in the EntryHi EntryLo0 EntryLo1 or PageMask registers as shown in Chapter 14 Coprocessor 0 for example the PFN and uncached attribute UC fields of the TLB entry are also held in the EntryLo registers 255 217 216 205 204 192 Ome es 39 12 13 191 190 189 172 171 141 140139136 135 128 pete we de e 2 18 31 T 4 8 127 125 98 97 70 69 67666564 UC 0 PFN C vio 2 30 S 3 111 63 61 34 33 6 5 3 0 2 1 DE 11 5 Lo 3 28 1 Figure16 5 Format of a TLB Entry MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 330 Address Translation Chapter 16 Because a 64 bit address is unnecessarily large only the low 44 address bits are translated The high two virtual address bits bits 63 62 sele
284. is left unchanged If the state of the block to be invalidated was found to be StateMod 010 Inconsistent the block in the primary data cache must be written back to the secondary cache The address and way in the secondary cache to be written back to are read out of the primary data cache Tag Address and secondary way fields and all 32 bytes are written back Only the data field of the secondary cache is modified by this instruction since the processor obeys State and data subset rules Since the CE bit is not defined in the R10000 processor this instruction no longer has an ECC register mode Hit CacheOps can cause cache error exceptions if they check ECC or parity bits MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 200 Chapter 10 10 16 Hit WriteBack Invalidate S Hit Writeback Invalidate S checks for a block which matches the CACHE instruction PA in the secondary cache invalidates it and writes back any dirty data to the System interface unit This operation extends to any blocks in the primary data or instruction caches which are subsets of the secondary cache block The operation takes place in the following sequence 1 7 The processor reads the STag PIdx and State bits from both ways of the secondary tag array If the PA of the CACHE instruction matches the STag and the State does not equal 00 Invalid a hit has occurred If there is a hit the STag is used to interrogat
285. is protected by odd parity The processor drives odd parity on the SysStatePar signal The 5 bit wide system response bus SysResp 4 0 is protected by odd parity Whenever an external agent asserts SysRespVal to indicate it is driving valid information on the SysResp 4 0 bus the processor checks the SysRespPar signal for odd parity If a parity error is detected the processor ignores the SysResp 4 0 bus for one SysCIk cycle The System interface unit posts a Cache Error exception and sets the SR bit in the local CacheErr register Additionally the processor informs the external agent by asserting SysUncErr for one SysCIk cycle Caution If the processor ignores the SysResp 4 0 bus it may become unsynchronized with other processors or the external agent on the cluster bus Also the processor will hang if a parity error is detected on the SysResp 4 0 bus during an external completion response cycle for a processor double single partial word read request originated by the processor The external agent may initiate a Soft Reset sequence to obtain the contents of the CacheErr register and the CacheErr register will indicate a System interface uncorrectable system response bus error MIPS R10000 Microprocessor User s Manual Error Protection and Handling Protocol Observation 185 The processor continuously observes the protocol on the System interface Table 9 9 presents the supported protocol observations and t
286. is sign extended and added to the contents of the general register base to form a Virtual Address VA The VA is translated to a Physical Address PA using the TLB The 5 bit sub opcode specifies a cache instruction variation for that address CacheOp and CACHE instruction are used interchangeably in this text MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997187 188 Chapter 10 10 1 Notes on CACHE Instruction Operations Virtual Address Physical Address CP0 Not Usable Version 2 0 of January 29 1997 This section describes the operations of the CACHE instructions in the R10000 processor NOTE The operation of any operation cache combination not listed below is undefined and the operation of this instruction on uncached addresses is also undefined The CACHE instruction uses the following portions of the VA to specify a primary cache block and way e VA 13 5 defines a 32 byte block in the primary data cache array e VA 13 6 defines a 64 byte block in the primary instruction cache array e In both cases VA 0 defines the way needed by Index operations Since VA 0 is used to indicate the way it does not cause alignment errors When accessing data in the primary caches VA Blocksize 1 is also used to read or write a specific word The CACHE instruction uses the following portions of the PA to specify a secondary cache block and way e PA Size of secondary cache 2 Blocksize of
287. ists of a single address cycle on the System interface During the address cycle SysAD 63 60 specify the target indication which allows an external agent to define the target processors of the external interrupt request If a processor determines it is an external interrupt request target SysAD 20 16 are the write enables for the five individual Interrupt register bits and SysAD 4 0 are the values to be written into these bits as shown in Figure 6 5 This allows any subset of the Interrupt register bits to be set or cleared witha single external interrupt request The Interrupt register is an architecturally transparent level sensitive register that is directly readable as bits 14 10 of the Cause register Since it is level sensitive an interrupt bit must remain asserted until the interrupt is taken at which time the interrupt handler must cause a second external interrupt request to clear the bit The processor clears the Interrupt register during any of the reset sequences MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 106 Software Interrupts Timer Interrupt Nonmaskable Interrupt Version 2 0 of January 29 1997 Chapter 6 8 IP 0 Software Interrupts 9 IP 1 SysAD 4 0 Interrupt Value 10 IP 2 11 IP 3 Hardware Interrupts 12 IP 4 13 IP 5 14 SysAD 20 16 Write i2 Timer IP 7 Interrupt Interrupt register Cause 15 08 Figure 6 5 Hardware Interrupts
288. it Kernel Mode Kernel Space xkseg In Kernel mode when KX 1 in the Status register and bits 63 62 of the 64 bit virtual address are 115 the address space selected is one of the following e kernel virtual space xkseg the current kernel virtual space the virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address one ofthe four 32 bit kernel mode compatibility spaces described below 64 bit Kernel Mode Compatibility Spaces ckseg1 0 cksseg ckseg3 In Kernel mode when KX 1 in the Status register bits 63 62 of the 64 bit virtual address are 115 and bits 61 31 of the virtual address equal 1 the lower two bytes of address as shown in Figure 16 3 select one of the following 512 Mbyte compatibility spaces e ckseg0 This 64 bit virtual address space is an unmapped region compatible with the 32 bit address model kseg0 The KO field of the Config register controls cacheability and coherency e ckseg1 This 64 bit virtual address space is an unmapped and uncached region compatible with the 32 bit address model kseg1 e cksseg This 64 bit virtual address space is the current supervisor virtual space compatible with the 32 bit address model ksseg e ckseg3 This 64 bit virtual address space is kernel virtual space compatible with the 32 bit address model kseg3 Address Space Access Privilege Differences Between the R4400 and R1000 In the R4400 the 64 bit Supervisor mode can
289. ius 35 SysAD 16 AT es 26 SysAD 17 AR veces 27 SysAD lt 18 gt AN 26 SysAD 19 APL cid 26 SysAD lt 20 gt AL 25 SysAD lt 21 gt ANN es 25 SysAD lt 22 gt AM 25 SysAD 23 AR 25 SysAD lt 24 gt AL ues 24 SysAD lt 25 gt AP seit 24 SysAD lt 26 gt AM 24 SysAD lt 27 gt AR cren 24 SysAD lt 28 gt Ale asses 23 SysAD lt 29 gt AN sran 23 SysAD lt 30 gt AM 22 SysAD lt 31 gt A Picon 23 SysAD lt 32 gt Chus 20 SysAD lt 33 gt a 20 SysAD lt 34 gt Dx 19 SysAD lt 35 gt Asse 19 SysAD lt 36 gt a 19 SysAD lt 37 gt Assess 18 SysAD lt 38 gt Dus 18 SysAD 39 Ecos 18 SysAD 40 Bios 18 SysAD lt 41 gt OS 17 SysAD 42 Anse 17 SysAD 43 Dt 17 SysAD lt 44 gt Bue 16 SysAD 45 Qa 16 SysAD 46 Nuus 15 SysAD lt 47 gt Eze 16 SysAD 48 AN ee 20 SysAD 49 AR 19 SysAD 50 AT au 19 SysAD 51 AN 19 SysAD lt 52 gt AM 19 SysAD 53 WP 18 SysAD lt 54 gt AM 18 SysAD 55 AR 18 SysAD 56 ATL us 18 SysAD lt 57 gt AR 17 SysAD 58 AM 17 SysAD 59 AN snu 17 SysAD lt 60 gt AINT sss 17 SysAD 61 AP cora 16 SysAD 62 AN au 16 SysAD 63 ARa 15 SysADChk lt 0 gt AN 22 SysADChk lt 1 gt AR 22 SysADChk lt 2 gt IND rss 21 SysADChk lt 3 gt AD ut 21 SysADChk lt 4 gt AM 21 SysADChk lt 5 gt AR 21 SysADChk lt 6 gt AL 20 SysADChk lt 7 gt AP Sit 20 SysClk Asst 22 SysClk Ares 23 SysClkRet Ca
290. ivide instructions are computed iteratively and have longer latencies A 4 Pipeline Repeat Rate The repeat rate of the pipeline is the number of cycles that occur between the issuance of one instruction and the issuance of the next instruction to the same execution unit In the R10000 processor the main five pipelines all have repeat rates of one cycle but the iterative units have longer repeat delays A 5 Out of Order Execution Version 2 0 of January 29 1997 The program order of instructions is the sequence in which they are fetched and decoded In the R10000 processor instructions may be issued executed and completed out of program order They are always graduated in program order MIPS R10000 Microprocessor User s Manual A 371 A 6 Dynamic Scheduling The R10000 processor can issue instructions to functional units out of program order this capability is known as dynamic scheduling or dynamic issuing The R10000 processor can dynamically issue an instruction as soon as all its operands are available and the required execution unit is not busy Thus an instruction is not delayed by a stalled previous instruction unless it needs the results of that previous instruction A 7 Instruction Fetch Decode Issue Execution Completion and Graduation A 8 Active List In general instructions are fetched decoded and graduated in their original program order but may be issued executed and completed out of program order as
291. k write upgrade and eliminate requests the processor asserts SysCmd 0 Consequently in a multiprocessor system using the cluster bus ECC checking and correcting is enabled for external coherency requests resulting from processor coherent block read and upgrade requests Table 6 14 Encoding of SysCmd 0 for External Address Cycles SysCmd 0 ECC check indication ECC checking and correcting disable ECC checking and correcting enable During the data cycles of an external data response or a processor coherency data response SysCmd 10 8 contain the request number associated with the original request as shown in Table 6 15 SysCmd 10 0 Data Cycle Encoding Table 6 15 Encoding of SysCmd 10 8 for Data Responses SysCmdri08 During data cycles SysCmd 5 indicates the data quality as shown in Table 6 16 Table 6 16 Encoding of SysCmd 5 for Data Cycles SysCmd 5 Data quality indication 0 Data is good 1 Data is bad MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 100 Chapter 6 During data cycles SysCmd 4 3 indicate the data type as shown in Table 6 17 Processor block write and double single partial word write requests use request data and request last data type indications External data and processor coherency data responses use response data and response last data type indications Table 6 17 Encoding of SysCmd 4 3 for Data Cycles SysCmd 4 3 Data
292. l HN ul SysWrRdy SCTWay SCTagLSBAddr SysCmd 11 0 M om SysCmdPar SysAD 63 0 Peal SysADChk 7 0 E UR SysState 2 0 M SysStatePar m SysStateVal EN i SC A B Addr 18 0 SC A B DWay SCData 127 0 SCDataChk 9 0 SysResp 4 0 SC A B DWr SysRespPar SC A B DCS SysRespVal SC A B DOE System Interface Connections for Uniprocessor System Version 2 0 of January 29 1997 84 Multiprocessor System Using Dedicated External Agents Chapter 6 Figure 6 3 shows the major System interface connections required for a typical multiprocessor system using dedicated external agents SysReq External S sGn Agent SysRel SysRdRdy SysWrRdy SysCmd 11 0 SysCmdPa SysAD 63 0 SysADChk 7 0 SysVal SysState 2 0 SysStatePa SysStateVa SysResp 4 0 SysRespPa SysRespVal r Coherent Interconnect SysRdRdy SysWrRdy SysCmd 11 0 SysCmdPa SysAD 63 0 SysADChk 7 0 SysVal SysState 2 0 SysStatePa SysStateVa SysResp 4 0 SysRespPa SysRespVal r Figure 6 3 Version 2 0 of January 29 1997 SysReq SCTWI Wr SysGnt SCTCS i lt 1 SysRel SCTOE OE m n R10000 SCTag 25 0 MIT gt Data DB ____ SysRdRdy SCTagChk 6 0 z I SysWrRdy sa 7 TWay SCTagLSBAddr Addr
293. le 6 8 This information differentiates between the various processor special requests Table 6 8 Encoding of SysCmd 4 3 for Processor Special Requests SysCmd 4 3 Special Cause Indication Reserved Eliminate 0 1 2 Reveved SSCS 3 Rem During the address cycle of processor block read typical block write upgrade and eliminate requests SysCmd 2 1 contain the secondary cache block former state as shown in Table 6 9 This information may be useful for system designs implementing a duplicate tag or a directory based coherency protocol Table 6 9 Encoding of SysCmd 2 1 for Processor Block Read Write Upgrade Eliminate Requests SysCmd 2 1 Secondary Cache Block Former State Invalid 1 Shared 2 CleanExclusive 3 DirtyExclusive MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 98 Chapter 6 During the address cycle of processor double single partial word read and write requests SysCmd 2 0 contain the data size indication as shown in Table 6 10 Table 6 10 Encoding of SysCmd 2 0 for Processor Double Single Partial Word Read Write Requests SysCmd 2 0 Data Size Indication 0 One byte valid Byte 1 Two bytes valid Halfword 2 Three bytes valid Tribyte 3 Four bytes valid Word 4 Five bytes valid Quintibyte 5 Six bytes valid Sextibyte 6 Seven bytes valid Septibyte 7 Eight bytes valid Doubleword During the a
294. ll processor requests are globally performed thereby allowing the processor to graduate SYNC instructions The external agent negates this signal when some processor requests are not yet globally performed thereby preventing the processor from graduating SYNC instructions To prevent a SYNC instruction from graduating the external agent must negate the SysGbIPerf signal no later than the same SysClk cycle in which it issued the external completion response for a processor read or upgrade request which is not yet globally performed Also the external agent must negate the SysGblPerf signal no later than two SysClk cycles after the address cycle of a processor double single partial word write request which has not yet been globally performed The SysGbIPerf signal may be permanently asserted in systems implementing a sequential consistency memory model 6 19 Cluster Bus Operation Version 2 0 of January 29 1997 A R10000 multiprocessor cluster may be created by directly attaching the System interfaces of 2 to 4 R10000 processors and providing an external cluster coordinator to handle arbitration and coherency management The cluster coordinator arbitrates the multiprocessors using the SysReq SysGnt and SysRel signals A processor request issued by an R10000 processor in master state is observed as an external request by any R10000 processors in the slave state on the cluster bus This is described Table 6 31 Table 6 31 Relati
295. ltiprocessor system using dedicated external agents SysReq SysGnt External SysRel R10000 Figure 6 6 Arbitration Signals for Uniprocessor System Figure 6 7 is an example of the operation of the System interface arbitration in a uniprocessor system The Master row in the following figures indicates which device is driving the System interface bidirectional signals Py and EA in Figure 6 7 When this row contains a dash as shown in Cycle 12 of Figure 6 7 mastership of the System interface is changing and no device is driving the System interface bidirectional signals for this one dead SysClk cycle The external agent generally asserts the SysGnt signal which allows the processor to issue requests at any time When the external agent needs to return an external data response it negates SysGnt for a minimum of one cycle waits for the processor to assert SysRel and then begins driving the System interface bus after one dead SysClk cycle 1 2 3774 57 7867 77 58779 10 t 2 1 D yam Somme STR aura Bra rath Figure 6 7 Arbitration Protocol for Uniprocessor System MIPS R10000 Microprocessor User s Manual System Interface Operations 111 Multiprocessor System Using Cluster Bus Figure 6 8 shows how the System interface arbitration signals are used in a four processor system using the cluster bus SysReq A SysReq0 R100000 SysGnt Bll SysGnto
296. m tpcdr Figure 6 26 External Coherency Request Latency Parameters Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 147 The external coherency request latency is presented in Table 6 30 Table 6 30 External Coherency Request Latency Latency PCIk cycles Processor Coherency State Processor Coherency Data Response tpcsr Response tpcdr SCCIkDiv Mint Typtt Max Min TypH Max 1 5 10 39 8 28 70 1 5 5 13 48 8 33 88 2 5 14 59 8 38 105 2 5 5 16 71 8 43 128 3 5 17 79 8 43 141 i This latency assumes no other previously issued external coherency requests are outstanding 1 to 3 additional PCIK cycles may be required for synchronization with SysClk depending on the SysClkDiv mode bits This value assumes a 32 word secondary cache block size t This value assumes the external coherency request hits a cached or outgoing buffer entry t This value assumes the external coherency request does not hit a cached or outgoing buffer entry the secondary cache is not busy and the external coherency request hits in the MRU way of the secondary cache If the external coherency request misses in the most recently used MRU way of the secondary cache 1 to 3 additional PCIK cycles are required to query the LRU way of the secondary cache depending on the SCCIKkDiv mode bits This value assumes the external coherency request does not hit a cac
297. m the PA of the CACHE instruction The way comes from PA 0 of the CACHE instruction The high word will be loaded into CPO TagHi and the low word of data will be loaded into CPO TagLo The check bits will be loaded into CPO ECC 9 0 The MRU field is unmodified ECC correction and checking is suppressed during Index Load Data S MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 202 Chapter 10 10 20 Index Store Data I Index Store Data I stores a single instruction into the primary instruction cache The address where this instruction will be written comes from VA 13 2 of the CACHE instruction The way where the data will be written comes from VA 0 of the CACHE instruction The instruction itself comes from CPO TagHi 3 0 and TagLo 31 0 The parity bit is also stored This comes from CPO ECC 0 The data to be stored bypasses the predecode and is written directly into the instruction cache The tag field is unmodified 10 21 Index Store Data D Index Store Data D stores a word of data and its byte parity into the data cache from the CPO TagLo and ECC registers The address where this word will be written is defined by VA 13 2 of the CACHE instruction The way is defined by VAIO The data word comes from CPO TagLo The parity bits come from CPO ECC 3 0 The data cache tag array including the LRU bit is left unchanged 10 22 Index Store Data S Version 2 0 of January 29 1997 Index Store Data
298. main memory while the secondary cache refill continues Event 11 for Counter 0 Secondary Cache Way Misprediction Instruction This counter is incremented when the secondary cache controller begins to retry an access to the secondary cache after it hit in the non predicted way provided the secondary cache access was initiated by the primary instruction cache Event 11 for Counter 1 Secondary Cache Way Misprediction Data This counter is incremented when the secondary cache controller begins to retry an access to the secondary cache because it hit in the non predicted way provided the secondary cache access was initiated by the primary data cache Event 12 for Counter 0 External Intervention Requests This counter is incremented on the cycle after an external intervention request enters the SCTP logic Event 12 for Counter 1 External Intervention Requests Hits In Secondary Cache This counter is incremented on the cycle after an external intervention request is determined to have hit in the secondary cache Event 13 for Counter 0 External Invalidate Requests This counter is incremented on the cycle after an external invalidate request enters the SCTP logic Event 13 for Counter 1 External Invalidate Requests Hits In Secondary Cache This counter is incremented on the cycle after an external invalidate request is determined to have hit in the secondary cache Ve
299. may not retrieve what was originally written Operation 32 64T TLB Indexs og lt PageMask EntryHi and not PageMask EntryLo1 EntryLoO Exceptions Coprocessor unusable exception MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 300 Chapter 14 14 37 TLBWR Instruction TLBWR Write Random TLB Entry TLBWR 31 26 25 24 6 5 0 COPO CO 0 TLBWR 010000 1 0000000000000000000 000110 6 1 19 6 Format TLBWR Description The G bit of the TLB is written with the logical AND of the G bits in the EntryLo0 and EntryLol registers The TLB entry pointed at by the contents of the TLB Random register is loaded with the contents of the EntryHi and EntryLo registers In the R4400 this instruction had to be executed in unmapped spaces and in the R10000 processor it can be executed in unmapped spaces without any hazard There is no hazard to executing a TLB write in mapped space unless the write affects those instructions that have been fetched and buffered by the processor If necessary a flush to the instruction fetch pipeline such as execution of a jump register instruction after a TLB write can avoid this hazard In the R4400 processor a TLB write instruction is used to write the whole page frame number from the EntryLo registers to the TLB entry Depending on the page size specified in the corresponding PageMask register the lower bits of PFN may not be used for address t
300. ments On Line The information in this manual and other MIPS related product information is also available over the Word Wide Web at http www mips com Requests can also be e mailed to webteam mips mti sgi com Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents vii Contents Acknowledgments About This Manual Gic Yi cee M v Stylistic CONVENTIONS onion aet aiebat p pe tat ede ae dt edes v laci ten v Getting MIPS Documents On Line sese eene nennen nnn nnne nennnen vi MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 viii Table of Contents 1 Introduction to the R10000 Processor MIPS Instruction Set Architecture ISA esee eene enne enne 2 What is a Superscalar Processor ssssssssssssssseeeeeeneenetnete eene nennen 3 Pipeline and Superpipeline Architecture sss 3 Superscalar Architecture e r reissen EE a Ee AOT E ESATE E ESTEE SEES ESAS eE Et 3 What is an R10000 Microprocessor se sssssessssssesstssterstertttsssertitterstsstesteentestentessstentestesttnn te 4 R10000 Superscalar Pipeline sissie sse eee nennt 5 Instruction Queues 5 deret ht edo t ade eed 6 Execution Pipelines i5 naue eniti eater sete diea o er eR 6 64 bit Integer ALU Pipeline reenso essere nnne 6 Load Store Pipeline ninne e e Ebene pe tie dieran 7 64 bit Floatin
301. n 1 2V MIPS R10000 Microprocessor User s Manual Clock Signals 157 7 2 Secondary Cache Clock The processor uses registered synchronous SRAMs for its secondary cache to allow pipelined accesses Errata The processor provides 6 pairs of differential clock outputs SCCIK 5 0 and SCCIK 5 0 to be used by the secondary cache synchronous SRAMs These outputs swing between VccOSC and Vss The SCCIKTap mode bits Mode bits are described in Chapter 8 the section titled Mode Bits specify the alignment of SCCIK 5 0 and SCCIK 5 0 relative to the internal secondary cache clock Note that the output buffer delay is not included The secondary cache interface clock is generated by dividing down the internal processor clock PCIk SCCIK is related to SysClk according to the following formula SCClk SysClk SysClkDivt1 SCC1kDiv 1 For example in a 50 MHz system with SysCIkDiv 7 and SCCIKDiv 2 SCCIk 50 8 3 133 MHz MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 158 Chapter 7 7 3 Phase Locked Loop The processor uses the internal PLL for clock generation and multiplication as shown in Figure 7 1 Values of the termination resistors for the SysClkRet SysClkRet signals are system dependent The system designer must select a value based upon the characteristic impedance of the board therefore it is beyond the scope of this manual to specify values for these termination resistors
302. n TT 195 Index Writeback Invalidate CACHE instruction 192 indexing the secondary cache 62 Version 2 0 of January 29 1997 Index inexact result FP 310 initialization 159 input voltage levels maximum 217 instruction CACHE see also CACHE instructions 172 187 285 287 CacheOp see also CACHE instructions 287 completion 20 371 COPO see also CPO 357 COP1 357 COP2 357 decoding 371 dependencies 13 DMECO 285 290 DMEFC1 310 DMTCO 285 291 ERET 285 292 execution 371 fetching 371 FPU processor specific 312 CFC1 313 CTC1 313 CVT L fmt 312 for valid FP control registers 313 moves and conditional moves 313 graduation 371 issue 20 371 superscalar 20 latencies 29 load linked 27 MFCO 285 293 MFC1 307 310 MFPC 295 MFPS 295 MTCO 285 296 MTPC 295 MTPS 295 prefetch 25 processor specific 26 queue 11 17 repeat rates 29 serializing 23 190 store conditional 27 Swc1 307 SYNC 28 56 148 TLBP 285 297 TLBR 285 298 TLBWI 285 299 MIPS R10000 Microprocessor User s Manual Index TLBWR 285 300 unsupported CACHE 190 instruction cache block size see also cache primary instruction 46 instruction register JTAG 205 integer queue 11 branch instructions 11 divide instructions 11 multiply instructions 11 ports 11 shift instructions 11 integer ALU pipeline 6 Integer Overflow exception 347 integer queue 6 interface external 10 internal coherency conflicts 143
303. n User mode e By default the R10000 processor implements the same user instruction set as the R4400 processor To enable execution of the MIPS IV instructions in User mode the MIPS IV User Mode bit XX of the CPO Status register must be set The MIPS IV instruction extension uses COP1X as the opcode this designation was COP3 in the R4400 processor For this reason the CU3 bit is omitted in the R10000 processor and is used as the XX bit In Kernel and Supervisor modes the state of the XX bit is ignored and MIPS IV instructions are always available Mode bit settings are shown in Table 14 9 dashes in the table represent don t cares Table 14 9 ISA and Status Register Settings for User Supervisor and Kernel Mode Operations Mode UX SX KX XX MIPSII MIPSIII MIPSIV 0 5 0 F User 1 1 0 Supervisor i Kernel 5 NOTE Operation with the MIPS IV ISA does not assume or require that the MIPS I instruction set or 64 bit addressing be enabled KX SX and UX may all be set to zero MIPS R10000 Microprocessor User s Manual Coprocessor 0 247 The Reduced Power RP bit is reserved and should be zero The R10000 processor does not define a reduced power mode The Reverse Endian RE bit bit 25 reverses the endianness of the machine The processor can be configured as either little endian or big endian at system reset reverse endian selection is available in Kernel and Supervisor modes
304. n any of the prior three SysCIk cycles e the SysGblPerf signal is asserted A SYNC instruction is not prevented from graduating if the uncached buffer contains any uncached accelerated stores As it executes programs the R10000 superscalar processor performs many operations in parallel Instructions can also be executed out of order Together these two facts greatly improve performance but they also make it difficult to predict the time required to execute any section of a program since it often depends on the instruction mix and the critical dependencies between instructions The processor has five largely independent execution units each of which are individualized for a specific class of instructions Any one of these units may limit processor performance even as the other units sit idle If this occurs instructions which use the idle units can be added to the program without adding any appreciable delay MIPS R10000 Microprocessor User s Manual Introduction to the R10000 Processor 29 User Instruction Latency and Repeat Rate Table 1 2 shows the latencies and repeat rates for all user instructions executed in ALU1 ALU2 Load Store Floating Point Add and Floating Point Multiply functional units definitions of latency and repeat rate are given in the Glossary Kernel instructions are not included nor are control instructions not issued to these execution units Table 1 2 Latencies and Repeat Rates for User Instructions
305. ndary caches To resume normal operation a Cold Reset sequence must be initiated Figure 8 3 presents the Soft Reset sequence de wee LH voses Ha Vcc Pa Pd INT p wo asas DCOk Master uu m qoe dex GEN d MO dee qoa P OE AL ALS EA P SysReset ey s 3C MEE TE PE DEEI SysAD 63 0 t y ae ae a SysRespVal t 2 16 SysClk cycles Figure 8 3 Soft Reset Sequence MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 164 8 5 Mode Bits Chapter 8 The R10000 processor uses mode bits to configure the operation of the microprocessor These mode bits are loaded into the processor from the SysAD 63 0 bus during a power on or cold reset sequence while SysGnt is asserted The SysADChk 7 0 bus does not have to contain correct ECC during mode bit initialization During the reset sequence the mode bits obtained from SysAD 24 0 are written into bits 24 0 of the CPO Config register The mode bits are described in Table 8 1 Table 8 1 Mode Bits SysAD Bit Name and Function Value Mode Setting Reserved Reserved Kseg0CA Uncached 2 0 Specifies the kseg0 cache ee Die Honeodetent algorithm Cacheable coherent exclusive i Cacheable coherent exclusive on write Reserved Uncached accelerated DevNum 4 3 Specifies the processor device number CONT ie Red Far External agent only 5 Specifies the target of processor Broadcast coherent requests issued on the System i
306. nes 23 SysClkRet Bories 23 SysCmd lt 0 gt N aaisa 2 SysCmd lt 1 gt D E 3 SysCmd lt 2 gt AA ues 1 SysCmd lt 3 gt Y eris 5 SysCmd lt 4 gt AA iissa 2 SysCmd lt 5 gt AA Less 4 SysCmd lt 6 gt AB 1 SysCmd lt 7 gt AA Lus 5 SysCmd lt 8 gt AB 3 SysCmd lt 9 gt AC 2 SysCmd lt 10 gt AB 4 Packaging 227 Table 13 3 cont Signal Location Signal Location Signal Location SysCmd lt 11 gt SysCmdPar SysCorErr SysCyc SysGblPerf SysGnt SysNMI SysRdRdy SysRel SysReq SysResp lt 1 gt SysResp lt 4 gt SysState lt 0 gt SysReset SysResp lt 2 gt SysRespPar SysState lt 1 gt SysResp lt 0 gt SysResp lt 3 gt SysRespVal SysState lt 2 gt SysStatePar SysStateVal SysUncErr SysVal SysWrRdy TCA TCB TriState VccPa VecQSC VccQSC VccQSC MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 228 Chapter 13 Table 13 3 cont Signal Location Signal Location Signal Location VecQSC AN 6 VecQSC AR 10 VecQSC AR 32 VecQSC AR 4 VecQSC Q aum 12 VecQSC QC uu 3 VccQSC o 30 VccQSC Qc 33 VccQSC onn 6 VecQSC Du 1 VecQSC Dis 35 VecQSC Bose 14 VecQSC pees 28 VecQSC Biene 8 VecQSC Buses 3 VecQSC j UD 33 VecQSC Hisasi 3 VecQSC ER ees 31 VecQSC Iiis 33 VecQSC H
307. ng 15 example of 16 superpipeline architecture 3 superpipeline R4000 3 superscalar pipeline 3 processor definition of 3 370 superscalar processor 13 Supervisor mode 316 address mapping 320 csseg space 321 operations 320 sseg space 321 suseg space 320 xsseg space 321 xsuseg space 321 suseg space 320 switch context 330 SX bit 248 316 326 SYNC instruction 28 56 148 prevented from graduating 92 SysAD bus signals 41 95 100 102 181 182 360 362 363 364 365 366 367 MIPS R10000 Microprocessor User s Manual Index SysAD 20 16 interrupt register 105 SysAD 39 0 during address cycle 103 SysAD 56 40 during address cycle 103 SysAD 57 secondary cache block way indication 103 SysAD 59 58 uncached attribute 102 SysAD 63 0 address cycle encoding 102 data cycle encoding 104 SysAD 63 60 address cycle 102 interrupt 105 SysADChk bus 182 SysADChk signal 42 164 SysClk cycle 93 127 148 SysClk signal 28 41 80 104 106 108 109 113 121 125 154 155 215 216 366 367 SysClkDiv mode bits 156 160 165 SysClkRet signal 41 156 158 SysCmd bus 41 95 170 181 182 SysCmd 0 90 Ecc 100 processor data cycles 100 SysCmd 10 8 95 data response 99 external intervention and invalidate requests 98 SysCmd 11 0 map 101 protocol 107 SysCmd 11 95 SysCmd 2 0 processor write requests 98 SysCmd 2 1 block data response 100 processor requests 97 SysCmd 4 3 data c
308. ng refilled Refill bit PTag1 contains physical address bits 39 36 stored in the cache tag MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 280 Chapter 14 PTag0 contains physical address bits 35 12 stored in the cache tag PState together with the Refill bit of the State Modifier in the TagHi register PState determines the state of the cache block in the primary data cache as shown in Table 14 24 Table 14 24 PState Field Definition in TagHi Lo Registers For Primary Data Cache Operation Errata Version 2 0 of January 29 1997 When CacheOp is Index Load Store Tag PState Refill 0 Refill 1 00 Invalid Refill clean block is being refilled 01 Upgrade Share converting phate shared to dirty 10 Clean Upgrade Clean Exclusive converting clean to dirty Dirty Refill dirty block is being 11 Exclusive refilled for a store LRU indicates which way is the least recently used of the set SP state even parity bit for the PState field and the Way bit Way indicates which secondary cache set contains the primary cache line for this tag TP tag even parity bit 0 Reserved Must be written as zeroes and returns zeroes when read Figure 14 29 shows the fields of the TagHi and TagLo registers 31 87 6 4 3 2 1 0 2 2 T 4 1 24 81 29 28 3 0 State 3 25 4 Figure 14 29 TagHi Lo Register Fields in Primary Data Cache When CacheOp is Index Load Store Ta
309. ng the mode bits with the CTM mode bit negated on SysAD 63 0 e waits at least two SysClk cycles e asserts SysGnt for at least one SysClk cycle After at least one SysClk cycle the external agent may negate SysReset to end the reset sequence Figure 18 2 shows the cache test mode exit sequence 452 53 4 5 6 7 8 9 10 11 12 18 14 15 16 17 EA EA EA EA EA BA EA EA EA EA EA EA EA EA EA EA EA X Mods X Negate CTM mode bit J Figure 18 2 Cache Test Mode Exit Sequence Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Cache Test Mode 18 5 SysAD 63 0 Encoding 363 Encoding of the SysAD 63 0 bus during cache test mode is shown in Table 18 1 Unused fields are read as undefined and must be written as zeroes Table 18 1 Cache Test Mode SysAD 63 0 Encoding Data Data Instruction Instruction aera SysAD Bit Cache Data Cache Tag Cache Data Cache Tag id Arra Arra Arra Arra Predication y y y y Array 0 Tag parity Tag parity MRU 1 SCWay Unused 2 State State parity parity 3 LRU LRU 4 Data Unused Data Unused 5 6 State State Unused 7 Unused 31 8 Tag Tag 35 32 Data parity 36 Data parity StateMod cl 38 37 Unused Unused Unused 39 Unused 0 1 2 3 4 42 40 Array select 43 Write Read select 44 Auto increment select 45 Way 57 46 Address 63 58 Unused MIPS R10
310. ngement all processors must assert SysReq indicating they are ready for operation MIPS R10000 Microprocessor User s Manual Initialization 161 Errata If the virtual SysClk is used during the reset sequence the mode bits SysGnt SysRespVal and SysReset should all be referenced to the virtual SysClk that is created with SysCyc This approach will cause the R10000 to come out of reset synchronously with the virtual SysClk which will allow repeatable and lock step operation see Chapter 6 the section titled Support for Hardware Emulation for description of virtual SysClk operation During a Power on Reset sequence all internal state is initialized A Power on Reset sequence causes the processor to start with the Reset exception Figure 8 1 shows the Power on Reset sequence ie arn Cian a ae MED LUN NE PR TS ES dpi GE ENEETIREENEEEI SUE EE T ED EE n duci Ce ae ee EE Mi NI NN NTENIENTEL OI me 4 y DCOk li i i i i Master l EA Bact SysReset l SysReq i SysGnt l E Ee EERIUD SysAD 63 0 pe LP JpuX dMoeeg I1 o to 10 pes gt SysRespVal gt 21ms e 2100s e 2100s lens Figure 8 1 Power On Reset Sequence MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 162 Chapter 8 8 3 Cold Reset Sequence The Cold Reset sequence is used to reset the entire processor and possibly alter the mode bit
311. ns for Multiprocessor Using the Cluster Bus 85 Figure 6 4 presents the major System interface connections required for a typical multiprocessor system using the cluster bus SysRel SysRdRdy SysWrRdy SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysResp 4 0 SysRespPar SysRespVal SysReq0 SysGnt0 Cluster Coordinator SysState0 2 0 SysStatePar0 SysStateVal0 SysReq1 SysGnt1 SysState1 2 0 SysStatePar1 SysStateVal1 MIPS R10000 Microprocessor User s Manual Cluster Bus SysReq SCTWI Wr SysGnt SCTCS CS SysRel R10000 SCTOE OE co ep s SCTag 25 0 Data m SysRdRdy SCTagChk 6 0 Be SysWrRdy n TWay SCTagL SBAddr Addr Sn 0 sCmdPar SysAD 63 0 SC A B Addr 18 0 SysADChk 7 0 SysVal pu SysState 2 0 SC A B DWay Addr SysStatePar n SysStateVal SCData 127 0 Data OO SCDataChk 9 0 pe w SysResp 4 0 SC A B DWr Wr o SysRespPar SC A B DCS CS SysRespVal SC A B DOE OE L SysReq scTWr Wr SysGnt SCTCS CS SysRel SCTOE OE T P R1 0000 SCTag 25 0 Data DB SysRdRdy SCTagChk 6 0 Se SysWrRdy dd TWay SCTagL SBAddr Addr Sema 0 sCmdPar SysAD 63 0 SC A B Addr 18 0 SysADChk 7 0 SysVal ir 7 SysState 2 0 SC A B DWay Addr SysStatePar n SysStateVal SCData 127 0 Data V O SCDataChk 9 0 pe w SysResp 4 0 SC A B DWr Wr o SysRe
312. ns may begin execution simultaneously if all their operands are valid but the processor stalls at any instruction whose operands are still busy e In an out of order superscalar processor such as the R10000 instructions are decoded and stored in queues Each instruction is eligible to begin execution as soon as its operands become valid independent of the original instruction sequence In effect the hardware rearranges instructions to keep its execution units busy This process is called dynamic issuing MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 14 Chapter 1 Branch Prediction and Speculative Execution Errata Although one or more instructions may begin execution during each cycle each instruction takes several or many cycles to complete Thus when a branch instruction is decoded its branch condition may not yet be known However the R10000 processor can predict whether the branch is taken and then continue decoding and executing subsequent instructions along the predicted path When a branch prediction is wrong the processor must back up to the original branch and take the other path This technique is called speculative execution Whenever the processor discovers a mispredicted branch it aborts all speculatively executed instructions and restores the processor s state to the state it held before the branch However the cache state is not restored see the section titled Side Effects of Sp
313. nterface by the processor PrcElmReq Specifies whether to enable Disable 6 processor eliminate requests onto Enable the System interface by the processor PrcReqMax Specifies d aum umber 0 1 outstanding processor request 1 2 outstanding processor requests 87 of outstanding processor requests 2 tetas processor eC deste allowed on the System interface 3 4 ins P q by the pr outstanding processor requests y the processo Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Initialization SysAD Bit Table 8 1 cont Name and Function 165 Mode Bits Mode Setting SysClkDiv Sets PClk to SysClk ratio determines the System interface clock frequency see Chapter 7 the section titled System Interface Clock and Internal Processor Clock Domains 0 1 2 3 4 5 6 7 8 9 A B C D E F Reserved Result of division by 1 Result of division by 1 5 Result of division by 2 Result of division by 2 5 Result of division by 3 Result of division by 3 5 Result of division by 4 Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved SCBIkSize Specifies the secondary cache block size SCCorEn Specifies the method of correcting secondary cache data array ECC errors MemEnd Specifies the memory system endianness 16 word 32 word Retry access through corrector Always access through corrector Little endian Big endian 24 22 SCSiz
314. nto the CPO ECC 9 0 register The data may be examined by copying the CPO TagHi TagLo and ECC registers to the general registers with the MTCO instruction Cycle SCCIk r SC A B Addr 18 0 SCTagLSBAddr SC A B DWay SCData 127 0 SCDataChk 9 0 SC A B DOE SC A B DWr SC A B DCS SCTWay SCTag 25 0 SCTagChk 6 0 SCTOE scTwr SCTCS Figure 5 3 4 Word Read Sequence MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 70 Chapter 5 8 Word Read Sequence An 8 word read sequence refills the primary data cache from the secondary cache after a primary data cache miss Figure 5 4 depicts a secondary cache 8 word read sequence In it SC A B DWay and SCTWay are driven with value X on the first address cycle which is obtained from the way prediction table On the next address cycle SCTWay is complemented in order to read the tag from the non predicted way of the addressed set SC A B DWay is not changed since it is assumed that the way prediction table is correct and the read is likely to hit in the predicted way The tag for the non predicted way is returned to the processor in the same cycle as the second quadword of data Reads that miss in the predicted way but hit in the non predicted way are noted by the internal control logic and reissued to the secondary cache as soon as possible Cycle L4 713r 79 24 eer eee ee 1 08 5 9 40 11 32 49 14 5 46 47
315. ntryHiz then Index 0 6 i5 o endif endfor Exceptions Coprocessor unusable exception MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 298 Chapter 14 14 35 TLBR Instruction TLBR Read Indexed TLB Entry TLBR 31 26 25 24 65 0 COPO CO 0 TLBR 010000 1 000 0000 0000 0000 0000 000001 6 1 19 6 Format TLBR Description The G bit which controls ASID matching read from the TLB is written into both of the EntryLo0 and EntryLol registers The EntryHi and EntryLo registers are loaded with the contents of the TLB entry pointed at by the contents of the TLB Index register In the R4400 this instruction had to be executed in unmapped spaces and in the R10000 processor it can be executed in unmapped spaces without any hazard In addition TLBR can be executed in mapped spaces Operation 32 T PageMask TLB Indexs ol427 96 EntryHi TLB Indexs olos g4 and not TLB Indexs 0 127 96 EntryLo1 TLB Indexs oles 32 EntryLoO TLB Indexs o a1 0 64 T PageMask TLB Indexs oless 192 EntryHi TLB Indexs o 191 128 and not TLB Indexs oloss 192 EntryLo1 TLB Indexs o 127 e5 TLB Indexs o 414o EntryLoO TLB Indexs olea 1 TLB Indexs 01140 Exceptions Coprocessor unusable exception Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 299 14 36 TLBWI Instruction TLBWI Write Indexed TLB Entry TLBWI
316. nual Cache Organization and Coherency 49 The data cache is indexed with a virtual address and tagged with a physical address Each primary cache block is in one of the following four states Invalid CleanExclusive DirtyExclusive Shared A primary data cache block is said to be Inconsistent when the data in the primary cache has been modified from the corresponding data in the secondary cache The primary data cache is maintained as a subset of the secondary cache where the state of a block in the primary data cache always matches the state of the corresponding block in the secondary cache A data cache block can be changed from one state to another as a result of any one of the following events primary data cache read write miss primary data cache write hit subset enforcement a CACHE instruction external intervention shared request intervention exclusive request invalidate request These events are illustrated in Figure 4 5 which shows the primary data cache state diagram MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 50 Chapter 4 Subset enforcement CACHE Index WriteBack Invalidate D S CACHE Index Store Tag D CACHE Hit Invalidate D S CACHE Hit WriteBack Invalidate D S Read miss obtained CleanExclusive CACHE Index Store Tag D Clean Read hit Exclusive L4 Pi Y 7 Intervention exclusive hit invalidate hit 7 1 N Write hit
317. of the stylistic conventions used in this book bits fields and registers of interest from a software perspective are italicized such as the BE bit in the Config register Signal names of more importance from a hardware point of view are rendered in bold such as Reset The asterisk appended to the signal name as in Reset indicates the signal is low active A range of bits uses a colon as a separator for instance 15 0 represents the 16 bit range that runs from bit 0 inclusive through bit 15 In some places an ellipsis 15 0 or partial ellipsis 15 0 may used in place of a colon for visibility Unfamiliar terms presented for the first time are printed in bold letters and are followed as closely as possible by a definition or description This document is updated from changes made to the Version 1 0 document dated June 26 1995 Any corrections made to this manual will be found in the R10000 User Manual Errata for Revision 2 0 The errata in this manual are indicated by the following paragraph heading Specific changes to the text are underlined in the text as shown below while descriptions of changes that have been made are italicized as shown below PLLDis and SelDVCO signal descriptions are revised in Table 3 4 System designers must take care especially in desktop applications to ensure sufficient airflow MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 v vi Getting MIPS Docu
318. on 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 234 Chapter 13 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 14 Coprocessor 0 This chapter describes the Coprocessor 0 operation concentrating on the CPO register definitions and the R10000 processor implementation of CPO instructions The Coprocessor 0 CPO registers control the processor state and report its status These registers can be read using MFCO instructions and written using MTCO instructions CPO registers are listed in Table 14 1 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997235 236 Chapter 14 Table 14 1 Coprocessor 0 Registers Register No Register Name Description 0 Index Programmable register to select TLB entry for reading or writing 1 Random Pseudo random counter for TLB replacement 2 EntryLo0 Low half of TLB entry for even VPN Physical page number 3 EntryLo1 Low half of TLB entry for odd VPN Physical page number 4 Context Pointer to kernel virtual PTE table in 32 bit addressing mode 5 Page Mask Mask that sets the TLB page size Number of wired TLB entries lowest TLB entries not used for random 6 Wired replacement 7 Undefined Undefined 8 BadVAddr Bad virtual address 9 Count Timer count 10 EntryHi High half of TLB entry Virtual page number and ASID 11 Compare Timer compare
319. on unit these instructions are issued to ALU2 ALU2 remains busy for the duration of the divide Floating point divides are performed by the Divide execution unit these instructions are issued to the floating point multiplier Floating point square root are performed by the Square root execution unit these instructions are issued to the floating point multiplier Primary Instruction Cache I cache The primary instruction cache has the following characteristics Primary Data Cache D cache it contains 32 Kbytes organized into 16 word blocks is 2 way set associative using a least recently used LRU replacement algorithm it reads four consecutive instructions per cycle beginning on any word boundary within a cache block but cannot fetch across a block boundary its instructions are predecoded its fields are rearranged and a 4 bit unit select code is appended it checks parity on each word it permits non blocking instruction fetch The primary data cache has the following characteristics MIPS R10000 Microprocessor User s Manual it has two interleaved arrays two 16 Kbyte ways it contains 32 Kbytes organized into 8 word blocks is 2 way set associative using an LRU replacement algorithm it handles 64 bit load store operations it handles 128 bit refill or write back operations it permits non blocking loads and stores it checks parity on each byte Version 2 0 of January 29 1997 10 Chapter 1 Instru
320. only when a new value is loaded into the EntryHi register the one bit result of the match is stored in a static Enable latch This bit is set whenever a new entry is loaded A translation may be defined as global so that it can be shared by all processes This G bit is set in the TLB entry and enables the entry independent of its ASID value Setting the TS bit in the Status register indicates an entry being presented to the TLB matches more than one virtual page entry in the TLB Any TLB entries that allow multiple matches even in the Wired area are invalidated before the new entry can be written into the TLB This prevents multiple matches during address translation MIPS R10000 Microprocessor User s Manual 17 CPU Exceptions This chapter describes the processor exceptions a general view of the cause and return of an exception exception vector locations and the types of exceptions that are supported including the cause processing and servicing of each exception MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997331 332 Chapter 17 17 1 Causing and Returning from an Exception When the processor takes an exception the EXL bit in the Status register is set to 1 which means the system is in Kernel mode After saving the appropriate state the exception handler typically changes the KSU bits in the Status register to Kernel mode and resets the EXL bit back to 0 When restoring the state and restarting th
321. ons Charlie Price for use of his rejuvenated MIPS 4 Instruction Set Architecture Steve Proffitt for both his technical assistance and helping handle the multitude of niggling details involved in getting this manual printed The following also provided technical help in innumerable ways Arun Mehta Tim Layman Greg Shippen Yeffi Van Atta John Brennan Len Widra Roy Johnson Hector Sucar Hong Men Su Mazin Khurshid Steve Whitney Doug Yanagawa chip illustrations and socket pinouts Mike Gupta Steven Peltier Rob Conrad Hai Nguyen Bill Voegtli and Sharad Mehrotra at the University of Illinois Remediating a prior deficiency thanks to Tom McReynolds In Production and Creative thanks to Melissa Miller for her design of the cover appreciable in hardcopy only right now Yen Nguyen for handling the printing both Kay Maitz and Beth Fraker for resolving various design issues and Michael Ritchie for tracking progress Joe Heinrich December 1995 Mt View California R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 iii R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 iv About This Manual Glossary Stylistic Conventions Errata Errata This manual describes the MIPS R10000 RISC microprocessor also referred to as the processor in this book Certain specialized terms used in this book are defined in the Glossary at the end of this manual A brief note on some
322. ons Therefore in general there are no hazards for CacheOps However if the CacheOps modify the current instruction fetching stream they may not work properly since the instruction fetch pipeline usually prefetches and buffers instructions and CacheOps are not serialized with respect to the instruction fetch pipeline Programmers should be aware of such potential hazards one solution is to put a COPO instruction after the CacheOp to prevent the speculative execution and force the CacheOp to complete and then use a Jump Register instruction to flush the instruction fetch pipeline Succeeding instructions will then be re fetched from caches If CPO is not usable a Coprocessor Unusable exception is taken CacheOps may induce Address Error or TLBL exceptions Refill or Invalid during address translation but never take a TLBS or Mod exception The virtual address is used to index the cache for an Index CacheOp but need not match the cache physical tag unmapped addresses may be used to avoid TLB exceptions The R10000 processor does not support the CE bit and programmers must supply correct parity bits or ECC for some CacheOps The R10000 processor supports the CH bit for secondary CacheOps Hit Invalidate and Hit WriteBack Invalidate As in the R4400 a hit sets the CH bit of the Status register and a miss resets it This bit is readable and writable by software For a detailed description of the individual CacheOps see Chapter 10 CACHE
323. onship Between Processor and External Requests for the Cluster Bus Processor Request External Request Coherent block read shared Intervention shared Coherent block read exclusive Intervention exclusive Noncoherent block read Allocate request number Double single partial word read Allocate request number Block write NOP Double single partial word write NOP Upgrade Invalidate Eliminate NOP MIPS R10000 Microprocessor User s Manual System Interface Operations 149 In the same manner a processor coherency data response issued by a processor in the master state is observed as an external block data response by any processors in the slave state External coherency requests that target a processor are handled in FIFO order and result in processor coherency state responses If an external coherency request that targets a processor hits a DirtyExclusive secondary cache block the processor also provides a processor coherency data response Figure 6 27 presents an example of a processor read request with four R10000 processors residing on the cluster bus The CohPrcReqTar mode bit is asserted for a snoopy based coherency protocol R100009 issues a processor coherent read exclusive request This is observed as an external intervention exclusive request by R10000 R100005 and R100003 R10000 and R100003 respond with Invalid processor coherency state responses R10000 responds with a DirtyExclusive processor coherency state response
324. ontext XContext and EntryHi registers contain the virtual address that failed address translation The EntryHi register also contains the ASID from which the translation fault occurred The contents of the EntryLo register are undefined The EPC register contains the address of the instruction that caused the exception unless that instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set The kernel uses the failed virtual address or virtual page number to identify the corresponding access control information The page identified may or may not permit write accesses if writes are not permitted a write protection violation occurs If write accesses are permitted the page frame is marked dirty writable by the kernel in its own data structures The TLBP instruction places the index of the TLB entry that must be altered into the Index register The EntryLo register is loaded with a word containing the physical page frame and access control bits with the D bit set and the EntryHi and EntryLo registers are written into the TLB MIPS R10000 Microprocessor User s Manual CPU Exceptions Cache Error Exception 345 The Cache Error exception is described in Chapter 9 Cache Error Exception Virtual Coherency Exception Errata The Virtual Coherency exception is not implemented in the R10000 processor since the virtual coherenc
325. or Uncached accelerated block write requests the processor drives the uncached attribute onto SysAD 59 58 See the section titled Support for Uncached Attribute in this chapter for more information MIPS R10000 Microprocessor User s Manual System Interface Operations SysAD 57 103 During the address cycle of processor block read typical block write upgrade and eliminate requests SysAD 57 contains the secondary cache block way indication This information may be useful for system designs implementing a duplicate tag or a directory based coherency protocol SysAD 56 40 When processor is in master state it drives SysAD 56 40 to zero during address cycles SysADI39 0 During the address cycle of processor and external requests SysAD 39 0 contain the physical address Table 6 22 presents the processor request address cycle address alignment Table 6 22 Processor Request Address Cycle Alignment Processor Request Type Address Bits Which Address Alignment Arte Driven to 0 Block read Quadword Doubleword read write Doubleword Singleword read write Singleword Halfword read write Byte tribyte quintibyte sextibyte septibyte read write Halfword Byte Block write 5 0 SCBlkSize 0 Secondary cache block 6 0 SCBlkSize 1 Upgrade Quadword 3 0 EM 5 0 SCBIkSize 0 Eliminate Secondary cache block 6 0 SCBIkSize 1 Table 6 23 presents the extern
326. or coherent block read or upgrade request The processors in slave state observe the processor request as an external coherency request that targets the external agent only causing the associated request number to become busy The cluster coordinator checks the duplicate tag or directory structure to determine if the block resides in the cache of one of the processors that was in slave state If necessary the cluster coordinator issues an external coherency request targeted at one or more of the processors that were in slave state By using the same request number as the original processor request this external coherency request does not consume a free request number and allows a potential processor coherency data response to be supplied as an external block data response to the original processor request MIPS R10000 Microprocessor User s Manual System Interface Operations 153 6 22 Support for a Directory Based Coherency Protocol Some system designs implement a directory based coherency protocol For such systems the processor provides the processor eliminate request cycle If the PrcElmReq mode bit is asserted the processor issues a processor eliminate request whenever it intends to eliminate a Shared CleanExclusive or DirtyExclusive block from the secondary cache During the address cycle of the processor eliminate request the physical address and the secondary cache block former state are provided The external agent may then use this
327. or increasing program efficiency e Compilers should reduce the number of branches as much as possible e Jump Register instructions should be avoided Aggressive use of the new integer and floating point conditional move instructions is recommended e Branch prediction rates may be improved by organizing code so that each branch goes the same direction most of the time since a branch that is taken 50 of the time has higher average cost than one taken 90 of the time The MIPS IV conditional move instructions may be effective in improving performance by replacing unpredictable branches Errata Side Effects of Speculative Execution To improve performance R10000 instructions can be speculatively fetched and executed Side effects are harmless in cached coherent operations however there are potential side effects with non coherent cached operations These side effects are described in the sections that follow Speculatively fetched instructions and speculatively executed loads or stores to a cached address initiate a Processor Block Read Request to the external interface if it misses in the cache The speculative operation may modify the cache state and or data and this modification may not be reversed even if the speculation turns out to be incorrect and the instruction is aborted Speculative Processor Block Read Request to an I O Address Accesses to I O addresses often cause side effects T
328. or provide a block of data using the SysAD 63 0 bus e external completion responses provide an acknowledge error or negative acknowledge indication using the SysResp 4 0 bus An external agent may only issue an external data response to the processor when the System interface is in slave state If the System interface is not already in slave state the external agent must first negate SysGnt and then wait for the processor to assert SysRel If the System interface is already in slave state the external agent may issue an external data response immediately External data responses may be accepted by the processor in adjacent SysCIk cycles and in arbitrary order relative to corresponding processor requests An external agent may issue an external completion response when the System interface is in either master or slave state External completion responses may be accepted by the processor in adjacent SysClk cycles and in arbitrary order relative to the corresponding processor requests External Block Data Response Protocol An external agent may issue an external block data response in response to a processor block read or upgrade request An external agent issues an external block data response with 8 or 16 data cycles Each data cycle consists of the following e asserting SysCmd 11 e driving the request number associated with the corresponding processor request on SysCmd 10 8 e driving the data quality indication on SysCmd 5
329. or the secondary Output cache tag SSRAM SSRAM Data Signals SCADWay Secondary cache data way SCBDWay Duplicated signal that indicates the way of the secondary cache data SSRAM Output that is to be accessed Secondary cache data bus 128 bit bus to read write cache data from to secondary cache data SSRAM Secondary cache data check bus SCDataChk 9 0 A 10 bit bus used to read write ECC and even parity from to the secondary Bidirectional SCData 127 0 Bidirectional cache data SSRAM SCADOE Secondary cache data output enable SCBDOE Duplicated signal that enables the outputs of the secondary cache data SSRAM SCADWr Secondary cache data write enable SCBDWr Duplicated signal that enables writing the secondary cache data SSRAM SCADCS Secondary cache data chip select Output SCBDCS Duplicated signal that enables the secondary cache data SSRAM ii i All cache static RAM SRAM are synchronous SRAM SSRAM MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 40 Chapter 3 Table 3 2 cont Secondary Cache Interface Signals Signal Name Description Type SSRAM Tag Signals Secondary cache tag way ey Signal indicating the way of the secondary cache tag SSRAM to be accessed Caput Secondary cache tag bus UNES PSTagiuon A 26 bit bus to read write cache tags from to the secondary cache tag SSRAM Bicarechonal Secondary cache tag check bus reu ad SC lag ChKIG O A 7 bit bu
330. ores fields from the CPO TagLo and TagHi registers into the secondary cache tag and MRU array fields The PA Cachesize 2 Blocksize defines the address and PA 0 defines the way to be read The following mapping defines the operation Tag ECC bits TagLo 6 0 Virtual index bits TagLo 8 7 State bits TagLo 11 10 Tag 35 18 TagLo 31 14 Tag 39 36 TagHi 3 0 MRU bit TagHi 31 All Tag fields including ECC are directly written Parity check is suppressed for all Index Store Tags Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual CACHE Instructions 197 10 11 Hit Invalidate I Hit Invalidate I invalidates an entry in the instruction cache which matches the PA of the CACHE instruction Both way tags at VA 13 6 are read from the instruction cache If the IState is 1 Valid and the PA of the CACHE instruction matches the Tag from the instruction cache tag array the IState bit of the entry is written to 0 Invalid and the IState parity bit is written to 0 The LRU bit does not change Parity error is checked Hit CacheOps can cause cache error exceptions if they check ECC or parity bits 10 12 Hit Invalidate D Hit Invalidate D invalidates an entry in the data cache which matches the PA of the CACHE instruction Both ways tags at VA 13 5 are read from the data cache If the DState is not equal to 00 Invalid and the PA of the CACHE instruction matches the DTag from the data cache ta
331. orrectly predicted and there are no conflicting requests Repeat rate PCIk cycles needed to transfer 2 quadwords data cache or 4 quadwords instruction cache Rate is valid for bursts of 2 to 3 cache misses if more than three cache misses in a row there can be a 1 cycle bubble t Clock synchronization causes variability MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 32 Version 2 0 of January 29 1997 Chapter 1 The processor mitigates access delays to the secondary cache in the following ways e The processor can execute up to 16 load and store instructions speculatively and out of order using non blocking primary and secondary caches That is it looks ahead in its instruction stream to find load and store instructions which can be executed early if the addressed data blocks are not in the primary cache the processor initiates cache refills as soon as possible e Ifa speculatively executed load initiates a cache refill the refill is completed even if the load instruction is aborted It is likely the data will be referenced again e The data cache is interleaved between two banks each of which contains independent tag and data arrays These four sections can be allocated separately to achieve high utilization Five separate circuits compete for cache bandwidth address calculate tag check load unit store unit external interface e The external interface gives priority to its refill
332. pPar 220 SysResp 4 SysResp 3 SysResp 2 223 SysResp 1 224 SysResp 0 225 SysGblPerf 226 SysRdRdy SysWrRdy SysStateVal 229 SysStatePar 230 SysState 2 231 SysState 1 232 SysState 0 SysCorErr SysUncErr 235 SysNMI 236 SCDataChk 7 237 SCDataChk 5 238 SCData 127 SCData 126 SCData 125 241 SCData 124 242 SCData 123 243 SCData 122 244 SCData 121 SCData 120 SCData 119 247 SCData 118 248 SCData 117 249 SCData 116 250 SCData 115 SCData 114 SCData 113 208 Chapter 11 Table 11 2 cont Boundary Scan Register Pinlist rev 1 2 Signal Signal Signal 256 SCBDCS 257 SCBDWay 258 SCBAddi 8 262 SCBAddr 4 263 SCBAddr 3 264 SCBAddr 2 268 SCData 110 269 SCData 109 270 SCData 108 n SCTaaK NS SCTagla 276 SCTasD 280 SCTagLSBAddr 281 TriState 282 SCTWay 286 SCTagChk 3 287 SCTagChk 2 288 SCTagChk 1 292 SysCmdp 293 SysCmal3 __ 294 SwsCmdi 298 SysCmd 8 299 SysCmd 9 300 SysCmd 10 316 SysRdRdy 317 SysWrRdy 318 SysStateVal 322 SysState 0 323 SysCorErr 324 SysUncErr 328 SCData 127 329 SCData 126 330 SCData 125 334 SCData 121 335 SCData 120 336 SCData 119 NIN 337 SCData 118 338 SCData 117 339 SCData 116 340 SCData 115 341 SCData 114 342 SCData 113 n 346 SCBDC 347 SCBDWay 348 SCBAddr 8 349 SCBAddr 7 352 SCBAddr 4 353 SCBAddi 3 354 SCBAddr 2 355 SCBAddri 1 358 SCData 110 359 SCData 109 360 SCData
333. ped Uncached 0 5 Gbytes Unmapped Cached Address Error cksseg ckseg1 ckseg0 Mapped xkseg Unmapped xkphys Address Error Address Error if SX 0 xksseg Address Error if UX 0 or ERL 1 _ 16 Tbytes _ Mapped See Note below xkuseg Figure 16 3 Kernel Mode Address Space NOTE If ERL 1 the selected 2 Gbyte space becomes uncached and unmapped MIPS R10000 Microprocessor User s Manual Memory Management 323 32 bit Kernel Mode User Space kuseg In Kernel mode when KX 0 in the Status register and the most significant bit of the virtual address A31 is cleared the 32 bit kuseg virtual address space is selected it covers the full 2 bytes 2 Gbytes of the current user address space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address 32 bit Kernel Mode Kernel Space 0 kseg0 In Kernel mode when KX 0 in the Status register and the most significant three bits of the virtual address are 1005 32 bit kseg0 virtual address space is selected it is the 22 byte 512 Mbyte kernel physical space References to kseg0 are not mapped through the TLB the physical address is selected by subtracting 0x8000 0000 from the virtual address The KO field of the Config register determines cacheability and coherency 32 bit Kernel Mode Kernel Space 1 kseg1 In Kernel mode when KX 0 in the Status register and the most significant thr
334. pletely gathered uncached accelerated blocks are issued to the System interface as processor block write requests Incompletely gathered uncached accelerated blocks are issued to the System interface using processor double single word write requests this is also described in Chapter 6 the section titled Uncached Buffer MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 56 Chapter 4 4 5 Relationship Between Cached and Uncached Operations Version 2 0 of January 29 1997 Uncached and uncached accelerated load and store instructions are executed in order and non speculatively Such accesses are buffered in the uncached buffer by the processor until they can be issued to the System interface All uncached and uncached accelerated accesses retain program order within the uncached buffer The processor continues issuing cached accesses while uncached accesses are queued in the uncached buffer NOTE Cached accesses do not probe the uncached buffer for conflicts Buffered uncached stores prevent a SYNC instruction from graduating However buffered uncached accelerated stores do not prevent a SYNC instruction from graduating The processor continues issuing cached accesses speculatively and out of order beyond a SYNC instruction that is waiting to graduate An uncached load may be used to guarantee that the uncached buffer is flushed of all uncached and uncached accelerated accesses A SYNC instruction an
335. pletion response External coherency conflicts occur when the processor has a processor request that is pending response and a conflicting external coherency request is received The processor relies on the external agent to detect and resolve external coherency conflicts If the external agent chooses to issue an external coherency request to the processor which causes an external coherency conflict the external coherency request must be completed before an external response is given to the conflicting processor request External coherency conflicts may be avoided if the point of coherence is the processor System interface bus and only one request is allowed to be outstanding for any given secondary cache block However in some system designs external coherency conflicts are unavoidable Processor block write and eliminate requests are never pending response and therefore cannot cause external coherency conflicts Table 6 29 describes the manner in which the external agent resolves external coherency conflicts MIPS R10000 Microprocessor User s Manual System Interface Operations Table 6 29 External Coherency Conflict Resolution Processor Requests that Conflicting External are Pending Response Coherency Request Resolution Intervention shared Intervention exclusive Coherent block read Invalidate The external agent responds to the external coherency requestor that the block is Invalid At some later tim
336. point register file Values are loaded to or stored from the register file by the load store and move units A logic diagram of floating point operations is shown in Figure 15 1 in which data and instructions are read from the secondary cache into the primary caches and then into the processor There they are decoded and appended to the floating point queue passed into the FP register file where each is dynamically issued to the appropriate functional unit After execution in the functional unit results are stored through the register file in the primary data cache Secondary Cache 512 Kbyte to 16 Mbyte i FP Adder System Bus Register l 32 Kbyte File FP 2 way associative 64 by 64 Multiply Refil Copyback Primary Register FP Divide 32 Kbyte Instruction Cache Instruction Rename amp SQRT 2 way associative Decode Map Branch Cache Rename E Branch Active and Unit Free Lists Primary Data Cache FP Queue 16 entry Branch Address Figure 15 1 Logical Diagram of FP Operations The floating point queue can issue one instruction to the adder unit and one instruction to the multiplier unit The adder and multiplier each have two dedicated read ports and a dedicated write port in the floating point register file Because of their low repeat rates the divide and square root units do not have their own issue port Instead they decode instructions issued to the multiplier unit
337. q SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Figure 6 22 External Invalidate Request Protocol MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 136 Chapter 6 External Interrupt Request Protocol Cycle SysCIk Master SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Version 2 0 of January 29 1997 An external agent issues an external interrupt request to interrupt the normal instruction flow of the processor An external agent issues an external interrupt request with a single address cycle This address cycle consists of the following e negating SysCmd 11 e driving the special command on SysCmd 7 5 e driving the interrupt special cause indication on SysCmd 4 3 e driving the ECC check indication on SysCmd 0 e driving the target indication on SysAD 63 60 e driving the Interrupt register write enables on SysAD 20 16 e driving the Interrupt register values on SysAD 4 0 e asserting SysVal An external agent may only issue an external interrupt request address cycle when the System interface is in slave state Figure 6 23 depicts three external interrupt requests Since the System interface is initially in mas
338. quest and Response Protocol sse 112 Processor Request Protocol nennen 112 Processor Block Read Request Protocol sss 113 Processor Double Single Partial Word Read Request Protocol 115 Processor Block Write Request Protocol sss 117 Processor Double Single Partial Word Write Request Protocol 119 Processor Upgrade Request Protocol sss 121 Processor Eliminate Request Protocol sss 123 Processor Request Flow Control Protocol sess 125 External Response Prot col sirenes enkei eei p iiei re KESE ESTE a DEKE eE Es 127 External Block Data Response Protocol sess 127 External Double Single Partial Word Data Response Protocol 129 External Completion Response Protocol s esses 130 External Request Protocol 5 ee eao reete seats edped eee 132 External Intervention Request Protocol sss 133 External Allocate Request Number Request Protocol sss 134 External Invalidate Request Protocol sss 135 External Interrupt Request Protocol sss 136 Processor Response Protocol sse nenne 137 Processor Coherency State Response Protocol sss 138 Processor Coherency Data Response Protocol
339. r SysStateVal System state bus parity Odd parity for the system state bus System state bus valid The processor asserts this signal for one SysCIk cycle when issuing a processor coherency state response on the system state bus System Response Bus Signals Output Output SysResp 4 0 SysRespPar SysRespVal System response bus A 5 bit bus used by the external agent for issuing external completion responses System response bus parity Odd parity for the system response bus System response bus valid The external agent asserts this signal for one SysClk cycle when issuing an external completion response on the system response bus Input Input Input SysReset SysNMI System Miscellaneous Signals System reset The external agent asserts this signal to reset the processor System non maskable interrupt The external agent asserts this signal to indicate a non maskable interrupt Input Input SysCorErr SysUncErr SysGblPerf System correctable error The processor asserts this signal for one SysClk cycle when a correctable error is detected and corrected System uncorrectable error The processor asserts this signal for one SysClk cycle when an uncorrectable tag error is detected System globally performed The external agent asserts this signal to indicate that all processor requests have been globally performed with respect to all external agents Output Output Input
340. r since cache test mode repeat rates and latencies are expressed in terms of PClk cycles the external agent must take care when operating at any system interface clock divisor other than Divide by 1 MIPS R10000 Microprocessor User s Manual Cache Test Mode 361 18 3 Entering Cache Test Mode In order for the processor to enter cache test mode the external agent must begin a Power on or Cold Reset sequence Rather than negating SysReset at the end of the reset sequence the external agent loads the mode bits into the processor by driving the mode bits with the CTM signal asserted on SysAD 63 0 waits at least two SysClk cycles and then asserts SysGnt for at least one SysClk cycle After waiting at least another 100 ms the external agent may issue the first cache test mode command Figure 18 1 shows the cache test mode entry sequence Cycle SysCIk Master SysReset E L LLL hp SysGne eee rmm SysAD 63 0 i j i JC Modes Kd JCond X Sysvar o a FA NM a 2100s SysRespVal le cons Assert CTM mode bit First cache test mode command Figure 18 1 Cache Test Mode Entry Sequence MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 362 Chapter 18 18 4 Exit Sequence Cycle SysCIk Master SysReset SysGnt SysAD 63 0 SysVal SysRespVal To leave cache test mode the external agent does the following loads the mode bits into the processor by drivi
341. r Block write for Uncached accelerated Pone SIBI partial word completely gathered blocks NA t Should not occur under normal circumstances Most systems return the Exclusive state for a cacheable noncoherent line therefore the Shared state is not normal MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 58 Chapter 4 4 7 Cache Block Ownership Version 2 0 of January 29 1997 The processor requires cache blocks to have a single owner at all times The owner is responsible for providing the current contents of the cache block to any requestor The processor uses the following ownership rules e The processor assumes ownership of a cache block if the state of the cache block becomes DirtyExclusive For a processor block read request the processor assumes ownership of the block after receiving the last doubleword of a DirtyExclusive external block data response and an external ACK completion response For a processor upgrade request the processor assumes ownership of the block after receiving an external ACK completion response e The processor gives up ownership of a cache block if the state of the cache block changes to Invalid CleanExclusive or Shared e CleanExclusive and Shared cache blocks are always considered to be owned by memory MIPS R10000 Microprocessor User s Manual 5 Secondary Cache Interface The processor supports a mandatory secondary cache by providing an internal secon
342. r must first assert SysReq and then wait until the external agent relinquishes mastership of the System interface by asserting SysGnt and SysRel 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 ELE ads 8 A a8 oi gt uda n asa epe as EA i EA i EA EA Por Po i Po i Po Po Po i Po i Po Po i Po i Po i Po i Po am EET II A ME OM ME NE Jose Race A A OSPA SPWA Ras X Adr X Dat X i i X Adr X Dat X Adr X Dat X i Figure 6 13 Processor Double Single Partial Word Write Request Protocol MIPS R10000 Microprocessor User s Manual System Interface Operations 121 Processor Upgrade Request Protocol A processor upgrade request results from a store or prefetch exclusive that hits a Shared block in the secondary cache As shown in Figure 6 14 the processor issues a processor upgrade request with a single address cycle This address cycle consists of the following e negating SysCmd 11 e driving a free request number on SysCmd 10 8 e driving the upgrade command on SysCmd 7 5 driving the upgrade cause indication on SysCmd 4 3 driving the secondary cache block former state on SysCmd 2 1 e asserting SysCmd 0 e driving the target indication on SysAD 63 60 driving the secondary cache block way on SysAD 57 e driving the physical address on SysAD 39 0 e asserting SysVal The processor may only issue a processor upgrade
343. r0 SysStateVal0 SysStatet 20 o gt SysStatePar1 Lo X SysStateVal1 il SysState2 2 0 Xo SysStatePar2 SysStateVal2 COM SysState3 2 0 OO X SysStatePar3 Ed SysStateVal3 Figure 6 27 R10000 Multiprocessor Cluster Processor Read Request Example Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations Cycle SysClk Master SysReq0 SysGnt0 SysGnt1 SysReq2 SysGnt2 SysReq3 SysGnt3 SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysResp 4 0 SysRespPar SysRespVal SysState0 2 0 SysStatePar0 SysStateVal0 SysState1 2 0 SysStatePar1 SysStateVal1 E SysState2 2 0 SysStatePar2 SysStateVal2 SysState3 2 0 SysStatePar3 SysStateVal3 SysReq1 X Uad Agr x RTE SE Se E a o Q rr gt Ol x Figure 6 28 R10000 Multiprocessor Cluster Processor Upgrade Request Example MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 151 152 6 20 Support for I O Chapter 6 The processor assumes a memory mapped I O model Consequentially no special System interface encodings are provided or required to designate I O accesses It is left to the programmer to ensure that I O addresses have the appropriate TL
344. rRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Version 2 0 of January 29 1997 An external agent issues an external allocate request number request to reserve a request number for private use Once allocated the processor is prevented from using the request number until an external completion response for the request number is received An external agent issues an external allocate request number request with a single address cycle this address cycle consists of the following e negating SysCmd 11 e driving a free request number on SysCmd 10 8 e driving the allocate request number command on SysCmd 7 5 e asserting SysVal An external agent may only issue an external allocate request number request address cycle when the System interface is in slave state and there is a free request number The external agent may have as many as eight external allocate request number requests outstanding on the System interface at any given time Figure 6 21 depicts three external allocate request number requests Since the System interface is initially in master state the external agent must first negate SysGnt and then wait until the processor relinquishes mastership of the System interface by asserting SysRel 112 13 4 5 86 7 8 9 10 11 12 13 14 15 16 17 UNT NT NT A ANN LN LN TN TNT NIN TIN TINT NT ON Py Po Po Po
345. rallel By implication a superscalar processor has more than one pipeline see below In the processor pipeline the execution of each instruction is divided into a sequence of simpler suboperations Each suboperation is performed by a separate hardware section called a stage and each stage passes its result to a succeeding stage Normally each instruction only remains in each stage for a single cycle and each stage begins executing a new instruction as previous instructions are being completed in later stages Thus a new instruction can often begin during every cycle Pipelines greatly improve the rate at which instructions can be executed as long as there are no dependencies The efficient use of a pipeline requires that several instructions be executed in parallel however the result of any instruction is not available for several cycles after that instruction has entered the pipeline Thus new instructions must not depend on the results of instructions which are still in the pipeline The latency of an execution pipeline is the number of cycles between the time an instruction is issued and the time a dependent instruction which uses its result as an operand can be issued In the R10000 processor most integer instructions have a single cycle latency load instructions have a 2 cycle latency for cache hits and floating point addition and multiplication have a 2 cycle latency Integer multiply floating point square root and all d
346. ranslation In the R10000 processor the lower bits not used for address translation are forced to zeroes during a TLB write This does not affect TLB address translation however a TLB read may not retrieve what was originally written Operation 32 64T TLB Randoms og PageMask EntryHi and not PageMask EntryLo1 EntryLoO Exceptions Coprocessor unusable exception Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 15 Floating Point Unit This section describes the operation of the FPU including the register definitions The Floating Point unit consists of the following functional units e add unit e multiply unit divide unit e square root unit The add unit performs floating point add and subtract compare and conversion operations Except for Convert Integer To Single Precision float all operations have a 2 cycle latency and a 1 cycle repeat rate The multiply unit performs single precision or double precision multiplication with a 2 cycle latency and a 1 cycle repeat rate The divide and square root units do single or double precision operations They have long latencies and low repeat rates 20 to 40 cycles MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997301 302 Chapter 15 15 1 Floating Point Unit Operations The floating point add multiply divide and square root units read their operands and store their results in the floating
347. rbitration protocol System interface 108 arbitration rules System interface 109 arbitration signals 41 arbitration cluster bus 82 Architecture with Non sequential Dynamic Execution Scheduling see also ANDES 375 arithmetic instructions FPU 310 arithmetic logic unit see also ALU 18 array 62 array page table entry PTE 241 ASID Address Space Identifier context switch 330 relationship to Global G bit in TLB entry 330 ASID Address Space Indentifier stored in EntryHi register 330 ASID field 245 asynchronous inputs AC electrical specification 216 auto increment read cache test mode 367 auto increment write cache test mode 365 B Bad Virtual Address register BadVAddr 244 BadV Addr register 241 259 340 BadVPN2 field 241 259 BD branch delay bit 252 254 BE memory endianness bit 256 BEV boot exception vector bit 250 BEV bit 171 332 block instruction cache 9 primary data cache 9 secondary cache 10 size primary data cache 48 primary instruction cache 46 secondary cache 51 block data transfers 94 external block data responses 94 processor block write requests 94 processor coherency data responses 94 Version 2 0 of January 29 1997 Index boundary scan register JTAG 206 BPIdx field 262 BPMode field 261 BPOp field 262 BPState field 262 branch determining next address 17 instruction limits on execution 17 prediction 14 31 374 prediction rates improving 21 speculative 374 unit 10 17
348. read write registers which contain a physical address of a doubleword location in main memory If enabled any attempt to read or write this location causes a Watch exception This feature is used for debugging Bits 7 0 of the WatchHi register contain bits 39 32 of the trap physical address shown in Figure 14 18 The WatchLo register contains physical address bits 31 3 The remaining bits of the register are ignored on write and read as zero Table 14 16 describes the WatchLo and WatchHi register fields WatchLo Register 81 1 0 PAddrO 0 R W 29 1 1 1 WatchHi Register 31 8 7 0 24 8 Figure 14 18 WatchLo and WatchHi Register Formats Table 14 16 WatchHi and WatchLo Register Fields Field PAddr1 Bits 39 32 of the physical address PAddr0 Bits 31 3 of the physical address R Trap on load references if set to 1 W Trap on store references if set to 1 0 Ignored on write and read as zero Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 259 14 17 XContext Register 20 The read write XContext register contains a pointer to an entry in the page table entry PTE array an operating system data structure that stores virtual to physical address translations When there is a TLB miss the operating system software loads the TLB with the missing translation from the PTE array The XContext register no longer shares the information provided in the BadV Addr register as it did
349. registers it has a 16 entry address stack for use by non blocking loads and stores it has a 44 bit virtual address calculation unit it has a 64 entry fully associative Translation Lookaside Buffer TLB which converts virtual addresses to physical addresses using a 40 bit physical address Each entry maps two pages with sizes ranging from 4 Kbytes to 16 Mbytes in powers of 4 64 bit Floating Point Pipeline The 64 bit floating point pipeline has the following characteristics it has a 16 entry instruction queue with dynamic issue it has a 64 bit 64 location floating point physical register file with five read and three write ports 32 logical registers it has a 64 bit parallel multiply unit 3 cycle pipeline with 2 cycle latency which also performs move instructions it has a 64 bit add unit 3 cycle pipeline with 2 cycle latency which handles addition subtraction and miscellaneous floating point operations it has separate 64 bit divide and square root units which can operate concurrently these units share their issue and completion logic with the floating point multiplier A block diagram of the processor and its interfaces is shown in Figure 1 5 followed by a description of its major logical blocks MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 8 Chapter 1 ke Up to 4 R10000 Microprocessors may be directly connected Secondary Cache E 2 SC Address 2 g
350. resources which otherwise would have been idle however in some cases speculative instructions can delay previous instructions Cache Performance The execution of load and store instructions can greatly affect performance These instructions are executed quickly if the required memory block is contained in the primary data cache otherwise there are significant delays for accessing the secondary cache or main memory Out of order execution and non blocking caches reduce the performance loss due to these delays however The latency and repeat rates for accessing the secondary cache are summarized in Table 1 3 These rates depend on the ratio of the secondary cache s clock to the processor s internal pipeline clock The best performance is achieved when the clock rates are equal slower external clocks add to latency and repeat times The primary data cache contains 8 word blocks which are refilled using 2 cycle transfers from the quadword wide secondary cache Latency runs to the time in which the processor can use the addressed data The primary instruction cache contains 16 word blocks which are refilled using 4 cycle transfers Table 1 3 Latency and Repeat Rates for Secondary Cache Reads SCCIkDiv Latency CI Mode PCIk Cycles PCIk Cycles 1 6 2 data cache 4 instruction cache cd 3 data cache s PU 6 instruction cache ant 4 data cache A in 8 instruction cache ti Assumes the cache way was c
351. resume after the Cache Error exception handler completes When ERL 1 the user address region becomes a 2 Gbyte uncached space mapped directly to the physical addresses This allows the Cache Error handler to save registers directly to memory without having to use a register to construct the address The processor does not support nested Cache Error exception handling While the CPO Status register ERL bit is asserted any subsequent Cache Error exceptions are ignored However the detection of additional uncorrectable errors is not inhibited and additional Cache Error exceptions may be posted t The hardware does not handle the case of multiple Cache Error exceptions in any special manner caches are refilled as normal and data forwarded to the appropriate functional units MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 172 Chapter 9 9 5 CPO CacheErr Register EW Bit When a unit detects an uncorrectable error it records information about the error in its local CacheErr register and posts a Cache Error exception If a subsequent uncorrectable error occurs while waiting for the Cache Error exception to be taken and transfer of the local CacheErr register to the CPO CacheErr register to complete the EW bit is set in its local CacheErr register Once the Cache Error exception is taken the EW bit in the CPO CacheErr register is set and the Cache Error exception handler now determines that a second error has
352. rformance counters and their associated event specifier registers which are mapped into the CPO register 25 The following instructions are used to perform an MTCO to or an MFCO from a performance counter or an event specifier register The event specifier registers are referred as control registers in the description of CPO register 25 M f M F PC Parformance Counter M FPC 31 2625 2120 1615 1110 65 1 0 COPO 00000 rt 11001 0 reg 1 M t MTPC Berformanca Counter MTPC 0 1 1 31 2625 2120 1615 1110 65 1 6 5 5 5 5 5 M f M FPS Petfornf nce Event Specifier M FPS 31 2625 2120 1615 1110 65 1 10 COPO 00000 rt 11001 0 reg 0 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 295 M t MT PS Performance Event Specifier MTPS Format MFPC rt reg Move from performance counter MTPC rt reg Move to performance counter MFPS rt reg Move from performance event specifier MTPS rt reg Move to performance event specifier reg can be either a performance counter or an event specifier only register 0 and 1 are valid in the R10000 implementation Errata The 0 field in each instruction is changed from a 1 to a 0 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 296 Chapter 14 14 33 MTCO Instruction Move To MTCO System Control Coprocessor MTCO
353. ribed in the following sections are handled processed by hardware these exceptions are then serviced by software MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 336 Cold Reset Exception Cause Processing Servicing Version 2 0 of January 29 1997 Chapter 17 The Cold Reset exception is taken for a power on or cold reset it occurs when the SysGnt signal is asserted while the SysReset signal is also asserted This exception is not maskable The CPU provides a special interrupt vector for this exception e location OxBFCO 0000 in 32 bit mode e location OxFFFF FFFF BFCO 0000 in 64 bit mode The Cold Reset vector resides in unmapped and uncached CPU address space so the hardware need not initialize the TLB or the cache to process this exception It also means the processor can fetch and execute instructions while the caches and virtual memory are in an undefined state The contents of all registers in the CPU are undefined when this exception occurs except for the following register fields e In the Status register SR and TS are cleared to 0 and ERL and BEV are set to 1 All other bits are undefined e Config register is initialized with the boot mode bits read from the serial input e The Random register is initialized to the value of its upper bound e The Wired register is initialized to 0 e The EW bit in the CacheErr register is cleared e The ErrorEPC register gets the PC e
354. rnal ACK NACK or ERR completion response Processor upgrade request Processor eliminate request None Processor coherency state response followed by processor coherency data External intervention request response if DirtyExclusive with a coincidental or subsequent external ACK NACK or ERR completion response External allocate request number External ACK NACK or ERR completion response request Processor coherency state response followed by external ACK NACK or External invalidate request ERR completion response External interrupt request None i External completion response is required to free the request number Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 89 6 10 System Interface Buffers The processor contains the following five buffers to enhance the performance of the System interface and to simplify the system design cluster request buffer e cached request buffer e incoming buffer e outgoing buffer e uncached buffer These buffers are described in the following sections Cluster Request Buffer The System interface contains an 8 entry cluster request buffer This buffer maintains the status of the eight possible outstanding requests on the System interface When the System interface is in master state and it issues the address cycle of processor read or upgrade request the processor places an entry into the cluster r
355. roprocessor User s Manual Version 2 0 of January 29 1997 xxii Table of Contents 12 Electrical Specifications DC Electrical Specifica HoN nee pta deno e e HR ode de hacienda 210 DC Power Supply Levels ete er e tie ct tem Re iR ie deas 210 DCOKk and Power Supply Sequencing sss 211 Maximum Operating Conditions essere nennen 211 Input Signal Level Sensing sse nennen nennen 212 Mode Definitions 3e s siehe iei aee tte eet Hh ete rh p dhs egets 212 cie c H A 212 Unused IPU tS en e pedites marea nuit edid ie eA 213 DC Input Output Specifications srren aeee eee 214 AC Electrical Specification ssssssssssssseseseeeeeneneenere RSEN pR e Eai rana nhe nenne nennen 215 Maximum Operating Conditions sess 215 Test Specification enses tereti etie e eiae imi pae C terri ile Res 215 Secondary Cache and System Interface Timing sss 215 Enable Output Delay Setup Hold Time sss 216 Asynchronous Inputs 22 tue nnm ie lentes ion de a i ee on Fe ge ters 216 Signal Integrity ISSUES ih zii se dette UH eere erede e Gel bibe t tegit Pede 217 Reference Voltage iut eie tere iet eiie ee e ine ees 217 Power Supply Regulations reei ieia EREA ES EEEE nenne tenentes 217 Maximum Input Voltage Levels sss e a 217 Decoupling Capacitance mrin ecu dee dide 218 Version 2 0 of January 29 19
356. rsion 2 0 of January 29 1997 94 Chapter 6 6 12 System Interface Block Data Ordering During block data transfers on the System interface SysAD 63 0 bus even doublewords Dat0 Dat2 always correspond to SCData 127 64 and odd doublewords Dat1 Dat3 always correspond to SCData 63 0 External Block Data Responses During the address cycle of processor block read and upgrade requests the processor specifies a quadword aligned address The processor expects the external block data response to be supplied in a subblock order sequence beginning at the specified quadword aligned address Processor Coherency Data Responses The address of external intervention requests are internally aligned by the processor to a quadword address If the processor determines that it must issue a processor coherency data response it supplies the data in a subblock order sequence beginning at the quadword aligned address specified by the corresponding external coherency request Processor Block Write Requests During the address cycle of processor block write requests the processor specifies a cache block aligned address During the subsequent data cycles for typical processor block write requests the processor supplies the data in sequence beginning with the secondary cache block aligned address Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 95 6 13 System Interface Bus Encoding
357. rsion 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 271 Event 14 for Counter 0 Functional Unit Completion Cycles This counter is incremented once on the cycle after at least one of the functional units ALU1 ALU2 FPU1 or FPU2 marks an instruction as done Event 14 for Counter 1 Stores or Prefetches with Store Hint to Clean Exclusive Secondary Cache Blocks This counter is incremented on the cycle after a request to change the Clean Exclusive state of the targeted secondary cache line to Dirty Exclusive is sent to the SCTP logic Event 15 for Counter 0 Instruction Graduation This counter is incremented by the number of instructions that were graduated on the previous cycle When an integer multiply or divide instruction graduates it is counted as two graduated instructions Event 15 for Counter 1 Stores or Prefetches with Store Hint to Shared Secondary Cache Blocks This counter is incremented on the cycle after a request to change the Shared state of the targeted secondary cache line to Dirty Exclusive is sent to the SCTP logic The performance counters and associated control registers are written by using an MTCO instruction as shown in Table 14 19 Table 14 19 Writing Performance Registers Using MTCO Opcode 15 11 Opcode 1 0 Operation 11001 00 Move to Performance Control 0 11001 01 Move to Performance Counter 0 11001 10 Mov
358. rtant except that it plays a part in specifying in which space an address resides An address is in user space if it is in useg suseg kuseg xuseg xsuseg or xkuseg see the description of virtual address spaces in Chapter 16 An address is in supervisor space if it is in sseg ksseg xsseg or xksseg and an address is in kernel space if it is in either kseg3 or xkseg Kseg0 kseg1 and kernel physical spaces xkphys are kernel spaces but are not mapped All references use these vectors when the EXL bit is set to 0 in the Status register This exception sets the TLBL or TLBS code in the ExcCode field of the Cause register This code indicates whether the instruction as shown by the EPC register and the BD bit in the Cause register caused the miss by an instruction reference load operation or store operation When this exception occurs the Bad VAddr Context XContext and EntryHi registers hold the virtual address that failed address translation The EntryHi register also contains the ASID from which the translation fault occurred The Random register normally contains a valid location in which to place the replacement TLB entry The contents of the EntryLo register are undefined The EPC register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set
359. ry Cache Interface SCTag 3 2 PIdx 67 Bits SCTag 3 2 of the secondary cache tag contain the primary cache index PIdx The PIdx field contains VA 13 12 which are the two lowest virtual address bits above the minimum 4 Kbyte page size This field is written into the secondary cache tag during a secondary cache refill For each processor initiated secondary cache access the virtual address bits are compared with the PIdx field of the secondary cache tag If a mismatch occurs a virtual coherency condition exists and the value of the PIdx field is used by internal control logic to purge primary cache locations so that all primary cache blocks holding valid data have indices known to the secondary cache This mechanism unlike that of the R4400 processor is implemented in hardware It helps preserve the integrity of cached accesses to a physical address using different virtual addresses an occurrence called virtual aliasing For each external coherency request the PIdx field of the secondary cache tag provides a mechanism to locate subset lines in the primary caches SCTag 1 0 Cache Block State The lower two bits of the secondary cache tag SCTag 1 0 contain the cache block state which can be Invalid Shared CleanExclusive or DirtyExclusive as shown in Table 5 4 Table 5 4 Secondary Cache Tag State Field Encoding SCTag 1 0 State 0 Invalid 1 Shared 2 CleanExclusive 3 DirtyExclusive Since the secondary cac
360. s completely gathered 55 blocks incompletely gathered 55 stores 55 attribute support for 153 buffer 89 92 cache algorithm 53 54 uncached accelerated 240 uncached accelerated cache algorithm 53 55 uncached attribute 240 uncorrectable error 169 detection suppressed 172 flag 90 92 underflow FP 310 unimplemented operation FP 310 uniprocessor system 34 83 arbitration rules 110 unnaming register 373 useg space 318 319 User mode 316 address mapping 318 operations 318 Version 2 0 of January 29 1997 1 18 useg Space 3 19 xuseg space 319 UX bit 248 316 326 V V valid bit 239 Vcc signal 38 210 221 VccPa signal 38 VccPd signal 38 VccQ signal 210 211 214 VecQSC signal 38 210 221 VecQSys signal 38 210 221 vector locations TLB refill 334 vector special interrupt 336 virtual address 187 188 space 317 translation 328 virtual aliasing 67 Virtual Coherency exception 345 virtual memory addresses 328 voltage input maximum 217 reference 217 VPN2 field 245 Vref signal 217 VrefByp signal 38 VrefSC signal 38 212 VrefSys signal 38 212 Vss signal 38 221 VssPa signal 38 VssPd signal 38 Ww W bit 258 Watch exception 189 354 WatchHi register 258 WatchLo register 258 way prediction table secondary cache 64 Wired entries 243 Wired register 238 243 write back protocol 45 and cache operations 189 primary data cache 48 Version 2 0 of January 29 19
361. s 10 changing states 52 clock domain 157 data array 60 data array width 62 description of 51 diagram state 52 Ecc 10 error handling 176 index and tag 51 MIPS R10000 Microprocessor User s Manual 1 3 indexing 62 indexing the data array 62 indexing the tag array 63 interface frequencies 61 sizes 10 specifying block size 60 specifying cache size 60 state diagram 52 states 51 tag 66 tag and data array ECC 60 tag array 60 way prediction 64 way prediction table 63 write back protocol 51 strong ordering example of 16 structure two level 45 Cache Barrier CACHE instruction 198 Cache Error exception 171 345 precision 171 prioritization 171 Cache Error handler 171 CACHE instruction support for I O 152 CACHE instructions 172 187 188 287 and a hit in the cache 189 and Address Error exception 189 and CE bit 190 and CH bit 190 and CPO 188 and invalidation 190 and TLB Invalid exception 189 and TLB Refill exception 189 and Watch exception 189 and write back 180 Cache Barrier 198 effect on the uncached buffer 92 Hit Writeback Invalidate 190 Index Hit Invalidate 197 Index Invalidate 192 Index Load Data 201 Index Load Tag 194 195 197 198 199 201 202 Index Store Data 202 Index Store Tag 195 Index Writeback Invalidate 192 op field encoding 191 serial operations 190 Version 2 0 of January 29 1997 unsupported instructions 190 using the physical address 188 using the virtual address 188 cache miss stalls 25
362. s are subject to change without notice Currently this register helps test the ITLB branch caches and the branch prediction scheme In addition it provides choices for branch prediction algorithms to help diagnostic program writing Errata The twelve fields of the Diagnostic register shown in Figure 14 21 are described below All fields are read only all writes are ignored ITLBM this field is a 4 bit read only counter This field is incremented by one for each ITLB miss and any overflow is ignored Its value is undefined during reset and its value is meaningless when used in an unmapped space BSIdx this field defines the entry in the branch stack to be used for the latest conditional branch decoded Its value is meaningless if the latest branch was an unconditional branch DBRC this field disables the use of the branch return cache BRC BRCY this field indicates whether or not the branch return cache BRC is valid BRC has only one entry four instructions BRCW this field indicates whether or not the latest branch JAL JALR RX BGEZAL BGEZALL BLTZAL or BLTZALL caused a write into BRC It is not affected by any other type of branch BRCH this field indicates whether or not the latest branch JR 131 or JALR rx r31 has a BRC hit It not affected by any other type of branch MP this field indicates whether or not the latest conditional branch verified was mispredicted BPMode this is a read write field
363. s rule e SysClk and SysCIk e DCOk All other inputs require a stable Vref SC Sys supply for proper operation The Vref SC Sys source can be a simple voltage divider the actual impedance of this source is not critical since the Vref SC Sys signals are sampled through a low pass filter on the processor Power Supply Regulation The system must provide connections to all of the Vec VccO SC Sys and Vss pins on the processor package The power supply voltages must be held to 5 tolerance at the processor pin connection Maximum Input Voltage Levels Maximum excursion of the input signal due to ringing may reach Vcc 0 5V or Vss 0 5V for periods of less than 10 of the total driven waveform period The R10000 processor includes overshoot clamps by silicon diode protection which limit the overshoot to approximately 500 mV beyond each supply rail MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 218 Chapter 12 Decoupling Capacitance Errata In order to regulate the transient current requirements of a R10000 based system it is suggested that explicit decoupling capacitors be used The R10000 package allows for the following capacitors e eight Vcc Vss e five VecOSC Vss e four VecOSys Vss The package also provides six connections for the PLL power supplies and loop capacitors VccPa VccPd is connected to VssPa VssPd through three decoupling capacitors as shown in Figures 12 1 and 12 2 Th
364. s that the TLB entry is valid otherwise a TLBL or TLBS invalid exception occurs V Valid If this bit is set it indicates that the TLB entry is valid otherwise a TLBL or TLBS invalid exception occurs G Global If this bit is set in both LoO and Lol then the processor ignores the ASID during TLB lookup 0 Reserved Must be written as zeroes and returns zeroes when read MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 240 Version 2 0 of January 29 1997 Chapter 14 The PFN fields of the EntryLo0 and EntryLo1 registers span bits 33 6 of the 40 bit physical address Two additional bits for the mapped space s uncached attribute can be loaded into bits 63 62 of the EntryLo register which are then written into the TLB with a TLB Write During the address cycle of processor double single partial word read and write requests and during the address cycle of processor uncached accelerated block write requests the processor drives the uncached attribute on SysAD 59 58 The same EntryLo registers are used for the 64 bit and 32 bit addressing modes In both modes the registers are 64 bits wide however when the MIPS IILISA is not enabled 32 bit User and Supervisor modes only the lower 32 bits of the EntryLo registers are accessible MIPS III is disabled when the processor is in 32 bit Supervisor or User mode Loading of the integer registers is limited to bits 31 0 sign extended through bits 63 32
365. s used to read write ECC from to the secondary cache tag SSRAM vee Secondary cache tag output enable COE A signal that enables the outputs of the secondary cache tag SSRAM apil SCTWr Secondary cache tag write enable A signal that enables writing the secondary cache tag SSRAM SCTCS Secondary cache tag chip select A signal which enables the secondary cache tag SSRAM Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Interface Signal Descriptions 41 3 3 System Interface Signals Table 3 3 presents the R10000 processor System interface signals Table 3 3 System Interface Signals Signal Name Description Type SysClk SysClk System Clock Signals System clock Complementary system clock input Input SysClkRet SysClkRet System clock return Complementary system clock return output used for termination of the system clock Output SysReq System Arbitration Signals System request The processor asserts this signal when it wants to perform a processor request and it is not already master of the System interface Output SysGnt System grant The external agent asserts this signal to grant mastership of the System interface to the processor Input SysRel System release The master of the System interface asserts this signal for one SysClk cycle to indicate that it will relinquish mastership of the System interface in the following SysClk cycle S
366. s while power and SysClk are stable The Cold Reset sequence is as follows e The external agent negates SysGnt and SysRespVal e After waiting at least one SysClk cycle the external agent asserts SysReset e After waiting at least 100 ms the external agent loads the mode bits into R10000 This is performed by driving the mode bits on SysAD 63 0 waiting at least two SysClk cycles and then asserting SysGnt for at least one SysClk cycle e After waiting at least another 100 ms for the internal clocks to restabilize the external agent synchronizes all processor internal clocks by asserting SysRespVal for one SysClk cycle e After waiting at least 100 ms for the internal clocks to again restabilize a third 100 ms restabilization period the external agent negates SysReset e The external agent must retain mastership of the System interface refrain from issuing external requests or nonmaskable interrupts and ignore the system state bus until the processor asserts SysReq The assertion of SysReq indicates the processor is ready for operation In a cluster arrangement all processors must assert SysReq indicating they are ready for operation During a Cold Reset sequence all processor internal state is initialized A Cold Reset sequence causes the processor to start with a Reset exception Figure 8 2 shows the cold reset sequence Vcc VccQ SC Sys Vref SC Sys Vcc Pa Pd SysCIk DCOk i i i Master LX CX 0X SysReset Sys
367. se Protocol MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 132 External Request Protocol Errata Version 2 0 of January 29 1997 Chapter 6 An external agent issues an external request when it requires a resource within the processor The external agent refers to any device attached to the processor system interface It may be memory interface or cluster coordinator ASIC or another processor residing on the cluster bus An external agent may only issue an external request to the processor when the System interface is in slave state If the System interface is not already in slave state the external agent must first negate SysGnt and then wait for the processor to assert SysRel If the System interface is already in slave state the external agent may issue an external request immediately The total number of outstanding external requests including interventions allocate request numbers and invalidates cannot exceed eight External requests may be accepted by the processor in adjacent SysClk cycles External intervention and invalidate requests are considered external coherency requests MIPS R10000 Microprocessor User s Manual System Interface Operations 133 External Intervention Request Protocol An external agent issues an external intervention request to obtain a Shared or Exclusive copy of a secondary cache block An external agent issues an external intervention request with
368. secondary cache is used to access the secondary cache e PA O0 is used to specify the way needed by Index operations Since PA 0 is used to indicate the way during CACHE Index operations alignment errors are suppressed When accessing data in the secondary cache PA Blocksize 1 3 is also used to read or write a specific doubleword If the CPO is not usable if not in Kernel mode CUO must be set in the Status register for CPO to be usable a Coprocessor Unusable exception is taken MIPS R10000 Microprocessor User s Manual CACHE Instructions 189 TLB Refill and TLB Invalid Exceptions on CacheOps TLB Refill and TLB Invalid exceptions can occur on any operation For Index operations where the address virtual address for the primary caches physical address for the secondary cache is used to index the cache but need not match the cache tag unmapped addresses may be used to avoid TLB exceptions The operation never causes TLB Modified exceptions Hit Operation Accesses A Hit operation accesses the specified cache as a normal data reference and performs the specified operation if the cache block contains valid data at the specified physical address a hit The operation is undefined if a CacheOp hit occurs in both ways of the cache Watch Exception There is no Watch exception for CacheOps Address Error Exception During an Index CacheOp bit 0 is not checked for an Address Error exception since this bit is used as
369. sequence 69 write sequence 74 599CLGA see CLGA 64 bit address space 317 mode TLB entry format 329 8 word cache refill read sequence 70 write sequence 75 A AC electrical specifications 215 asynchronous inputs 216 delay time 216 hold time 216 maximum operating conditions 215 setup time 216 test specification 215 timing secondary cache 215 System interface 215 access privileges address space 326 ACK completion response 130 ACK signal 90 active list definition of 371 add unit FPU 301 MIPS R10000 Microprocessor User s Manual IH address encodings mode 317 Kernel mode 322 mapping Kernel mode 322 Supervisor mode 320 User mode 318 mode 317 page 328 physical 187 queue 6 12 instruction graduation 12 issue ports 12 number of entries 12 number of instructions written per cycle 12 organized as FIFO 12 sequencing 12 space access privileges 326 kernel 317 supervisor 317 user 317 virtual 317 Supervisor mode 320 translation 330 User mode 318 virtual 187 Address Error exception 340 Address Space Identifier see also ASID 330 address data bus signals 41 AGEL indication 340 AdES indication 340 algorithms cache five types of 53 57 aliasing virtual 67 allocate request number requests external 134 ALU arithmetic logic unit No 1 18 No 2 18 Version 2 0 of January 29 1997 ALU1 9 11 ALU2 9 11 ANDES Architecture with aoe Dynamic Execution Scheduling 4 375 a
370. ses Sparse encoding Primary data cache state mod array Secondary cache tag ECC SECDED Secondary cache data System interface address data bus Parity Parity is used to protect the primary caches and various System interface buses The processor uses both odd and even parity schemes e in an odd parity scheme the total number of ones on the protected data and the corresponding parity bit should be odd e in an even parity scheme the total number of ones on the protected data and the corresponding parity bit should be even Sparse Encoding A sparse encoding is used to protect the primary data cache state mod array In such a scheme valid encodings are chosen so that altering a single bit creates an invalid encoding ECC An error correcting code ECC is used to protect the secondary cache tag the secondary cache data and the System interface address data bus A distinct single bit error correction and double bit error detection SECDED code is used for each of these three applications MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 174 Chapter 9 9 9 Primary Instruction Cache Error Protection and Handling Error Protection Error Handling Version 2 0 of January 29 1997 This section describes error protection and error handling schemes for the primary instruction cache The primary instruction cache arrays have the following error protection schemes as listed in Table 9 2
371. shared A miss to a particular address goes through either TLB Refill or XTLB Refill depending on the source of the reference There can be only be a single page table unless the refill handlers execute address deciphering and page table selection in software NOTE Refills for the 0 5 Gbyte supervisor mapped region sseg ksseg are controlled by the value of KX rather than SX This simplifies control of the processor when supervisor mode is not being used MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 334 Chapter 17 Table 17 2 lists the TLB refill vector locations based on the address that caused the TLB miss and its corresponding mode bit Space Address Range Table 17 2 TLB Refill Vectors Regions Exception Vector Kernel xFFFF FFFF E000 to XFFFF FFFF FFFF kseg3 Refill KX 0 or XRefill KX 1 Supervisor Kernel Supervisor OxFFFF FFFF C000 to OxFFFE FEES DER 0000 0000 to OFFE FFFF 0000 0000 Lo OFFF FFFF sseg ksseg xsseg xksseg Refill KX 0 or XRefill KX 1 XRefill KX 1 XRefill SX 1 User User Version 2 0 of January 29 1997 0000 8000 to OFFF FFFF 0000 0000 to 0000 7FFF MIPS R10000 Microprocessor User s Manual xsuseg xuseg xkuseg useg xuseg suseg xsuseg kuseg xkuseg XRefill UX 1 Refill UX 0 or XRefill UX 1 CPU Exceptions 335 Priority of Exc
372. shed by adding a value of 4 to the EPC register EPC register 4 before returning If a BREAK instruction is in a branch delay slot interpretation of the branch instruction is required to resume execution MIPS R10000 Microprocessor User s Manual CPU Exceptions 351 Reserved Instruction Exception Cause The Reserved Instruction exception occurs when one of the following conditions occurs e an attempt is made to execute an instruction with an undefined major opcode bits 31 26 anattempt is made to execute a SPECIAL instruction with an undefined minor opcode bits 5 0 anattempt is made to execute a REGIMM instruction with an undefined minor opcode bits 20 16 anattempt is made to execute 64 bit operations in 32 bit mode when in User or Supervisor modes anattempt is made to execute a COP1X when the MIPS IV ISA is not enabled 64 bit operations are always valid in Kernel mode regardless of the value of the KX bit in the Status register This exception is not maskable Processing The common exception vector is used for this exception and the RI code in the Cause register is set The EPC register contains the address of the reserved instruction unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction Servicing No instructions in the MIPS ISA are currently interpreted The process executing at the time of this exception is handed a UNIX SIG
373. shown in Figure A 1 Instruction fetching is the process of reading instructions from the instruction cache Instruction decode includes register renaming and initial dependency checks For branch instructions the branch path is predicted and the target address is computed e An instruction is issued when it is handed over to a functional unit for execution e An instruction is complete when its result has been computed and stored in a temporary physical register e An instruction graduates when this temporary result is committed as the new state of the processor An instruction can graduate only after it and all previous instructions have been successfully completed In order In order A BIRD g Out of order Instruction Decode Graduate LI Time Figure A 1 Dynamic Scheduling The R10000 processor s active list is a program order list of decoded instructions For each instruction the active list indicates the physical register which contained the previous value of the destination register if any If this instruction graduates that previous value is discarded and the physical register is returned to the free list The active list records status such as those instructions that have completed or those instructions that have detected exceptions Instructions are appended to the bottom of the list as they are decoded and instructions are removed from the top as they graduate MIPS R10000 Micropro
374. sly Note that the empty cycle is considered the issue cycle for a processor coherency data response If the System interface is not already in master state the processor must first assert SysReq and then wait for the external agent to relinquish mastership of the System interface bus by asserting SysGnt and SysRel If the System interface is already in master state the processor may issue a processor coherency data response immediately MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 140 Chapter 6 Errata When SysStateVal is negated SysState 0 provides the processor coherency data response indication The processor asserts the processor coherency data response indication when there are one or more processor coherency data responses pending issue in the outgoing buffer Once asserted the indication is negated when the first doubleword of the last pending issue processor coherency data response is issued to the system interface bus The processor coherency data response indication is not affected by SysWrRdy However as previously noted the processor may only issue a processor coherency data response when SysWrRdy was asserted two SysClk cycles previously Processor coherency data response data is supplied in subblock order beginning wi
375. sor fetches and decodes four instructions in parallel each cycle or pipeline stage Each pipeline includes stages for fetching stage 1 in Figure 1 4 decoding stage 2 issuing instructions stage 3 reading register operands stage 3 executing instructions stages 4 through 6 and storing results stage 7 7 Pipeline Stages Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Fetch Decode Issue Execute Execute Execute Store FP Add Pipeline FP Queue FAdd 1 FAdd 2 FAdd 3 Floating Point Queue FMpy 1 FMpy 2 FMpy 3 Result and Registers Pus Integer Register Operands Adar Calc FP Multiply Pipeline FP Queue Integer ALU Pipeline E xecution Integer Queue Pipelines Integer ALU Pipeline Integer Queue Load Store Pipeline Address Queue Data Cache Result 2 way Interleaved Cache Read operands from Floating Point Translation Lookaside Buffer Instruction Fetch Pipeline p or Integer Register Files Primary Decode Instruction u Branch Unit B Branch Address one branch can be handled each cycle am M MM Jie 4 Instruction Cycle Fetch and Decode Functional Units Execute Instruction Figure 1 4 Superscalar Pipeline Architecture in the R10000 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 Instruction Queues Execution Pipelines Chapter 1 As shown in Figure 1 4 each instruction decoded in stage 2 is appended to one of
376. spPar SC A B DCS CS SysRespVal SC A B DOE OE Version 2 0 of January 29 1997 86 Chapter 6 6 9 System Interface Requests and Responses Processor Requests Version 2 0 of January 29 1997 The System interface supports the following processor request e external response e external request processor response The following sections describe these request and response types and their operations Processor requests are generated by the processor when it requires a system resource The following processor requests are supported e coherent block read shared request e coherent block read exclusive request e noncoherent block read request e double single partial word read request e block write request e double single partial word write request e upgrade request eliminate request Processor write and eliminate requests do not require or expect a response by the external agent However if an external agent detects an error in a processor write or eliminate request it may use an interrupt to signal the processor It is not possible to generate precise exceptions for processor write and eliminate requests for which an external agent detects an error Processor read and upgrade requests require some type of response by the external agent MIPS R10000 Microprocessor User s Manual System Interface Opera
377. ss VA 43 0 region bits VA 63 59 If a region is mapped virtual addresses are translated in the TLB Bits VA 58 44 are not translated in the TLB and are sign extensions of bit VA 43 In both 32 bit and 64 bit address mode the memory address space is divided into many regions as shown in Figure 16 3 Each region has specific characteristics and uses The user can access only the useg region in 32 bit mode or xuseg in 64 bit mode as shown in Figure 16 1 The supervisor can access user regions as well as sseg in 32 bit mode or xsseg and csseg in 64 bit mode shown in Figure 16 2 The kernel can access all regions except those restricted because bits VA 58 44 are not implemented in the TLB as shown in Figure 16 3 The R10000 processor follows the R4400 implementation for data references only ensuring compatibility with the NT kernel If any of the upper 33 bits are nonzero for an instruction fetch an Address Error is generated Refer to Table 16 2 for delineation of the address spaces MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 318 Chapter 16 User Mode Operations In User mode a single uniform virtual address space labelled User segment is available its size is 2 Gbytes 23 bytes in 32 bit mode useg e 16 Tbytes oF bytes in 64 bit mode xuseg Figure 16 1 shows User mode virtual address space 32 bit 64 bit KSU 105 and KSU 105 and EXL 0 and EXL 0 and ERL 0 and E
378. ssertion of IP 7 when the associated counter overflows e The U S K and EXL bits indicate the processor modes in which the event is counted U is user mode S is supervisor mode K is kernel mode when EXL and ERL both are set to 0 the system is in kernel mode and handling an exception when EXL is set to 1 as shown in Table 14 22 Version 2 0 of January 29 1997 266 Errata Chapter 14 e 0 Reserved Must be written as zeroes and returns zeroes when read These modes can be set individually for example one could set all four bits to count a certain event in all processor modes except during a cache error exception In describing the rules that are applied for the counting of each events listed in Table 14 18 following terminology is used Done is defined as the point at which the instruction is successfully executed by the functional unit but is not yet graduated Graduated is defined as the point in time when the instruction is successfully executed done and it is the oldest instruction Secondary Cache Transaction Processing SCTP logic is on chip logic in which up to four internally generated and one externally generated secondary cache transactions are queued to be processed The following rules apply for the counting of each event listed in Table 14 16 Event 0 for Counter 0 and Counter 1 Cycles Version 2 0 of January 29 1997 The counter is incremented on each
379. ssor waits for the assertion of SysGnt When the processor observes the assertion of SysGnt it negates SysReq two SysClk cycles later Once the processor asserts SysReq it does not negate SysReq until the assertion of SysGnt even if the need for the System interface bus is contravened by an external coherency request When the processor observes the assertion of SysRel it enters master state two SysClk cycles later and begins to drive the System interface bus SysRel may be asserted coincidentally with or later than SysGnt Once in master state the processor does not relinquish mastership of the System interface until it observes the negation of SysGnt The processor indicates it is relinquishing mastership of the System interface bus by asserting SysRel for one SysClk cycle two or more SysCIk cycles after the negation of SysGnt The processor may issue any type of processor request or coherency data response in the two SysClk cycles following the negation of SysGnt This may delay the assertion of SysRel Version 2 0 of January 29 1997 110 Chapter 6 Uniprocessor System Cycle SysCIk Master SysReq SysGnt SysRel SysCmd 11 0 SysVal Version 2 0 of January 29 1997 Po Pg Pg Po Py Po Po Po Po Po Po Po man i ung ah Figure 6 6 shows how the System interface arbitration signals are used in a uniprocessor system Note that this same configuration would be used in a mu
380. sssssseisssseeeeeeenenee tenente nennen tenete tenen 58 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents xiii 5 Secondary Cache Interface Tag and Data ATraystacsociesan eae o ari E e ER EORR HR e ieri opes 60 Secondary Cache Interface Frequencies sernir notisie iii risista 61 Secondary Cache Indexing tenente nnne 62 Indexing the Data Array aiit e i een e EE aE E AEE KATE AE E TEAOR NENE 62 Indexing the Tag Array onie e ee i a E RARE 63 Secondary Cache Way Prediction Table c ccccccccesssesestenetesesesneneescecenesesssnanseseseseeessneseees 64 Secondary Cache Tag ass esee edet iet ied ende tse ev de sae ese dt 66 SCTae Q25 4 Physical Tag acess ehe neben eed aee pid tees 66 STAG Q2 Pla Xini e e eE hate rb a Dualo sst nidi tn Ra NOR UM Ue V 67 SCTag 1 0 Cache Block State sse ee eee eee nnn 67 Read Sequences eene en eee p Te hd n e te eed 68 4 Word Read Sequence esep ie eene k ene e araa ae EPE aaa E a e Ga 69 5 Wotd Read Sequence caeco ttp AEE SEE a Aa AER 70 16 or 32 Word Read Seguente ohte ike oeie a eE EAS E eaa 71 Tag Read Sequence o denote ne etie da aaa ding tah ae fedi edi ds 72 Write Seq ences xo Rendite eet aulet iE Adis sta ee Sata ete tust Gelbe it pepe npe 73 4 Word Write Segu nce is ssi oieee ea oeda tenen nnne nennen tenerent 74 8 Word Write Sequence ees tegere ire ee eripe tie hee ihid ee irte desine deban 75 16 or 32 Word Write SegUence sosi a a e
381. st last data type indication MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 118 Chapter 6 The processor may negate SysVal between data cycles of a processor block write request only if the SCCIK frequency is less than half of the SysClk frequency The processor may only issue a processor block write request address cycle when the following are true e the System interface is in master state e SysWrRdy was asserted two SysClk cycles previously e the processor is not the target of a conflicting outstanding external coherency request Figure 6 12 depicts two adjacent processor block write requests Since the System interface is initially in slave state the processor must first assert SysReq and then wait until the external agent relinquishes mastership of the System interface by asserting SysGnt and SysRel Cycle 112131455 SysCIk j Master j EA EA EA j EA SysReq SysGnt SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysRdRdy SysWrRdy SysState 2 0 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Figure 6 12 Processor Block Write Request Protocol Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 119 Processor Double Single Partial Word Write Request Protocol A processor double single partial word write request results from an uncached store or incompletely gathered uncached acceler
382. ster is interpreted as a revision number and the high order byte bits 15 8 is interpreted as an implementation number The implementation number of the R10000 processor is 0x09 The content of the high order halfword bits 31 16 of the register are reserved The revision number is stored as a value in the form y x where y is a major revision number in bits 7 4 and x is a minor revision number in bits 3 0 The revision number can distinguish some chip revisions however there is no guarantee that changes to the chip will necessarily be reflected in the PRid register or that changes to the revision number necessarily reflect real chip changes For this reason software should not rely on the revision number in the PRId register to characterize the chip MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 256 14 14 Config Register 16 The R10000 processor s Config register has a different format from that of the R4400 since the R10000 processor has different mode bits and configurations however some fields are still compatible KO DC IC and BE The value of bits 24 0 are taken directly from the Mode bit settings during a reset sequence refer to Table 8 1 for these bit definitions Table 14 15 shows the R10000 Config register fields along with values which are hardwired into the register at boot time Figure 14 16 shows the Config register format Version 2 0 of January 29 1997 Chapter 14
383. t Reset and Cache Error exceptions 1 interrupt pending 0 no interrupt IP ExcCode Exception code field see Table 14 13 0 Reserved Must be written as zeroes and returns zeroes when read Cause Register 31 30 29 28 27 16 15 9 758 25 0 BD 0 CE 0 iB s po V6 E rg 1 1 2 12 8 1 5 2 Figure 14 13 Cause Register Format Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Coprocessor 0 253 Table 14 13 Cause Register ExcCode Field Exception s md Mnemonic Description Code Value 0 Int Interrupt 1 Mod TLB modification exception 2 TLBL TLB exception load or instruction fetch 3 TLBS TLB exception store 4 AdEL Address error exception load or instruction fetch 5 AdES Address error exception store 6 IBE Bus error exception instruction fetch 7 DBE Bus error exception data reference load or store 8 Sys Syscall exception w 11 CpU Coprocessor Unusable exception 12 Ov Arithmetic Overflow exception Tr i m recepi SSCS 14 Reserved 15 FPE Floating Point exception 1622 Reserved 23 WATCH Reference to WatchHi WatchLo address 24 30 Reserved 31 Reserved MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 254 Chapter 14 14 12 Exception Program Counter 14 Version 2 0 of January 29 1997 The Exception Program Counter EPC is a read write register that contains the address a
384. t also drives the proper ECC on the SysADChk 7 0 bus Processor in Slave State Whenever the processor is in slave state error checking is enabled with the assertion of SysCmd 0 and an external agent asserts Sys Val to indicate it is driving valid information on the SysAD 63 0 bus the processor checks the SysADChk 7 0 bus for the proper ECC Correctable Error Detected If a correctable error is detected during an external address cycle or during an external data cycle for a processor read or upgrade request originated by the R10000 processor correction is automatically performed in line without affecting latency The processor asserts SysCorErr for one SysClk cycle to inform the external agent that a correctable error has been detected and corrected Uncorrectable Error Detected Errata Version 2 0 of January 29 1997 If an uncorrectable error is detected during an external address cycle the processor ignores the SysCmd 11 0 and SysAD 63 0 buses for one SysClk cycle and the System interface unit posts a Cache Error exception and sets the SA bit in the local CacheErr register Additionally the processor informs the external agent by asserting SysUncErr for one SysCIk cycle Caution By ignoring the SysCmd 11 0 and SysAD 63 0 buses this processor may become unsynchronized with other processors or the external agent on the cluster bus If an uncorrectable error is detected or the data quality indication on
385. t is modified by a Hit Invalidate S or Hit WriteBack Invalidate S CACHE instruction CH is set if there is a hit in the secondary cache and cleared if there is a miss The CH bit can also be modified by a MTCO instruction Serial Operation of CACHE Instructions All CACHE instruction variations are performed serially From the aspect of the primary cache this means CACHE instructions can impede the instruction stream For this reason load store speculation is not allowed beyond a CACHE instruction until the CACHE instruction has graduated All load store accesses including writebacks to the external agent must be complete before the CACHE instruction can graduate and any load store following a CACHE instruction cannot be issued speculatively until the CACHE instruction graduates Uncached operations and instruction fetches are not affected Instructions Not Supported Version 2 0 of January 29 1997 The processor does not support the following CACHE instructions e Create DirtyExclusive e Hit WriteBack Fill I e Hit Set Virtual variations MIPS R10000 Microprocessor User s Manual CACHE Instructions Op Field Encoding 191 Table 10 1 presents the Op field encoding for the CACHE instruction Encodings not listed in this table are undefined Table 10 1 CACHE Instruction Op Field Encoding MIPS R10000 Microprocessor User s Manual Op Field CACHE Instruction Variation T
386. t rate of one request per SysClk cycle and a maximum data rate of one doubleword per SysClk cycle Various flow control mechanisms are provided to limit these rates as described below Processor Write and Eliminate Request Flow Control The processor can only issue a processor write or eliminate request if e the System interface is in master state e SysWrRdy was asserted two SysClk cycles previously Processor Read and Upgrade Request Flow Control The processor can only issue a processor read or upgrade request if e the System interface is in master state e SysRdRdy was asserted two SysClk cycles previously the maximum number of outstanding processor requests specified by the PrcReqMax mode bits is not exceeded e there is a free request number Processor Coherency Data Response Flow Control The processor can only issue a processor coherency data response if e the System interface is in master state e SysWrRdy was asserted two SysClk cycles previously External Request Flow Control When the System interface is in Slave state it is capable of accepting external requests An external agent may issue external requests in adjacent SysClk cycles External Data Response Flow Control Since the processor has an incoming buffer an external agent may supply external data response data in adjacent SysCIk cycles without regard to cache bandwidth or internal resource availability MIPS R10000 Microprocessor User s Manual Ve
387. t to Push pull 30 configure select System interface Open drain bidirectional and output signals P as open drain CTM 1 31 Specifies whether or not to enable Disable Enable cache test mode 63 32 Reserved t Does not include the output buffer delay SysReq SysRel SysCmd 11 0 SysCmdPar SysAD 63 0 SysADChk 7 0 SysVal SysState 2 0 SysStatePar SysStateVal SysCorErr SysUncErr Errata The description of bits 28 25 of Table 8 1 has been revised Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 9 Error Protection and Handling This chapter presents the error protection and handling features provided by the R10000 processor Two types of errors can occur in an R10000 system e correctable e uncorrectable The following two sections describe them MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997167 168 Chapter 9 9 1 Correctable Errors Correctable errors consist of e secondary cache tag array correctable ECC errors e secondary cache data array correctable ECC errors e System interface address data bus correctable ECC errors When the processor detects a correctable error the error is automatically corrected and normal operation continues Secondary cache array scrubbing is not performed The processor informs the external agent that a correctable error was detected and then corrected by asserting the SysCorErr signal for one SysClk cy
388. t which processing resumes after an exception has been serviced For synchronous exceptions the EPC register contains either e the virtual address of the instruction that was the direct cause of the exception or e the virtual address of the immediately preceding branch or jump instruction when the instruction is in a branch delay slot and the Branch Delay bit in the Cause register is set The processor does not write to the EPC register when the EXL bit in the Status register is set to a 1 Figure 14 14 shows the format of the EPC register EPC Register 63 0 EPC 64 Figure 14 14 EPC Register Format t The ErrorEPC register provides a similar capability described later in this chapter MIPS R10000 Microprocessor User s Manual Coprocessor 0 255 14 13 Processor Revision Identifier PRId Register 15 The 32 bit read only Processor Revision Identifier PRId register contains information identifying the implementation and revision level of the CPU and CPO Figure 14 15 shows the format of the PRId register Table 14 14 describes the PRId register fields PRid Register 31 1615 8 7 0 0 Imp 0x09 Rev 16 8 8 Figure 14 15 Processor Revision Identifier Register Format Table 14 14 PRId Register Fields Field Description Imp Implementation number Rev Revision number Reserved Must be written as zeroes and returns zeroes when read The low order byte bits 7 0 of the PRId regi
389. ted by TLB Read TLBR or TLB Write Index TLBWI instructions Figure 14 1 shows the format of the Index register Table 14 2 describes the Index register fields Index Register 31 30 6 5 0 BL ree 1 25 6 Figure 14 1 Index Register Table 14 2 Index Register Field Descriptions Field Description P Probe failure Set to 1 when the previous TLBProbe TLBP instruction was unsuccessful Index Index to the TLB entry affected by the TLBRead and TLBWrite instructions 0 Reserved Must be written as zeroes and returns zeroes when read MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 238 Chapter 14 14 2 Random Register 1 Version 2 0 of January 29 1997 The Random register is a read only register of which six bits index an entry in the TLB This register decrements when any instruction graduates at that particular cycle and its values range between an upper and a lower bound as follows e The lower bound is set by the number of TLB entries reserved for exclusive use by the operating system the contents of the Wired register e The upper bound is set by the total number of TLB entries minus 1 64 1 maximum The Random register specifies the entry in the TLB that is affected by the TLB Write Random instruction The register does not need to be read for this purpose however the register is readable to verify proper operation of the processor To simplify testing the R
390. ten to point to the way opposite to that being invalidated If the processor Eliminate Request mode bit PrcElmReq is set a processor eliminate request is sent to notify the external agent that a block in the secondary cache has been invalidated Hit Invalidate S sets the CH bit if it hits in the secondary cache Once the CH bit is set it stays set until cleared by a MTCO instruction or the next CacheOp that can change the CH bit Hit CacheOps can cause cache error exceptions if they check ECC or parity bits Cache Barrier does not change any cache fields It is used when serialization of a CACHE instruction is needed without unwanted side effects For more information see the section titled the section titled Serial Operation of CACHE Instructions in this chapter MIPS R10000 Microprocessor User s Manual CACHE Instructions 199 10 15 Hit Writeback Invalidate D Hit Writeback Invalidate D invalidates an entry in the primary data cache which matches the PA of the CACHE instruction In addition it writes back to the secondary cache any DirtyExclusive or Inconsistent data found in the primary data cache Both way DTags at VA 13 5 are read from the data cache If the DState is not equal to 00 Invalid and PA of the CACHE instruction matches the DTag then the DState bits of the entry are set to 00 Invalid the SCWay is set to 0 the DState parity is set to 0 and the StateMod bits are set to 001 Normal The LRU bit
391. ter is not implemented FGR 3 Register 3 e e e e e e 63 32 31 0 63 0 FGR 31 FGR 30 FGR 30 Register 430 63 0 Register is not implemented FGR 31 Register 31 Figure 15 2 Floating Point Registers Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Floating Point Unit 305 Load and Store Operations When FR 0 floating point load and stores operate as follows e A doubleword load or store is handled the same as if the FR bit was set to 1 as long as the register selected is even 0 2 4 etc e If the register selected is odd the load store is invalid These operations are shown in Figure 15 3 Singleword loads stores to even and odd registers are also shown FR 0 16 Register Mode Doubleword Load Store Same as FR 1 if register is even else invalid Singleword Load Store when Register is Even Singleword Load Store when Register is Odd LWC1 ft address MTC1 ft rs LWC1 ft address MTC1 ft rs 0 63 32 31 0 Unchanged Load 32 bit SWC1 ft address MFC1 rt fs 63 Load 32 bit Unchanged SWC1 ft address MFC1 rt fs 638 LLL 32 31 63 5 03 0 tMove to from selects an integer register instead tMove to from selects an integer register instead Moved 32 bit data is sign extended in 64 bit register Moved 32 bit data is sign extended in 64 bit register Figure 15 3 Loading and Storing Floating Point Registers in 16 Register Mode NOTE
392. ter state the external agent must first negate SysGnt and then wait until the processor relinquishes mastership of the System interface by asserting SysRel PPP Sa Be hb Bo 19 10 11 12 13 14 15 46 17 a GRO aS an a Ge ee Go em sO a es oe te ce a AA Po Po Po Po Po EA EA EA EA EA EA EA EA EA EA 5 EA NEN i i NY i i gt K_Int X It X Int X Sica CATY a oo oe Figure 6 23 External Interrupt Request Protocol MIPS R10000 Microprocessor User s Manual System Interface Operations 137 Processor Response Protocol Processor responses are supplied by the processor in response to external coherency requests that target the processor The R10000 processor issues a processor coherency state response for each external coherency request that targets the processor The processor issues a processor coherency data response for each external intervention request that targets the processor and hits a DirtyExclusive secondary cache block Processor coherency state responses are issued by the processor in the same order that the corresponding external coherency requests are received Processor coherency state and data responses may occur in adjacent SysClk cycles MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 Chapter 6 Processor Coherency State Response
393. terrupts 105 master state 81 multiprocessor connections with cluster bus 85 with dedicated external agents 84 outgoing buffer 91 outstanding processor requests 87 outstanding requests on the System interface 87 port 4 processor block read request protocol 113 processor coherency data response protocol 139 processor coherency state response protocol 138 processor double single partial word read request protocol 115 processor double single partial word write request protocol 119 processor eliminate request protocol 123 processor request 80 86 flow control protocol 125 protocol 112 processor response 80 87 protocols 137 processor upgrade request protocol 121 register to register operation 80 request 86 cycle 80 number field 87 protocol 112 response 86 cycle 80 protocol 112 signals 41 81 slave state 81 split transaction 87 support for I O 152 uncached attribute 153 uncached buffer 92 uniprocessor connections 83 SysUncErr signal 42 169 170 174 175 179 SysVal signal 42 113 115 117 119 121 123 127 129 133 134 135 136 139 181 360 364 365 366 367 MIPS R10000 Microprocessor User s Manual Index SysWrRdy signal 41 118 119 123 125 139 and flow control 93 T table busy bit 372 mapping 375 tag bus secondary cache SCTag 66 tag read sequence 72 tag write sequence 77 TagHi register 69 74 278 TagLo register 69 74 278 tags external duplicate 152 TAP controller 204 205
394. th an external agent Such a system is shown in Figure 2 1 The external agent is typically an ASIC that provides a gateway to the memory and I O subsystems in fact this ASIC may incorporate the memory controller itself If hardware I O coherency is desired the external agent may use the multiprocessor primitives provided by the processor to maintain cache coherency for interventions and invalidations External duplicate tags can be used by the external agent to filter external coherency requests Secondary Cache Secondary Cache Interface R10000 System Interface l Duplicate External Tags Agent To Other System Resources Figure 2 1 Uniprocessor System Organization MIPS R10000 Microprocessor User s Manual System Configurations 2 2 Multiprocessor Systems Two types of multiprocessor systems can be implemented with R10000 processor a dedicated external agent interfaces with each R10000 processor up to four R10000 processors and an external agent reside on a cluster bus Multiprocessor Systems Using Dedicated External Agents A multiprocessor system may be created with R10000 processors by providing a dedicated external agent for each processor such a system is shown in Figure 2 2 The external agent provides a path between the processor System interface and some type of coherent interconnect In such a system the processor provides support for three coherency schemes Secondary Cac
395. th the quadword aligned address specified by the corresponding external coherency request Processor coherency data responses are not necessarily issued in the same order as the external coherency requests however each processor coherency data response always follows the corresponding processor coherency state response Note that more than one processor coherency state response may be pipelined ahead of the corresponding processor coherency data responses Figure 6 25 depicts one external coherency request and the resulting processor coherency state and data responses Cycle 63728 9 10 117 12 13 14 15 16 17 SysCik Bo ee ated et pe s eat anaes anne Master EA EAL EA Pai Pai fhi Po i Poi Py Poi Py SysReq X 7 SysGnt ij SysRel j j SysCmd 11 0 j S XEniplyyRspDa SysCmdPar gt SysAD 63 0 gt SysADChk 7 0 i gt SysVal j N SysRdRdy SysWrRdy SysState 2 0 X 1 SysStatePar SysStateVal SysResp 4 0 SysRespPar SysRespVal Figure 6 25 Processor Coherency Data Response Protocol Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 141 6 18 System Interface Coherency The System interface supports external intervention shared intervention exclusive and invalidate coherency requests These requests are used by an external agent or other R10000 processors on the cluster bus to maintain cache coherency
396. the Status register which is set to 1 SR bit of the Status register which is set to 1 on Soft Reset or an NMI 0 for a Cold Reset BEV bit of the Status register which is set to 1 TS bit of the Status register which is set to 0 PC is set to the reset vector OXFFFF FFFF BFCO 0000 clears any pending Cache Error exceptions A Soft Reset exception is intended to quickly reinitialize a previously operating processor after a fatal error It is not normally possible to continue program execution after returning from this exception since a SysReset signal can be accepted anytime MIPS R10000 Microprocessor User s Manual CPU Exceptions 339 NMI Exception Cause The NMI exception is caused by assertion of the SysNMI signal An NMI exception is not maskable In R4400 processor there is no way for software to differentiate between a Soft Reset exception and an NMI exception In the R10000 processor a bit labelled NMI has been added to the Status register to distinguish between these two exceptions Both Soft Reset and NMI exceptions set the SR bit and use the same exception vector During an NMI exception the NMI bit is set to 1 during a Soft Reset the NMI bit is set to 0 Processing When an NMI exception occurs the SR bit of the Status register is set distinguishing this exception from a Cold Reset exception An exception caused by an NMI is taken at the instruction boundary It does not abort any state machines preserving
397. the Way indicator bit and may be non zero Bit 1 of an Index CacheOp can still generate an Address Error exception if it is not set to zero For all remaining CacheOps the low order two bits of the instruction must be set to zero or else they will generate an Address Error exception A CacheOp is never checked for alignment Address Error exceptions only for privilege type Address Error exceptions Write Back Write back from the primary data cache goes to the secondary cache Write back from a secondary cache always goes to the System interface unit A secondary write back always writes the most recent data the primary data cache must be interrogated and any dirty inconsistent data written back to the secondary cache before the secondary block is written back to the system interface unit The address to be written is specified by the cache tag and not the translated PA MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 190 Invalidation CE Bit CH Bit Chapter 10 When a block is invalidated in the secondary cache all subset blocks in the primary cache are also invalidated The StateMod bits on invalidated block in the primary data cache are set to 001 Normal during any invalidation The R10000 processor does not support the CE bit The functionality of the CE bit has been replaced by the Index Load Data and Index Store Data instructions The CH bit is supported in the R10000 processor I
398. the need for the processor to flow control the external agent that is providing the external data responses Regardless of the cache bandwidth or internal resource availability the external agent may supply external data response data for all outstanding read and upgrade requests at the maximum System interface data rate The external agent may issue any number of external data responses for a particular request number before issuing a corresponding external completion response An external data response remains in the incoming buffer until a corresponding external completion response is received A former buffered external data response for a particular request number is over written by a subsequent external data response for the same request number An external ACK completion response frees buffered data to be forwarded to the caches and other internal resources while an external NACK or ERR completion response purges any corresponding buffered data For minimum latency the external agent should issue an external ACK completion response coincident with the first doubleword of an external data response External coherency requests that target blocks residing in the incoming buffer are stalled until the incoming buffer data is forwarded to the secondary cache and the instruction that caused the secondary miss is satisfied Each doubleword of the incoming buffer has an Uncorrectable Error flag When an external data response provides a doubleword
399. the state of the processor for diagnosis The Cause register remains unchanged and the system jumps to the NMI exception handler see Table 17 1 An NMI exception preserves the contents of all registers except for e ErrorEPC register which contains the PC e ERL bit of the Status register which is set to 1 e SR bit of the Status register which is set to 1 on Soft Reset or an NMI 0 for a Cold Reset e BEV bit of the Status register which is set to 1 e TS bit of the Status register which is set to 0 e PC is set to the reset vector OXFFFF FFFF BFCO 0000 clears any pending Cache Error exceptions Servicing The NMI can be used for purposes other than resetting the processor while preserving cache and memory contents For example the system might use an NMI to cause an immediate controlled shutdown when it detects an impending power failure It is not normally possible to continue program execution after returning from this exception since an NMI can occur during another error exception MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 340 Address Error Exception Cause Processing Servicing Version 2 0 of January 29 1997 Chapter 17 The Address Error exception occurs when an attempt is made to execute one of the following reference to an illegal address space reference the supervisor address space from User mode e reference the kernel address space from User or Supervisor mode
400. tions 87 External Responses External responses are supplied by an external agent or another processor in response to a processor request The following external responses are supported e block data response e double single partial word data response completion response External Requests External requests are generated by an external agent when it requires a resource within the processor The following external requests are supported e intervention shared request e intervention exclusive request e allocate request number request e invalidate request interrupt request External intervention and invalidate requests require some type of response by the processor Processor Responses Processor responses are supplied by the processor in response to an external request The following processor responses are supported e coherency state response e coherency data response Outstanding Requests and Request Numbers The processor allows requests and corresponding responses to be split transactions which enables additional processor and external requests to be issued while waiting for a prior response The System interface supports a request number field to link requests with their corresponding responses so responses can be returned out of order The processor allows a maximum of eight outstanding requests on the System interface through a 3 bit request number These outstanding requests may be composed of any mix of processor and e
401. tions are undefined In the R10000 processor an execution of a Coprocessor 2 instruction takes a Reserved Instruction exception when CUZ bit is set MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 358 Chapter 17 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual 18 Cache Test Mode The R10000 processor provides a cache test mode that may be used during manufacturing test and system debug to access the following internal RAM arrays e data cache data array e data cache tag array e instruction cache data array e instruction cache tag array e secondary cache way predication table MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997359 360 18 1 Interface Signals Chapter 18 Cache test mode is accessed by using a subset of the system interface signals By not requiring the use of any secondary cache interface signals the internal RAM arrays may be accessed for single chip LGA as well as R10000 secondary cache module configurations The following system interface signals are used during cache test mode e SysAD 57 0 e SysVal Any input signals not listed above are ignored by the processor when it is operating in cache test mode and any output signals not listed above are undefined during cache test mode 18 2 System Interface Clock Divisor Version 2 0 of January 29 1997 Cache test mode is supported for all system interface clock speeds Howeve
402. tiplexed address data bus a 3 bit state output bus a 5 bit response input bus At any time the System interface is either in master or slave state In master state the processor drives the bidirectional System interface signals and is permitted to issue processor requests to the external agent In slave state the processor tristates the bidirectional System interface signals and accepts external requests from the external agent 6 6 Connecting to an External Agent In a uni or multiprocessor system using dedicated external agents the System interface connects to a single external agent In a multiprocessor system using the cluster bus see below the system can connect up to four R10000 processors to an external agent This external agent is referred to as the cluster coordinator MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 82 6 7 Cluster Bus Master R10000 System Interface Processor Request Version 2 0 of January 29 1997 Chapter 6 Ina multiprocessor system using the cluster bus the cluster coordinator performs the cluster bus arbitration and data flow management The arbitration scheme assures that either one of the processors or the cluster coordinator is master at any given time while the remaining devices are slave A processor request issued by the master processor is observed as an external request by all slave R10000 processors as shown in Figure 6 1 S
403. tructions Notes on CACHE Instruction Operations ccccccccseesesssneseseececeseseeceneesesescsneseseecsseseseeeananes 188 Virtaal Address Ld ARE RUE ARR E ie e eee Ee RO A exit eie corde bue eel 188 Physical Addtess enum mede DES 188 CPOINot Us ble 2 ere et e ERR er See nisin sated ten 188 TLB Refill and TLB Invalid Exceptions on CacheOps sss 189 Hit Operation Accesses ise eene diese fete lien eee tfe to tete reete dio ote nd 189 Watch Exception aie osee deo defer atero eal qe ead ed os 189 Address Error Excepto Ainin a eies pipe eea nennen tentent 189 Write Back sais cenit a weenie ae laine Garey a bote e Gatien 189 Inva EL 6 Fi al 0 a eee ance ne Le n ee dee eem t 190 CE Biz eie ie e en hen t ERE 190 CH Bite ai ee t a a ee e Aena 190 Serial Operation of CACHE Instructions ssssssseeseeeeeneneneneneen 190 Instructions Not Supported teet e tide deerit i fe e i te tects 190 Op Field Encoding pee ed netta eeu fita i eei petis 191 Index Inwaliclate I 4 eee teteeeen eene forte teen etae eere ente erae Ra aaria de orae ues 192 Index WriteBack Invalidate D esee eese entente teen E EEE enne 192 Index WriteBack Invalidate S eese esent tne nennen enne 193 Index Load Tag D eene hr eee beetle te md nee Een redes 194 Index Load Tag D icon dete ende etes 194 Index oad Tag B titre rte tte ee etri eie e rond n ner 195 Index Store Lag
404. tual address are 105 the xkphys virtual address space is selected it is a set of eight kernel physical spaces Each kernel physical space contains either one or four 240_byte physical pages References to this space are not mapped the physical address selected is taken directly from bits 39 0 of the virtual address Bits 61 59 of the virtual address specify the cache algorithm described in Chapter 4 the section titled Cache Algorithms If the cache algorithm is either uncached or uncached accelerated values of 2 or 7 the space contains four physical pages access to addresses whose bits 56 40 are not equal to 0 cause an Address Error exception Address bits 58 57 carry the uncached attribute described in Chapter 6 the section titled Support for Uncached Attribute and are not checked for address errors If the cache algorithm is neither uncached nor uncached accelerated the space contains a single physical page as on the R4400 processor In this case access to addresses whose bits 58 40 are not equal to a zero cause an Address Error exception as shown in Figure 16 4 MIPS R10000 Microprocessor User s Manual Memory Management Ox Ox Ox OX Ox OX Ox OX Ox OX Ox OX Ox Ox OX OX Ox Ox Ox OX OX OX OX OX Ox Ox Ox Ox BEOOO1 BEO BEO BDF BCO BCO BCO BBF BAO BAO BAO BOF B 8 B 8 B 8 B7 BO BO BO AF A8 A8 A8 A7 AO A00 A00 e Hilo ojo jo ojo no o wle Alo OF CO ALO GS Ra OJ CO ALO
405. uired The processor supplies the secondary cache tag way on SCTWay Table 5 2 presents the secondary cache tag array index for each secondary cache size it shows each index is composed of a physical address loaded onto SC A B Addr concatenated with SCTWay and SCTagLSBAddr Table 5 2 Secondary Cache Tag Array Index SCSize Secondary Mode 2 Cache Size Bits 0 512 Kbyte SCTWay SC A B Addr 13 3 SCTagLSBAddr 1 Mbyte SCTWay SC A B Addr 14 3 SCTagLSBAddr 2 Mbyte SCTWay SC A B Addr 15 3 SCTagLSBAddr 4 Mbyte SCTWay SC A B Addr 16 3 SCTagLSBAddr 8 Mbyte SCTWay SC A B Addr 17 3 SCTagLSBAddr 16 Mbyte SCTWay SC A B Addr 18 3 SCTagLSBAddr For a system design that only supports a secondary cache block size of 32 words the secondary cache tag array need not use SCTagLSBAddr as an index bit Secondary Cache Tag Array Index OT Bl wl N e MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 64 Chapter 5 5 4 Secondary Cache Way Prediction Table The primary and secondary caches are two way set associative However the implementation of the secondary cache is different than the primary caches The primary caches read simultaneously from two separate tag arrays corresponding to each way in the cache and then select the data based on the result of two parallel tag compares The secondary cache does not use this implementation beca
406. ular First In First Out FIFO buffer A newly decoded load store instruction is written into the next available sequential empty entry up to four instructions may be written during each cycle The FIFO order maintains the program s original instruction sequence so that memory address dependencies may be easily computed Instructions remain in this queue until they have graduated they cannot be deleted immediately after being issued since the load store unit may not be able to complete the operation immediately The address queue contains more complex control logic than the other queues An issued instruction may fail to complete because of a memory dependency a cache miss or a resource conflict in these cases the queue must continue to reissue the instruction until it is completed The address queue has three issue ports e First it issues each instruction once to the address calculation unit This unit uses a 2 stage pipeline to compute the instruction s memory address and to translate it in the TLB Addresses are stored in the address stack and in the queue s dependency logic This port controls two dedicated read ports to the integer register file If the cache is available it is accessed at the same time as the TLB A tag check can be performed even if the data array is busy e Second the address queue can re issue accesses to the data cache The queue allocates usage of the four sections of the cache which consist of the t
407. ultiplier 3 stage Pipeline sss 18 Floating Point Divide and Square Root Units sss 18 Floating Point Adder 3 stage Pipeline sss 18 Integer ALUN 1 stage Pipeline 18 Integer ALU2 1 stage Pipeline sse 18 Address Calculation and Translation in the TLB ssssssssssss 19 Implications of R10000 Microarchitecture on Software ssssssssesseeeennn 20 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Table of Contents ix Superscalar Instruction ISSUE s e siseosas eoo taes eies e E ESEE EEE EA AE SiS 20 Speculative Execution sedet gite iste tete ote e EO IR EAE Ea AS TENERE 21 Side Effects of Speculative Execution sse 21 Nonblocking C aches aetate detiene tended dieci is 25 R10000 Specific CPU Instructions sess netten nennen nnne 26 PREP fo nane te RH UMEN PIRE IS HE RUE 26 BEES cnnetgumtote ee eR Rp NR a pe ves 27 end Gi RA 28 Performance ipea sgh DAR REA uu 28 User Instruction Latency and Repeat Rate sss 29 Other Performance TISSUES iiceoe ER E enne nennen 31 Cache Performance echt eoe tiende tad t Ede E E Poire 31 MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 x Table of Contents System Configurations UniprocessorSystemsussa cenae piene oe Gode dpud Multiprocessor Systems Multiprocessor Systems Using
408. undesired speculative loads portions of the cache referred by the address of the DMA read buffers could be flushed after every DMA transfer from the I O devices Make references to appropriate I O spaces uncached by changing the cache coherency attribute in the TLB Generally arbitrary accesses can be controlled by mapping selected addresses through the TLB However references to an unmapped cached xkphys region could have hazardous affects on I O A solution for this is given below First of all note that the xkphys region is hard wired into cached and uncached regions however the cache attributes for the kseg0 region are programmed through the Config register Therefore clear the KX bit to a zero and set to ones the SX and UX bits in the Status register This disables access to the xkphys region and restricts access to only the User and Supervisor portions of the 64 bit address space In general the system needs either a coherent or a noncoherent protocol but not both Therefore these cache attributes can be used by the external hardware to filter accesses to certain parts of the kseg0 region For instance the cache attributes for the kseg0 address space might be defined in the Config register to be cache coherent while the cache attributes in the TLB for the rest of virtual space are defined to be cached noncoherent or uncached The external hardware could b
409. unis 1 Vss Qus 15 Vss Qd 24 Vss Quan 27 Vss QURE 35 Vss assu 9 Vss Dua 13 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Packaging 229 Table 13 3 cont Signal Location Signal Location Signal Location MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 Chapter 13 230 A34 Al8N3SSV V9166S ONI S3I9010NHO3L SdIN O19 M31S108 NNISIV3H 8Md 31VGONOOOV OL NOILVOISIGON 380033 34908 JO SNOISNAWIG 1vnlov TIM SNOLLWOISIOSdS SYVMGYVH S00 Juv SAONVASIOL Q3l4I03dS 3SIMM3HIO SSFINN Z jlVld 8318108 SSIS H108 VSd NOLdVy S00 S3HONI NI 38V SNOISN3AIG SALON ddd Mt uer y g8Md Lu m l13M9OS V9166GS v9166S QVd 3OVJ83lNI ANISLVSH Se e rr ANISLVSH N0001 90 02v09 N d ONINdS NOISS3HdWOO 8 X QO w b YIWdS ao Z M3HSVM 1V1j ab X 9 SOHS Xv m Figure 13 2 599LGA Assembly Drawing MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 231 Packaging
410. unter is incremented by the number of instructions that were graduated on the previous cycle When an integer multiply or divide instruction graduates it is counted as two instructions Event 2 for Counter 0 Load Prefetch Sync CacheOp Issue Each of these instructions are counted as they are issued A load instruction is only counted once even though it may have been issued more than one time Event 2 for Counter 1 Load Prefetch Sync CacheOp Graduation Each of these instructions are counted as they are graduated Up to four loads can graduate in one cycle t This could be a result of DCache Tag being busy or four Instruction or Data cache misses already present and waiting to be processed in the Secondary Cache Transaction Processing SCTD logic MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 268 Chapter 14 Event 3 for Counter 0 Stores Including Store Conditional Issued The counter is incremented on the cycle after a store instruction is issued to the address calculation unit Note that a store can only be counted as having been issued once even though it may actually be issued more than once due to DCache Tag being busy or there already being four load store cache misses waiting in the SCTP logic Event 3 For Counter 1 Store Including Store Conditional Graduation Each graduating store including SC increments the counter At most
411. upt since they share IM 7 The Compare register can be programmed to generate an interrupt at a particular time and is continually compared to the Count register Whenever their values equal the interrupt bit IP 7 in the Cause register is set This interrupt bit is reset whenever the Compare register is written 31 0 32 bit Counter incremented every processor cycle Compare 11 32 bit Compare Value Version 2 0 of January 29 1997 32 bit Equal to Comparator J Set IP7 in Cause Register Figure 14 9 Count and Compare Registers MIPS R10000 Microprocessor User s Manual Coprocessor 0 245 14 9 EntryHi Register 10 The EntryHi register holds the high order bits of a TLB entry for TLB read and write operations The EntryHi register is accessed by the TLB Probe TLB Write Random TLB Write Indexed and TLB Read Indexed instructions Figure 14 10 shows the format of this register and Table 14 8 describes the register s fields EntryHi Register 63 62 61 44 43 13 12 8 7 0 R FL VPN2 0 ASD 2 18 31 5 8 Figure 14 10 EntryHi Register Table 14 8 EntryHi Register Fields Field Description Virtual page number divided by two maps to two pages upper bits of the virtual address Address space ID field An 8 bit field that lets multiple processes ASID share the TLB each process has a distinct mapping of otherwise identical virtual page numbers Region 00 user 01 supervisor 11 k
412. use it would either require too many pins to read in two full copies of the data and tags or add latency to externally multiplex two banks of memory Instead a way prediction table is used to determine which way to read from first The way prediction table is internal to the processor and has 8K one bit entries each entry corresponding to a pair of secondary cache blocks The bit entry indicates which way of the addressed set has been most recently used MRU When the secondary cache is accessed this prediction bit is used as an address bit thus the two ways in the secondary cache are shared in the same SSRAM bank The secondary cache way prediction table is indexed with a subset of 11 to 13 bits of the physical address based on both the secondary cache block size and the secondary cache size as shown in Table 5 3 0 indicates a zero bit concatenated to the address to pad the index out to a full 13 bits Table 5 3 Secondary Cache Way Prediction Table Index SCSize Secondary Cache SCBI1kSize Secondary Cache Secondary Cache Mode Bits Size Mode Bit Block Size Way Prediction Table Index 16 word 0 PA 17 6 1 32 word 0 Il 0 Il PA 17 7 1 0 16 word PA 18 6 1 32 word 0 Il PA 18 7 0 16 word PA 18 6 2to5 2M to 16 Mbyte 1 32 word PA 19 7 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Secondary Cache Interface 65 Three states are possible in the way prediction table e t
413. use register is set signifying whether the instruction as indicated by the EPC register and BD bit in the Cause register caused the exception by an instruction reference load operation or store operation The EPC register contains the address of the instruction that caused the exception unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set Servicing The physical address at which the fault occurred can be computed from information available in the CPO registers e If the IBE code in the Cause register is set indicating an instruction fetch reference the instruction that caused the exception is located at the virtual address contained in the EPC register or 4 the contents of the EPC register if the BD bit of the Cause register is set e If the DBE code is set indicating a load or store reference the instruction that caused the exception is located at the virtual address contained in the EPC register or 4 the contents of the EPC register if the BD bit of the Cause register is set The virtual address of the load and store reference can then be obtained by interpreting the instruction The physical address can be obtained by using the TLBP instruction and reading the EntryLo registers to compute the physical page number The process executing at the time of this exception is handed a UNIX SIGBUS bus error signal
414. using its operand registers and bypasslogic They appropriate a second cycle later for storing their result When an instruction is issued up to two operands are read from dedicated read ports in the floating point register file After the operation has been completed the result can be written back into the register file using a dedicated write port For the add and multiply units this write occurs four cycles after its operands were read Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Floating Point Unit 303 15 2 Floating Point Unit Control The control of floating point execution is shared by the following units The floating point queue determines operand dependencies and dynamically issues instructions to the execution units It also controls the destination registers and register bypass e The execution units control the arithmetic operations and generate status e The graduate unit saves the status until the instructions graduate and then it updates the Floating Point Status register 15 3 Floating Point General Registers FGRs The Floating Point Unit is the hardware implementation of Coprocessor 1 in the MIPS IV Instruction Set Architecture The MIPS IV ISA defines 32 logical floating point general registers FGRs as shown in Figure 15 2 Each FGR is 64 bits wide and can hold either 32 bit single precision or 64 bit double precision values The hardware actually contains 64 physical 64 bit re
415. ut receivers Voltage reference bypass VrefByp This pin must be tied to Vss preferably or VrefSys through at least a 100 ohm resistor Vss Vss Vss for the core circuits and output drivers Vcc PLL analog vee Vcc for the PLL analog circuits Vss PLL analog MEI Vss for the PLL analog circuits Vcc PLL digital NOEP Vcc for the PLL digital circuits Vss PLL digital vere Vss for the PLL digital circuits Input DC voltages are OK DCOk The external agent asserts these two signals when Vcc Input VccQ SC Sys Vref SC Sys Vcc Pa Pd and SysClk are stable Errata VrefByp description changed in Table 3 1 Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual Interface Signal Descriptions 39 3 2 Secondary Cache Interface Signals Errata Table 3 2 description of SCBAddr 18 0 is revised Table 3 2 presents the R10000 processor secondary cache interface signals Table 3 2 Secondary Cache Interface Signals Signal Name SSRAM Clock Signals SCCIK 5 0 Secondary cache clock Output SCCIK 5 0 Duplicated complementary secondary cache clock outputs i SSRAM Address Signals Secondary cache address bus SCAAddr 18 0 E SCBAddr 18 0 SCBAddr is complementary SCAAddr 19 bit bus which specifies the set Output address of the secondary cache data and tag SSRAM that is to be accessed Secondary cache tag LSB address SCTagLSBAddr Signal that specifies the least significant bit of the address f
416. ve or invalidate to the secondary cache block containing the linked address 3 TheSC stores a new value into the memory word unless the new value has been modified If the word has not been modified the store succeeds and a 1 is stored in the destination register Otherwise the Store Conditional fails memory is not modified and a 0 is loaded into the destination register Since the instruction format has only a single field to select a data register rt this destination register is the same as the register which was stored Load Linked and Store Conditional instructions LL LLD SC and SCD do not implicitly perform SYNC operations in the R10000 processor MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 28 SYNC 1 9 Performance Version 2 0 of January 29 1997 Chapter 1 The SYNC instruction is implemented in a lightweight manner after decoding a SYNC instruction the processor continues to fetch and decode further instructions It is allowed to issue load and store instructions speculatively and out of order following a SYNC The R10000 processor only allows a SYNC instruction to graduate when the following conditions are met e all previous instructions have been successfully completed e the uncached buffer does not contain any uncached stores e the address cycle of a processor double single partial word write request resulting from an uncached store was not issued to the System interface i
417. w in Figure 14 33 Doubleword 31 0 TagLo o 32 Doubleword 63 32 TagHi 32 Figure 14 33 TagHi Lo Register Fields in Secondary Cache When CacheOp is Index Load Store Data MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 284 Chapter 14 14 24 ErrorEPC Register 30 Version 2 0 of January 29 1997 The ErrorEPC register is similar to the EPC register except that ErrorEPC is used on ECC and parity error exceptions It is also used to store the program counter PC on Reset Soft Reset and nonmaskable interrupt NMI exceptions The read write ErrorEPC register contains the virtual address at which instruction processing can resume after servicing an error Figure 14 34 shows the format of the ErrorEPC register ErrorEPC Register 63 0 ErrorEPC 64 Figure 14 34 ErrorEPC Register Format MIPS R10000 Microprocessor User s Manual Coprocessor 0 285 14 25 CPO Instructions Hazards Branch on Coprocessor 0 Table 14 25 lists the CPO instructions defined for the R10000 processor Since they are implementation dependent they are included here and not in the MIPS ISA manual Table 14 25 CPO Instructions OpCode Description CACHE Cache Operation DMFCO Doubleword Move From CPO DMTCO Doubleword Move To CPO ERET Exception Return MFCO Move from CPO MTCO Move to CPO TLBP Probe TLB for Matching Entry TLBR Read Indexed TLB Entry TLBWI Write Indexed
418. which limits the amount of memory each program may access The scheme prevents programs from interfering with the memory used by other programs or the operating system Translated virtual addresses retrieve data in blocks which are called pages In the R10000 processor the size of each page may be selected from a range that runs from 4 Kbytes to 16 Mbytes inclusive in powers of 4 that is 4 Kbytes 16 Kbytes 64 Kbytes etc The virtual address bits which select a page and thus are translated are called the page address The lower bits which select a byte within the selected page are called the offset and are not translated The number of offset bits varies from 12 to 24 bits depending on the page size Virtual Page Size Encodings Version 2 0 of January 29 1997 Page size is defined in each TLB entry s PageMask field This field is loaded or read using the PageMask register as described in Chapter 14 PageMask Register 5 Each entry translates a pair of physical pages The low bit of the virtual address page is not compared because it is used to select between these two physical pages MIPS R10000 Microprocessor User s Manual Memory Management Using the TLB Cache Algorithm Field Format of a TLB Entry Translations are maintained by the operating system using page tables in memory A subset of these translations are loaded into a hardware buffer called the translation lookaside buffer or TLB The contents of th
419. xternal requests An individual processor as opposed to the System interface above supports a maximum of four outstanding processor requests at any given time MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 88 Chapter 6 Request and Response Relationship The relationship between processor and external requests and their acceptable responses is presented in Table 6 1 The data in this table is given with respect to a single processor in either a uni or multiprocessor system independent of cluster non cluster configuration Table 6 1 Request and Response Relationship Request Acceptable Response Sequences External NACK or ERR completion response 0 or more external block data responses followed by a final external block data response with a coincidental or subsequent external ACK NACK or ERR completion response Processor block read request External NACK or ERR completion response 0 or more external double single partial word data responses followed by a final external double single partial word data response with a coincidental or subsequent external ACK NACK or ERR completion response Processor double single partial word write request Processor double single partial word read request None External ACK NACK or ERR completion response 0 or more external block data responses followed by a final external block data response with a coincidental or subsequent exte
420. xternal responses are supplied by an external agent in response to a processor request e External requests are generated by an external agent when it requires a resource within the processor e Processor responses are supplied by the processor in response to an external request 6 2 System Interface Frequencies The System interface operates at SysClk frequency supplied by the external agent The internal processor clock PClk is derived from this same SysClk The SysClkDiv mode bits select a PClk to SysClk divisor of 1 1 5 2 2 5 3 3 5 or 4 using the formula described in Chapter 7 the section titled System Interface Clock and Internal Processor Clock Domains 6 3 Register to Register Operation The System interface is designed to operate in the following register to register fashion with the external agent e all System interface outputs are sourced directly from registers clocked on the rising edge of SysClk all System interface inputs directly feed registers that are clocked on the rising edge of SysCIk This allows the System interface to run at the highest possible clock frequency Version 2 0 of January 29 1997 MIPS R10000 Microprocessor User s Manual System Interface Operations 81 6 4 System Interface Signals The R10000 System interface is composed of 6 5 Master and Slave States 3 arbitration signals 2 flow control input signals a bidirectional 12 bit command bus a bidirectional 64 bit mul
421. y 29 1997 274 Chapter 14 14 22 CacheErr Register 27 The CacheErr register is a 32 bit read only register that handles ECC errors in the secondary cache or system interface and parity errors in the primary caches R10000 processor correction policy is as follows e Parity errors cannot be corrected e Single bit ECC errors can be corrected by hardware without taking a Cache Error exception e Double bit ECC errors can be detected but not corrected by hardware e All uncorrectable errors take Cache Error exceptions unless the DE bit of the Status register is set e As in the R4400 cache errors are imprecise The CacheErr register provides cache index and status bits which indicate the source and nature of the error it is loaded when a Cache Error exception is taken CacheErr Register Format for Primary Instruction Cache Errors Figure 14 24 shows the format of the CacheErr register when a primary instruction cache error occurs 31 30 29 28 27 26252423 22 21 14 13 65 0 00 EW O D TA TS 0 Pldx 0 2 1 1 2 2 2 8 8 6 Figure 14 24 CacheErr Register Format for Primary Instruction Cache Errors EW set when CacheErr register is already holding the values of a previous error D data array error way1 way0 TA tag address array error way1 way0 TS tag state array error way1 ll way0 Pldx primary cache virtual block index VA 13 6 Errata 0 Reserved Must be written as zeroes and returns zeroes when r
422. y condition is handled in hardware When the hardware detects the Virtual Coherency exception it invalidates the lines in all other segments of the primary cache that could cause aliasing This takes six cycles more than that needed to refill the primary cache line the refill would have occurred even if there was no Virtual Coherency exception detected In the R4400 processor a Virtual Coherency exception occurs when a primary cache miss hits in the secondary cache but VA 14 12 are not the same as the PIdx field of the secondary cache tag and the cache algorithm specifies that the page is cached When such a situation is detected in the R10000 processor the primary cache lines at the old virtual index are invalidated and the Pldx field of the secondary cache is written with the new virtual index MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 346 Chapter 17 Bus Error Exception Cause A Bus Error exception occurs when a processor block read upgrade or double single partial word read request receives an external ERR completion response or a processor double single partial word read request receives an external ACK completion response where the associated external double single partial word data response contains an uncorrectable error This exception is not maskable Processing The common interrupt vector is used for a Bus Error exception The IBE or DBE code in the ExcCode field of the Ca
423. y used The way prediction table can cover up to a 2 Mbyte secondary cache when the secondary cache block size is 32 words If the secondary cache exceeds this size the accuracy of the way prediction table diminishes slightly However the extremely large performance gain made by making the secondary cache larger far outstrips any performance loss in the way prediction table MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 66 Chapter 5 5 5 Secondary Cache Tag SCTag 25 4 Physical Tag Version 2 0 of January 29 1997 The secondary cache tag transferred on the SCTag 25 0 bus is divided into three fields as shown in Figure 5 2 below 25 43210 Physical Tag Pldx State 22 2 2 Figure 5 2 Secondary Cache Tag Fields The minimum secondary cache size is 512 Kbytes 256 Kbytes per way so a minimum of 18 bits are required to index a data byte within a selected way Since the processor supports 40 physical bits a maximum of 22 bits are required for the physical tag 40 physical address bits 18 minimum required 22 Consequently the processor supplies the 22 physical address bits PA 39 18 on SCTag 25 4 for the physical tag When the secondary cache is larger than the minimum size the secondary cache tag array must still maintain the full physical tag supplied by the processor even though some bits are redundant MIPS R10000 Microprocessor User s Manual Seconda
424. ycles 100 external special requests 99 processor read requests 96 processor upgrade requests 97 SysCmd 5 MIPS R10000 Microprocessor User s Manual 1 15 data cycles 99 SysCmd 5 bit 90 SysCmdq 7 5 external requests 98 processor requests 96 SysCmdPar signal 41 181 SysCorErr signal 42 168 177 179 182 SysCyc signal 42 154 SysGblPerf signal 28 42 56 148 SysGnt signal 41 108 109 110 112 114 116 118 120 122 124 127 132 133 134 135 136 139 148 160 162 163 336 337 361 362 SysNMI signal 42 106 339 SysRdRdy signal 41 109 113 115 121 125 and flow control 93 SysRel signal 41 108 110 112 114 116 118 120 22 124 127 132 133 134 135 136 139 148 SysReq signal 41 108 109 112 114 116 118 120 122 124 139 148 162 SysReset signal 42 160 162 163 204 216 336 337 338 361 362 SysResp bus 42 95 105 184 SysResp 4 0 external completion response 130 SysResp 4 2 driving completion indication 105 SysRespPar signal 42 184 SysRespVal signal 42 130 160 162 163 184 SysState bus 42 95 104 170 184 SysState 0 processor coherency data response 146 SysState 2 0 encoding 104 SysStatePar signal 42 184 SysStateVal signal 42 104 System Call exception 349 system configuration multiprocessor 35 uniprocessor 34 System interface 10 79 arbitration in a cluster bus system 82 111 in a uniprocessor system 110 protocol 108 Version 2 0 of
425. ypically such I O addresses are mapped to an uncached region and uncached reads and writes are made as double single partial word reads and writes non block reads and writes in R10000 Uncached reads and writes are guaranteed to be non speculative However if R10000 has a garbage value in a register a speculative block read request to an unpredictable physical address can occur if it speculatively fetches data due to a Load or Jump Register instruction specifying this register Therefore speculative block accesses to load sensitive I O areas can present an unwanted side effect MIPS R10000 Microprocessor User s Manual Version 2 0 of January 29 1997 22 Chapter 1 Unexpected Write Back Due to Speculative Store Instruction When a Store instruction is speculated and the target address of the speculative Store instruction is missing in the cache the cache line is refilled and the state is marked to be Dirty However the refilled data may not be actually changed in the cache if this store instruction is later aborted This could present a side effect in cases such as the one described below e The processor is storing data sequentially to memory area A using a code loop that includes Store and Cond branch instructions ADMA write operation is performed to memory area B e DMA area B is contiguous to the sequential storage area A e The DMA operation is noncoherent e The
426. ys usable when in Kernel mode regardless of the setting of the CUg bit 1 gt usable 0 unusable RP FR In the R4400 processor this bit enables reduced power operation by reducing the internal clock frequency In the R10000 processor this bit should be set to zero Enables additional floating point registers 0 16 registers 1 32 registers RE Reverse Endian bit valid in User mode DS IM Diagnostic Status field see Figure 14 12 Interrupt Mask controls the enabling of each of the external internal and software interrupts An interrupt is taken if interrupts are enabled and the corresponding bits are set in both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause register 0 disabled 1 enabled KX Enables 64 bit addressing in Kernel mode The extended addressing TLB refill exception is used for TLB misses on kernel addresses 0 32 bit 1 64 bit SX UX Version 2 0 of January 29 1997 Enables 64 bit addressing and operations in Supervisor mode The extended addressing TLB refill exception is used for TLB misses on supervisor addresses 0 32 bit 1 gt 64 bit Enables 64 bit addressing and operations in User mode The extended addressing TLB refill exception is used for TLB misses on user addresses 0 32 bit 1 64 bit MIPS R10000 Microprocessor User s Manual Coprocessor 0 249 Table 14 10
427. ystem Flow Control Signals Bidirectional SysRdRdy System read ready The external agent asserts this signal to indicate that it can accept processor read and upgrade requests Input SysWrRdy System write ready The external agent asserts this signal to indicate that it can accept processor write and eliminate requests System Address Data Bus Signals Input SysCmd 11 0 System command A 12 bit bus for transferring commands between processor and the external agent Bidirectional SysCmdPar SysAD 63 0 MIPS R10000 Microprocessor User s Manual System command bus parity Odd parity for the system command bus System address data bus A 64 bit bus for transferring addresses and data between R10000 and the external agent Version 2 0 of January 29 1997 Bidirectional Bidirectional 42 Signal Name Table 3 3 cont System Interface Signals Description Chapter 3 Type System State Bus Signals SysADChk 7 0 SysVal SysState 2 0 System address data check bus An 8 bit ECC bus for the system address data bus System valid The master of the System interface asserts this signal when it is driving valid information on the system command and system address data buses System state bus A 3 bit bus used for issuing processor coherency state responses and also additional status indications Bidirectional Bidirectional Output SysStatePa

Download Pdf Manuals

image

Related Search

Related Contents

MANUEL D`INSTALLATION SET/OSK2  Frymaster 8196339 Fryer User Manual  XL-xD Satelliten Aktivanlage Bedienungsanleitung / Users manual  Emerson T205B Instruction Manual  Brodit Passive Holder,Tilt Swivel  Télécharger - Culture Mobile  Documento PDF - AMS Tesi di Laurea  WLg-ABOARD /N[P] - ACKSYS Communications & Systems  Herunterladen der Bedienungsanleitung  Operating the OSD  

Copyright © All rights reserved.
Failed to retrieve file