Home

cache

1. 28 26 Opcode 31 29 0 1 2 3 5 6 7 0 SPECIAL REGIMM J JAL BEQ BNE BLEZ BGTZ 1 ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI 2 COPO COP1 COP2 BEQL BNEL BLEZL BGTZL 3 DADDI DADDIU LDL LDR 4 LB LH LWL LW LBU LHU LWR LWU 5 SB SH SWL SW SDL SDR SWR CACHES 6 LL LWC1 LWC2 x LLD LDC1 LDC2 LD 7 SC SWC1 SWC2 ji SCD SDC1 SDC2 SD 2 0 SPECIAL function 5 3 1 2 3 4 5 6 7 0 SRL SRA SLLV S SRLV SRAV 1 SYSCALL BREAK SYNC 2 MFLO MTLO DSLLV i DSRLV DSRAV 3 DIV DIVU DMULT DMULTU DDIV DDIVU 4 SUB SUBU AND OR XOR NOR 5 SLT SLTU DADD DADDU DSUB DSUBU 6 TGE TLT TLTU TEQ TNE is 7 DSLL DSRL DSRA DSLL32 DSRL32 DSRA32 REGIMM rt 20 19 1816 1 2 3 4 5 6 7 0 BLTZ BGEZ BLTZL BGEZL i i i 1 TGEI TGEIU TLTI TLTIU TEQI i TNEI 2 BLTZAL BGEZAL BLTZALL BGEZALL E i i i 23 24 COPz rs 25 24 0 1 2 3 4 5 6 7 0 MF DMF CF Y MT DMT CT Y 1 BC Y Y Y a6 Y Y Y 2 co 3 18 16 COPz rt 20 19 0 1 2 3 4 5 6 7 0 BCF BCT BCFL BCTL Y Y Y Y 1 Y Y Y Y Y y Y Y 2 Y Y Y Y Y Y Y Y 3 Y Y K Y Y y Y Y 2 0 CPO Function 5 3 0 1 2 3 4 5 6 7 0 o TLBR TLBWI TLBWR 1 Pra 2 3 ERET 4 WAIT 5 6 o 7 o Key to Table Operation codes marked with an asterisk cause reserved instruction exceptions in all current implementations and are reserved for future versions of the architecture g O
2. CPU Instruction Set Details BEQ Branch On Equal Appendix A BEQ 31 26 25 21 20 16 15 0 BEQ rs rt offset 000100 6 5 5 16 Format BEQ rs rt offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended The contents of general register rs and the contents of general register rt are compared If the two registers are equal then the program branches to the target address with a delay of one instruction target lt offset s offset 0 condition GPR rs GPR rt if condition then PC amp PC target endif Operation T T 1 Exceptions None CPU Instruction Set Details Appendix A B EQL Branch On Equal Likely B EQL 31 26 25 21 20 16 15 0 BEQL rs rt offset 010100 6 5 5 16 Format BEQL rs rt offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended The contents of general register rs and the contents of general register rt are compared If the two registers are equal the target address is branched to with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay
3. Serial Value Mode Setting Serial Value Mode Setting Bit Bit 0 Reserved must be zero 9 10 Non block Write Selects the manner in which non block writes are handled bit 10 is most significant 1 4 XmitDatPat System interface data rate for 0 R4x00 compatible block writes only bit 4 most significant 0 DDDD 1 Reserved 1 2 Pipelined Writes 2 3 Write re issue 3 11 TmrintEn Disables the timer interrupt on Int 5 4 0 Enabled Timer Interrupt 5 1 Disabled Timer Interrupt 6 12 Reserved must be zero 7 13 14 Drv_Out Output driver slew rate control Bit 14 is most significant Affects only out puts that are not clocks 8 DxxxDxxxDxxxDxxx 10 100 strength fastest 9 15 Reserved 11 83 strength 5 7 SysCkRatio PClock to SClock divisor fre 00 67 strength quency relationship between SClock RClock and TClock and PClock bit 7 most significant o Divide by 2 Ol 50 strength slowest 1 Divide by 3 15 Tclock O 2 Divide by 4 0 Enabled 1 Disabled 3 Divide by 5 16 Tclock 1L 4 Divide by 6 0 Enabled 1 Disabled 5 Divide by 7 17 Rclock 0 6 Divide by 8 0 Enabled 1 Disabled 7 Reserved 18 Rclock 1 8 EndBIt Specifies byte ordering 0 Enabled 1 Disabled 0 Little endian 19 255 Reserved must be zero ordering 1 Big endian ordering Table 9 2 Boot Mode Settings Clock Interface Chapter 10 dt Integrated Device Te
4. A 106 CPU Instruction Set Details Appendix A SD L Store Doubleword Left S D L 31 26 25 21 20 16 15 0 SDL base rt offset 101100 6 5 5 16 Format SDL rt offset base Description This instruction can be used with the SDR instruction to store the contents of a register into eight consecutive bytes of memory when the bytes cross a doubleword boundary SDL stores the left portion of the register into the appropriate part of the high order doubleword of memory SDR stores the right portion of the register into the appropriate part of the low order doubleword The SDL instruction adds its sign extended 16 bit offset to the contents of general register base to form a virtual address which may specify an arbitrary byte It alters only the word in memory which contains that byte From one to four bytes will be stored depending on the starting byte specified Conceptually it starts at the most significant byte of the register and copies it to the specified byte in memory then it copies bytes from register to memory until it reaches the low order byte of the word in memory No address exceptions due to alignment are possible address 8 8 9 10 11 12 13 14 15 addressO 0 1 2 3 4 51 6 address 8 8 9 address0 O B memory big endian register before a B C D E F G H 24 SDL
5. 31 26 25 2120 65 0 SPECIAL rs 0 MTLO 000000 000000000000000 010011 6 5 15 6 Format MTLO rs Description The contents of general register rs are loaded into special register LO If a MTLO operation is executed following a MULT MULTU DIV or DIVU instruction but before any MFLO MFHI MTLO or MTHI instructions the contents of special register HI are undefined Operation T 2 LO lt undefined T 1 LO lt undefined jE LO GPR rs Exceptions None CPU Instruction Set Details Appendix A MULT Multiply MULT 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt 0 MULT 000000 000000 0000 011000 6 5 5 10 6 Format MULT rs rt Description The contents of general registers rs and rt are multiplied treating both operands as 32 bit 2 s complement values No integer overflow exception occurs under any circumstances The operands must be valid 32 bit sign extended values When the operation completes the low order word of the double result is loaded into special register LO and the high order word of the double result is loaded into special register HI If either of the two preceding instructions is MFHI or MFLO the results of these instructions are undefined Correct operation requires separating reads of HI or LO from writes by a minimum of two other instructions Operation T 2 LO lt undefined HI lt undefined T 1 LO lt unde
6. LDR Register A B C D E F G H Memory J K L M N O P BigEndianCPU 0 BigEndianCPU 1 vAddr3 o destination type offset destination type offset LEM BEM LEM BEM 0 I J KLMNOP 7 0 0 JABCDEFGI 0 7 0 1 Al JKLMNO 6 1 O ABCDEFI J 1 6 0 2 ABI JKLMN 5 2 0 ABCDEI J K 2 5 0 3 ABCIJKLM 4 3 0 ABCDIJKL 3 4 0 4 ABCDI JKL 3 4 0 ABCIJKLM 4 3 Q 5 ABCDEI J K 2 5 0 ABIJKLMN 5 2 0 6 ABCDEFI J 1 6 0 JAI JKLMNOJ 6 1 0 7 ABCDEFGI 0 7 Oj JKLMNOP 7 o 0 LEMLittle endian memory BigEndianMem 0 BEMBigEndianMem 1 TypeAccessType see Table 2 1 on page 3 sent to memory OffsetpAddrg 9 sent to memory Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details LH Load Halfword Appendix A LH 31 26 25 21 20 16 15 0 LH base rt offset 100001 6 5 5 16 Format LH rt offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of the halfword at the memory location specified by the effective address are sign extended and loaded into general register rt If the least significant bit of the effective address is non zero an address error exception occurs Operation T vAddr lt offset 8 offset s 9 GPR base pAddr uncached lt AddressTranslation vAddr DATA pAddr pAddrpgize
7. FPU Instruction Set Details Appendix B FPU Instruction Opcode Bit Encoding Opcode 28 26 31 29 0 1 2 3 4 5 6 7 0 1 2 COP1 3 4 5 6 LWC1 LDC1 7 SWC1 SDC1 sub 23 21 25 24 0 1 2 3 4 5 6 7 0 MF DMFn CF y MT DMT CT y 1 BC Y Y Y Y 7 Y 2 S D 5 5 W Ly 5 5 3 5 5 5 5 5 5 5 18 16 br 20 19 9 1 2 3 4 5 6 7 0 BCF BCT BCFL BCTL y Y Y Y Y Y Y Y Y Y Y 2 y Y Y y Y Y 3 y Y Y Y Y Y Y Y 2 0 function 5 3 0 1 2 3 4 5 6 7 0 ADD SUB MUL DIV SQRT ABS MOV NEG 1 ROUND LN TRUNC LN CEIL L FLOOR LN ROUND W TRUNC W CEIL W FLOOR W 2 3 4 CVT S CVT D 5 CVT W CVT Ln 5 5 5 5 5 5 5 6 C F C UN C EQ C UEQ C OLT C ULT C OLE C ULE 7 C SF C NGLE C SEQ C NGL C LT C NGE C LE C NGT Key to Table y Operation codes marked with a gamma cause a reserved instruction exception They are reserved for future versions of the architecture 5 Operation codes marked with a delta cause unimplemented operation exceptions in all current implementations and are reserved for future versions of the architecture n Valid when 64 bit operand opcodes are enabled Figure B 3 Bit Encoding for FPU Instructions Cache Operations Timing Appendix C Y B Integrated Device Technology Inc Introduction This appendix lists cycle operation counts and caveats for R4600 R4700 cache operations timing
8. one cycle lt o 1l 1R_ Instruction cache access 2R Instruction decode 2l Instruction virtual to physical address translation in ITLB 1A 2A Integer add logical shift Data cache access and load align 1A Data virtual address calculation Data virtual to physical address translation in DTLB 2A Store align Virtual to physical address translation in JTLB 1A Branch decision Register file read 2W Register file write Bypass calculation Figure 3 1 Instruction Pipeline Stages The CPU Pipeline Chapter 3 CPU Pipeline Stages This section describes each of the phases of the five pipeline stages Bra stage has 2 phases lI Instruction Fetch Phase one 21 Instruction Fetch Phase two 1R Register Fetch Phase one 2R Register Fetch Phase two 1A Execution Phase one 2A Execution Phase two 1D Data Fetch Phase one 2D Data Fetch Phase two 1W Write Back Phase one 2W Write Back Phase two 1I Instruction Fetch Phase one During the lI phase the instruction address translation begins in the ITLB 2I Instruction Fetch Phase two During the 2I phase the instruction cache fetch begins and the instruction address translation in the ITLB continues 1R Register Fetch Phase one During the 1R phase the following occurs e The instruction cache fetch finishes e The instruction cache tag is checked against the page frame number obtained from the ITLB 2R Regist
9. Appendix A LUI Load Upper Immediate LUI 31 26 25 21 20 16 15 0 LUI 0 rt immediate 001111 00000 6 5 5 16 Format LUI rt immediate Description The 16 bit immediate is shifted left 16 bits and concatenated to 16 bits of zeros The result is placed into general register rt The loaded word is sign extended Operation T GPRf rt immediate 5 immediate 016 Exceptions None CPU Instruction Set Details Appendix A LW Load Word LW 31 26 25 21 20 16 15 0 LW base rt offset 100011 6 5 5 16 Format LW rt offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of the word at the memory location specified by the effective address are loaded into general register rt The loaded word is sign extended If either of the two least significant bits of the effective address is non zero an address error exception occurs Operation T vAddr lt offset s offset s 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrpgize 1 3 Addr o xor ReverseEndian 02 mem lt LoadMemory uncached WORD pAddr vAddr DATA byte vAddr 9 xor BigEndianCPU 0 GPR rt mems1 8 byte gt MEM3 1 8 byte 8 byte Exceptions TLB refill exception TLB invalid exception Bus error exception Address
10. G Detect Cache Miss 2 Get entire cache line into cache 3 Continue pipeline NOP Inserted NOP instructions Figure 3 8 Instruction cache miss Instruction cache misses are detected in R as shown in Figure 3 8 and the pipeline slips in its A stage There can never be a writeback required for an instruction cache miss since dirty data can never exist in the I cache Writes are not allowed to the I cache Note that early restart is not employed for instruction cache misses the requested cache line will be loaded into the cache in its entirety and after that the pipeline will restart R4600 R4700 Write Buffer The R4600 R4700 contains a write buffer to improve the performance of writes to the external memory Writes to external memory whether cache miss writebacks or stores to uncached or write through addresses use this on chip write buffer The write buffer holds up to four 64 bit address and data pairs For a cache miss write back the entire buffer is used for the write back data and allows the processor to proceed in parallel with the memory update For uncached and write through stores the write buffer uncouples the CPU from the write to memory allowing increased performance over the R4000 family of processors If the write buffer is full additional stores will stall until there is room for them in the write buffer Memory Management Chapter 4 Integrated Device Tech
11. Staging Register Figure 10 7 Gate Array System Without Phase Lock Using the R4600 R4700 Processor In a system without phase lock the transmission time for a signal from the processor to an external agent composed of gate arrays can be calculated from the following equation Transmission Time 75 percent of TClock period tpo for R4600 R4700 Min External Clock Buffer Delay External Sample Register Setup Time Max Clock Jitter for R4600 R4700 Internal Clocks Max Clock Jitter for RClock 10 7 Clock Interface Chapter 10 The transmission time for a signal from an external agent composed of gate arrays to the processor in a system without phase lock can be calculated from the following equation Transmission Time TClock period tps for R4600 R4700 Max External Clock Buffer Delay Max External Output Register Clock to Q Delay Max Clock Jitter for TClock Max Clock Jitter for R4600 R4700 Internal Clocks Connecting to a CMOS Logic System The processor uses matched delay clock buffers to generate aligned clocks to external CMOS logic A matched delay clock buffer is inserted in the SyncOut Synclin alignment path of the processor skewing SyncOut MasterOut RClock and TClock to lead MasterClock by the buffer delay amount while leaving PClock aligned with MasterClock The remaining matched delay clock buffers are available to generate a
12. 31 26 25 21 20 16 15 11 10 6 5 0 COP1 fmt 0 fs fd TRUNC L 010001 00000 001001 6 5 5 5 5 6 Format TRUNC L fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed point format The result is placed in the floating point register specified by fd Regardless of the setting of the current rounding mode the conversion is rounded as if the current rounding mode is round toward zero 1 This instruction is valid only for conversion from single or double precision floating point formats When the source operand is an Infinity NaN or the correctly rounded integer result is outside of 263 to 263 1 the Invalid operation exception is raised If the Invalid operation is not enabled then no exception is taken and 283 1 is returned Operation T StoreFPR fd L ConvertFmt ValueFPR fs fmt fmt L Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU Instruction Set Details Appendix B TRUNC W fmt eano Point TRUNC W fmt Truncate to Single Fixed Point Format 31 26 25 21 20 16 15 11 10 6 5 0 fmt 0 fs fd TRUNC W 010001 00000 001101 5 5 5 5 6 Format TRUNC W fmt fd fs Description The
13. Figure 11 10 Counter Using LL and SC 11 12 System Interface Chapter 12 dt Integrated Device Technology Inc Introduction The System interface allows the processor to access external resources needed to satisfy cache misses and uncached operations while permit ting an external agent access to some of the processor internal resources This chapter describes the system interface from the point of view of both the processor and the external agent Terminology The following terms are used in this chapter An external agent is any logic device connected to the processor over the system interface that allows the processor to issue requests A system event is an event that occurs within the processor and requires access to external system resources Sequence refers to the precise series of requests that a processor gener ates to service a system event Protocol refers to the cycle by cycle signal transitions that occur on the system interface pins to assert a processor or external request Syntax refers to the precise definition of bit patterns on encoded buses such as the command bus System Interface Description The R4600 R4700 processor supports a 64 bit address data interface that can construct a simple uniprocessor with main memory The System interface consists of e 64 bit address and data bus SysAD e 8 bit SysAD check bus SysADC even parity only e
14. Release Latency Release latency is generally defined as the number of cycles the processor can wait to release the system interface to slave state for an external request When no processor requests are in progress internal activity can cause the processor to wait some number of cycles before releasing the system interface Release latency is therefore more specifi cally defined as the number of cycles that occur between the assertion of ExtRqst and the assertion of Release There are three categories of release latency e Category 1 when the external request signal is asserted two cycles before the last cycle of a processor request e Category 2 when the external request signal is not asserted during a processor request or is asserted during the last cycle of a processor request e Category 3 when the processor makes an uncompelled change to slave state Table 12 5 summarizes the minimum and maximum release latencies for requests that fall into categories 1 2 and 3 Note that the maximum and minimum cycle count values are subject to change Category Minimum PCycles Maximum PCycles 1 4 6 2 4 24 3 0 0 Table 12 5 Release Latency for External Requests The differences in the minimum and maximum times are due to internal conditions not readily observable externally System Interface Commands and Data Identifiers System interface commands specify the nature and attributes of any system interface
15. Table 12 4 Transmit Data Rates and Patterns In Table 12 4 data patterns are specified using the letters D and x D indicates a data cycle and x indicates an unused cycle During the unused cycles the data bus will maintain the last data value D 12 30 System Interface Chapter 12 Independent Transmissions on the SysAD Bus In most applications the SysAD bus is a point to point connection running from the processor to a bidirectional registered transceiver residing in an external agent For these applications the SysAD bus has only two possible drivers the processor or the external agent Certain applications may require connection of additional drivers and receivers to the SysAD bus to allow transmissions over the SysAD bus that the processor is not involved in These are called independent trans missions To effect an independent transmission the external agent must coordinate control of the SysAD bus by using arbitration handshake signals and external null requests An independent transmission on the SysAD bus follows this procedure 1 The external agent requests mastership of the SysAD bus to issue an external request 2 The processor releases the system interface to slave state 3 The external agent then allows the independent transmission to take place on the SysAD bus making sure that ValidIn is not asserted while the transmission is occurring 4 When the transmission is complete the external agent
16. 4 bytes valid Word 5 bytes valid Quintibyte 6 bytes valid Sextibyte 7 bytes valid Septibyte 8 bytes valid Doubleword NN OQ ol PB N oO Table 12 12 Doubleword Word or Partial word Write Request Data Size Encoding of SysCmd 2 0 12 35 System Interface Chapter 12 Null Requests Figure 12 31 shows the format of a SysCmd null request Null Request Specific see table Figure 12 31 Null Request SysCmd Bus Bit Definition System interface release external null requests use the null request command Table 12 13 lists the encoding of SysCmd 4 3 for external null requests SysCmd 2 0 are reserved for both instances of null requests SysCmd 4 3 Null Attributes 0 System Interface release 1 3 Reserved Table 12 13 External Null Request Encoding of SysCmd 4 3 System Interface Data Identifier Syntax This section defines the encoding of the SysCmd bus for system inter face data identifiers Figure 12 32 shows a common encoding scheme used for all system interface data identifiers Reserved Figure 12 32 Data Identifier SysCmd Bus Bit Definition SysCmd 8 must be set to 1 for all system interface data identifiers system interface data identifiers use the format for noncoherent data Noncoherent Data Noncoherent data is defined as follows e data that is associated with processor block write requests and pro cessor doublewor
17. Address space ID field An 8 bit field that lets multiple processes share the TLB each process has a distinct mapping of otherwise identical virtual page numbers Region 00 user 01 supervisor 11 kernel used to match vAddrg3__ go Reserved Returns zero when read ignored on writes Reserved Must be written as zeroes and returns zeroes when read Figure 4 9 Fields of the PageMask and EntryHi Registers Memory Management Chapter 4 EntryLo0O and EntryLo1 Registers 30 29 Page frame number the upper bits of the physical address Specifies the TLB page coherency attribute see Table 4 6 Dirty If this bit is set the page is marked as dirty and therefore writable This bit is actually a write protect bit that software can use to prevent alteration of data ee Valid If this bit is set it indicates that the TLB entry is valid otherwise a TLBL or TLBS miss occurs Global If this bit is set in both LoO and Lo1 then the processor ignores the ASID during TLB lookup Reserved Must be written as zeroes and returns zeroes when read Figure 4 10 Fields of the EntryLoO0 and EntryLol Registers The TLB page coherency attribute C bits specify whether references to the page should be cached if cached the algorithm selects between several coherency attributes Table 4 6 shows the coherency attributes selected by the C bits C 5 3 Value Page Coherency Attribute 0 Cacheable n
18. FLOOR L fmt Fioortotong FLOOR L fmt Fixed Point Format 31 26 25 21 20 16 15 11 10 6 5 0 COP1 fmt 0 fs fd FLOOR L 010001 00000 001011 6 5 5 5 5 6 Format FLOOR L fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed point format The result is placed in the floating point register specified by fd Regardless of the setting of the current rounding mode the conversion is rounded as if the current rounding mode is round to 3 This instruction is valid only for conversion from single or double precision floating point formats When the source operand is an Infinity NaN or the correctly rounded integer result is outside of 28 to 289 1 the Invalid operation exception is raised If the Invalid operation is not enabled then no exception is taken and 2 1 is returned Operation le StoreFPR fd L ConvertFmt ValueFPR fs fmt fmt L Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU Instruction Set Details Appendix B FLOOR W fmt 22 9Fe FL OOR W fmt Fixed Point Format 31 26 25 21 20 16 15 11 10 6 5 0 fmt 0 fs fd FLOOR W 0100
19. GPR rt 34 s 0 OS GPR rd temp31 37 temp Exceptions None A 112 CPU Instruction Set Details Appendix A SLLV Shift Left Logical Variable SLLV 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 SLLV 000000 00000 000100 6 5 5 5 5 6 Format SLLV rd rt rs Description The contents of general register rt are shifted left the number of bits specified by the low order five bits contained in general register rs inserting zeros into the low order bits The result is placed in register rd The operand must be a valid sign extended 32 bit value Operation aie s lt 0 GP rs 4 o temp GPRI rt 31 s 0 I1 0 GPR rd temp31 3 temp Exceptions None A 113 CPU Instruction Set Details SLT Set On Less Than Appendix A SLT 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 SLT 000000 00000 101010 6 5 5 5 5 6 Format SLT rd rs rt Description The contents of general register rt are subtracted from the contents of general register rs Considering both quantities as signed integers if the contents of general register rs are less than the contents of general register rt the result is set to one otherwise the result is set to zero The result is placed into general register rd No integer overflow exception occurs under any circumstances The comparison is valid even if the subtraction used during the co
20. SDCz Store Doubleword SDCz From Coprocessor 31 26 25 21 20 16 15 0 SDCz base rt offset 1111xx 6 5 5 16 Format SDCz rt offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form a virtual address Coprocessor unit z sources a doubleword which the processor writes to the addressed memory location The data to be stored is defined by individual coprocessor specifications If any of the three least significant bits of the effective address are non zero an address error exception takes place This instruction is not valid for use with CPO This instruction is undefined when the least significant bit of the rt field is non zero Operation T vAddr lt offset 5 offset 5 o GPR base pAddr uncached AddressTranslation vAddr DATA data COPzSD rt StoreMemory uncached DOUBLEWORD data pAddr vAddr DATA Note See the table in this section under Opcode Bit Encoding Also see CPU Instruction Opcode Bit Encoding at the end of Appendix A Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception Coprocessor unusable exception Opcode Bit Encoding SDCZ pi 31 30 29 28 27 2 0 pei EEEN 1 0 1 Bit 31 30 29 28 27 26 0 1 1 1 1 1 0 spca _ al 5 a ae a SD opcode Coprocessor Unit Number
21. Multiple Matches The R4600 R4700 does not provide any detection or shutdown mechanism for multiple matches in the TLB There is no damage possible from this condition The result is undefined for this condition Software is expected never to allow this to occur Address Spaces This section describes the virtual and physical address spaces and the manner in which virtual addresses are converted or translated into physical addresses in the TLB Virtual Address Space The processor virtual address can be either 32 or 64 bits wide depending on mode of operation user supervisor or kernel and the setting of the corresponding extended address bit in the Status register UX SX and KX e For the extended address bit 0 addresses are 32 bits wide e For the extended address bit 1 addresses are 64 bits wide Both 32 bit and 64 bit address wrap in the same way For example in 64 bit mode Oxfffftfffffffffff will wrap to OxOOOO000000000000 While the R4400 slipped on shift of gt 32 bit or other shift variables the R4600 R4700 does not There are virtual to physical address translations that occur outside of the TLB For example addresses in kseg0 and kseg spaces are unmapped translations In these spaces the physical address is 0x0000 0000 0 II VA 28 0 4 1 Memory Management Chapter 4 Figure 4 1 shows the translation of a virtual address into a physical address Virtual address 1 Virtual address VA repre
22. SH Store Halfword Appendix A SH 31 26 25 21 20 16 15 0 SH base rt offset 101001 6 5 5 16 Format Description SH rt offset base The 16 bit offsetis sign extended and added to the contents of general register base to form an unsigned effective address The least significant halfword of register rt is stored at the effective address If the least significant bit of the effective address is non zero an address error exception occurs Operation T vAddr lt offset s 4 offsets 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrpgize 1 3 PAddrs o xor ReverseEndian 0 byte lt vAddro 9 xor BigEndianCPU2 0 data GPRIrtlgs_sbyte o 08 PY StoreMemory uncached HALFWORD data pAddr vAddr DATA Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception A 111 CPU Instruction Set Details Appendix A SLL Shift Left Logical SLL 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL 0 SLL 000000 00000 j rd sa 000000 6 5 5 5 5 6 Format SLL rd rt sa Description The contents of general register rt are shifted left by sa bits inserting zeros into the low order bits The result is placed in register rd The operand must be a valid sign extended 32 bit value Operation T seo0j sa temp lt
23. TLB invalid exception TLB modification exception Bus error exception Address error exception A 127 SWL Register A B C D E F G H Memory J K L M N O P BigEndianCPU 0 BigEndianCPU 1 offset offset vAddrs 9 destination type LEM BEM destination type LEM BEM 0 I J KLMNOE 0 0 7 EF GHMNOP 3 4 0 1 I J KLMNEF 1 0 6 I EFGMNO P 2 4 1 2 I J KLMEFQG 2 0 5 jI J EFMNOP 1 4 2 3 I J KLEFGH 3 0 4 J KEMNO P 0 4 3 4 I J KEMNOP 0 4 3 IJ KLEFGH 3 0 4 5 I J EFMNOP 1 4 2 I J KL MEF G 2 0 5 6 I EF GMNOP 2 4 1 IJ KL MNE F 1 0 6 7 EF GHMNOP 3 4 O II J KLMNOE 0 0 7 LEM Little endian memory BigEndianMem 0 BEM BigEndianMem 1 Type AccessType see Table 2 1 on page 2 3 sent to memory Offset pAddro o sent to memory Exceptions CPU Instruction Set Details Appendix A SWR Store Word Right SWR 31 26 25 21 20 16 15 SWR base rt offset 101110 6 5 5 16 Format SWR rt offset base Description This instruction can be used with the SWL instruction to store the contents of a register into four consecutive bytes of memory when the bytes cross a boundary between two words SWR stores the right portion of the register into the appropriate part of the low order word SWL stores the left portion of the register into the appropriate part of the low order word of memory The SWR instruction adds its sign extended 16 bit offset to the contents of general register base to form a virtual add
24. TagHi 29 Used with memory management system Note Register number LLAddr 17 Used with exception processing See Chapter 5 for details Figure 4 7 CPO Registers and the TLB Format of a TLB Entry Figure 4 8 shows the TLB entry formats for both 32 and 64 bit virtual addressing Each field of an entry has a corresponding field in the EntryHi EntryLo0O EntryLol or PageMask registers as shown in Figure 4 9 and Figure 4 10 for example the Mask field of the TLB entry is also held in the PageMask register Memory Management Chapter 4 64 bit Virtual Addressing 217 216 205 204 96 12 39 13 191 190 189 168 167 141 140 139 136 135 128 256 bit TLB G ied ASID entry in 64 bit 8 4 irtual add i virtual addressing 70 69 67 66 65 64 Figure 4 8 Format of a TLB Entry The format of the EntryHi EntryLoO EntryLol and PageMask registers are nearly the same as the TLB entry The one exception is the Global field G bit which is used in the TLB but is reserved in the EntryHi register Figure 4 9 and Figure 4 10 describe the TLB entry fields that are shown in Figure 4 8 PageMask Register 31 25 24 9 g 13 12 0 ae MASK i 7 12 13 Mask Page comparison mask Reserved Must be written as zeroes and returns zeroes when read EntryHi Register 63 62 61 40 39 64 bit VA R FILL VPN2 2 22 27 VPN2 Virtual page number divided by two maps to two pages ASID
25. bits 19 12 of virtual address Uncached and write through stores uncached stores are buffered in 1 entry uncached store buffer write through not possible uncached and write though stores buffered in 4 entry write buffer SysADC parity only same SysADC for non data cycles parity zero SysCmdP parity zero Parity error during writeback use Cache Error exception output bad parity Error bit in data identifier of read responses Bus Error if error bit set for any dou bleword only check error bit of first doubleword all other error bits ignored Parity error on read data Block writes Bus Error if parity error in any dou bleword 1 2 null cycles between address and data bad parity written to cache take Cache Error exception if bad parity occurs on dou blewords that the processor is waiting for O cycles between address and data Release after Read Request variable latency O latency SysAD value for x cycles of write back data pattern SysAD bus use after last D cycle of writeback data bus undefined data bus undefined data bus maintains last D cycle value trailing x cycles e g DDxxDDxx not DDxxDD follow rule in entry immediately preceding Output slew rate dynamic feedback control simple CMOS output buffers with 2 bit static strength control IOOut output IOIn input output slew rate control feedback loop output output slew rate contro
26. which could cause processor B to read stale data The SYNC in processor B prevents DATA from being read before FLAG which could likewise result in reading stale data For processors which only execute loads and stores in order with respect to shared memory this instruction is a NOP LL and SC instructions implicitly perform a SYNC This instruction is allowed in User mode Operation T SyncOperation Exceptions None A 130 CPU Instruction Set Details SYSCALL System Call Appendix A SYSCALL 31 26 25 6 5 0 SPECIAL Code SYSCALL 000000 001100 6 20 6 Format SYSCALL Description A system call exception occurs immediately and unconditionally transferring control to the exception handler The code field is available for use as software parameters but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction Operation T SystemCallException Exceptions System Call exception A 131 CPU Instruction Set Details Appendix A TEQ Trap If Equal TEQ 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt code TEQ 000000 110100 6 5 5 10 6 Format TEQ rs rt Description The contents of general register rt are compared to general register rs If the contents of general register rs are equal to the contents of general register rt a trap exception occurs The
27. 0 When operating in the kernel mode four distinct virtual address spaces totalling 1024GB 4GB when Status KX 0 are simultaneously available and are differentiated by the high order bits of the virtual address The R4600 R4700 processors also support a supervisor mode in which the virtual address space is 256 5GB 2 5GB when Stauts SX 0 divided into three regions based on the high order bits of the virtual address When the R4600 R4700 uses 64 bit virtual addresses the address space layouts are an upward compatible extension of the 32 bit virtual address space layout A detailed description of the addressing is given in Chapter 4 J oint TLB For fast virtual to physical address decoding the R4600 R4700 uses a large fully associative TLB which maps 96 Virtual pages to their corresponding physical addresses The TLB is organized as 48 pairs of even odd entries and maps a virtual address and address space identifier into the large 64GB physical address space Two mechanisms are provided to assist in controlling the amount of mapped space and the replacement characteristics of various memory regions First the page size can be configured on a per entry basis to map a page size of 4KB to 16MB in multiples of 4 A CPO register is loaded with the page size of a mapping and that size is entered into the TLB when a new entry is written Thus operating systems can provide special purpose maps for example a typical frame buffer
28. 0 1 256 Kbytes 0 0 0 o O 0 1 1 1 1 Mbyte ool of of 1 1 1 1 1 4 Mbytes Gimmes 1 1 4 4 1 1 1 16Mbytes 1 1 1 1 1 1 1 1 1 Table 4 9 Mask Field Values for Page Sizes Memory Management Chapter 4 Wired Register 6 The Wired register is a read write register that specifies the boundary between the wired and random entries of the TLB as shown in Figure 4 13 Wired entries are nonreplaceable entries which cannot be overwritten by a TLB write random operation Random entries can be overwritten Range of Random entries lt Wired Register Range of Wired entries Figure 4 13 Wired Register Boundary The Wired register is set to O upon system reset Writing this register also sets the Random register to the value of its upper bound see Random register above Figure 4 14 shows the format of the Wired register Table 4 10 which follows the figure describes the register fields Wired Register 31 65 0 0 Wired 26 6 Figure 4 14 Wired Register Field Description Wired TLB Wired boundary the number of wired TLB entries Reserved Must be written as zeroes and returns zeroes when read Table 4 10 Wired Register Field Descriptions EntryHi Register CPO Register 10 The EntryHi register holds the high order bits of a TLB entry for TLB read and write operations The EntryHi register is accessed by the TLB Probe TLB Write Ran
29. Bit 31 30 29 28 27 26 0 Lwc2 1 1 0 0 1 0 A J Y Opcode Coprocessor Unit Number CPU Instruction Set Details Appendix A LWL Load Word Lett LWL 31 26 25 21 20 16 15 0 LWL base rt offset 100010 6 5 5 16 Format LWL rt offset base Description This instruction can be used in combination with the LWR instruction to load a register with four consecutive bytes from memory when the bytes cross a word boundary LWL loads the left portion of the register with the appropriate part of the high order word LWR loads the right portion of the register with the appropriate part of the low order word The LWL instruction adds its sign extended 16 bit offset to the contents of general register base to form a virtual address which can specify an arbitrary byte It reads bytes only from the word in memory which contains the specified starting byte From one to four bytes will be loaded depending on the starting byte specified The loaded word is sign extended Conceptually it starts at the specified byte in memory and loads that byte into the high order left most byte of the register then it loads bytes from memory into the register until it reaches the low order byte of the word in memory The least significant right most byte s of the register will not be changed memor laig endan register address 4 4 5 6 7 addressoO 0 1 2 3 before A B C D
30. Cause The Interrupt exception occurs when one of the eight interrupt conditions is asserted The significance of these interrupts is dependent upon the specific system implementation Each of the eight interrupts can be masked by clearing the corresponding bit in the Int Mask field of the Status register and all of the eight interrupts can be masked at once by clearing the IE bit of the Status register Processing The common exception vector is used for this exception and the Int code in the Cause register is set The IP field of the Cause register indicates current interrupt requests It is possible that more than one of the bits can be simultaneously set or even no bits may be set if the interrupt is asserted and then deasserted before this register is read Interrupt exception processing is shown in Figure 5 15 on page 13 Servicing If the interrupt is caused by one of the two software generated exceptions SW1 or SWO the interrupt condition is cleared by setting the corresponding Cause register bit to 0 If the interrupt is hardware generated the interrupt condition is cleared by correcting the condition causing the interrupt pin to be asserted NOTE due to the write buffer a store to an external device will not necessarily occur until after other instructions in the pipeline finish Thus the user must ensure that the store will occur before the return from exception instruction ERET is executed otherwise the interrupt ma
31. ES EE EB 02 SR lt SR31 3 1 IISR4 0 if SRoo 1 then What is the BEV bit setting PC OxFFFF FFFF BFCO 0200 0x100 access boot PROM area else PC lt OxFFFF FFFF A000 0000 0x100 access main memory area endif Figure 5 13 Cache Error Exception Processing Soft Reset and NMI Exception Process Figure 5 14 shows the Soft Reset and NMI exception process T ErrorEPC lt PC SR SR341 23 1 0 1 SR49 3 I 1 SR4 0 PC lt OxFFFF FFFF BFCO 0000 Figure 5 14 Soft Reset and NMI Exception Processing General Exception Process Figure 5 15 shows the process used for exceptions other than Reset Soft Reset NMI and Cache Error T Cause BD 0 CE 01 Cause45 g 0 ExcCode 0 if SR 0 then system in User or Supervisor mode with no current exception EPC PC endif SR e SR34 2 1 SRO if SRoo 1 then What is the BEV bit setting PC lt OxFFFF FFFF BFCO 0200 vector access to uncached space else PC lt OxFFFF FFFF 8000 0000 vector access to cached space endif Figure 5 15 General Exception Processing Except Reset Soft Reset NMI and Cache Error Exception Vector Locations The Reset Soft Reset and NMI exceptions are always vectored to location OxFFFF FFFF BFCO 0000 virtual address corresponding to ksegO Addresses for all other exceptions are a combination of a vector offset and a base a
32. FCR 31 o3 Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Unimplemented operation exception Invalid operation exception Division by zero exception Inexact exception Overflow exception Underflow exception FPU Instruction Set Details Appendix B CVT D fmt Floating Point Convert to Double Floating Point Format CVT D fmt 31 26 25 21 20 16 15 11 10 6 5 0 COP1 fmt 0 fs fd CVT D 010001 00000 100001 6 5 5 5 5 6 Format CVT D fmt fd fs Description The contents of the floating point register specified by fs is interpreted in the specified source format fmt and arithmetically converted to the double binary floating point format The result is placed in the floating point register specified by fd This instruction is valid only for conversions from single floating point format 32 bit or 64 bit fixed point format If the single floating point or single fixed point format is specified the operation is exact The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation T StoreFPR fd D ConvertFmt ValueFPR fs fmt fmt D Exceptions Coprocessor un
33. For more information about jump instructions refer to the individual instruction as described in Appendix A Overview of Branch Instructions All branch instruction target addresses are computed by adding the address of the instruction in the delay slot to the 16 bit offset shifts left 2 bits and is sign extended to 32 bits All branches occur with a delay of one instruction If a conditional branch likely is not taken the instruction in the delay slot is nullified For regular conditional branches the delay slot is always executed For more information about branch instructions refer to the individual instruction as described in Appendix A Special Instructions Special instructions allow the software to initiate traps they are always R type For more information about special instructions refer to the individual instruction as described in Appendix A Exception Instructions Exception instructions are extensions to the MIPS ISA For more information about exception instructions refer to the individual instruction as described in Appendix A Coprocessor Instructions Coprocessor instructions perform operations in their respective coprocessors Coprocessor loads and stores are I type and coprocessor computational instructions have coprocessor dependent formats Individual coprocessor instructions are described in Appendices A for CPO and B for the FPU CP1 CPO instructions perform operations specifically on the System Con
34. RWNT Read Write bits required for Windows NT 0 Reserved Must be written as zeroes returns zeroes when read Table 4 13 Cache Tag Register Fields Memory Management Chapter 4 Virtual to Physical Address Translation Process During virtual to physical address translation the CPU compares the 8 bit ASID if the Global bit G is not set of the virtual address to the ASID of the TLB entry to see if there is a match The following comparison is also made e For the 64 bit virtual addresses the highest 15 to 27 bits depending upon the page size of the virtual address are compared to the con tents of the TLB virtual page number If a TLB entry matches the physical address and access control bits C D and V are retrieved from the matching TLB entry While the V bit of the entry must be set for a valid translation to take place it is not involved in the determination of a matching TLB entry Figure 4 19 illustrates the TLB address translation process Virtual Address Input Note For valid address space VPN see the section in this chapter and that describes Operating Modes ASID G Error Exception Exception Yes y Non a5 TLB TLB XTLB acheable Refill Refill No Exception Exception Access Main Access Memory Cache Physical Address Output Figure 4 19 TLB Address Translation Memory Management Chapter 4 TLB Misses If there is no TLB entry that matches th
35. TLBWrite instructions 0 Reserved Must be written as zeroes and returns zeroes when read Table 4 7 Index Register Field Descriptions Random Register 1 The Random register is a read only register of which six bits index an entry in the TLB This register decrements as each instruction executes and its values range between an upper and a lower bound as follows e A lower bound is set by the number of TLB entries reserved for exclu sive use by the operating system the contents of the Wired register e An upper bound is set by the total number of TLB entries Thus the upper bound is 47 The TLB entries are number from 0 to 47 The R4600 R4700 implements this register differently from the R4000 The R4000 counts both valid and invalid instructions while the R4600 R4700 counts only valid instructions The Random register specifies the entry in the TLB that is affected by the TLB Write Random instruction The register does not need to be read for this purpose however the register is readable to verify proper operation of the processor To simplify testing the Random register is set to the value of the upper bound upon system reset This register is also set to the upper bound when the Wired register is written Figure 4 12 shows the format of the Random register Table 4 8 on page 17 describes the Random register fields Random Register 31 65 0 0 Random 26 6 Figure 4 12 Random Register Memory Ma
36. The contents of general register rs are combined with the contents of general register rt in a bit wise logical NOR operation The result is placed into general register rd Operation T GPR rd lt GPR rs nor GPR rt Exceptions None CPU Instruction Set Details Appendix A OR Or OR 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 OR 000000 00000 100101 6 5 5 5 5 6 Format OR rd rs rt Description The contents of general register rs are combined with the contents of general register rt in a bit wise logical OR operation The result is placed into general register rd Operation T GPRird lt GPR rs or GPR rt Exceptions None A 100 CPU Instruction Set Details ORI Appendix A Or Immediate ORI 31 26 25 21 20 16 15 0 ORI rs rt immediate 001101 6 5 5 16 Format ORI rt rs immediate Description The 16 bit immediate is zero extended and combined with the contents of general register rs in a bit wise logical OR operation The result is placed into general register rt Operation T GPR rt GPRIrs e3 16 immediate or GPR rs 45 0 Exceptions None A 101 CPU Instruction Set Details Appendix A SB Store Byte SB 31 26 25 21 20 16 15 0 SB base rt offset 101000 6 5 5 16 Format SB rt offset base Description The 16
37. and on Inexact exceptions the trap handler gains access to the correctly rounded result by examining source registers and simulating the operation in software On Overflow or Underflow exceptions encountered on floating point conversions and on Invalid Operation and Divide by Zero exceptions the trap handler gains access to the operand values by examining the source registers of the instruction The IEEE Standard 754 recommends that if enabled the overflow and underflow traps take precedence over a separate inexact trap This prioritization is accomplished in software hardware sets the bits for both the Inexact exception and the Overflow or Underflow exception Chapter 8 Processor Signal Descriptions dt Integrated Device Technology Inc Introduction This chapter describes the signals used by and in conjunction with the R4600 R4700 processor The signals include the System interface the Clock Control interface the Interrupt interface the Joint Test Action Group JTAG interface and the Initialization interface Signals are listed in bold and low active signals have a trailing asterisk for instance the low active Read Ready signal is RdRdy The signal description also tells if the signal is an input the processor receives it or output the processor sends it out Figure 8 1 illustrates the functional groupings of the processor signals SysAD 63 0 TClock 1 0 RClock 1 0 MasterClock MasterOut SyncOu
38. bit is set to a 1 if software emulation is required otherwise it remains 0 The other bits are set to O or 1 to indicate the occurrence or non occurrence respectively of an IEEE 754 exception When a floating point exception is taken no results are stored and the only state affected is the Cause bits Exceptions caused by an immediately previous floating point operation can be determined by reading the Cause field Enable Bits A floating point operation that sets an enabled Cause bit forces an immediate exception as does setting both Cause and Enable bits with CTC1 The floating point exception or interrupt is enabled when the corresponding enable be is set There is no enable for Unimplemented Operation E Setting Unimplemented Operation always generates a floating point exception 6 6 Floating Point Unit Chapter 6 Before returning from a floating point exception or doing a CTC1 software must first clear the enabled Cause bits to prevent a repeat of the interrupt Thus User mode programs can never observe enabled Cause bits set if this information is required in a User mode handler it must be passed somewhere other than the Status register For a floating point operation that sets only unenabled Cause bits no exception occurs and the default result defined by IEEE 754 is stored In this case the exceptions that were caused by the immediately previous floating point operation can be determined by reading the Cause fi
39. byte into the low order right most byte of the register then it loads bytes from memory into the register until it reaches the high order byte of the doubleword in memory The most significant left most byte s of the register will not be changed address 8 8 9 10 11 12 113 14 15 address 0 memory big endian register 0 1l2 3l45 g 7 before A B C D E F G H 24 LDR 24 4 0 register after A B C 0 1 2 3 4 24 The contents of general register rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LDR or LDL instruction which also specifies register rt No address exceptions due to alignment are possible CPU Instruction Set Details Appendix A Operation T vAddr lt offset s offsets 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 PAddrs 9 xor ReverseEndian if BigEndianMem 1 then pAddr pAddr34 3 0 endif byte vAddrs o xor BigEndianCPU mem lt LoadMemory uncached byte pAddr vAddr DATA GPRIrt GPRI rt e3 64 8 byte Meme3 8 byte Given a doubleword in a register and a doubleword in memory the operation of LDR is as follows
40. discrete CMOS logic devices can be calculated from the following equation Transmission Time TClock period tps for R4600 R4700 Max External Output Register Clock to Q Delay Max External Clock Buffer Delay Mismatch Max Clock Jitter for R4600 R4700 Internal Clocks Max Clock Jitter for TClock In this clocking methodology the hold time of data driven from the processor to an external sampling register is a critical parameter To guarantee hold time the minimum output delay of the processor tpy must be greater than the sum of the following Min hold time for the external sampling register max clock jitter for R4600 R4700 internal clocks max clock jitter for TClock max delay mismatch of the external clock buffers 10 9 Cache Organization Chapter 11 Operation and Coherency dt Integrated Device Technology Inc Introduction This chapter describes in detail the cache memory its place in the R4600 R4700 memory organization and individual operations of the primary cache This chapter uses the following terminology The primary cache may also be referred to as the P cache The primary data cache may also be referred to as the D cache The primary instruction cache may also be referred to as the I cache These terms are used interchangeably throughout this book Memory Organization Figure 11 1 shows the R4600 R4700 system memory hierarchy In the logical memory hierarchy
41. even and odd register numbers are valid Operation T StoreFPR fd fmt ValueFPR fs fmt ValueFPR ft fmt Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Unimplemented operation exception Invalid operation exception Inexact exception Overflow exception Underflow exception FPU Instruction Set Details Appendix B BCIF Paepae BC1F 31 26 25 21 20 1615 0 COP1 BC BCF offset 010001 01000 00000 5 5 16 Format BCIF offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the result of the last floating point compare is false the program branches to the target address with a delay of one instruction Operation T 1 condition not COC 1 T target lt offset offset 07 T 1 if condition then PC amp PC target endif Exceptions Coprocessor unusable exception FPU Instruction Set Details BC1FL Appendix B Branch On FPU False Likely BC1 FL Coprocessor 1 31 26 25 21 20 1615 0 COP1 BC BCFL offset 010001 01000 00010 6 5 5 16 Format BCI1FL offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign exte
42. gN Inexact Operation Underflow Overflow Division by Zero Invalid Operation Unimplemented Operation Floating Point Exceptions Chapter 7 Each of the five IEEE Standard 754 exceptions V Z O U I is associated with a trap under user control and is enabled by setting one of the five Enable bits When an exception occurs and its corresponding Enable bit is not set both the corresponding Cause and Flag bits are set When an exception occurs and its corresponding Enable bit is set the corresponding Cause bit is set and the subsequent exception processing allows a trap to be taken Exception Trap Processing When a floating point exception trap is taken the Cause register indicates the floating point coprocessor is the cause of the exception trap The Floating Point Exception FPE code is used and the Cause bits of the floating point Control Status register indicate the reason for the floating point exception These bits are in effect an extension of the system coprocessor Cause register Flags A Flag bit is provided for each IEEE exception This Flag bit is set toa 1 on the assertion of its corresponding exception with no corresponding exception trap signaled The Flag bit is reset by writing a new value into the Status register flags can be saved and restored by software either individually or as a group When no exception trap is signaled the floating point coprocessor takes a default action providing a subs
43. ms and is expected to remain stable The assertion of VCCOk initiates the reading of the boot time mode control serial stream This is described in the subsection Initialization Sequence on page 9 4 ColdReset The ColdReset signal must be asserted low for either a power on reset or a cold reset The clocks SClock TClock and RClock begin to cycle and are synchronized with the de asserted edge high of ColdReset ColdReset must be de asserted synchronously with MasterClock Reset The Reset signal must be asserted for any reset sequence It can be asserted synchronously or asynchronously for a cold reset or synchronously to initiate a warm reset Reset must be de asserted synchronously with MasterClock Modeln Serial boot mode data in ModeClock Serial boot mode data out at the MasterClock frequency divided by 256 MasterClock 256 l Asserted means the signal is true or in its valid state For example the low active Reset signal is said to be asserted when it is in a low true state the high active VCCOKk signal is true when it is asserted high 9 1 Initialization Interface Chapter 9 Table 9 1 lists the processor signals and their possible states Description Name I O Asserted State 3 State Reset State System address data bus SysAD 63 0 I O High Yes a System address data check bus SysADC 7 0 I O High Yes a System command data identifier bus SysCmd 8 0 I O H
44. offset s 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr lt pAddrpgize 1 3 Addr 9 xor ReverseEndian 07 byte vAddro 9 xor BigEndianCPU 0 data COPzSW byte rt StoreMemory uncached WORD data pAddr vAddr DATA Note See the table in this section under Opcode Bit Encoding Also see CPU Instruction Opcode Bit Encoding at the end of Appendix A Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception Coprocessor unusable exception Reserved instruction exception coprocessor 3 Opcode Bit Encoding swei 1 1 1 0 0 1 Bit 31 30 29 28 27 26 0 swce 1 1 ane 0 DA yY SW opcode Coprocessor Unit Number A 125 CPU Instruction Set Details Appendix A SWL Store Word Left SWL 0 31 26 25 21 20 16 15 SWL base rt offset 101010 6 5 5 16 Format SWL rt offset base Description This instruction can be used with the SWR instruction to store the contents of a register into four consecutive bytes of memory when the bytes cross a word boundary SWL stores the left portion of the register into the appropriate part of the high order word of memory SWR stores the right portion of the register into the appropriate part of the low order word The SWL instruction adds its sign extended 16 bit offset to the contents of general reg
45. offset4s offset 0 condition GPRirs g3 0 GPR 31 PC 8 T 1 if condition then PC PC target else NullifyCurrentInstruction endif Exceptions None CPU Instruction Set BGEZL Details Branch On Greater Than Or Equal To Zero Likely Appendix A BGEZL BGEZL rs offset Description 31 26 25 21 20 16 15 0 REGIMM rs BGEZL offset 000001 00011 6 5 5 16 Format A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the contents of general register rs have the sign bit cleared then the program branches to the target address with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation T target e offset 5 offset 07 condition GPR rs g3 0 T 1 if condition then PC PC target else NullifyCurrentinstruction endif Exceptions None CPU Instruction Set Details BGTZ Branch On Greater Than Zero Appendix A BGTZ BGTZ rs offset Description 31 26 25 21 20 16 15 0 BGTZ rs 0 offset 000111 00000 6 5 5 16 Format A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended The contents o
46. 0 LO j Program Counter Figure 1 2 R4600 R4700 CPU Registers Two of the CPU general purpose registers have assigned functions e rO is hardwired to a value of zero and can be used as the target reg ister for any instruction whose result is to be discarded rO can also be used as a source when a zero value is needed e r31 is used as an implicit return destination address register by the JAL and BAL series of instructions The CPU has three special purpose registers e PC Program Counter register e HI Multiply and Divide register higher result e LO Multiply and Divide register lower result The two Multiply and Divide registers HI LO store e the product of integer multiply operations or e the quotient in LO and remainder in HI of integer divide operations The R4600 R4700 processor has no Program Status Word PSW register as such this is covered by the Status and Cause registers incorporated within the System Control Coprocessor CPO CPO registers are described later in this chapter Overview Chapter 1 CPU Instruction Set Overview Each CPU instruction is 32 bits long As shown in Figure 1 3 there are three instruction formats e immediate I type jump J type e register R type 31 2625 2120 1615 l Type Immediate op rs rt immediate 31 26 25 J Type Jump op target 31 26 25 2120 1615 11 10 65 0 Figure 1 3 CPU Instruction Forma
47. 14 External Data Read Response Time The R4600 R4700 accesses the external bus due to data cache miss or an uncached reference The length of time for an external read is based on the overhead at the beginning and end of the read along with the time to drive the address and get the response data 12 17 System Interface Chapter 12 Data Read Latency Steps for System Clock The read latency for a system clock in the divide by two mode is as follows 1 The startup overhead is one to two pipeline cycles PCycle for the CPU to generate the parity for the address to be output The second PCycle is needed if the miss is detected or a PCycle not aligned with the rising edge of SClock 2 The CPU drives the address on the SysAD bus for two PCycles 3 The CPU tri states the SysAD bus for two PCycles 4 The CPU waits for the main memory to return the data This is expressed as n x 2 PCycles where nis the number of SClock cycles for the first data to be returned in a block read or the latency for the single read For zero wait state memory system n should be zero 5 The first double word is driven in the SysAD from the main memory for two PCycles 6 The end of the overhead is two PCycles one to transfer the data from the pads and generate the parity and one to write to the register or cache if it is cacheable data Notes on the Data Read Latency Steps a If n 0 and the line being replaced is dirty the CPU takes one to two a
48. 14 The Exception Program Counter EPC is a read write register that contains the address at which processing resumes after an exception has been serviced For synchronous exceptions the EPC register contains either e the virtual address of the instruction that was the direct cause of the exception or e the virtual address of the immediately preceding branch or jump in struction when the instruction is in a branch delay slot and the Branch Delay bit in the Cause register is set The processor does not write to the EPC register when the EXL bit in the Status register is set to a 1 Figure 5 7 shows the format of the EPC register EPC Register EPC 64 Figure 5 7 EPC Register Format CPU Exception Processing Chapter 5 XContext Register 20 The read write XContext register contains a pointer to an entry in the page table entry PTE array an operating system data structure that stores virtual to physical address translations When there is a TLB miss the operating system software loads the TLB with the missing translation from the PTE array The XContext register duplicates some of the information provided in the BadVAdadr register and puts it in a form useful for a software TLB exception handler The XContext register is for use with the XTLB refill handler which loads TLB entries for references to a 64 bit address space and is included solely for operating system use The operating system sets the PTE
49. 16 bit immediate is sign extended and compared to the contents of general register rs Considering both quantities as signed integers if the contents of general register rs are less than the sign extended immediate a trap exception occurs Operation T if 0 GPR rs lt 0 immediate 5 4 immediate 5 9 then TrapException endif Exceptions Trap exception A 144 CPU Instruction Set Details TLTU Trap If Less Than Unsigned Appendix A TLTU 31 26 25 21 20 16 15 0 SPECIAL rs rt code TLTU 000000 110011 6 5 5 10 6 Format TLTU rs rt Description The contents of general register rt are compared to general register rs Considering both quantities as unsigned integers ifthe contents of general register rs are less than the contents of general register rt a trap exception occurs The code field is available for use as software parameters but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction Operation T if 0 GPR rs lt 0 GPR rt then TrapException endif Exceptions Trap exception A 145 CPU Instruction Set Details TNE Trap If Not Equal Appendix A TNE 31 26 25 21 20 16 15 0 SPECIAL rs rt code TNE 000000 110110 6 5 5 10 6 Format TNE rs rt Description The contents of general register rt are compared t
50. 1615 AY pyte _ 9 1 AEE T Halfword Byte l Bt 7 6543 21 Ol PE Te Bits in a Byte Figure 1 7 Big Endian Data in a Doubleword The CPU uses byte addressing for halfword word and doubleword accesses with the following alignment constraints e Halfword accesses must be aligned on an even byte boundary 0 2 4 e Word accesses must be aligned on a byte boundary divisible by four 0 4 8 e Doubleword accesses must be aligned on a byte boundary divisible by eight 0 8 16 The following special instructions load and store words that are not aligned on 4 byte word or 8 word doubleword boundaries LWL LWR SWL SWR LDL LDR SDL SDR These instructions are used in pairs to provide addressing of misaligned words Addressing misaligned data incurs one additional instruction cycle over that required for addressing aligned data This extra cycle is because of an extra instruction for the pair e g LWL and LWR form a pair Also note that the CPU moves the unaligned data at the same rate as a hardware mechanism Overview Chapter 1 Figures 1 8 and 1 9 show the access of a misaligned word that has byte address 3 Higher Address Bit l 24 23 1615 5 Lower Address Figure 1 8 Big Endian Misaligned Word Addressing Higher Address Bit 31 24 23 1615 Lower Add
51. 2 NOPDEFGH 2 0 5 KLMNOPGH 5 0 2 3 MNOPEFGP 3 0 4 LMNOPFGH 4 0 3 4 L MNOPFGH 4 0 3 MNOPEFGH 3 0 4 5 KLMNOPGH 5 0 2 NOPDEFGH 2 0 5 6 JKLMNOPH 6 0 1 OPCDEFGH 1 0 6 7 I J KLMNOP 7 0 0 PBCDEFGH 0 0 7 LEMLittle endian memory BigEndianMem 0 BEMBigEndianMem 1 TypeAccessType see Table 2 1 on page 3 sent to memory OffsetpAddry 9 sent to memory Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Detail S Appendix A LDR Load Doubleword Right LD R 31 26 25 21 20 16 15 0 LDR base rt offset 011011 6 5 5 16 Format LDR rt offset base Description This instruction can be used in combination with the LDL instruction to load a register with eight consecutive bytes from memory when the bytes cross a doubleword boundary LDR loads the right portion of the register with the appropriate part of the low order doubleword LDL loads the left portion of the register with the appropriate part of the high order doubleword The LDR instruction adds its sign extended 16 bit offset to the contents of general register base to form a virtual address which can specify an arbitrary byte It reads bytes only from the doubleword in memory which contains the specified starting byte From one to eight bytes will be loaded depending on the starting byte specified Conceptually it starts at the specified byte in memory and loads that
52. 20 16 15 11 10 6 5 0 fmt 0 fs fd CEIL W 010001 00000 001110 5 5 5 5 6 Format CEIL W fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed point format The result is placed in the floating point register specified by fd Regardless of the setting of the current rounding mode the conversion is rounded as if the current rounding mode is round to 2 This instruction is valid only for conversion from a single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid When the source operand is an Infinity or NaN or the correctly rounded integer result is outside of 2 to 231 1 the Invalid operation exception is raised If the Invalid operation is not enabled then no exception is taken and 231 1 is returned Operation T StoreFPR fd W ConvertFmt ValueFPR fs fmt fmt W Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU Instruct
53. 24 LWL 24 1 0 after 1 2 3 D 24 The contents of general register rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LWL or LWR instruction which also specifies register rt No address exceptions due to alignment are possible CPU Instruction Set Details Appendix A Operation T vAddr lt offset 5 offset s 9 GPR base pAddr uncached lt AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 OAddre o xor ReverseEndian if BigEndianMem 0 then pAddr pAddrpgize 1 s 0 endif byte vAddr o xor BigEndianCPU word lt vAddro xor BigEndianCPU mem lt LoadMemory uncached 0 byte pAddr vAddr DATA temp lt MemM31432 word 8 byte 32 word GPRIrt o3 8 byte o GPRi rt temp3 temp Given a doubleword in a register and a doubleword in memory the operation of LWL is as follows LWL Register A B C D E F G H Memory J KI L MINJ O p BigEndianCPU 0 BigEndianCPU 1 vAddro o destination type ofset destination type offset LEM BEM LEM BEM 0 SSSSPFGH 0 0 7 SSSSIJKLI 3 4 0 1 SSSSOPGH 1 0 6 SSSSJKLH 2 4 1 2 SSSSNOPH 2 0 5 SSSSKLGH 1 4 2 3 SSSSMNOP 3 0 4 SSSSLEFGH 0 4 3 4 SSSSLFGH 0 4 3 SSSSMNOP 83 0 4 5 SSSSKLGH 1 4
54. 24 1 0 15 H 10 11 12 13 14 C D E F G after Operation T vAddr lt offset s offset 15 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 PAddre o xor ReverseEndian If BigEndianMem 0 then pAddr lt pAddr34 3 0 endif byte vAddrs 9 xor BigEndianCPU data 0 Y GPRIrtles 56 s byte Storememory uncached byte data pAddr vAddr DATA A 107 CPU Instruction Set Details Appendix A Given a doubleword in a register and a doubleword in memory the operation of SDL is as follows SDL Register A B C D E F G H Memory J K L M N O P BigEndianCPU 0 BigEndianCPU 1 offset offset vAddr3 o destination type LEM BEM destination type LEM BEM 0 I JKLMNOAJ 0 0 7 ABCDEFGH 7 0 0 1 I J KLMNAB 1 0 6 I ABCDEFG 6 O 1 2 I J KLMABC 2 0 5 I1JABCODEF 5 0 2 3 I JKLABCD 3 0 4 I1J3JKABCODE 4 0 3 4 I J KABCDE 4 0 3 1JKLABCD 3 0 4 5 I JABCDEF 5 0 2 I1JKLMABC 2 0 5 6 I ABCDEFG 6 0o 1 I J KLMNA B 1 0 6 7 ABCDEFGH 7 0 O I1JKLMNOA 0 0 7 LEM Little endian memory BigEndianMem 0 BEM BigEndianMem 1 Type AccessType see Table 2 1 on page 2 3 sent to memory Offset pAddrg o sent to memory Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Ad
55. 31 26 25 21 20 16 15 0 BLEZ rs 0 offset 000110 00000 6 5 5 16 Format BLEZ rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended The contents of general register rs are compared to zero If the contents of general register rs have the sign bit set or are equal to zero then the program branches to the target address with a delay of one instruction Operation T target lt offset offset 07 condition GPR rs g3 1 and GPR rs 064 T 1 if condition then PC lt PC target endif Exceptions None CPU Instruction Set Details BLEZL Branch on Less Than Or Equal To Zero Likely Appendix A BLEZL 31 26 25 21 20 16 15 0 BLEZL rs 0 offset 010110 00000 6 5 5 16 Format BLEZL rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended The contents of general register rs is compared to zero If the contents of general register rs have the sign bit set or are equal to zero then the program branches to the target address with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation target lt offset
56. 5 0 COP1 fmt 0 fs fd ROUND W 010001 00000 001100 6 5 5 5 5 6 Format ROUND W fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed point format The result is placed in the floating point register specified by fd Regardless of the setting of the current rounding mode the conversion is rounded as if the current rounding mode is round to the nearest even RM 0 This instruction is valid only for conversion from a single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid When the source operand is an Infinity or NaN or the correctly rounded integer result is outside of 231 to 234 1 an Invalid operation exception is raised If Invalid operation is not enabled then no exception is taken and 231 1 is returned Operation ake StoreFPR fd W ConvertFmt ValueFPR fs fmt fmt W Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU In
57. 5 9 then TrapException endif Exceptions Trap exception A 136 CPU Instruction Set Details Appendix A TG E U Trap If Greater Than Or Equal Unsigned TG E U 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt code TGEU 000000 110001 6 5 5 10 6 Format TGEU rs rt Description The contents of general register rt are compared to the contents of general register rs Considering both quantities as unsigned integers if the contents of general register rs are greater than or equal to the contents of general register rt a trap exception occurs The code field is available for use as software parameters but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction Operation T if 0 GPR rs 0 GPR rt then TrapException endif Exceptions Trap exception A 137 CPU Instruction Set Details Appendix A TLBP Probe TLB For Matching Entry TLBP 31 26 25 24 6 5 0 COPO CO 0 TLBP 010000 1 000 0000 0000 0000 0000 001000 6 1 19 6 Format TLBP Description The Index register is loaded with the address of the TLB entry whose contents match the contents of the EntryHi register If no TLB entry matches the high order bit of the Index register is set The architecture does not specify the operation of memory references associated with the instruction immediately after a TLBP instruction nor
58. 6 5 5 5 5 6 Format SRA rd rt sa Description The contents of general register rt are shifted right by sa bits sign extending the high order bits The result is placed in register rd The operand must be a valid sign extended 32 bit value Operation T s lt 0O lsa temp lt GPR rt 34 GPR rt 31 5 GPR rd temp3 temp Exceptions None A 118 CPU Instruction Set Details Appendix A SRAV Arithmetic Vartable SRAV 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 SRAV 000000 00000 000111 6 5 5 5 5 6 Format SRAV rd rt rs Description The contents of general register rt are shifted right by the number of bits specified by the low order five bits of general register rs sign extending the high order bits The result is placed in register rd The operand must be a valid sign extended 32 bit value Operation T S lt GPRIrs q 0 temp lt GPRIrt 3 GPR rt 3 _ GPR rd temp3 temp Exceptions None A 119 CPU Instruction Set Details SRL Shift Right Logical Appendix A SRL 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL 0 rt rd sa SRL 000000 00000 000010 6 5 5 5 5 6 Format SRL rd rt sa Description The contents of general register rt are shifted right by sa bits inserting zeros into the high order bits The result is placed in register rd The operand
59. 7 Bae Byte 0 o lo fo O JO 1 O 1 JO O 1 1 1 1O O 1 O 1 1 1 10 1 1 1 Table 2 1 Byte Access within a Doubleword CPU Instruction Set Summary Chapter 2 Computational Instructions Computational instructions can be either 1 in register R type format in which both operands are registers or 2 in immediate I type format in which one operand is a 16 bit immediate Computational instructions perform the following operations on register values e arithmetic e logical e shift e multiply e divide These operations fit in the following four categories of computational instructions e ALU Immediate instructions e three Operand Register Type instructions e shift instructions e multiply and divide instructions 64 bit Virtual Address Operations with 32 bit operands Operands to 32 bit operand opcodes must be in sign extended form 32 bit operand opcodes include all non doubleword operations such as ADD ADDU SUB SUBU ADDI SLL SRL SRA SLLV etc The result of operations that use incorrect sign extended 32 bit values is unpredictable Cycle Timing for Multiply and Divide Instructions MFHI and MFLO instructions described in Appendix A are interlocked so that any attempt to read them before prior multiply or divide instructions complete delays the execution of these instructions until the prior instructions finish Table 2 2 gives the number of processor cycles PCycles required t
60. A 63 62 119 1 ckseg3 OxFFFF FFFF E000 0000 512 A 61 31 1 through Mbytes OxFFFF FFFF FFFF FFFF 22 bytes Table 4 4 64 bit Kernel Mode Segments 64 bit Kernel Mode User Space xkuseg In Kernel mode when Status KX 1 and bits 63 62 of the 64 bit virtual address are 00 the xkuseg virtual address space is selected it covers the current user address space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address As a special feature for the ECC handler if the ERL bit of the Status register is set the user address region becomes a 231 byte unmapped uncached space This allows the ECC exception code to operate uncached using rO as a base register 64 bit Kernel Mode Current Supervisor Space xksseg In Kernel mode when Status KX 1 and bits 63 62 of the 64 bit virtual address are Olo the xksseg virtual address space is selected it is the current supervisor virtual space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address Memory Management Chapter 4 64 bit Kernel Mode Physical Spaces xIcphys In Kernel mode when Status KX 1 and bits 63 62 of the 64 bit virtual address are 109 the xkphys virtual address space is selected it is a set of eight 2 byte kernel physical spaces Accesses with address bits 58 36 not equal to O cause an address error References to this space are n
61. BC1TL 1 R4600 8 8 R4700 4 5 DIV fmt 32 61 BC1FL 1 SQRT fmt 31 60 LWC1 LDC1 2 ABS fmt 1 1 SWC1 SDC1 1 MOV fmt 1 1 TRUNC Wfmt 4 4 NEG fmt 1 1 MTC1 DMTC1 2 ROUND Wfmt 4 4 MFC1 DMFC1 2 CEIL Wfmt 4 4 CTC1 3 FLOOR Wfmt 4 4 CFC1 2 CVT S fmt 4 CMP CVT D fmt 2 a FIX 4 CVT Wfmt 4 4 FLOAT C fmt cond 3 Note These operations are illegal Table 6 14 Floating Point Operation Latencies Floating Point Unit Chapter 6 Instruction Scheduling Constraints The FPU resource scheduler only issues instructions to the FPU op units adder and multiplier when no hardware use conflicts will occur In addition some overlap possibilities are disallowed to keep the scheduler simple and or increase performance FPU Multiplier Constraints The FPU multiplier is partially pipelined in the R4600 allowing a new multiply to begin every 6 cycles It is more fully pipelined in the R4700 allowing a new multiply to begin every 4 cycles FPU Adder Constraints The FPU scheduler may issue an add operation ADD fmt or SUB fmt 2 cycles after a multiply MUL fmt Resource Scheduling Rules The FPU Resource Scheduler issues instructions while adhering to the rules described below These scheduling rules optimize op unit executions if the rules are not followed the hardware interlocks to guarantee correct operation DIV S D can start only when all of the following conditions are met in the 1A phase e The adder is idle d
62. BCzTL offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the contents of coprocessor z s condition signal as sampled during the previous instruction is true the target address is branched to with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Because the internal condition signal is sampled during the previous instruction there must be at least one instruction between this instruction and a coprocessor instruction that changes the internal condition signal Operation T 1 condition COC z T target lt offset gt offset 0 T 1 if condition then PC PC target else NullifyCurrentinstruction endif NOTE See the table Opcode Bit Encoding on next page or CPU Instruction Opcode Bit Encoding at the end of Appendix A Exceptions Coprocessor unusable exception Opcode Bit Encoding BCZTL Bit 31 30 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 BcoTL 9 1 0 0 0 00 1 0 0 jojojo oj1 1 Bit 3130 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 Bawl a 1 o o o 1 fo 1 olofofofolo 1 Bit 3130 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 ec 9 1 0 0 1 0 o 1 0 0 jojofo j oj1 1 a v vyv yY Opcode BC sub opcode Branch condition Coprocessor Unit Number
63. Big In User mode this endianness may be reversed by setting SRos Thus BigEndianCPU may be computed as BigEndianMem XOR ReverseEndian LLbit Bit of state to specify synchronization instructions Set by LL cleared by ERET and Invalidate and read by SC T i Indicates the time steps between operations Each of the statements within a time step are defined to be executed in sequential order as modified by conditional and loop constructs Operations which are marked 7 are executed at instruction cy cle relative to the start of execution of the instruction Thus an instruction which starts at time j executes operations marked T at time i j The interpretation of the order of execution between two instructions or two operations which execute at the same time should be pessimistic the order is not defined Table A 1 CPU Instruction Operation Notations CPU Instruction Set Details Appendix A Instruction Notation Examples The following examples illustrate the application of some of the instruction notation conventions Example 1 GPR rt lt _ immediate 016 Sixteen zero bits are concatenated with an immediate value typically 16 bits and the 32 bit string with the lower 16 buts set to zero is assigned to General Purpose Register rt Example 2 immediate ek immediatej5 6 Bit 15 the sign bit of an immediate value is extended for 16 bit positions and the result is conc
64. Chapter 12 SCycle SClock SysAD Bus SysCmd Bus ValidOut Validin ExtRast Release Figure 12 21 shows a timing diagram of an external read request which consists of the following steps 1 An external agent asserts ExtRqst to arbitrate for the system interface 2 The processor releases the system interface to slave state by asserting Release for one cycle and then deasserting Release 3 After Release is deasserted the SysAD and SysCmd buses are set to a tri state for one cycle 4 The external agent drives a read request command on the SysCmd bus and a read request address on the SySAD bus and asserts ValidIn for one cycle 5 After the address and command are sent the external agent releases the SysCmd and SysAD buses by setting them to tri state and allowing the processor to drive them The processor having accessed the data that is the target of the read returns this data to the external agent The processor accomplishes this by driving a data identifier on the SysCmd bus the response data on the SysAD bus and asserting ValidOut for one cycle The data identifier indicates that this is last data cycle response data 6 The system interface is in master state The processor continues driving the SysCmd and SysAD buses after the read response is returned Note Timings for the SysADC and SysCmdP buses are the same as those of the SysAD and SysCmd buses respectively External read requests are only allowed to
65. ColdReset Functional Overview The R4600 R4700 processor has the following three types of resets Refer to Figure 9 1 on page 9 4 Figure 9 2 on page 9 5 and Figure 9 3 on page 9 6 for timing diagrams of these resets e Power on reset Starts when the power supply is turned on and completely reinitializes the internal state machine of the processor without saving any state information e Cold reset Restarts all clocks but the power supply remains stable A cold reset completely reinitializes the internal state machine of the processor without saving any state information Warm reset Restarts processor but does not affect clocks A warm reset preserves the processor internal state These resets use the VCCOk ColdReset and Reset input signals which are summarized in the next subsection Descriptions of each type of reset operation is described The Initialization interface is a serial interface that operates at the frequency of the MasterClock divided by 256 i e MasterClock 256 This low frequency operation allows the initialization information to be stored in a low cost EPROM or PLD Reset and Initialization Signal Descriptions This section describes the three reset signals VCCOk ColdReset and Reset and the two initialization signals Modeln and ModeClock VCCOk When asserted VCCOk indicates to the processor that the 5 0 3 3 volt power supply Vcc has been above 4 75 3 0 volts for more than 100 milliseconds
66. ECC register DE Specifies that cache parity errors cannot cause exceptions O gt parity remains enabled 1 gt disables parity 0 Reserved Must be written as zeroes and returns zeroes when read IM Interrupt Mask controls the enabling of each of the external internal and software inter rupts An interrupt is taken if interrupts are enabled and the corresponding bits are set in both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause register IM 7 2 correspond to interrupts Int 5 0 and IM 1 0 to the software interrupts 0 gt disabled l gt enabled KX KX controls whether the TLB Refill Vector or the XTLB Refill Vector address is used for TLB misses on kernel addresses O gt TLB Refill Vector 1 XTLB Refill Vector SX Enables 64 bit virtual addressing and operations in Supervisor mode The extended address ing TLB refill exception is used for TLB misses on supervisor addresses 0 gt 32 bit 1 gt 64 bit UX Enables 64 bit virtual addressing and operations in User mode The extended addressing TLB refill exception is used for TLB misses on user addresses 0 gt 32 bit 1 gt 64 bit KSU Mode bits 10 User Ol Supervisor 00 Kernel ERL Error Level O0 normal 1 gt error EXL Exception Level O gt normal 1 exception Note When going from 0 to 1 IE should be disabled 0 first This would be done when pre paring to return from the exception handler such as be
67. Exception Handling and Servicing Flowcharts 5 32 Floating Point Unit Chapter 6 Overview 6 1 The R4600 R4700 Floating Point Coprocessor 6 1 FPU Features 6 2 FPU Programming Model 6 2 Floating Point General Registers FGRs 6 2 Floating Point Registers 6 3 Floating Point Control Registers 6 3 Implementation and Revision Register FCRO 6 4 Control Status Register FCR31 6 4 Accessing the Control Status Register 6 6 IEEE Standard 754 6 6 Control Status Register FS Bit 6 6 Control Status Register Condition Bit 6 6 Control Status Register Cause Flag and Enable Fields 6 6 Cause Bits 6 6 Enable Bits 6 6 Flag Bits 6 7 Control Status Register Rounding Mode Control Bits 6 7 Floating Point Formats 6 7 Binary Fixed Point Format 6 9 Floating Point Instruction Set Overview 6 10 Floating Point Load Store and Move Instructions 6 11 Transfers Between FPU and Memory 6 11 Transfers Between FPU and CPU 6 11 Load Delay and Hardware Interlocks 6 12 Data Alignment 6 12 Endianness 6 12 Floating Point Conversion Instructions 6 12 Floating Point Computational Instructions 6 12 Branch on FPU Condition Instructions 6 12 Floating Point Compare Operations 6 12 FPU Instruction Pipeline Overview 6 13 Instruction Execution 6 13 Instruction Execution Cycle Time 6 14 Instruction Scheduling Constraints 6 15 FPU Multiplier Constraints 6 15 FPU Adder Constraints 6 15 Resource Scheduling Rules 6 15 vi Table of Contents Table of Contents Floating Point Ex
68. Floating Point Subtract SUB fmt 31 26 25 21 20 16 15 11 10 6 5 0 fmt ft fs fd SUB 010001 000001 5 5 5 5 6 Format SUB fmt fd fs ft Description The contents of the floating point registers specified by fs and ft are interpreted in the specified format and arithmetically subtracted The result is rounded as if calculated to infinite precision and then rounded to the specified format according to the current rounding mode The result is placed in the floating point register specified by fd This instruction is valid only for single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation T StoreFPR fd fmt ValueFPR fs fmt ValueFPR ft fmt Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Unimplemented operation exception Invalid operation exception Inexact exception Overflow exception Underflow exception FPU Instruction Set Details Appendix B Swot seer swe 31 26 25 21 20 16 15 0 SWC1 base ft offset 111001 6 5 5 16 Format SWC 1 ft offset base Description
69. Load Doubleword Il LDL Load Doubleword Left Ill LDR Load Doubleword Right Ill LL Load Linked II LLD Load Linked Doubleword Ill LWU Load Word Unsigned Ill SC Store Conditional II SCD Store Conditional Doubleword Ill SD Store Doubleword Il SDL Store Doubleword Left Ill SDR Store Doubleword Right Ill SYNC Sync II Table 1 9 MIPS 2 MIPS 3 Additional Load and Store Instructions OpCode Description MIPS ISA Level DADDI Doubleword Add Immediate Ill DADDIU Doubleword Add Immediate Ill Unsigned Table 1 10 MIPS 2 MIPS 3 Additional Arithmetic Instructions ALU Immediate OpCode Description MIPS ISA Level DMULT Doubleword Multiply Ill DMULTU Doubleword Multiply Unsigned MI DDIV Doubleword Divide Ill DDIVU Doubleword Divide Unsigned Ill Table 1 11 MIPS 2 MIPS 3 Additional Multiply and Divide Instructions Overview Chapter 1 MIPS ISA Level OpCode_ Description BEQL Branch on Equal Likely II BNEL Branch on Not Equal Likely II BLEZL Branch on Less Than or Equal to Zero II Likely BGTZL Branch on Greater Than Zero Likely II BLTZL Branch on Less Than Zero Likely II BGEZL Branch on Greater Than or Equal to Zero II Likely BLTZALL Branch on Less Than Zero And Link II Likely BGEZALL Branch on Greater Than or Equal to Zero II And Link Likely BCzTL Branch on Coprocessor z True Likely II BCzFL Branch on Coprocessor z False Likel
70. NEG Negate 8 ROUND L Convert to single fixed point rounded to nearest even 9 TRUNC L Convert to single fixed point rounded toward zero 10 CEIL L Convert to single fixed point rounded to 11 FLOOR L Convert to single fixed point rounded to gt 12 ROUND W Convert to single fixed point rounded to nearest even 13 TRUNC W Convert to single fixed point rounded toward zero 14 CEIL W Convert to single fixed point rounded to oo 15 FLOOR W Convert to single fixed point rounded to 16 31 Reserved 32 CVT S Convert to single floating point 33 CVT D Convert to double floating point 34 Reserved 35 Reserved 36 CVT W Convert to 32 bit binary fixed point 37 CVT L Convert to 64 bit binary fixed point 38 47 Reserved 48 63 Cc Floating point compare Table B 5 Floating Point Instructions and Operations In the following pages the notation FGR refers to the 32 General Purpose registers FGRO through FGR31 of the FPU and FPR refers to the floating point registers of the FPU e When the FR bit in the Status register SR 26 equals zero only the even floating point registers are valid and the 32 General Purpose reg isters of the FPU are 32 bits wide e When the FR bit in the Status register SR 26 equals one both odd and even floating point registers may be used and the 32 General Pur pose registers of the FPU are 64 bits wide The following routines are used in the description of the floating point operations to ret
71. Operation T if rd 0 then data lt CPR z d4 1 0131 0 else data lt CPR Z rd4_ 4 0l63 32 endif T 1 GPR rt datag data Exceptions Coprocessor unusable exception Reserved instruction exception coprocessor 3 Opcode Bit Encoding MFCz Bt 31 30 29 28 27 26 25 24 23 2221 0 Meco 9 1 0 o o o of o oj of o Bit 31 30 29 28 27 26 25 24 23 22 21 0 mro 9 1 0 0 0 1 0 0 o0 0 0 Bit 31 30 29 28 27 26 25 24 23 22 21 0 Mece 0 1 o o 1 o o0 o0 o0 o0 o Ve Goce Opcode Coprocessor Suboperation Coprocessor Unit Number CPU Instruction Set Details Appendix A MEH Move From HI MEHI 31 26 25 16 15 1110 6 5 0 SPECIAL 0 rd 0 MFHI 000000 00 0000 0000 00000 010000 6 10 5 5 6 Format MFHI rd Description The contents of special register HI are loaded into general register rd To ensure proper operation in the event of interruptions the two instructions which follow a MFHI instruction may not be any of the instructions which modify the HI register MULT MULTU DIV DIVU MTHI DMULT DMULTU DDIV DDIVU Operation T GPR rd lt HI Exceptions None CPU Instruction Set Details MFLO Move From Lo Appendix A MFLO The contents of special register LO are loaded into general register rd
72. Pipeline Load Delay The CPU Pipeline Chapter 3 Interlock and Exception Handling Smooth pipeline flow is interrupted when cache misses or exceptions occur or when data dependencies are detected Interruptions handled using hardware such as cache misses are referred to as interlocks while those that are handled using software are called exceptions There are two types of interlocks e stalls which are resolved by halting the pipeline e slips which require the back end of the pipeline to advance while the front end of the pipeline is held static At each cycle exception and interlock conditions are checked for all active instructions Because each exception or interlock condition corresponds to a particular pipeline stage a condition can be traced back to the particular instruction in the exception interlock stage as shown in Figure 3 5 For instance a Reserved Instruction RI exception is raised in the execution A stage Pipeline Stage State l R A D Ww Stall ITM ICM DCM CPE l R A D Ww Slip LDI MDSt FCBsy l R A D Ww Exceptions ITLB IBE RI DBE IPErr CUn NMI BP Reset SC DPErr DTLB OVF TLBMod Trap Intr Figure 3 5 Correspondence of Pipeline Stage to Interlock Condition For a description of the pipeline interlocks and exceptions listed in Figure 3 5 refer to Table 3 1 and Table 3 2 which follow The CPU Pipeline Chapter 3
73. Point Computational Instructions Computational instructions perform arithmetic operations on floating point values in registers Table 6 11 on page 11 lists the computational instructions and Appendix B provides a detailed description of each instruction There are two categories of computational instructions e 3 Operand Register Type instructions which perform floating point addition subtraction multiplication division and square root e 2 Operand Register Type instructions which perform floating point absolute value move and negate Branch on FPU Condition Instructions Table 6 12 on page 11 lists the Branch on FPU coprocessor unit 1 condition instructions that can test the result of the FPU compare C cond instructions Appendix B gives a detailed description of each instruction Floating Point Compare Operations The floating point compare C fmt cond instructions interpret the contents of two FPU registers fs ft in the specified format fmt and arithmetically compare them A result is determined based on the comparison and conditions cond specified in the instruction Table 6 12 on page 11 lists the compare instructions Appendix B gives a detailed description of each instruction Table 6 13 on page 13 lists the mnemonics for the compare instruction conditions Floating Point Unit Chapter 6 Mnemonic Definition Mnemonic Definition F False T True UN U
74. Point Unit Chapter 6 FPU Features This section briefly describes the operating model the load store instruction set and the coprocessor interface in the FPU A more detailed description is given in the sections that follow e Full 64 bit Operation When the FR bit in the CPU Status register equals O the FPU is configured for sixteen 64 bit registers for double precision values or thirty two 32 bit registers for single precision val ues When the FR bit in the CPU Status register equals 1 the FPU is configured for thirty two 64 bit registers Each register can hold sin gle or double precision values The FPU also includes a 32 bit Con trol Status register that provides access to all IEEE Standard exception handling capabilities e Load and Store Instruction Set Like the CPU the FPU uses a load and store oriented instruction set with single cycle load and store operations Overlap of multiply and add is supported e Tightly Coupled Coprocessor Interface The FPU resides on chip to form a tightly coupled unit with a seamless integration of floating point and fixed point instruction sets FPU Programming Model This section describes the set of FPU registers and their data organization The FPU registers include Floating Point General Purpose registers FGRs and two control registers Control Status and Implementation Revision Floating Point General Registers FGRs The FPU has a set of Floating Point General Purpose reg
75. Store Word Right I Note 1For Tables 1 1 through 1 17 this column refers to the level in which the instruction first appeared Table 1 1 CPU Instruction Set Load and Store Instructions OpCode Description MIPS ISA Level ADDI Add Immediate I ADDIU Add Immediate Unsigned I SLTI Set on Less Than Immediate I SLTIU Set on Less Than Immediate I Unsigned ANDI AND Immediate I ORI OR Immediate I XORI Exclusive OR Immediate I LUI Load Upper Immediate I Table 1 2 CPU Instruction Set Arithmetic Instructions ALU Immediate Overview Chapter 1 OpCode Description MIPS ISA Level ADD Add I ADDU Add Unsigned S T e SUB Subtract I SUBU Subtract Unsigned I SLT Set on Less Than I SLTU Set on Less Than Unsigned I AND AND I OR IRo fp XOR Exclusive OR I NOR NOR I Table 1 3 CPU Instruction Set Arithmetic 3 Operand R Type OpCode Description MIPS ISA Level MULT Multiply I MULTU Multiply Unsigned o D DIV Divide I DIVU Divide Unsigned I MFHI Move From HI I MTHI Move To HI I MFLO Move From LO I MTLO MoveToLlO o Do I Table 1 4 CPU Instruction Set Multiply and Divide Instructions MIPS ISA Level OpCode Description J Jump I JAL Jump And Link I Table 1 5 CPU Instruction Set J ump and Branch Instruction Overview Chapter 1 Op
76. T of the instruction immediately following this load instruction The FR bit in the Status register specifies whether all 32 registers of the R4600 are addressable When FR equals zero this instruction is not defined when the least significant bit of fs is non zero When FRis set fs may specify either odd or even registers Operation T if SRog 1 then data CPR 1 fs else data CPR 1 S44 0 endif T 1 GPRirt data Exceptions Coprocessor unusable exception FPU Instruction Set Details Appendix B Doubleword Move To DMTC1 Floating Point Coprocessor DMTC1 31 26 25 21 20 16 15 1110 0 COP1 DMT rt fs 0 010001 00101 000 0000 0000 6 5 5 5 11 Format DMTC1 rt fs Description The contents of general register rt are loaded into coprocessor register fs of the CP1 The contents of floating point register fs are undefined for time T of the instruction immediately following this load instruction The FR bit in the Status register specifies whether all 32 registers of the R4600 are addressable When FR equals zero this instruction is not defined when the least significant bit of fs is non zero When FR equals one fs may specify either odd or even registers Operation T data GPR ri T 1 if SRog 1 then CPR 1 fs lt data else CPR 1 fs4 4 0 data endif Exceptions Coprocessor unusable exception FPU Instruction Set Details Appendix B
77. TLB Low power operation 3 3V or 5V power supply options For R4600 25mW MuHz internal power dissipation 2 5W 100MHz 3 3V For R4700 24mW MuHz internal power dissipation 2 4W 100MHz 3 3V Standby mode reduces internal power to 400mW Fully software compatible with R4000 Processor Family Standard operating system support includes Microsoft Windows NT UNISOFT Unix System V 4 JMI C executive VX Works Available in 179 pin PGA or 208 pin MQUAD Input and output clock frequency Input clock at one half pipeline frequency Output clock is a programmable divisor of the pipeline frequency Selectable bus frequency Ratios of 1 2 1 8 of pipeline rate 64GB physical address space Processor family for a wide variety of applications Desktop workstations and PCs Deskside or departmental servers Routers High performance embedded applications Notebooks Large number of development tools including Cross compilers Logic models Logic analyzer support Overview Chapter 1 Device Overview The R4600 R4700 family brings a high level of integration designed for high performance and high bandwidth computing The key elements of the R4600 R4700 are briefly described below An overview of these blocks is found here with more detailed information on each block presented in subsequent chapters Figure 1 1 shows a block level representation of the functional units within the R4600 R
78. TLB modification exception Bus error exception Address error exception A 129 SWR Register A B C D E F G H Memory J K L M N O P BigEndianCPU 0 BigEndianCPU 1 offset offset vAddr 9 destination type LEM BEM destination tyPe LEM BEM 0 I J KLEF GH 3 0 4 HJ KLMNOPY O 7 0 1 I J KLF GHP 2 1 4 GHKLMNOP 1 6 0 2 I J KLGHOP 1 2 4 F GHLMNOP 2 5 0 3 I J KLHNOP 0 3 4 EF GHMNOP 3 4 0 4 EF GHMNOP 3 4 0 I JKLHNOPJO 3 4 5 F GHLMNOP 2 5 Oj I J KLGHOP 1 2 4 6 GHKLMNOP 1 6 Oj I J KLFGHPY 2 1 4 7 HJKLMNOP 0 7 Oj JKLEFGHY 3 0 4 LEM Little endian memory BigEndianMem 0 BEM BigEndianMem 1 Type AccessType see Table 2 1 on page 2 3 sent to memory Offset pAddro 9 sent to memory Exceptions CPU Instruction Set Details Appendix A SYNC Synchronize SYNC 31 26 25 6 5 0 SPECIAL 0 SYNC 000000 0000 0000 0000 0000 0000 001111 6 20 6 Format SYNC Description The SYNC instruction ensures that any loads and stores fetched prior to the present instruction are completed before any loads or stores after this instruction are allowed to start Use of the SYNC instruction to serialize certain memory references may be required in a multiprocessor environment for proper synchronization For example Processor A Processor B SW R1 DATA 1 LW R2 FLAG LI R2 1 BEQ R2 RO 1B SYNC NOP SW R2 FLAG SYNC LW R1 DATA The SYNC in processor A prevents DATA being written after FLAG
79. Table 3 1 and Table 3 2 describe the pipeline interlocks and exceptions listed in Figure 3 5 Exception Description ITLB Instruction Translation or Address Exception Intr External Interrupt IBE Instruction Bus Error RI Reserved Instruction BP Breakpoint SC System Call CUn Coprocessor Unusable IPErr Instruction Parity Error OVF Integer Overflow FPE FP Interrupt ExTrap EX Stage Traps DTLB Data Translation or Address Exception TLBMod TLB Modified DBE Data Bus Error DPErr Data Parity Error NMI Non maskable Interrupt or Soft Reset Reset Reset Table 3 1 Pipeline Exceptions Interlock Description ITM Instruction TLB Miss ICM Instruction Cache Miss CPE Coprocessor Possible Exception DCM Data Cache Miss LDI Load Interlock MDSt Multiply Divide Start FCBsy FP Coprocessor Busy Table 3 2 Pipeline Interlocks Exception Conditions When an exception condition occurs the relevant instruction and all those that follow it in the pipeline are cancelled Accordingly any stall conditions and any later exception conditions that may have referenced this instruction are inhibited there is no benefit in servicing stalls for a cancelled instruction The CPU Pipeline Chapter 3 When an exceptional condition is detected for an instruction the R4600 R4700 will kill it and all following instructions When this instruction reaches the W stage the exce
80. The 16 bit offset is sign extended and added to the contents of general register base to form an unsigned effective address The contents of register ft from the floating point coprocessor are stored at the memory location specified by the effective address The FR bit of the Status register specifies whether all 64 bit floating point registers are addressable If FR 0 SWC1 stores either the high or low half of the 16 even floating point registers If FR 1 SWC1 stores the low 32 bits of both even and odd floating point registers If either of the two least significant bits of the effective address are non zero an address error exception occurs Operation T vAddr lt offsetys offsety5 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 pAddro o xor ReverseEndian 0 byte lt vAddro o xor BigEndianCPU 07 if SRog 1 then data CPR 1 ftle3 s byte o O PY else if ftg 0 then data lt CPR 1 ft4 Il Ole3 s byte o 0 PY else data 0 Py e CPR 1 ft 0 63 32 8 byte endif StoreMemory uncached WORD data pAddr vAddr DATA Exceptions Coprocessor unusable TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception FPU Instruction Set Details Appendix B TRUNC L fmt jer TRUNC L fmt Truncate to Long Fixed Point Format
81. The signals are provided for compatibility with existing R4xOOPC designs Table 8 4 lists the JTAG interface signals Name Definition Direction Description JTDI JTAG data in Input Connected directly to JTDO No JTAG imple mented Should be pulled High JTCK TAG clock input Input Unused input Should be pulled High JTDO JTAG data out Output Connected directly to JTDI If no external scan used this is a no connect JTMS JTAG command Input Unused input Should be pulled High Table 8 4 J TAG Interface Signals Processor Signal Descriptions Chapter 8 Initialization Interface Signals The Initialization interface signals make up the interface by which an external agent initializes the processor operating parameters Table 8 5 lists the Initialization interface signals Name Definition Direction Description ColdReset Cold reset Input This signal must be asserted for a power on reset or a cold reset The clocks SClock TClock and RClock begin to cycle and are syn chronized with the deasserted edge of ColdReset ColdReset must be deasserted synchronously with MasterClock ModeClock Boot mode clock Output Serial boot mode data clock output runs at the Master Clock frequency divided by 256 MasterClock 256 Modeln Boot mode data in Input Serial boot mode data input Reset Reset Input This signal must be asserted for any reset sequence It can be asserted synchrono
82. To ensure proper operation in the event of interruptions the two instructions which follow a MFLO instruction may not be any of the instructions which modify the LO register MULT MULTU DIV DIVU MTLO DMULT DMULTU DDIV DDIVU Operation 31 26 25 16 15 1110 6 SPECIAL 0 rd 0 MFLO 000000 00 0000 0000 00000 010010 6 10 5 5 6 Format MFLO rd Description GPR rd LO Exceptions None CPU Instruction Set Details Appendix A Move To MTCO System Control Coprocessor MTCO 31 26 25 21 20 16 15 11 10 0 COPO MT rt rd 0 010000 00100 000 0000 0000 6 5 5 5 11 Format MTCO rt rd Description The contents of general register rt are loaded into coprocessor register rd of CPO Because the state of the virtual address translation system may be altered by this instruction the operation of load instructions store instructions and TLB operations immediately prior to and after this instruction are undefined Operation T data GPR ri T 1 CPR O rd data Exceptions Coprocessor unusable exception CPU Instruction Set Details Move To Coprocessor Appendix A MTCz MTCz 26 25 21 20 16 15 11 10 COPz MT rt rd 0 0100xx 00100 000 0000 0000 6 5 5 5 11 Format MTCz rt rd Description The contents of general register rt are loaded into coprocessor register rd of coprocessor z Execution of the instruction referencing
83. Virtual Coherency Data Access or Virtual Coherency Instruction Fetch The processor does not write to the BadVAddr register when the EXL bit in the Status register is set to a 1 Figure 5 2 shows the format of the BadVAddr register BadV Addr Register 63 0 Bad Virtual Address 64 Figure 5 2 BadVAddr Register Format Note The BadVAddr register does not save any information for bus errors since bus errors are not addressing errors Count Register 9 The Count register acts as a timer incrementing at a constant rate half the maximum instruction issue rate whether or not an instruction is executed retired or any forward progress is made through the pipeline This register can be read or written It can be written for diagnostic purposes or system initialization for example to synchronize processors Figure 5 3 shows the format of the Count register Count Register 31 0 Count Compare Register 11 The Compare register acts as a timer see also the Count register it maintains a stable value that does not change on its own When the value of the Count register equals the value of the Compare register interrupt bit IP 7 in the Cause register is set This causes an interrupt as soon as the interrupt is enabled Writing a value to the Compare register as a side effect clears the timer interrupt For diagnostic purposes the Compare register is a read write register In normal use however the Comp
84. arbitrating to issue an external request This transition to slave state allows the external agent to quickly return read response data 12 5 System Interface Chapter 12 Processor and External Requests There are two broad categories of requests processor requests and external requests These two categories are described in this section When a system event occurs the processor issues either a single request or a series of requests called processor requests through the system interface to access an external resource and service the event For this to work the processor system interface must be connected to an external agent that is compatible with the system interface protocol and can coordinate access to system resources An external agent requesting access to a processor status register generates an external request This access request passes through the system interface System events and request cycles are shown in Figure 12 5 R4600 R4700 External Agent Processor Requests e Read e Write External Requests e Read e Write e Null System Events e Load Miss Store Miss Store Hit on write through Uncached Load Store CACHE operations Figure 12 5 Requests and System Events Rules for Processor Requests The following rules apply to processor requests e After issuing a processor read request the processor cannot issue a subsequent read request until it has received a read response e After the
85. are reset by this exception Like Reset it can be used on the processor in any state the caches TLB and normal exception vectors need not be properly initialized Soft Reset preserves the state of the caches and memory system while resetting the bus state and cache state machine When this exception occurs the contents of all registers are preserved except for e ErrorEPC register which contains the restart PC e ERL bit of the Status register which is set to 1 e SR bit of the Status register which is set to 1 e BEV bit of the Status register which is set to 1 Because the Soft Reset can abort cache and bus operations cache and memory state is undefined when this exception occurs Soft reset exception processing is shown in Figure 5 14 on page 13 Servicing The Soft Reset exception is serviced by saving the current processor state for diagnostic purposes and reinitializing for the Reset exception CPU Exception Processing Chapter 5 Nonmaskable Interrupt NMI Exception This section explains the Nonmaskable Interrupt exception Cause The Nonmaskable Interrupt NMI exception occurs in response to the falling edge of the NMI pin or an external write to the Int 6 bit of the Interrupt register Unlike all other interrupts this interrupt is not maskable it occurs regardless of the settings of the EXL ERL and the IE bits in the Status register Processing The Reset exception vector is used for this excepti
86. as an inexact result Trap Enabled Results When an underflow trap is enabled underflow is signaled when tininess is detected regardless of loss of accuracy If underflow traps are enabled the result register is not modified and the source registers are preserved Floating Point Exceptions Chapter 7 Trap Disabled Results When an underflow trap is not enabled and FCSR FS is clear then take an unimplemented exception When an underflow trap is not enabled and FCSR FS is set raise Inexact and return either O or 2 as appropriate for the current rounding mode Unimplemented Instruction Exception E Any attempt to execute an instruction with an operation code or format code that has been reserved for future definition sets the Unimplemented bit in the Cause field in the FPU Control Status register and traps The operand and destination registers remain undisturbed and the instruction is emulated in software Any of the IEEE Standard 754 exceptions can arise from the emulated operation and these exceptions in turn are simulated The Unimplemented Instruction exception can also be signaled when unusual operands or result conditions are detected that the implemented hardware cannot handle properly These include e Denormalized operand e Quiet NaN operand e Underflow e Reserved opcodes e Unimplemented formats e Conversion of a floating point number to a fixed point format when an overflow occurs or the source operan
87. attempt is made to execute one of the following e load or store a doubleword that is not aligned on a doubleword boundary except for use of special instruction e load fetch or store a word that is not aligned on a word boundary except for use of special instruction e load or store a halfword that is not aligned on a halfword boundary e reference the kernel address space from User or Supervisor mode e reference the supervisor address space from User mode This exception is not maskable Processing The common exception vector is used for this exception The AdEL or AdES code in the Cause register is set indicating whether the instruction shown by the EPC register and BD bit in the Cause register caused the exception with an instruction reference load operation or store operation When this exception occurs the BadVAddr register retains the virtual address that was not properly aligned or referenced protected address space The contents of the VPN field of the Context and EntryHi registers are undefined as are the contents of the EntryLo register The EPC register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot If it is in a branch delay slot the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set as indication Address Error exception processing is shown in Figure 5 15 on page 13 Servicing Typically
88. base field in the register as needed Normally the operating system uses the XContext register to address the current page map which resides in the kernel mapped segment kseg3 Figure 5 8 shows the format of the XContext register Table 5 6 which follows the figure describes the XContext register fields XContext Register 33 32 3130 PTEBase R BadVPN2 31 2 27 Figure 5 8 XContext Register Format The 27 bit BadVPN2 field has bits 39 13 of the virtual address that caused the TLB miss bit 12 is excluded because a single TLB entry maps to an even odd page pair For a 4 Kbyte page size this format may be used directly to address the pair table of 8 byte PTEs For other page and PTE sizes shifting and masking this value produces the appropriate address Field Description BadVPN2 The Bad Virtual Page Number 2 field is written by hardware on a miss It contains the VPN of the most recent invalidly translated vir tual address R The Region field contains bits 63 62 of the virtual address 00 user 01 supervisor 11 kernel PTEBase The Page Table Entry Base read write field is normally written with a value that allows the operating system to use the Context register as a pointer into the current PTE array in memory Table 5 6 XContext Register Fields Error Checking and Correcting ECC Register 26 The 8 bit Error Checking and Correcting ECC register reads or writes prima
89. be changed address 8 address 0 memory big endian register 8 9 10 11 12 13 14 15 lol 1 2 3 4 5 6 71 before A B C D E F GH 24 LDL 24 3 0 after 1314 5 6 IZ F IG H 24 The contents of general register rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LDL or LDR instruction which also specifies register rt No address exceptions due to alignment are possible CPU Instruction Set Details Appendix A Operation T vAddr lt offset s offsety5 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrpgjze_1 3 0Addr2_ o xor ReverseEndian if BigEndianMem 0 then pAddr pAddrpgize 1 3 0 endif byte vAddro o xor BigEndianCPU mem LoadMemory uncached byte pAddr vAddr DATA GPR rt mem7 pyte 0 GPR rt l55 8 byte 0 Given a doubleword in a register and a doubleword in memory the operation of LDL is as follows LDL Register A B C D E F G H Memory J K L M N O P BigEndianCPU 0 BigEndianCPU 1 vAddro 9 destination type offset destination type ofset LEM BEM LEM BEM 0 PBCDEFGH 0 0 7 J KLMNOP 7 0 0 1 OPCDEFGH 1 0 6 JKLMNOP H 6 O 1
90. bit bus for command and data identifier mand data identi Output transmission between the processor and an fier external agent SysCmdP System com Input A single even parity bit for the SysCmd bus mand data identi Output fier bus parity ValidIn Valid input Input The external agent asserts ValidiIn when it is driving a valid address or data on the SysAD bus and a valid command or data identifier on the SysCmd bus ValidOut Valid output Output The processor asserts ValidOut when it is driving a valid address or data on the SysAD bus and a valid command or data identifier on the SysCmd bus WrRdy Write read Input An external agent asserts WrRdy when it y y P g y can accept a processor write request Table 8 1 System Interface Signals Processor Signal Descriptions Clock Control Interface Signals The Clock Control interface signals make up the interface for clocking and maintenance Table 8 2 lists the Clock Control interface signals Chapter 8 Name Definition Direction Description IOOut I O output Output Reserved for future output Always High IOIn I O input Input Reserved for future input Should be driven High MasterClock Master clock Input Master clock input that estab lishes the processor operating frequency It is 1 2 the pipeline frequency MasterOut Master clock out Output Master clock output aligned with MasterClock R
91. bit offsetis sign extended and added to the contents of general register base to form a virtual address The least significant byte of register rt is stored at the effective address Operation T vAddr offset s offsety5 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 DAddre 9 xor ReverseEndian byte lt vAddrs 9 xor BigEndianCPU data lt GPRI r e3 s byte o 0 StoreMemory uncached BYTE data pAddr vAddr DATA Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception A 102 CPU Instruction Set Details Appendix A SC Store Conditional SC 31 26 25 21 20 16 15 0 SC base rt offset 111000 5 5 16 Format SC rt offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form a virtual address The contents of general register rt are conditionally stored at the memory location specified by the effective address This instruction implicitly performs a SYNC operation loads and stores to shared memory fetched prior to the SC must access memory before the SC loads and stores to shared memory fetched subsequent to the SC must access memory after the SC If any other processor or device has modified the physical address since the time of the previous Load Linked instruction or
92. block read request followed by processor block write request NCW Processor noncoherent write request Table 12 2 Store Miss to Primary Cache No Secondary Cache Mode Store Miss If the coherency attribute is write back or write through with write allo cate the processor issues a read request for the cache line that contains the data element to be loaded then awaits the external agent to provide read data in response to the read request Then if the current cache line must be written back the processor issues a write request for the current cache line For a write through no write allocate store miss the processor issues a write request only 12 12 System Interface Chapter 12 In no secondary cache mode if the new cache line replaces a current cache line whose Write back W bit is set the current cache line moves to an internal write buffer before the new cache line is loaded in the primary cache Store Hit This section describes store hits in no secondary cache mode for both write back and write through lines No Secondary Cache Mode Store Hit In no secondary cache mode the action on the system interface will be determined by whether the line is write back or write through All lines that use a write back policy are set to the dirty exclusive cache state and there is no bus transactions generated For lines with a write through policy the store will also generate a processor write request for the st
93. caches lie between the CPU and main memory They are designed to make the speedup of memory accesses transparent to the user Each functional block in Figure 11 1 has the capacity to hold more data than the block above it For instance physical main memory has a larger capacity than the primary cache At the same time each functional block takes longer to access than any block above it For instance it takes longer to access data in main memory than in the CPU on chip registers R4600 R4700 Registers Registers Registers l cache D cache Primary Cache y Optional Faster Access Increasing Data Secondary Time Capacity Cache A Main Memory Disk CD ROM Tape etc Peripherals Figure 11 1 Logical Hierarchy of Memory The R4600 R4700 processor has two on chip primary caches one holds instructions the instruction cache the other holds data the data cache 11 1 Cache Organization Operation and Coherency Chapter 11 Overview of Cache Operations As described earlier caches provide fast temporary data storage and they make the speedup of memory accesses transparent to the user In general the processor accesses cache resident instructions or data through the following procedure 1 The processor through the on chip cache controller attempts to access the next instruction or data in the primary cache 2 The cache controll
94. cleared to 0 and ERL and BEV are set to 1 All other bits are undefined The Random register is initialized to the value of its upper bound e The Wired register is initialized to 0 e Some of the Config Register bits are initialized from the boot time mode stream Reset exception processing is shown in Figure 5 12 on page 12 Servicing The Reset exception is serviced by e initializing all processor registers coprocessor registers caches and the memory system e performing diagnostic tests e bootstrapping the operating system L In the following sections and throughout this manual a signal followed by an asterisk such as Reset is low active 5 15 CPU Exception Processing Chapter 5 Soft Reset Exception This section explains the Soft Reset exception Cause The Soft Reset exception occurs in response to the Reset input signal and execution begins at the Reset vector when Reset is deasserted This exception is not maskable Processing The Reset exception vector is used for this exception located within unmapped and uncached address space so that the cache and TLB need not be initialized to process this exception When a Soft Reset occurs the SR bit of the Status register is set to distinguish this exception from a Reset exception The primary purpose of the Soft Reset exception is to reinitialize the processor after a fatal error during normal operations Unlike an NMI all cache and bus state machines
95. code field is available for use as software parameters but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction Operation T if GPR rs GPR rt then TrapException endif Exceptions Trap exception A 132 CPU Instruction Set Details TEQI Trap If Equal Immediate Appendix A TEQI TEQI rs immediate Description The 16 bit immediate is sign extended and compared to the contents of general register rs If the contents of general register rs are equal to the sign extended immediate a trap exception occurs 31 26 25 21 20 16 15 0 REGIMM rs TEQI immediate 000001 01100 6 5 5 16 Format Operation T if GPR rs immediate 5 immediate 5 o then TrapException endif Exceptions Trap exception A 133 CPU Instruction Set Details TGE Trap If Greater Than Or Equal Appendix A TGE 31 26 25 21 20 16 15 0 SPECIAL rs rt code TGE 000000 110000 6 5 5 10 6 Format TGE rs rt Description The contents of general register rt are compared to the contents of general register rs Considering both quantities as signed integers if the contents of general register rs are greater than or equal to the contents of general register rt a trap exception occurs The code field is available for use as software parameters but is retrieved by the exception handler
96. coprocessor 3 causes a reserved instruction exception not a coprocessor unusable exception Operat ion T T els data GPRIrt 3 0 if rdo 0 CPR Z rd4_4 0 CPR z rd4 1 Ole3 32 data endif e CPR z rd4 4 0 lt data CPR z rd4 1 Ols1 0 Exceptions Coprocessor unusable exception Reserved instruction exception coprocessor 3 Opcode Bit Encoding MTCz Bit 31 30 29 28 27 26 25 24 23 22 21 0 copo 9 1 0 0 0 0 0 0 1 0 0 Bit 31 30 29 28 27 26 25 24 23 22 21 0 cari 0 1 0 0 0 1 0 0 1 0 0 Bit 31 30 29 28 27 26 25 24 23 22 21 0 Gia 0J 1 0 0 1 0 0 0 1 0 0 NN A Dn D Opcode Coprocessor Unit Number Coprocessor Suboperation CPU Instruction Set Details Appendix A MTHI Move To HI MTHI 31 26 25 21 20 65 0 SPECIAL rs 0 MTHI 000000 000 000000000000 010001 6 5 15 6 Format MTHI rs Description The contents of general register rs are loaded into special register HI If a MTHI operation is executed following a MULT MULTU DIV or DIVU instruction but before any MFLO MFHI MTLO or MTHI instructions the contents of special register LO are undefined Operation T 2 HI lt undefined T 1 HI lt undefined T HI GPR rs Exceptions None CPU Instruction Set Details Appendix A MTLO Move To LO MTLO
97. cycles 2 cycle penalty for taken branches 2 doublewords 2 cycles no penalty for taken branches 1 doubleword Integer multiply integer multiply hardware 1 cycle to issue done in floating point multiplier 4 cycles to issue Integer divide done in integer datapath adder slips until done done in floating point adder 4 cycles to issue Integer multiply HIGH and LOW available at the same time LOW available one cycle before HIGH Integer divide HIGH and LOW available at the same time HIGH available one cycle before LOW HIGH and LOW hazards MFHI MFLO latency yes HIGH and LOW written early in pipeline 1 cycle no HIGH and LOW written after W 2 cycles SLLV SRLV SRAV DSLL DSRL DSRA DSLL32 DSRL32 DSRA32 DSLLV DSRLV DSRAV 2 cycles to issue 2 cycles to issue 1 cycle to issue 1 cycle to issue Table 1 23 Pipeline Comparison Between R4400 PC and R4600 R4700 Overview Chapter 1 Item R4400 PC R4600 R4700 WatchLo WatchHi implemented unimplemented no watch registers Config as described in R4000 User s Guide subset Status as described in R4000 User s Guide no TS or RP Low power standby mode but RP not functional no WAIT instruction disables internal clock freezing pipeline and other state resume on interrupt MFCO MTCO hazard EntryLoO EntryLol only hazardous for certain cpO register combin
98. data that missed is fetched first The rest of the data cache line will be placed into the cache in parallel If the current cache line must be written back the processor issues a write request to save the dirty cache line in memory Store Miss When a processor store misses in the primary cache the processor may request from the external agent the cache line that contains the target location of the store for pages that are either write back or write through with write allocate only The processor examines the coherency attribute in the TLB entry for the page TLB page coherency attributes are listed in Chapter 4 that contains the requested cache line to see if the line is write allocate or no write allocate The processor then executes one of the following requests e If the coherency attribute is noncoherent write back or noncoherent write through with write allocate a noncoherent block read request is issued e If the coherency attribute is noncoherent write through with no write allocate the processor issues a non block write request Table 12 1 shows the actions taken on a store miss to the primary cache Page Attribute State of Data Cache Line Being Replaced Clean Invalid Dirty W 1 Noncoherent write back or NCR NCR W Noncoherent write through with write allocate Noncoherent write through with NCW NA no write allocate NCR Processor noncoherent block read request NCR W Processor noncoherent
99. errors cannot be corrected There are two types of parity e Odd Parity adds 1 to any even number of 1s in the data making the total number of 1s odd including the parity bit e Even Parity adds 1 to any odd number of 1s in the data making the total number of 1s even including the parity bit Odd and even parity are shown in the example below Data 3 0 Odd Parity Bit Even Parity Bit 0010 0 1 The example above shows a single bit in Data 3 0 with a value of 1 this bit is Data 1 e In even parity the parity bit is set to 1 This makes 2 an even num ber the total number of bits with a value of 1 e Odd parity makes the parity bit a O to keep the total number of 1 val ue bits an odd number in the case shown above the single bit Da ta 1 The example below shows odd and even parity bits for various data values Data 3 0 Odd Parity Bit Even Parity Bit 0110 1 O 0000 1 O l1 1 1 1 1 0 1101 0 1 Parity allows single bit error detection but it does not indicate which bit is in error for example suppose an odd parity value of 00011 arrives The last bit is the parity bit and since odd parity demands an odd number 1 3 5 of 1s this data is in error it has an even number of 1s However it is impossible to tell which bit is in error 14 1 R4600 R4700 Error Checking Chapter 14 Error Checking Operation The processor verifies data correctness by using parity as it passes data from to the system inte
100. for all system interface commands Request Type Request Specific Figure 12 28 System Interface Command Syntax Bit Definition SysCmd 8 must be set to O for all system interface commands SysCmd 7 5 specify the system interface request type which may be read write or null Table 12 6 lists the encoding of SysCmd 7 5 Table 12 6 shows the types of requests encoded by the SysCmd 7 5 bits SysCmd 7 5 Command 0 Read Request 1 Reserved 2 Write Request 3 Null Request 4 7 Reserved Table 12 6 Encoding of SysCmd 7 5 for System Interface Commands SysCmd 4 0 are specific to each type of request and are defined in each of the following sections Read Requests Figure 12 29 shows the format of a SysCmd read request Read realise Specific see tables Figure 12 29 Read Request SysCmd Bus Bit Definition 12 33 System Interface Chapter 12 Table 12 7 Table 12 8 and Table 12 9 list the encoding of SysCmd 4 0 for read requests SysCmd 4 3 Read Attributes 0 1 Reserved 2 Noncoherent block read 3 Doubleword partial doubleword word or partial word Table 12 7 Encoding of SysCmd 4 3 for Read Requests SysCmd 2 Link Address Retained Indication 0 Link address not retained 1 Link address retained SysCmd 1 0 Read Block Size 0 Reserved 1 8 words 2 3 Reserved Table 12 8 Encoding of SysCmd 2 0 fo
101. four bytes within the data word need to be returned If the cache is enabled for this access the entire word is returned and loaded into the cache StoreMemory Uses the cache write buffer and main memory to store the word or part of word specified as data in the word con taining the specified physical address The low order two bits of the address and the Access Type field indicates which of each of the four bytes within the data word should be stored Table B 3 Load and Store Common Functions Figure B 1 shows the I Type instruction format used by load and store operations I Type Immediate 31 26 25 21 20 is a 6 bit operation code is the 5 bit base register specifier is a 5 bit source for stores or destination for loads FPA register specifier is the 16 bit signed immediate offset igure B oad and ore Instruction Forma FPU Instruction Set Details Appendix B All coprocessor loads and stores reference aligned word data items Thus for word loads and stores the access type field is always WORD and the low order two bits of the address must always be zero For doubleword loads and stores the access type field is always DOUBLEWORD and the low order three bits of the address must always be zero Regardless of byte numbering order endianness the address specifies that byte which has the smallest byte address in the addressed field For a big endian machine this is the leftmost byte for
102. generated by an XOR of the subblock address either 109 119 or O19 with the binary count of the doubleword 00 through 119 Thus the third doubleword retrieved from a block of data with a starting address of 10 is found by taking the XOR of address 105 with the binary count of DW2 109 The result is 002 or DWO shown in Table 12 16 Cycle StartingBlock Binary Count Double Word Address Retrieved 1 10 00 10 2 10 0l 11 3 10 10 00 4 10 11 0l Table 12 16 Sequence of Doublewords Transferred Using Subblock Ordering Address 103 Cycle StartingBlock Binary Count Double Word Address Retrieved 1 11 00 11 2 11 Ol 10 3 11 10 Ol 4 11 11 00 Table 12 17 Sequence of Doublewords Transferred Using Subblock Ordering Address 115 Cycle StartingBlock Binary Count Double Word Address Retrieved 1 01 00 0l 2 0l Ol 00 3 01 10 11 4 01 11 10 Table 12 18 Sequence of Doublewords Transferred Using Subblock Ordering Address 015 For block write requests the processor always delivers the address of the doubleword at the beginning of the block the processor delivers data beginning with the doubleword at the beginning of the block and progresses sequentially through the doublewords that form the block 12 40 System Interface Chapter 12 During data cycles the valid byte lines depend upon the position of the data with respec
103. in the BadVAddr register but the information is arranged in a form that is more useful for a software TLB exception handler Figure 5 1 shows the format of the Context register Table 5 2 which follows the figure describes the Context register fields Context Register 23 22 PTEBase BadVPN2 19 Field Description BadVPN2 This field is written by hardware on a miss It contains the virtual page number VPN of the most recent virtual address that did not have a valid translation PTEBase This field is a read write field for use by the operating system It is normally written with a value that allows the operating system to use the Context register as a pointer into the current PTE array in memory Table 5 2 Context Register Fields The 19 bit BadVPN2 field contains bits 31 13 of the virtual address that caused the TLB miss bit 12 is excluded because a single TLB entry maps to an even odd page pair For a 4 Kbyte page size this format can directly address the pair table of 8 byte PTEs For other page and PTE sizes shifting and masking this value produces the appropriate address CPU Exception Processing Chapter 5 Bad Virtual Address Register BadVAddr 8 The Bad Virtual Address register BadVAddn is a read only register that displays the most recent virtual address that caused one of the following exceptions Address Error e g unaligned access TLB Invalid TLB Modified TLB Refill
104. integer overflow exception occurs Operation T GPR rt GPR rs immediate 5 immediate s5 o Exceptions Integer overflow exception CPU Instruction Set Details Appendix A DADDIU immediate Unsigned DADDIU 31 26 25 21 20 16 15 0 DADDIU rs rt immediate 011001 6 5 5 16 Format DADDIU rt rs immediate Description The 16 bit immediate is sign extended and added to the contents of general register rs to form the result The result is placed into general register rt No integer overflow exception occurs under any circumstances The only difference between this instruction and the DADDI instruction is that DADDIU never causes an overflow exception Operation T GPR rt GPR rs immediate 5 immediate5 o Exceptions None CPU Instruction Set Details DADDU _Doubleword Add Unsigned Appendix A DADDU 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 DADDU 000000 00000 101101 6 5 5 5 5 6 Format DADDU rd rs rt Description The contents of general register rs and the contents of general register rt are added to form the result The result is placed into general register rd No overflow exception occurs under any circumstances The only difference between this instruction and the DADD instruction is that DADDU never causes an overflow exception Operation T GPR rd
105. is in progress e While waiting for the assertion of RdRdy to issue a processor read request the processor can accept an external request if the request is delivered to the processor one or more cycles before RdRdy is assert ed e While waiting for the assertion of WrRdy to issue a processor write request the processor can accept an external request provided the re quest is delivered to the processor one or more cycles before WrRdy is asserted e If waiting for the response to a read request after the processor has made an uncompelled change to a slave state the external agent can issue an external request before providing the read response data External Read Request In contrast to a processor read request data is returned directly in response to an external read request no other requests can be issued until the processor returns the requested data An external read request is complete after the processor returns the requested word of data The data identifier see System Interface Commands and Data Identi fiers on page 12 32 associated with the response data can signal that the returned data is erroneous causing the processor to take a bus error Note The R4600 R4700 does not contain any resources that are readable by an external read request in response to an external read request the processor returns undefined data and a data identifier with its Erroneous Data bit SysCmd 5 set External Write Request When an ex
106. link bit is not set then the store fails to execute The success or failure of the SC is indicated in the target register of the store The link is broken upon completion of an ERET return from exception instruction The most important features of LL and SC are e They provide a mechanism for generating all of the common synchro nization primitives including test and set counters sequencers etc with no additional overhead e When they operate bus traffic is generated only if the state of the cache line changes lock words stay in the cache until some other pro cessor takes ownership of that cache line 11 10 Cache Organization Operation and Coherency Chapter 11 Examples Using LL and SC Figure 11 9 shows how to implement test and set using LL and SC instructions ORI r3 r2 1 Unlocked BEQ 1r3 r2 Loop 0 NOP es Y Try locking SC 13 r1 semaphore BEQ r3 0 Loop NOP Yes Execute critical section Access shared data Unlock semaphore Figure 11 9 Test and Set using LL and 11 11 Cache Organization Operation and Coherency Chapter 11 Figure 11 10 shows synchronization using a counter iE Yes Try decrementing counter Yes Execute critical section Try incrementing counter Yes Continue processing Loop1 LL r2 r1 BLEZ r2 Loop1 NOP SUB 13 r2 1 SC r3 r1 BEQ r3 0 Loop1 NOP Loop2 LL r2 r1 ADDr3 r2 1 SC 13 r1 BEQ r3 0 Loop2 NOP
107. link register r31 If the contents of general register rs have the sign bit cleared then the program branches to the target address with a delay of one instruction General register rs may not be general register 31 because such an instruction is not restartable An attempt to execute this instruction is not trapped however Operation T target lt offset 5 offset 0 condition GPRirs g3 0 GPR 31 PC 8 T 1 if condition then Exceptions None PC PC target endif CPU Instruction Set Details Appendix A BGEZALL Stua zo BGEZALL And Link Likely 31 26 25 21 20 16 15 0 REGIMM rs BGEZALL offset 000001 10011 6 5 5 16 Format BGEZALL rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended Unconditionally the address of the instruction after the delay slot is placed in the link register r31 If the contents of general register rs have the sign bit cleared then the program branches to the target address with a delay of one instruction General register rs may not be general register 31 because such an instruction is not restartable An attempt to execute this instruction is not trapped however If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation T target lt
108. must issue a system interface release external null request to return the system interface to master state System Interface Endianness The endianness of the system interface is programmed at boot time through the boot time mode control interface see chapter 9 Initialization Interface and remains fixed until the next time the processor boot time mode bits are read Software cannot change the endianness of the system interface and the external system software can set the reverse endian bit to reverse the interpretation of endianness inside the processor but the endianness of the system interface remains unchanged System Interface Cycle Time The processor specifies minimum and maximum cycle counts for various processor transactions and for the processor response time to external requests Processor requests themselves are constrained by the system interface request protocol and request cycle counts can be deter mined by examining the protocol The following system interface interac tions can vary within minimum and maximum cycle counts e waiting period for the processor to release the system interface to slave state in response to an external request release latency e response time for an external request that requires a response exter nal response latency The remainder of this section describes and tabulates the minimum and maximum cycle counts for these system interface interactions 12 31 System Interface Chapter 12
109. occurs on an instruction address translation the least recently used ITLB entry is filled from the JTLB The operation of the ITLB is invisible to the user Data TLB The R4600 R4700 also incorporates a 4 entry data TLB Each entry maps a 4KB page The data TLB improves performance by allowing data address translation to occur in parallel with data address translation When a miss occurs on an data address translation the DTLB is filled from the JTLB The DTLB refill is pseudo LRU the least recently used entry of the least recently used half is filled The operation of the DTLB is invisible to the user Cache Memory In order to keep the R4600 R4700 s high performance pipeline full and operating efficiently the R4600 R4700 incorporates on chip instruction and data caches that can be accessed in a single processor cycle Each cache has its own 64 bit data path and can be accessed in parallel The cache subsystem provides the integer and floating point units with an aggregate bandwidth of 1 6GB per second at a system clock frequency of 50MHz Furthermore the large Two way set associative caches increase emulation performance of DOS and Windows 3 1 applications when running under Windows NT Instruction Cache The R4600 R4700 incorporates a two way set associative on chip instruction cache This virtually indexed physically tagged cache is 16KB in size and is protected with word parity Because the cache is virtually indexed the virtu
110. only by loading the contents of the memory word containing the instruction Operation T if GPR rs gt GPR rt then TrapException endif Exceptions Trap exception A 134 CPU Instruction Set Details Appendix A TGEI Trap If Greater Than Or Equal Immediate TG El 31 26 25 21 20 16 15 0 REGIMM rs TGEI immediate 000001 01000 6 5 5 16 Format TGEI rs immediate Description The 16 bit immediate is sign extended and compared to the contents of general register rs Considering both quantities as signed integers if the contents of general register rs are greater than or equal to the sign extended immediate a trap exception occurs Operation T if GPR rs gt immediate 5 immediate 5 o then TrapException endif Exceptions Trap exception A 135 CPU Instruction Set Details TGEIU Trap If Greater Than Or Equal Immediate Unsigned Appendix A TGEIU TGEIU rs immediate Description 31 26 25 21 20 16 15 0 REGIMM rs TGEIU immediate 000001 01001 6 5 5 16 Format The 16 bit immediate is sign extended and compared to the contents of general register rs Considering both quantities as unsigned integers if the contents of general register rs are greater than or equal to the sign extended immediate a trap exception occurs Operation T if 0 GPR rs 0 immediate 5 8 immediate
111. read a word of data from the processor The processor response to external read requests for any data element other than a word is undefined lt Master je Slave ig lt Master gt 10 11 12 2 8 9 LT ae The processor does not contain any resources that are readable by an external read request in response to an external read request the processor returns undefined data and a data identifier with its Erroneous Data bit SysCmd 5 set Figure 12 21 External Read Request System Interface in Master State External Null Request Protocol The R4600 R4700 only supports one external null request A system interface release external null request returns the system interface to master state from slave state without otherwise affecting the processor 12 25 System Interface Chapter 12 SCycle SClock SysAD Bus SysCmd Bus ValidOut ValidIn ExtRast Release External null requests require no action from the processor other than to return the system interface to master state Figure 12 22 show timing diagram of the external null request cycle which consist of the following steps 1 The external agent asserts ExtRast to arbitrate for the system interface 2 The processor releases the system interface to slave state by asserting Release 3 The external agent drives a system int
112. register is not modified and the source registers are preserved Trap Disabled Results The result when no trap occurs is determined by the rounding mode and the sign of the intermediate result Underflow Exception U Two related events contribute to the Underflow exception e creation of a tiny nonzero result between 2 which can cause some later exception because it is so tiny e extraordinary loss of accuracy during the approximation of such tiny numbers by denormalized numbers IEEE Standard 754 allows a variety of ways to detect these events but requires they be detected the same way for all operations Tinniness can be detected by one of the following methods e after rounding when a nonzero result computed as though the expo nent range were unbounded would lie strictly between 2 e before rounding when a nonzero result computed as though the ex ponent range and the precision were unbounded would lie strictly be tween 2Eminy_ The MIPS architecture requires that tininess be detected after rounding Loss of accuracy can be detected by one of the following methods e denormalization loss when the delivered result differs from what would have been computed if the exponent range were unbounded e inexact result when the delivered result differs from what would have been computed if the exponent range and precision were both un bounded The MIPS architecture requires that loss of accuracy be detected
113. request this specification is made during the address cycle for the request System interface data identifiers specify the attributes of data transmitted during a system interface data cycle The following sections describe the syntax that is the bitwise encoding of system interface commands and data identifiers Reserved bits and reserved fields in the command or data identifier should be set to 1 for system interface commands and data identifiers associated with external requests For system interface commands and data identifiers associated with processor requests reserved bits and reserved fields in the command and data identifier are undefined Command and Data Identifier Syntax System interface commands and data identifiers are encoded in 9 bits and are transmitted on the SysCmd bus from the processor to an external agent or from an external agent to the processor during address and data cycles Bit 8 the most significant bit of the SysCmd bus deter mines whether the current content of the SysCmd bus is a command or a data identifier and therefore whether the current cycle is an address cycle or a data cycle For system interface commands SysCmd 8 must be set to 0 For system interface data identifiers SysCmd 8 must be set to 1 12 32 System Interface Chapter 12 System Interface Command Syntax This section describes the SysCmd bus encoding for system interface commands Figure 12 28 shows a common encoding used
114. returning If a BREAK instruction is in a branch delay slot interpretation of the branch instruction is required to resume execution CPU Exception Processing Chapter 5 Reserved Instruction Exception This section explains the Reserved Instruction exception Cause The Reserved Instruction exception occurs when one of the following conditions occurs e an attempt is made to execute an instruction with an undefined major opcode bits 31 26 e an attempt is made to execute a SPECIAL instruction with an unde fined minor opcode bits 5 0 e an attempt is made to execute a REGIMM instruction with an unde fined minor opcode bits 20 16 e an attempt is made to execute 64 bit operations in 32 bit virtual ad dressing when in User or Supervisor modes 64 bit operations are always valid in Kernel mode regardless of the value of the KX bit in the Status register This exception is not maskable Reserved Instruction exception processing is shown in Figure 5 15 on page 13 Processing The common exception vector is used for this exception and the RI code in the Cause register is set The EPC register contains the address of the reserved instruction unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction Servicing No instructions in the MIPS ISA are currently interpreted The process executing at the time of this exception is handed an illegal instruction reserved ope
115. rt lt data Exceptions Coprocessor unusable exception Reserved instruction exception for supervisor mode with Status SX 0 or user mode with Status UX 0 CPU Instruction Set Details Appendix A Doubl dM T DMTCO System Control Coprocessor DMTCO 31 26 25 21 20 16 15 1110 0 COPO DMT rt rd 0 010000 00101 000 0000 0000 5 5 5 11 Format DMTCO rt rd Description The contents of general register rt are loaded into coprocessor register rd of the CPO This operation is defined in kernel mode regardless of the setting of the Status KX bit Execution of this instruction with in supervisor mode with Status SX 0 or in user mode with UX 0 causes a reserved instruction exception All 64 bits of the coprocessor O register are written from the general register source The operation of DMTCO on a 32 bit coprocessor O register is undefined Because the state of the virtual address translation system may be altered by this instruction the operation of load instructions store instructions and TLB operations immediately prior to and after this instruction are undefined Operation T data GPR ri T 1 CPR O rd data Exceptions Reserved instruction exception for supervisor mode with Status SX 0 or user mode with Status UX 0 CPU Instruction Set D Details Appendix A MULT Doubleword Multiply DMULT 31 26 25 21 20 16 15 6 5 0 SPECIAL rs
116. set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation T StoreFPR fd fmt AbsoluteValue ValueFPR fs fmt Exceptions Coprocessor unusable exception Coprocessor exception trap Coprocessor Exceptions Unimplemented operation exception Invalid operation exception FPU Instruction Set Details Appendix B ADD fmt Floating Point Add ADD fmt 31 26 25 21 20 16 15 11 10 6 5 0 COP1 010001 000000 fmt ft fs fd ADD 6 5 5 5 5 6 Format ADD fmt fd fs ft Description The contents of the FPU registers specified by fs and ft are interpreted in the specified format and arithmetically added The result is rounded as if calculated to infinite precision and then rounded to the specified format fm according to the current rounding mode The result is placed in the floating point register FPR specified by fd This instruction is valid only for single and double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both
117. slot is nullified Operation T target lt offset4s offset 07 condition GPR rs GPR rt T 1 if condition then PC amp PC target else NullifyCurrentInstruction endif Exceptions None CPU Instruction Set Details Appendix A Branch On Greater Than BG EZ Or Equal To Zero BG EZ 31 26 25 21 20 16 15 0 REGIMM rs BGEZ offset 000001 00001 6 5 5 16 Format BGEZ rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the contents of general register rs have the sign bit cleared then the program branches to the target address with a delay of one instruction Operation T target lt offset 5 offset 0 condition GPR rs g3 0 T 1 if condition then PC PC target endif Exceptions None CPU Instruction Set Details Appendix A BGEZAL Branch On Greater Than Or Equal To Zero And Link BGEZAL 31 26 25 21 20 16 15 0 REGIMM rs BGEZAL offset 000001 10001 6 5 5 16 Format BGEZAL rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended Unconditionally the address of the instruction after the delay slot is placed in the
118. source is computed using the following formula destination stage of A source stage of B 1 SOURCE DESTINATION Operation Name Stage Name Stage MTCO gpr rt 2 A cpr rd 4 W a MFCO cpr rd 2 A gpr rt 4 W a TLBR Index TLB 2 A PageMask 4 W EntryHi EntryLoO EntryLol TLBWI Index or Random PageMask 2 A TLB 3 D B TLBWR EntryHi EntryLoO EntryLol TLBP PageMask EntryHi 2 A Index 4 W ERET EPC or ErrorEPC 2 A Status EXL Status ERL 4 W y Status ERL LLbit 4 W CACHE Index Load TagLo TagHi ECC 3 D Tag CACHE Index Store TagLo TagHi ECC 3 D Tag Instruction fetch EntryHi ASID Status KSU Sta ol tus RE Config KOC TLB Status ERL Status EXL Oy Instruction fetch EPC Status Cause 4 W exception BadVAddr Context EntryHi 1 1 Coprocessor usable Status CU Status KSU Sta 1 R test tus EXL Status ERL Interrupt Cause IP Status IM Status IE 2 A Status EXL Status ERL Load Store EntryHi ASID Status KSU Sta 2 A tus RE Status ERL Status EXL Config KOC TLB Load Store exception EPC Status Cause Bad 4 W VAddr Context EntryHi Notes a There must be at least one instruction between a MTCO and a MFCO B TLBW_ instructions will cause a one cycle slip y Instructions fetches following an ERET will see a change in EXL or ERL in Stage 2 of the ERET in anticipation of the completion of the ERET If the ERET does not complete
119. stage If the cache line to be replaced is dirty the W bit is set the data is moved to the internal write buffer in the next cycle The first doubleword of data is returned to the cache in 3 and the pipeline will then restart The remainder of the cache line is returned in the subsequent cycles The data to be written back will be returned to memory some time after the entire new cache line is returned Slip Conditions During the 2R and 1A pipe stages internal logic will determine whether it is possible to start the current instruction in this cycle If all of the source operands are available either from the register file or via the internal bypass logic and all the hardware resources necessary to complete the instruction will be available at the necessary time s then the instruction issues otherwise the instruction will slip Slipped instructions are retried on subsequent cycles until they issue The backend of the pipeline stages D and W will advance normally during slips in an attempt to resolve the conflict NOPS will be inserted into the bubble in the pipeline Instructions killed by branch likely instructions ERET or exceptions will not cause slips The CPU Pipeline Chapter 3 Figure 3 8 shows an instruction cache miss CYCLE Issue Issue Slip Slip Slip Slip Issue Issue Issue W D Z Q Previous Instructions
120. the output of the AND OR logic to produce the R4600 R4700 inter rupt signal Status register SR 0 G Status register SR 15 8 R4600 R4700 Interrupt 1 i IP2 8 AND Pay function IP6 AND OR IP7 function Cause register 15 8 IMO IM7 IPO Figure 13 4 Masking of the R4600 R4700 Interrupts 13 3 R4600 R4700 Chapter 14 Error Checking Integrated Device Technology Inc Introduction This chapter describes the Error Checking mechanism used in the R4600 R4700 processor Error Checking in the Processor Error checking codes allow the processor to detect and sometimes correct errors made when moving data from one place to another Two major types of data errors can occur in data transmission e hard errors which are permanent arise from broken interconnects internal shorts or open leads e soft errors which are transient are caused by system noise power surges and alpha particles Hard errors must be corrected by physical repair of the damaged equipment and restoration of data from backup Soft errors can be corrected by using error checking and correcting codes Types of Error Checking The R4600 R4700 uses parity error detection only Parity Error Detection Parity is the simplest error detection scheme By appending a bit to the end of an item of data called a parity bit single bit errors can be detected however these
121. the physical address Valid bit FIFO Replacement Bit Complemented on refill Even parity for the PTag and V fields Even parity 1 parity bit per word of data Cache data Figure 11 3 R4600 R4700 Primary I Cache Line Format Organization of the Primary Data Cache D Cache Each line of primary D cache data has an associated 30 bit tag that contains a 24 bit physical address 2 bit cache line state a write back bit a parity bit for the physical address and cache state fields a parity bit for the write back bit and the FIFO replacement bit The R4600 R4700 processor primary D cache has the following characteristics e write back or write through on a per page basis e two way set associative indexed with a virtual address checked with a physical tag organized with 8 word 32 byte cache line 11 3 Cache Organization Operation and Coherency Chapter 11 Figure 11 4shows the format of a primary D cache line 29 28 27 26 25 24 23 FIWJIWI P CS 1 1 1 1 2 FIFO Replacement Bit Even parity for the write back bit Write back bit set if cache line has been written Even parity for the PTag and CS fields Primary cache state O Invalid 1 Shared 2 Clean Exclusive 3 Dirty Exclusive Physical tag bits 35 12 of the physical address Even parity for the data 1 bit per byte Cache data Figure 11 4 R4600 R4700 8 Word Primary Data Cache Line Format In the R4600 R4700 th
122. the process executing at the time is handed a segmentation violation signal This error is usually fatal to the process incurring the exception To resume execution the EPC register must be altered so that the unaligned reference instruction does not re execute this is accomplished by adding a value of 4 to the EPC register EPC register 4 before returning If an unaligned reference instruction is in a branch delay slot interpretation of the branch instruction is required to resume execution CPU Exception Processing Chapter 5 TLB Exceptions This section explains the TLB Exceptions For specifics on the exceptions listed here refer to the following three subsections Three types of TLB exceptions can occur e TLB Refill occurs when there is no TLB entry that matches an at tempted reference to a mapped address space e TLB Invalid occurs when a virtual address reference matches a TLB entry that is marked invalid e TLB Modified occurs when a store operation virtual address reference to memory matches a TLB entry which is marked valid but is not dirty the entry is not writable The following three subsections describe the TLB exceptions TLB Refill Exception This subsection explains the TLB refill exception Cause The TLB refill exception occurs when there is no TLB entry to match a reference to a mapped address space This exception is not maskable Processing There are two special exception vectors for this e
123. these instructions are killed before they commit changes in state other than noted by d The pipestage corresponding to the stage field is given in parentheses Table E 1 Coprocessor 0 Hazards Coprocessor 0 Hazards Appendix E Certain combinations of instructions are not permitted because the results of executing such combinations are unpredictable in the face of the events such as pipeline delays cache misses interrupts and excep tions Most hazards result from instructions modifying and reading state in different pipeline stages Such hazards are defined between pairs of instructions not on a single instruction in isolation Other hazards are associated with restartability of instructions in the presence of exceptions
124. transferred can actually be doubleword partial doubleword word or partial word 12 19 System Interface SCycle SClock Chapter 12 Master gt I 1 fee Th ae eae e et e 8 28 5 G0 a4 12 bee a eg ae SysAD Bus V addr Datao Y SysCmd Bus ValidOut Validin Release Figure 12 15 Processor Noncoherent Word Write Request Protocol The R4600 R4700 interface requires that WrRdy be asserted two system cycles prior to the issue of a write for one clock cycle An external agent that deasserts WrRdy immediately upon receiving the write that fills its buffer will stop a subsequent write for four system cycles in R4000 non block write compatible mode This leaves two null system cycles after a write address data pair to give the external agent time to stop the next write This is illustrated in Figure 12 6 on page 12 7 An Address data pair every four system cycles is not sufficiently high performance for all applications For this reason the R4600 R4700 provides two new protocol options that modify the R4000 back to back write protocol to allow an address data pair every two system cycles The first protocol called write re issue allows WrRdy to be deasserted during the address cycle and forces a write to be re issued The second called pipelined writes leaves the sample point of WrRdy unchanged and requires that the external agent accept one more writ
125. with what happens in the DCache The processor does not check addresses received from the system interface and does not generate correct check bits for addresses transmitted to the system interface The processor does not contain a data corrector instead the processor takes a cache error exception when it detects an error based on data check bits Software is responsible for error handling System Interface Command Bus In the R4600 R4700 processor the system interface command bus has no parity SysCmdP always drives zero out for CPU valid cycles and is not checked when the system interface is in slave state 14 2 R4600 R4700 Error Checking Chapter 14 Summary of Error Checking Operations Error Checking operations are summarized in Table 14 1 and Table 14 2 Primary Cache Primary Cache Uncached Uncached Load from System Write to System Cache Bus Load Store Interface Interface Instruction Processor Data From System Not From System Inter Checked Trap Check on Interface Checked face unchanged on Error cache write back Trap on Error System Interface Not Not Not Generated Not Generated Not Generated Address Com Generated Generated mand and Check Bits Transmit System Interface Not Checked NA Not Checked NA NA Address Com mand and Check Bits Receive System Interface Checked From Pro Checked Trap on From Primary From Primary Data Trap on Error ce
126. zero divisor When the operation completes the quotient word of the double result is loaded into special register LO and the remainder word of the double result is loaded into special register HI If either of the two preceding instructions is MFHI or MFLO the results of those instructions are undefined Correct operation requires separating reads of HI or LO from writes by two or more instructions Operation T 2 LO e undefined HI lt undefined T 1 LO lt undefined HI lt undefined T q lt 0 GPR rs 31_0 div 0 GPR rt 31_ 0 lt 0 GPR rs 31 9 mod 0 GPR rt 31 0 LO lt 31 Il 931 0 HI lt f34 131 0 Exceptions None CPU Instruction Set Details Appendix A Doubl dM F DMFCO System Control Coprocessok DMFCO 31 26 25 21 20 16 15 1110 0 COPO DMF rt rd 0 010000 00001 000 0000 0000 6 5 5 5 11 Format DMFCO rt rd Description The contents of coprocessor register rd of the CPO are loaded into general register rt This operation is defined in kernel mode regardless of the setting of the Status KX bit Execution of this instruction with in supervisor mode with Status SX 0 or in user mode with UX 0 causes a reserved instruction exception All 64 bits of the general register destination are written from the coprocessor register source The operation of DMFCO on a 32 bit coprocessor O register is undefined Operation T data lt CPR 0 rd T 1 GPR
127. 0 16 15 11 10 6 5 0 Format CVT S fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single binary floating point format The result is placed in the floating point register specified by fd Rounding occurs according to the currently specified rounding mode This instruction is valid only for conversions from double floating point format or from 32 bit or 64 bit fixed point format The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation l StoreFPR fd S ConvertFmt ValueFPR fs fmt fmt S Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception Underflow exception FPU Instruction Set Details CVT W fmt Floating Point Convert to Fixed Point Format Appendix B CVT W fmt 31 26 25 21 20 16 15 11 10 6 5 0 COP1 fmt 0 fs fd CVT W 010001 00000 100100 6 5 5 5 5 6 Format CVT W fmt fd fs Description The contents of the flo
128. 0000 001001 6 5 5 5 5 6 Format JALR rs JALR rd rs Description The program unconditionally jumps to the address contained in general register rs with a delay of one instruction The address of the instruction after the delay slot is placed in general register rd The default value of rd if omitted in the assembly language instruction is 31 Register specifiers rs and rd may not be equal because such an instruction does not have the same effect when re executed However an attempt to execute this instruction is not trapped and the result of executing such an instruction is undefined Since instructions must be word aligned a J ump and Link Register instruction must specify a target register rs whose two low order bits are zero If these low order bits are not zero an address exception will occur when the jump target instruction is subsequently fetched Operation T temp lt GPR rs GPR rd PC 8 T 1 PC lt temp Exceptions None CPU Instruction Set Details JR Jump Register Appendix A JR 31 26 25 21 20 65 0 SPECIAL rs 0 JR 000000 0000000 0000 0000 001000 6 5 15 6 Format JR rs Description The program unconditionally jumps to the address contained in general register rs with a delay of one instruction Since instructions must be word aligned a J ump Register instruction must specify a target register rs whose two low order bits are zero If these l
129. 01 00000 001111 5 5 5 5 6 Format FLOOR W fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed point format The result is placed in the floating point register specified by fd Regardless of the setting of the current rounding mode the conversion is rounded as if the current rounding mode is round to RM 3 This instruction is valid only for conversion from a single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid When the source operand is an Infinity or NaN or the correctly rounded integer result is outside of 231 to 234 1 an Invalid operation exception is raised If Invalid operation is not enabled then no exception is taken and 234 1 is returned Operation T StoreFPR fd W ConvertFmt ValueFPR fs fmt fmt W Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU Instruction Set Details Appendix B Load Doubleword
130. 041 0 Bit 31 30 29 28 27 26 25 24 23 22 21 0 CFc2 9 1 0 0 1 0 0 0 0 1 0 Se A Ke vV Opcode Coprocessor Suboperation Coprocessor Unit Number CPU Instruction Set Details COPz Coprocessor Operation Appendix A COPz 31 26 25 24 0 COPz CO cofun 0100xx 1 6 1 25 Format COPz cofun Description A coprocessor operation is performed The operation may specify and reference internal coprocessor registers and may change the state of the coprocessor condition line but does not modify state within the processor or the cache memory system Details of coprocessor operations are contained in Appendix B Operati on iz CoprocessorOperation z cofun Exceptions Coprocessor unusable exception Coprocessor interrupt or Floating Point Exception Opcode Bit Encoding COPZ Bit 3130 29 28 27 26 25 3 copo 1 o 0 o o 1 Bit 31 30 29 28 27 26 25 cop1 0 1 0 Tola Bit 3130 29 28 27 26 25 cop2 9 1 0 0 1 041 Se Opcode CO sub opcode see end of Appendix A Geprocessor Unit Number CPU Instruction Set Details Appendix A CTCz Move Control to Coprocessor CTCz 31 26 25 21 20 16 15 11 10 0 COPz CT rt rd 0 0100xx 00110 000 0000 0000 6 5 5 5 11 Format CTCz rt rd Description The contents of general register rt are loaded into control reg
131. 1 e Doubleword Move From Coprocessor 1 DMFC1 Like the floating point load and store operations these operations perform no format conversions and never cause floating point exceptions Floating Point Unit Chapter 6 Load Delay and Hardware Interlocks The instruction immediately following a load can use the contents of the loaded register In such cases the hardware interlocks requiring additional real cycles for this reason scheduling load delay slots is desirable although it is not required for functional code Data Alignment All coprocessor loads and stores reference the following aligned data items For word loads and stores the access type is always WORD and the low order 2 bits of the address must always be 0 For doubleword loads and stores the access type is always DOUBLE WORD and the low order 3 bits of the address must always be 0 Endianness Regardless of byte numbering order endianness of the data the address specifies the byte that has the smallest byte address in the addressed field For a big endian system it is the leftmost byte for a little endian system it is the rightmost byte Floating Point Conversion Instructions Conversion instructions perform conversions between the various data formats such as single or double precision fixed or floating point formats Table 6 10 on page 10 lists conversion instructions Appendix B gives a detailed description of each instruction Floating
132. 1 3 PAddrs o xor ReverseEndian 0 mem lt LoadMemory uncached HALFWORD pAddr vAddr DATA byte lt vAddro o xor BigEndianCPU2 0 16 GPR rt lt mem454 8 byte I MeM15 8 byte 8 byte Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details Appendix A LHU Load Halfword Unsigned LHU 31 26 25 21 20 16 15 0 LHU base rt offset 100101 6 5 5 16 Format LHU rt offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of the halfword at the memory location specified by the effective address are zero extended and loaded into general register rt If the least significant bit of the effective address is non zero an address error exception occurs Operation T vAddr lt offset s offsety5 o GPR base pAddr uncached lt AddressTranslation vAddr DATA pAddr pAddrpgize 1 3 DAddre 9 xor ReverseEndian 0 mem lt LoadMemory uncached HALFWORD pAddr vAddr DATA byte lt vAddrs 9 xor BigEndianCPU2 0 GPR rt 0 8 mem45 8 byte 8 byte Exceptions TLB refill exception TLB invalid exception Bus Error exception Address error exception CPU Instruction Set Details Appendix A LL Load Linked LL 0 31 26 25
133. 1 000 0000 0000 0000 0000 000010 6 1 19 6 Format TLBWI Description The G bit of the TLB is written with the logical AND of the G bits in the EntryLoO and EntryLo1 registers The TLB entry pointed at by the contents of the TLB Index register is loaded with the contents of the EntryHi and EntryLo registers The operation is invalid and the results are unspecified if the contents of the TLB Index register are greater than the number of TLB entries in the processor Operation T TLB Indexs 9 lt PageMask EntryHi and not PageMask EntryLo1 EntryLo0 Exceptions Coprocessor unusable exception A 140 CPU Instruction Set Details Appendix A TLBWR Write Random TLB Entry TLBWR 31 26 25 24 6 5 0 COPO CO 0 TLBWR 010000 1 000 0000 0000 0000 0000 000110 6 1 19 6 Format TLBWR Description The G bit of the TLB is written with the logical AND of the G bits in the EntryLoO and EntryLo1 registers The TLB entry pointed at by the contents of the TLB Random register is loaded with the contents of the EntryHi and EntryLo registers Operation T TLB Randoms 9 lt PageMask EntryHi and not PageMask EntryLo1 EntryLoO Exceptions Coprocessor unusable exception A 141 CPU Instruction Set Details TLT Trap If Less Than Appendix A TLT 31 26 25 21 20 16 15 0 SPECIAL rs rt code TLT 000000 110010 6 5
134. 1 2 Cache Support in the R4600 R4700 Cache Line Size A cache line is the smallest unit of information that can be fetched from memory to be filled into the cache A primary cache line is 8 words in length and is represented by a single tag Upon a cache miss in the primary cache the missing cache line is loaded from memory into the primary cache Cache Organization and Accessibility This section describes the organization of the primary cache including the manner in which it is mapped the addressing used to index the cache and composition of the cache lines The primary instruction and data caches are indexed with a virtual address VA 11 2 Cache Organization Operation and Coherency Chapter 11 Organization of the Primary Instruction Cache I Cache Each line of primary I cache data although it is actually an instruction it is referred to as data to distinguish it from its tag has an associated 28 bit tag that contains a 24 bit physical address a single valid bit a reserved bit a single parity bit and the FIFO replacement bit Word parity is used on I cache data The R4600 R4700 processor primary I cache has the following characteristics e two way set associative e indexed with a virtual address e checked with a physical tag e organized with 8 word 32 byte cache line Figure 11 3 shows the format of a primary I cache line 27 26 25 24 23 F P O V PTag 1 1 24 Physical tag bits 35 12 of
135. 11 12 13 addressO E F G H 4 5 1415 after 6 7 ReverseEndian Operation T vAddr lt offset s offset 45 0 GPR base pAddr uncached AddressTranslation vAddr DATA z pAddr pAddrps ze 1 3 Addr2 0 xor If BigEndianMem 0 then pAddr lt pAddrpgize 31 3 Il 0 endif byte vAddr _ xor BigEndianCPU data GPR rtles e byte 08 e Given a doubleword in a register and a doubleword in memory the operation of SDR is as follows A 109 CPU Instruction Set Details Appendix A TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception A 110 SDR Register A B C D E F G Memory J K L M N O BigEndianCPU 0 BigEndianCPU 1 a offset offset vAddr2 0 destination type LEM BEM destination type LEM BEM 0 ABCDEFGH 7 0 0 HJKLMNO P O 7 0 1 BCDEF GHP 6 1 0 GHKLMNOP 1 6 0 2 CDEFGHOP 5 2 0 FGHLMNOP 2 5 0 3 DEFGHNOP 4 3 0 EF GHMNO P 3 4 0 4 EF GHMNOP 3 4 0 DEFGHNOP 4 3 0 5 F GHLMNOP 2 5 0 CDEFGHOP 5 2 0 6 GHKLMNOP 1 6 0 BCDEFGH P 6 1 0 7 HJ KLMNOP 0 7 0 ABCDEFGH 7 0 0 LEM Little endian memory BigEndianMem 0 BEM BigEndianMem 1 Type AccessType see Table 2 1 on page 2 3 sent to memory Offset pAddro o sent to memory Exceptions CPU Instruction Set Details
136. 2 ISS SSNOPHJ 2 0 5 6 S SSSSJKLH 2 4 1 SSSSOPGH 1 0 6 7 SSSSIJKL 3 4 0 ISSSSPFGQHJ 0 0 7 Key to table LEMLittle endian memory BigEndianMem 0 BEMBigEndianMem 1 TypeAccessType see Table 2 1 on page 3 sent to memory OffsetpAddry 9 sent to memory Ssign extend of destinations Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details Appendix A LWR Load Word Right LWR 31 26 25 21 20 16 15 0 LWR 100110 base rt offset 6 5 5 16 Format LWR rt offset base Description This instruction can be used in combination with the LWL instruction to load a register with four consecutive bytes from memory when the bytes cross a word boundary LWR loads the right portion of the register with the appropriate part of the low order word LWL loads the left portion of the register with the appropriate part of the high order word The LWR instruction adds its sign extended 16 bit offset to the contents of general register base to form a virtual address which can specify an arbitrary byte It reads bytes only from the word in memory which contains the specified starting byte From one to four bytes will be loaded depending on the starting byte specified The loaded word is sign extended Conceptually it starts at the specified byte in memory and loads that byte into the low order r
137. 21 20 16 15 LL base rt offset 110000 6 5 5 16 Format LL rt offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form a virtual address The contents of the word at the memory location specified by the effective address are loaded into general register rt The loaded word is sign extended This instruction implicitly performs a SYNC operation all loads and stores to shared memory fetched prior to the LL must access memory before the LL and loads and stores to shared memory fetched subsequent to the LL must access memory after the LL The processor begins checking the accessed word for modification by other processors and devices Load Linked and Store Conditional can be used to atomically update memory locations as shown LL T1 TO ADD T2 TI 1 SC T2 TO BEQ T72 0 LI1 NOP This atomically increments the word addressed by TO Changing the ADD to an OR changes this to an atomic bit set This instruction is available in User mode and it is not necessary for CPO to be enabled The operation of LL is undefined if the addressed location is uncached and for synchronization between multiple processors the operation of LL is undefined if the addressed location is noncoherent A cache miss that occurs between LL and SC may cause SC to fail so no load or store operation should occur between LL and SC otherwise the SC may never be successful Exceptions also
138. 4700 Data Tag A Data Set A Instruction Set A DTLB Physical Store Buffer ih Data Tag B aS uN co SysAD H Instruction Select 4 Write Buffer Address Buffer 4 Instruction Register Read Buffer Instruction Tag A 4 ITLB Physical Data Set B Instruction Set B Instruction Tag B IBus i Control DBus Vv Floating point 5 Load Aligner Register File F Joint TLB a Integer Register File Unpacker Packer 5 2 rN 9 Integer Address Adder A A Data TLB Virtual Floating point E Coprocessor 0 DvA 2 Add Sub Cvt Div Sqrt o Shifter Store Aligner Integer Divide 3 p E Logic Unit 2 PC Incrementer Floating point Integer System Memory Multiply Control Branch Adder IVA Instruction TLB Virtual Phase Lock Loop Clocks Program Counter Figure 1 1 R4600 R4700 Block Diagram Pipeline Overview The R4600 R4700 uses a 5 stage pipeline similar to the IDT79R3000 The simplicity of this pipeline allows the R4600 R4700 to be lower cost and lower power than super scalar or super pipelined processors Unlike the R3000 the R4600 R4700 does virtual to physical translation in parallel with cache access This allows the R4600 R4700 to operate at over twice the frequency of the R3000 and to support a lar
139. 5 0 SCD base rt offset 111100 6 5 5 16 Format SCD rt offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form a virtual address The contents of general register rt are conditionally stored at the memory location specified by the effective address This instruction implicitly performs a SYNC operation loads and stores to shared memory fetched prior to the SCD must access memory before the SCD loads and stores to shared memory fetched subsequent to the SCD must access memory after the SCD If any other processor or device has modified the physical address since the time of the previous Load Linked Doubleword instruction or if an ERET instruction occurs between the Load Linked Doubleword instruction and this store instruction the store fails and is inhibited from taking place The success or failure of the store operation as defined above is indicated by the contents of general register rt after execution of the instruction A successful store sets the contents of general register rt to 1 an unsuccessful store sets it to 0 The operation of Store Conditional Doubleword is undefined when the address is different from the address used in the last Load Linked Doubleword This instruction is available in User mode it is not necessary for CPO to be enabled If either of the three least significant bits of the effective address is non zero an address error exception takes
140. 5 10 6 Format TLT rs rt Description The contents of general register rt are compared to general register rs Considering both quantities as signed integers if the contents of general register rs are less than the contents of general register rt a trap exception occurs The code field is available for use as software parameters but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction Operation T if GPR rs lt GPR rt then TrapException endif Exceptions Trap exception A 142 CPU Instruction Set Details TLTI Trap If Less Than Immediate Appendix A TLTI TLTI rs immediate Description 31 26 25 21 20 16 15 0 REGIMM rs TLTI immediate 000001 01010 6 5 5 16 Format The 16 bit immediate is sign extended and compared to the contents of general register rs Considering both quantities as signed integers if the contents of general register rs are less than the sign extended immediate a trap exception occurs Operation T if GPR rs lt immediate5 immediate 5 o then TrapException endif Exceptions Trap exception A 143 CPU Instruction Set Details TLTI U Trap If Less Than Immediate Unsigned TLTI U Appendix A TLTIU rs immediate Description 31 26 25 21 20 16 15 0 REGIMM rs TLTIU immediate 000001 01011 6 5 5 16 Format The
141. 6 bit immediate value branch displacement or address displacement target 26 bit jump target address rd 5 bit destination register specifier sa 5 bit shift amount funct 6 bit function field Figure 2 1 CPU Instruction Formats In the MIPS architecture coprocessor instructions are implementation dependent refer to Appendix A for details of individual Coprocessor O instructions CPU Instruction Set Summary Chapter 2 Load and Store Instructions Load and store are immediate I type instructions that move data between memory and the general registers The only addressing mode that load and store instructions directly support is base register plus 16 bit signed immediate offset Scheduling a Load Delay Slot A load instruction that does not allow its result to be used by the instruction immediately following is called a delayed load instruction The instruction slot immediately following this delayed load instruction is referred to as the load delay slot In the R4600 R4700 processor the instruction immediately following a load instruction can request the contents of the loaded register however in such cases hardware interlocks insert additional real cycles Consequently scheduling load delay slots can be desirable both for performance and R Series e g R3051 processor compatibility However the scheduling of load delay slots is not absolutely required Defining Access Types Access type indicates the size of
142. 7 4 and xis a minor revision number in bits 3 0 The revision number can distinguish some chip revisions however there is no guarantee that changes to the chip will necessarily be reflected in the PRId register or that changes to the revision number necessarily reflect real chip changes For this reason these values are not listed and software should not rely on the revision number in the PRId register to characterize the chip Certain attributes such as cache size are independent of implementation number Config Register 16 The Config register specifies various configuration options selected on R4600 R4700 processors Table 4 12 lists these options Some configuration options as defined by Config bits 31 3 are set by the hardware during reset and are included in the Config register as read only status bits for the software to access The KO field is the only read write field as indicated by Config register bits 2 0 and controlled by software on reset these fields are undefined Figure 4 16 shows the format of the Config register Table 4 12 which follows the figure describes the Config register fields Config Register 30 28 27 24 2322 2120 19 18 17 16 15 14 13 1211 98 654 3 2 0 10 0 0 1 0JB 1 1 0 Ic DC IB DB 0 KO Memory Management Chapter 4 Field Description System clock ratio 0 gt processor clock frequency divided by 2 1 processor clock frequency divided b
143. 9 bit command bus SysCmd e six handshake signals RdRdy WrRdy ExtRaqst Release Validin ValidOut The processor uses the system interface to access external resources in order to service processor requests such as cache misses cache line write backs write through stores and uncached operations 12 1 System Interface Chapter 12 Interface Buses Figure 12 1 shows the primary communication paths for the system interface a 64 bit address and data bus SysAD 63 0 and a 9 bit command bus SysCmd 8 0 These SysAD and the SysCmd buses are bidirectional that is they are driven by the processor to issue a processor request and by the external agent to issue an external request see Processor and External Request Protocols on page 12 14 for more infor mation A request through the system interface consists of e an address e a System interface command that specifies the precise nature of the request e a series of data elements if the request is for a write or read response External Agent SysAD 63 0 R4600 R4700 SysCmd 8 0 Figure 12 1 System Interface Buses Address and Data Cycles Cycles in which the SysAD bus contains a valid address are called address cycles Cycles in which the SysAD bus contains valid data are called data cycles Validity is determined by the state of the ValidIn and ValidOut signals described in Interface Buses on page 12 2 The SysCmd bus
144. About Cache Operations Cache Operations Tables Details on the Fill_I Equation Standby Mode Operation Entering Standby Mode Coprocessor 0 Hazards Chapter 13 13 1 13 1 13 1 13 1 Chapter 14 14 1 14 1 14 1 14 1 14 2 14 2 14 2 14 3 Appendix A A 1 A 1 A 2 A 2 A 4 A 4 A 5 A 6 A 6 A 151 Appendix B B 1 B 1 B 3 B 4 B 4 B 4 B 5 B 6 B 45 Appendix C C 1 C 1 C 1 C 3 Appendix D D 1 Appendix E dt Integrated Device Technology Inc List of Figures Number Figure 1 1 Figure 1 2 Figure 1 3 Figure 1 4 Figure 1 5 Figure 1 6 Figure 1 7 Figure 1 8 Figure 1 9 Figure 1 10 Figure 1 11 Figure 2 1 Figure 3 1 Figure 3 2 Figure 3 3 Figure 3 4 Figure 3 5 Figure 3 6 Figure 3 7 Figure 3 8 Figure 4 1 Figure 4 2 Figure 4 3 Figure 4 4 Figure 4 5 Figure 4 6 Figure 4 7 Figure 4 8 Figure 4 9 Figure 4 10 Figure 4 11 Figure 4 12 Figure 4 13 Figure 4 14 Figure 4 15 Figure 4 16 Figure 4 17 Figure 4 18 Figure 4 19 Figure Title R4600 R4700 Block Diagram R4600 R4700 CPU Registers CPU Instruction Formats Big Endian Byte Ordering Little Endian Byte Ordering Little Endian Data in a Doubleword Big Endian Data in a Doubleword Big Endian Misaligned Word Addressing Little Endian Misaligned Word Addressing R4600 R4700 CPO Registers Typical System Block Diagram CPU Instruction Formats Instruction Pipeline Stages CPU Pipeline A
145. Bit 5 of the Interrupt register in the R4600 R4700 is ORed with the Int S5 pin and then multiplexed with the internal TimerInterrupt signal This result is directly readable as bit 15 of the Cause register e Bits 4 0 of the Interrupt register are bit wise ORed with the current value of the interrupt pins Int 4 0 and the result is directly readable as bits 14 10 of the Cause register Timer Interrupt register SClock gt 5 4 3 2l 0 Cai OR gate D multiplexer 3 Int 5 Int 3 Int 1 Int 4 Int 2 Int 0 Figure 13 2 R4600 R4700 Interrupt Signals Figure 13 3 shows the internal derivation of the NMI signal for the R4600 R4700 processor The NMI pin is latched into an internal register by the rising edge of SClock Bit 6 of the Interrupt register is then ORed with the inverted value of NMI to form the nonmaskable interrupt Only the one falling edge of the latched signal will cause the NMI Interrupt register 6 Internal register Internal Figure 13 3 R4600 R4700 Nonmaskable Interrupt Signal 13 2 R4600 R4700 Processor Interrupts Chapter 13 Figure 13 4 shows the masking of the R4600 R4700 interrupt signal e Cause register bits 15 8 IP7 IPO are AND ORed with Status register interrupt mask bits 15 8 IM7 IMO to mask individual interrupts e Status register bit O is a global Interrupt Enable IE It is ANDed with
146. Cause Register Format EPC Register Format XContext Register Format ECC Register Format CacheErr Register Format ErrorEPC Register Format Reset Exception Processing Cache Error Exception Processing Soft Reset and NMI Exception Processing General Exception Processing Except Reset Soft Reset NMI and Cache Error General Exception Handler HW General Exception Servicing Guide lines SW TLB XTLB Miss Exception Handler HW TLB XTLB Exception Servicing Guide lines SW Cache Error Exception Handling HW and Servicing Guidelines SW Reset Soft Reset amp NMI Exception Handling HW and Servicing Guidelines SW FPU Functional Block Diagram FPU Registers Implementation Revision Register FP Control Status Register Bit Assignments Control Status Register Cause Flag and Enable Fields Single Precision Floating Point Format Double Precision Floating Point Format Binary Fixed Point Format FPU Instruction Pipeline Control Status Register Exception Flag Trap Enable Bits R4600 R4700 Processor Signals Power on Reset Cold Reset Warm Reset Page a A A A A A A ol 1 WDNYNNOOCOHNAWOWD agg gq g ol ol el oe List of Figures List of Figures Number Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
147. Caveats About Cache Operations 1 All cycle counts are in processor cycles 2 All cache ops have lower priority than cache misses write backs and external requests If the write back buffer contains unwritten data when a cache op is executed the write back buffer will be retired before the cache op is begun If an instruction cache miss occurs at the same time as a cache op is executed the instruction cache miss will be handled first Cache ops are mutually exclusive with respect to data cache misses External requests will be completed before beginning a cache op 3 For all data cache ops the cache op machine waits for the store buffer and response buffer to empty before beginning the cache op This can add 3 cycles to any data cache op if there is data in the response buffer or store buffer The response buffer contains data from the last data cache miss that has not yet been written to the data cache The store buffer contains delayed store data waiting to be written to the data cache 4 Cache ops of the form xx xx_Writeback_xxxx may perform a write back which will fill the write back buffer Write backs can affect subsequent cache ops since they will stall until the write back buffer is written back to memory Cache ops which fill the write back buffer are noted as writeback in the following tables 5 All cycle counts are best case assuming no interference from the mechanisms described above Cache Operations Tables Table C 1 an
148. Clock 1 0 Receive clocks Output Two identical receive clocks that establish the System interface frequency SyncOut Synchronization Output SyncOut must be connected to clock out Syncin through an interconnect that models the interconnect between MasterOut TClock RClock and the external agent SyncIn Synchronization Input Synchronization clock input clock in TClock 1 0 Transmit clocks Output Two identical transmit clocks that establish the System inter face frequency Fault Fault Output Reserved for future output Always High VecP Quiet Vcc for PLL Input Quiet Vcc for the internal phase locked loop VssP Quiet Vss for PLL Input Quiet Vss for the internal phase locked loop Table 8 2 Clock Control Interface Signals Processor Signal Descriptions Chapter 8 Interrupt Interface Signals The Interrupt interface signals make up the interface used by external agents to interrupt the R4600 R4700 processor Six hardware interrupts Int 5 0 and one NMI are available on the R4600 R4700 Table 8 3 lists the Interrupt interface signals Name Definition Direction Description Int 5 0 Interrupt Input Six general processor interrupts bit wise ORed with bits 5 0 of the interrupt register NMI Nonmaskable Input Nonmaskable interrupt ORed with bit 6 of the interrupt interrupt register Table 8 3 Interrupt Interface Signals J TAG Interface Signals The R4600 R4700 does not implement JTAG
149. Code Description MIPS ISA Level JR Jump Register JALR Jump And Link Register BEQ Branch on Equal BNE Branch on Not Equal BLEZ Branch on Less Than or Equal to Zero BGTZ Branch on Greater Than Zero BLTZ Branch on Less Than Zero BGEZ Branch on Greater Than or Equal to Zero BLTZAL Branch on Less Than Zero And Link BGEZAL Branch on Greater Than or Equal to Zero I Table 1 5 CPU Instruction Set J ump and Branch Instruction And Link OpCode Description MIPS ISA Level SLL Shift Left Logical I SRL Shift Right Logical I SRA Shift Right Arithmetic I SLLV Shift Left Logical Variable I SRLV Shift Right Logical Variable I SRAV Shift Right Arithmetic Variable I Table 1 6 CPU Instruction Set Shift Instructions OpCode Description MIPS ISA Level LWCz Load Word to Coprocessor z I SWCz Store Word from Coprocessor z I MTCz Move To Coprocessor z I MFCz Move From Coprocessor z I CTCz Move Control to Coprocessor z I CFCz Move Control From Coprocessor z I COPz Coprocessor Operation z I BCzT Branch on Coprocessor z True I BCzF Branch on Coprocessor z False I Table 1 7 Instruction Set Coprocessor Instructions OpCode Description MIPS ISA Level SYSCALL System Call I BREAK Break I Table 1 8 CPU Instruction Set Special Instructions 1 9 Overview Chapter 1 OpCode Description MIPS ISA Level LD
150. D E F GJ H Memory J K L MIN OJP BigEndianCPU 0 BigEndianCPU 1 vAddro 9 destination type offset destination type offset LEM BEM LEM BEM 0 SSSSMNOP 0 0 4 SSSSEFGI 0 7 0 1 SSSSEMNO 1 1 4 SSSSEFI J 1 6 0 2 SSSSEFMN 2 2 4 SSSSEIJ K 2 5 0 3 SSSSEFGM 3 3 4 SSSSIJSJKL 3 4 0 4 SSSSIJKLI 0 4 0 SSSSEFGM 0 3 4 5 SSSSEIJKJ 1 5 0 SSSSEFMN 1 2 4 6 SSSSEFI J 2 6 0 SSSSEMNOJ 2 1 4 7 SSSSEFGI 3 7 0 SSSSMNOP 3 0 4 Key to table LEMLittle endian memory BigEndianMem 0 BEMBigEndianMem 1 TypeAccessType see Table 2 1 on page 3 sent to memory OffsetpAddrg 9 sent to memory Ssign extend of destination Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details LWU Load Word Unsigned Appendix A LWU Description LWU rt offset base 31 26 25 21 20 16 15 0 LWU base rt offset 1011141 6 5 5 16 Format The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of the word at the memory location specified by the effective address are loaded into general register rt The loaded word is zero extended If either of the two least significant bits of the effective address is non zero an address error exception occurs Operation T vAddr lt offset s offset s 9 GPR base pAddr uncached lt AddressTra
151. ENLO and Write into the TLB There could be a TLB miss again during the mapping of the data or instruction address The processor will jump to the general exception vector since the EXL is 1 Option to complete the first level refill in the general exception handler or ERET to the original instruction and take the exception again ERET is not allowed in the branch delay slot of another Jump Instruction Processor does not execute the instruction which is in the ERET s branch delay slot PC amp EPC EXL lt 0 LLbit 0 CPU Exception Processing Cache Error Exception Handling HW Servicing Guidelines SW Note Can be masked disabled by DE SR16 bit 1 Set CacheErr Reg ErrEPC lt PC 4 ErrEPC lt PC gt ERL lt 1 0 normal lt gt 1 bootstrap Base is sign extended for 64 bits PC lt OxFFFF FFFF A000 0000 PC OxFFFF FFFF BFCO 0200 100 100 unmapped uncached unmapped uncached Commenis Unmapped Uncached vector so TLB related amp Cache Error Exception not possible ERL 1 so Interrupt exceptions disabled Service Code OS System to avoid all other exceptions Only Reset Soft Reset NMI exceptions possible ERET is not allowed in the branch delay slot of another Jump Instruction Processor does not execute the instruction which is in the ERET s branch delay slot PC amp ErrorEPC ERL 0 L
152. Figure Figure Figure Figure Figure Figure Figure Figure Figure 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 11 1 11 2 11 3 11 4 11 5 11 6 11 7 11 8 11 9 l1 12 10 1 12 2 12 3 12 4 12 5 12 6 12 7 12 8 12 9 12 12 12 12 12 12 12 12 12 10 11 12 13 14 15 16 17 18 Figure Title Signal Transitions Clock to Q Delay Processor Clocks PClock to SClock Division by 2 PLL Passive Components R4600 R4700 PLL Network R4600 R4700Processor Phase Locked System Gate Array System Without Phase Lock Using the R4600 R4700 Processor Gate Array and CMOS System Without Phase Lock Using the R4600 R4700 Processor Logical Hierarchy of Memory Cache Support in the R4600 R4700 R4600 R4700 Primary I Cache Line Format R4600 R4700 8 Word Primary Data Cache Line Format Primary Cache Data and Tag Organization Primary Data Cache State Diagram Synchronization with Test and Set Synchronization Using a Counter Test and Set using LL and SC Counter Using LL and SC System Interface Buses State of RdRdy Signal for Read Requests State of WrRdy Signal for Write Requests System Interface Register to Register Operation Requests and System Events Back to Back Write Cycle Timing R4000 compatible mode Processor Requests Processor Request External Requests External Request Read Response Processor Read Request Protocol Uncached Read Externa
153. GPR rs GPA rt Exceptions None CPU Instruction Set Details Appendix A DDIV Doubleword Divide DDIV 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt 0 DDIV 000000 00 0000 0000 011110 6 5 5 10 6 Format DDIV rs rt Description The contents of general register rs are divided by the contents of general register rt treating both operands as 2 s complement values No overflow exception occurs under any circumstances and the result of this operation is undefined when the divisor is zero This instruction is typically followed by additional instructions to check for a zero divisor and for overflow When the operation completes the quotient word of the double result is loaded into special register LO and the remainder word of the double result is loaded into special register HI If either of the two preceding instructions is MFHI or MFLO the results of those instructions are undefined Correct operation requires separating reads of HI or LO from writes by two or more instructions Operation T 2 LO lt undefined HI lt lt undefined T 1 LO lt undefined HI lt lt undefined T LO lt GPR rs div GPR rt HI lt GPRi rs mod GPRf rt Exceptions None CPU Instruction Set Details Appendix A DDIVU Doubleword Divide Unsigned DDIVU 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt 0 DDIVU 000000 000000 0000 011111 6 5 5 10 6 Forma
154. ID field to form a unique virtual address 32 bit Kernel Mode Kernel Space 0 kseg0 In Kernel mode when Status KX 0 and the most significant three bits of the virtual address are 1005 32 bit ksegO virtual address space is selected it is the current 229 byte 512 Mbyte kernel physical space References to ksegO are not mapped through the TLB the physical address selected is defined by subtracting 0x8000 0000 from the virtual address physical address 0x0000 0000 O VA 28 0 The KO field of the Config register described in this chapter controls cacheability and coherency 32 bit Kernel Mode Kernel Space 1 kseg1 In Kernel mode when Status KX 0 and the most significant three bits of the 32 bit virtual address are 1019 32 bit kseg1 virtual address space is selected it is the current 279 byte 512Mbyte kernel physical space References to kseg1 are not mapped through the TLB the physical address selected is defined by subtracting OxAO00 0000 from the virtual address physical address 0x0000 0000 O VA 28 0 Caches are disabled for accesses to these addresses and physical memory or memory mapped I O device registers are accessed directly 32 bit Kernel Mode Supervisor Space ksseg In Kernel mode when Status KX 0 and the most significant three bits of the 32 bit virtual address are 1109 the ksseg virtual address space is selected it is the current 279 byte 512Mbyte supervisor virtual space The virtu
155. IDT79R4600 and IDT79R4700 RISC Processor Hardware User s Manual Revision 2 0 April 1995 dt Integrated Device Technology Inc Integrated Device Technology Inc reserves the right to make changes to its products or specifications at any time without notice in order to improve design or performance and to supply the best possible product IDT does not assume any respon sibility for use of any circuitry described other than the circuitry embodied in an IDT product ITD makes no representations that circuitry described herein is free from patent infringement or other rights of third parties which may result from its use No license is granted by implication or otherwise under any patent patent rights or other rights of Integrated Device Technology Inc LIFE SUPPORT POLICY Integrated Device Technology s products are not authorized for use as critical components in life sup port devices or systems unless a specific written agreement pertaining to such intended use is executed between the manufacturer and an officer of IDT 1 Life support devices or systems are devices or systems that a are intended for surgical implant into the body or b support or sustain life and whose failure to perform when properly used in accor dance with instructions for use provided in the labeling can be reasonably expected to result in a sig nificant injury to the user 2 A critical component is any component of a life support devic
156. Invalidate_I writes the physical address of the cache op into the tag when it clears the valid bit which is different from the R4000 Index Write Back Invalidate Examine the cache state and W bit of the primary data cache block at the index specified by the virtual address If the state is not Invalid and the W bit is set then write back the block to memory The address to write is taken from the primary cache tag Set cache state of primary cache block to Invalid I D Index Load Tag Read the tag for the cache block at the specified index and place it into the TagLo CPO registers ignoring par ity errors Also load the data parity bits into the ECC register I D Index Store Tag Write the tag for the cache block at the specified index from the TagLo and TagHi CPO registers Create Dirty Exclusive This operation is used to avoid loading data needlessly from memory when writing new contents into an entire cache block If the cache block does not contain the specified address and the block is dirty write it back to the memory In all cases set the cache block tag to the specified physical address set the cache state to Dirty Exclusive I D Hit Invalidate If the cache block contains the specified address mark the cache block invalid Hit WriteBack Invalidate If the cache block contains the specified address write back the data if it is dirty and mark the cache block invalid F
157. JTLB will finish The data cache tag is checked against the PFN from the DTLB or JTLB for any data cache access 1W Write Back Phase one This phase is used internally by the processor to resolve all exceptions in preparation for the register file write 2W Write Back Phase two For register to register and load instructions the result is written back to the register file during the 2W stage Branch instructions perform no operation during this stage Figure 3 2 shows the activities occurring during each ALU pipeline stage for load store and branch instructions Stage an Decode ALU Load Store Instruction cache address decode Instruction cache array access Instruction address translation match Instrustion address translation read Instruction tag check Register operand fetch Instruction decode Operation stage 1 Operation stage 2 Write back to register file Data virtual address calculation Data cache address decode Data cache array access Data cache load align Address translation in JTLB stage 1 Address translation in JTLB stage 2 Data address translation match Data address translation read Data tag check Store align Data cache write Branch address calculation Figure 3 2 CPU Pipeline Activities The CPU Pipeline Chapter 3 Branch Delay The CPU pipeline has a branch delay of one cycle and a load delay of one cy
158. LB Instructions 4 23 CPU Exception Processing Chapter 5 How Exception Processing Works Exception Processing Registers Context Register 4 Bad Virtual Address Register BadVAddr 8 Count Register 9 Compare Register 11 Status Register 12 Status Register Format Status Register Modes and Access States Status Register Reset Cause Register 13 Exception Program Counter EPC Register 14 XContext Register 20 Error Checking and Correcting ECC Register 26 Cache Error CacheErr Register 27 Error Exception Program Counter Error EPC Register 30 Processor Exceptions Exception Types Reset Exception Process Cache Error Exception Process Soft Reset and NMI Exception Process General Exception Process Exception Vector Locations Priority of Exceptions Reset Exception Soft Reset Exception Nonmaskable Interrupt NMI Exception Address Error Exception TLB Exceptions TLB Refill Exception TLB Invalid Exception TLB Modified Exception Cache Error Exception Bus Error Exception Integer Overflow Exception Trap Exception 1 1 1 1 1 1 1 1 1 1 LJ J g A A g A A g g A A J J OPFWNHRFOOOANDUAWBWWWNHNHNHNKFOKOKOANDADAPBPWBWWNHH O O O O O A A A A A A A A A A coc cacc co vo NNUNUMNNB H m H H H H H H H p p H H p Table of Contents Table of Contents System Call Exception 5 26 Breakpoint Exception 5 27 Reserved Instruction Exception 5 28 Coprocessor Unusable Exception 5 29 Floating Point Exception 5 30 Interrupt Exception 5 31
159. Lbit 0 Chapter 5 CPU Exception Processing Chapter 5 Soft Reset or NMI Exception Reset Exception Status Random TLBENTRIES 1 Wired 0 Config Update 31 6 Undef 5 0 Status BEV lt 1 SR lt 1 ERL amp 1 BEV lt 1 SR lt 0O ERL lt 1 ErrorEPC e PC l PC amp OxFFFF FFFF BFCO 0000 dD E lt c 2 2 2 x i z o D 7 Q o g O Pe z Q o Note There is no indication from the processor to differentiate between NMI amp Soft Reset VET there must be a system level indication NMI Service Code Status bit 20 SR Reset Soft Reset amp NMI Servicing Guidelines SW Soft Reset Service Code Reset Service Code ERET Optional Figure 5 21 Reset Soft Reset amp NMI Exception Handling HW and Servicing Guidelines SW Floating Point Unit Chapter 6 Integrated Device Technology Inc This chapter describes the R4600 and R4700 floating point unit FPU features including the programming model instruction set and formats and the pipeline The FPU with associated system software fully conforms to the requirements of ANSI IEEE Standard 754 1985 IEEE Standard for Binary Floating Point Arithmetic In addition the MIPS architecture fully supports the recommendations of the standard and precise
160. Miss 12 11 No Secondary Cache Mode Load Miss 12 12 Store Miss 12 12 No Secondary Cache Mode Store Miss 12 12 Store Hit 12 13 No Secondary Cache Mode Store Hit 12 13 Uncached Loads or Stores 12 13 CACHE Operations 12 13 Load Linked Store Conditional Operation 12 14 Processor and External Request Protocols 12 14 Processor Request Protocols 12 14 Processor Read Request Protocol Steps 12 15 External Instruction Read Response Time 12 16 Instruction Read Latency Steps for System Clock 12 17 Notes on the Instruction Read Latency Steps 12 17 Example of Instruction Block Read With Zero Wait State 12 17 External Data Read Response Time 12 17 Data Read Latency Steps for System Clock 12 18 Notes on the Data Read Latency Steps 12 18 Example of Data Single Read With Zero Wait State 12 18 External Cycles for Read Latency 12 18 Processor Write Request Protocol 12 19 Processor Request and Flow Control 12 22 External Request Protocols 12 23 External Arbitration Protocol 12 24 External Read Request Protocol 12 24 External Null Request Protocol 12 25 External Write Request Protocol 12 26 Read Response Protocol 12 27 Data Rate Control 12 29 Read Data Pattern 12 29 Write Data Transfer Patterns 12 30 Independent Transmissions on the SysAD Bus 12 31 System Interface Endianness 12 31 System Interface Cycle Time 12 31 Release Latency 12 32 System Interface Commands and Data Identifiers 12 32 Command and Data Identifier Syntax 12 32 System Inte
161. RI rt 63 s 0 os Exceptions None CPU Instruction Set Details Doubleword Shift Left Appendix A DSLL32 DSLL32 Logical 32 31 26 25 21 20 16 15 11 10 6 5 SPECIAL 0 rt rd sa DSLL32 000000 00000 111100 6 5 5 5 5 6 Format DSLL32 rd rt sa Description Operation The contents of general register rt are shifted left by 32 sa bits inserting zeros into the low order bits The result is placed in register rd T s lt 1 sa GPR rd GPRIrt 63 s 0 os Exceptions None CPU Instruction Set Details Appendix A Doubl d DS RA Shift Right Arithmetic DS RA 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL 0 rt rd sa DSRA 000000 00000 111011 6 5 5 5 5 6 Format DSRA rd rt sa Description The contents of general register rt are shifted right by sa bits sign extending the high order bits The result is placed in register rd Operation T s lt 0 sa GPR rd GPR rt g3 GPR rt 63 5 Exceptions None CPU Instruction Set Details Appendix A Doubl d Shift Right DS RAV Arithmetic Variable DS RAV 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 DSRAV 000000 00000 010111 6 5 5 5 5 6 Format DSRAV rd rt rs Description The contents of general register rt are shifted right by the number of bits specified by the low order six bits of general register r
162. Reverse Endian RE bit bit 25 reverses the endianness of the machine The processor can be configured as either little endian or big endian at system reset This selection is always used in Kernel and Supervisor modes and also in User mode when the RE bit is 0 Setting the RE bit to 1 inverts the User mode endianness Status Register Format Figure 5 5 shows the format of the Status register Table 5 3 which follows the figure describes the Status register fields 31 28 27 26 25 24 23 22 21 20 19 CU Cu3 Cu0 Figure 5 5 Status Register CPU Exception Processing Chapter 5 Field Description CU Controls the usability of each of the four coprocessor unit numbers CPO is always usable when in Kernel mode regardless of the setting of the CUp bit 1 gt usable O gt unusable FR Enables additional floating point registers O gt 16 registers 1 gt 32 registers RE Reverse Endian bit valid in User mode BEV Controls the location of TLB refill and general exception vectors 0 gt normal l gt bootstrap SR 1 gt Indicates a soft reset or NMI has occurred CH Hit tag match and valid state or miss indication for last CACHE Hit Invalidate Hit Write Back Invalidate Hit Write Back or Hit Set Virtual for a primary cache O miss l gt hit CE Contents of the ECC register set or modify the check bits of the caches when CE 1 see description of the
163. The contents of the FPU register specified by fs are interpreted in the specified format and are copied into the FPU register specified by fd The move operation is non arithmetic no IEEE 754 exceptions occur as a result of the instruction This instruction is valid only for single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation T StoreFPR fd fmt ValueFPR fs fmt Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Unimplemented operation exception FPU Instruction Set Details Appendix B M To FPU MTC1 Coprocessol 1 MTC1 31 26 25 21 20 16 15 11 10 0 COP1 MT rt fs 0 010001 00100 0000000 0000 5 5 5 11 Format MTC1 rt fs Description The contents of register rt are loaded into the FPU general register at location fs The contents of floating point register fs is undefined for time T of the instruction immediately following this load instruction The FR bit of the Status register specifies whether all 32 registers of the R4600 are addressable If FR equals zero MTC1 loads either the high or low half of the 16
164. a little endian machine this is the rightmost byte Computational Instructions Computational instructions include all of the arithmetic floating point operations performed by the FPU Figure B 2 shows the R Type instruction format used for computational operations R Type Register 31 26 25 21 20 16 15 COP1 fmt ft fs function 5 5 COP1 is a 6 bit operation code fmt is a 5 bit format specifier fs is a 5 bit sourcel register ft is a 5 bit source2 register fd is a 5 bit destination register function is a 6 bit function field The function field indicates the floating point operation to be performed Each floating point instruction can be applied to a number of operand formats The operand format for an instruction is specified by the 5 bit format field decoding for this field is shown in Table B 4 Code Mnemonic Size Format 16 S single Binary floating point 17 D double Binary floating point 18 Reserved 19 Reserved 20 Ww single 32 bit binary fixed point 21 L longword 64 bit binary fixed point 22 31 Reserved Table B 4 Format Field Decoding FPU Instruction Set Details Appendix B Table B 5 lists all floating point instructions Code Mnemonic Operation 5 0 0 ADD Add 1 SUB Subtract 2 MUL Multiply 3 DIV Divide 4 SQRT Square root 5 ABS Absolute value 6 MOV Move 7
165. a virtual index for the primary caches Only the lower two bits bits 1 and 0 are vAddr the high bit bit 2 is zero 0 Reserved Must be written as zeroes and returns zeroes when read Table 5 8 CacheErr Register Fields Error Exception Program Counter Error EPC Register 30 The ErrorEPC register is similar to the EPC register except that ErrorEPC is used on parity error exceptions It is also used to store the program counter PC on Reset Soft Reset and nonmaskable interrupt NMI exceptions The read write ErrorEPC register contains the virtual address at which instruction processing can resume after servicing an error This address can be e the virtual address of the instruction that caused the exception e the virtual address of the immediately preceding branch or jump in struction when this address is in a branch delay slot There is no branch delay slot indication for the ErrorEPC register CPU Exception Processing Chapter 5 Figure 5 11 shows the format of the ErrorEPC register ErrorEPC Register 63 0 ErrorEPC 64 Figure 5 11 ErrorEPC Register Format Processor Exceptions This section describes the processor exceptions it describes the cause of each exception its processing by the hardware and servicing by a handler software The types of exception with exception processing operations are described in the next section Exception Types This section gives sample excepti
166. aa aa SClock el cal Weg A Stag AES gh ap AA SysAD Bus Y addr Datao Datat Data2 Datas SysCmd Bus Read NData NData NData NEOD ValidOut Validin ExtRast Release RdRdy Figure 12 26 Block Read Transaction With One Wait State Data Rate Control The system interface supports a maximum data rate of one doubleword per cycle The data rate the processor can support is directly related to the rate at which the external agent can accept data Read Data Pattern The rate at which data is delivered to the processor can be determined by the external agent for example the external agent can drive data and assert Validin every n cycles instead of every cycle An external agent can deliver data at any rate it chooses but must not deliver data to the processor any faster than the processor is capable of receiving it The processor only accepts cycles as valid when ValidIn is asserted and the SysCmd bus contains a data identifier If the external agent sends more data items then requested e g a fifth doubleword of read response data with ValidIn asserted or the last data i e the fourth doubleword of a block read is not tagged as the last data item it is an error and the resulting actions of the processor for these cases will be undefined 12 29 System Interf
167. able 6 7 Table 6 8 Table 6 9 Table 6 10 Table 6 11 Table 6 12 Table 6 13 Table 6 14 Table 7 1 Table 7 2 Table 8 1 Table 8 2 Table 8 3 Table 8 4 Table 8 5 Table 8 6 Table Title 32 bit and 64 bit User Mode Segments 32 bit and 64 bit Supervisor Mode Segments 32 bit Kernel Mode Segments 64 bit Kernel Mode Segments Cacheability and Coherency Attributes TLB Page Coherency C Bit Values Index Register Field Descriptions Random Register Field Descriptions Mask Field Values for Page Sizes Wired Register Field Descriptions PRId Register Fields Config Register Fields Cache Tag Register Fields TLB Instructions CPO Exception Processing Registers Context Register Fields Status Register Fields Cause Register Fields Cause Register ExcCode Field XContext Register Fields ECC Register Fields CacheErr Register Fields Exception Vector Base Addresses Exception Vector Offsets Exception Priority Order List of Exception Flowcharts Floating Point Control Register Assignments FCRO Fields Control Status Register Fields Rounding Mode Bit Decoding Equations for Calculating Values in Single and Double Precision Floating Point Format Floating Point Format Parameter Values Minimum and Maximum Floating Point Values Binary Fixed Point Format Fields FPU Instruction Summary Load Move and Store Instructions FPU Instruction Summary Conversion Instruc tions FPU Instruction Summary Computational Instructi
168. ace SCycle SClock SysAD Bus SysCmd Bus NData NData Data NEOD gt ValidOut Validin ExtRqst Release Chapter 12 Figure 12 27 shows a read response with reduced data rate and with the system interface in slave state li lz2lala4a s5l6e l7 8 o 10 11 12 AA A AAS A A E ke ee ee a ae wo y Data0 i Datat y y Data2 y Data3 Figure 12 27 Read Response Reduced Data Rate System Interface in Slave State Write Data Transfer Patterns The write data pattern specifies the pattern the R4600 R4700 uses when writing a block to the external agent This pattern is specified through the mode bits A data pattern is a sequence of letters indicating the data and unused cycles that repeat to provide the appropriate data rate For example the data pattern DDxx specifies a repeatable data rate of two doublewords every four cycles with the last two cycles unused Table 12 4 lists the maximum processor data rate and the data pattern for each data rate Maximum Data Transmit Rate Block writes Data Pattern 1 Double 1 SClock Cycle DDDD 2 Doubles 3 SClock Cycles DDxDDx 1 Double 2 SClock Cycles DDxxDDxx 1 Double 2 SClock Cycles DxDxDxDx 2 Doubles 5 SClock Cycles DDxxxDDxxx 1 Double 3 SClock Cycles DDxxxxDDxxxx 1 Double 3 SClock Cycles DxxDxxDxxDxx 1 Double 4 SClock Cycles DDxxxxxxDDxxXxXXX 1 Double 4 SClock Cycles DxxxDxxxDxxxDxxx
169. ains a valid location in which to put the replacement TLB entry The contents of the EntryLo registers are undefined The EPC register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set TLB Invalid exception processing is shown in Figure 5 15 on page 13 Servicing A TLB entry is typically marked invalid when one of the following is true e a virtual address does not exist e the virtual address exists but is not in main memory a page fault e a trap is desired on any reference to the page for example to main tain a reference bit or during debug After servicing the cause of a TLB Invalid exception the TLB entry is located with TLBP TLB Probe and replaced by an entry with that entry s Valid bit set CPU Exception Processing Chapter 5 TLB Modified Exception This subsection explains the TLB modified exception Cause The TLB modified exception occurs when a store operation virtual address reference to memory matches a TLB entry that is marked valid but is not dirty and therefore is not writable This exception is not maskable Processing The common exception vector is used for this exception and the Mod code in the Cause register is set When this exception occurs the BadVAddr Context XContext and EntryHi regist
170. al then the program branches to the target address with a delay of one instruction Operation T target offset s offset 07 condition GPR rs GPR rt T 1 if condition then PC PC target endif Exceptions None CPU Instruction Set Details Appendix A BNEL Branch On Not Equal Likely BNEL 31 26 25 21 20 16 15 0 BNEL rs rt offset 010101 5 5 16 Format BNEL rs rt offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended The contents of general register rs and the contents of general register rt are compared If the two registers are not equal then the program branches to the target address with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation T target lt offset offset 07 condition GPR rs GPR rt T 1 if condition then PC lt PC target else NullifyCurrentinstruction endif Exceptions None CPU Instruction Set Details Appendix A BREAK Breakpoint BREAK 31 26 25 65 0 SPECIAL code BREAK 000000 001101 6 20 6 Format BREAK Description A breakpoint trap occurs immediately and unconditionally transferring control to the exception handler The code field is availab
171. al address is extended with the contents of the 8 bit ASID field to form a unique virtual address Memory Management Chapter 4 32 bit Kernel Mode Kernel Space 3 kseg3 In Kernel mode when Status KX 0 and the most significant three bits of the 32 bit virtual address are 111 the kseg3 virtual address space is selected it is the current 2 9 byte 512Mbyte kernel virtual space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address Address Bit Status Register Is Segment Address Range Segment Values One Of These Values Name Size KSU EXL ERL KX A 63 62 005 1 xkuseg 0x0000 0000 0000 0000 1 Thyte through 27 bytes 0x0000 OOFF FFFF FFFF A 63 62 019 1 xksseg 0x4000 0000 0000 0000 1 Thyte through 27 bytes 0x4000 OOFF FFFF FFFF A 63 62 105 KSU 002 f1 xkphys 0x8000 0000 0000 0000 8 23 byte through spaces Qr OxBFFF FFFF FFFF FFFF A 63 62 119 EXL 1 1 xkseg OxC000 0000 0000 0000 244 bytes through or OxC000 OOFF 7FFF FFFF A 63 62 115 ERL 1 1 cksegO OxFFFF FFFF 8000 0000 512 A 61 31 1 through Mbytes OxFFFF FFFF 9FFF FFFF 22 bytes A 63 62 115 1 cksegl OxFFFF FFFF A000 0000 512 A 61 31 1 through Mbytes OxFFFF FFFF BFFF FFFF 22 bytes A 63 62 115 1 cksseg OxFFFF FFFF C000 0000 512 A 61 31 1 through Mbytes OxFFFF FFFF DFFF FFFF 22 bytes
172. al read request Therefore in response to an external read request the processor returns undefined data and a data identifier with its Erroneous Data bit SysCmd 5 set The Interrupt register is the only processor internal resource available for write access by an external request The Interrupt register is accessed by an external write request with an address of 000 on bits 6 4 of the SysAD bus The interrupt register is described in detail in Chapter 13 R4600 R4700 Processor Interrupts 12 42 R4600 R4700 Processor Chapter 13 dt Interrupts Integrated Device Technology Inc Introduction The R4600 R4700 processor supports the following interrupts six hardware interrupts one internal timer interrupt two software interrupts and one nonmaskable interrupt The processor takes an exception on any interrupt This chapter describes the six hardware and single nonmaskable interrupts A description of the software and the timer interrupts can be found in Chapter 5 CPU exception processing is also described in Chapter 5 Floating point exception processing is described in Chapter 6 Hardware Interrupts The six CPU hardware interrupts can be caused by external write requests to the R4600 R4700 or can be caused through dedicated interrupt pins These pins are latched into an internal register by the rising edge of SClock Nonmaskable Interrupt NMI The nonmaskable inte
173. al to physical address translation occurs in parallel with the cache access thus further increasing performance by allowing these two operations to occur simultaneously The tag holds a 24 bit physical address and valid bit and is parity protected The instruction cache is 64 bits wide and can be refilled or accessed in a single processor cycle Instruction fetches require only 32 bits per cycle for a peak instruction bandwidth of 700 MB sec 175MHz Sequential accesses take advantage of the 64 bit fetch to reduce power dissipation and cache miss refill writes 64 bits per cycle to minimize the cache miss penalty The line size is eight instructions 32 bytes to maximize performance Data Cache For fast single cycle data access the R4600 R4700 includes a 16KB on chip data cache that is two way set associative with a fixed 32 byte eight words line size Both the D cache and the I cache can be accessed each pipeline cycle thus the data bandwidth is 1400 MB sec 175 MHz in addition to the 700 MB sec instruction bandwidth The data cache is protected with byte parity and its tag is protected with a single parity bit It is virtually indexed and physically tagged to allow simultaneous address translation and data cache access Overview Chapter 1 The normal write policy is writeback which means that a store to a cache line does not immediately cause memory to be updated This increases system performance by reducing bus traffic
174. alse the target address is branched to with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Because the internal condition signal is sampled during the previous instruction there must be at least one instruction between this instruction and a coprocessor instruction that changes the internal condition signal NOTE See the table Opcode Bit Encoding on next page or CPU Instruction Opcode Bit Encoding at the end of Appendix A Operation T 1 condition not COC z T target lt offset offset 02 T 1 if condition then PC amp PC target else NullifyCurrentInstruction endif Exceptions Coprocessor unusable exception Opcode Bit Encoding BCZFL Bit 3130 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 BCoFL O 1 9 9 9 0 0 1 0 0 0 0 0 0 1 0 Bit 31 30 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 BciFL 9 1 9 9 9 1 0 1 0 0 0 0 0 0 1 0 Bit 391 30 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 Bc2FL 9 1 9 0 1 0 0 1 0 0 10 0 0 0 1 0 Xx A KX A z7 O DA de Y v a P BC sub opcode Branch condition CPU Instruction Set Details Appendix A BCzT Branch OnCoprocessorzTrue BCzT 31 26 25 21 20 1615 0 COPz BC BCT offset 0100xx 01000 00001 6 5 5 16 Format BCzT offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset s
175. alueFPR ft fmt Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Unimplemented operation exception Invalid operation exception Inexact exception Overflow exception Underflow exception FPU Instruction Set Details Appendix B NEG fmt Floating Point Negate NEG fmt 31 26 25 21 20 16 15 11 10 6 5 0 fmt 0 fs fd NEG 010001 00000 000111 5 5 5 5 6 Format NEG fmt fd fs Description The contents of the FPU register specified by fs are interpreted in the specified format and the arithmetic negation is taken polarity of the sign bit is changed The result is placed in the FPU register specified by fd The negate operation is arithmetic an NaN operand signals invalid operation This instruction is valid only for single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation T StoreFPR fd fmt Negate ValueFPR fs fmt Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Unimplemented operation exception Invalid operation exception FPU Instruction Set Details Appen
176. an R4600 R4700 processor data item to be loaded or stored set by the load or store instruction opcode Access types are defined in Appendix A Regardless of access type or byte ordering endianness the address given specifies the low order byte in the addressed field For a big endian configuration the low order byte is the most significant byte for a little endian configuration the low order byte is the least significant byte The access type together with the three low order bits of the address define the bytes accessed within the addressed doubleword which is shown in Table 2 1 on page 2 3 CPU Instruction Set Summary Chapter 2 Only the combinations shown in Table 2 1 are permissible other combinations cause address error exceptions See Appendix A for individual descriptions of CPU load and store instructions Access Type Low Order Bytes Accessed Mnemonic Address Value Bits Big endian Little endian 63 31 0 63 31 0 2 1 O Byte Byte Doubleword 7 O O JO 3 3 Septibyte 6 o lo lo 3 3 O JO 1 3 3 Sextibyte 5 O IO JO 3 3 O 1 JO 3 3 Quintibyte 4 O IO JO 3 3 O 1 1 3 3 Word 3 O JO JO 3 3 1 1O JO Triplebyte 2 0 O JIO 1 1O JO 41 5 514 4 5 6 E 6 5 4 1 O 1 5 7 7 5 EREEEE Halfword 1 oO O JO O 1 O 1 1 10 7
177. anch Instructions Table 1 13 MIPS 2 MIPS 3 Additional Arithmetic Instructions 3 operand R type 1 11 Table 1 14 MIPS 2 MIPS 3 Additional Shift Instructions 1 11 Table 1 15 MIPS 2 MIPS 3 Additional Exception Instructions 1 12 Table 1 16 MIPS 2 MIPS 3 Additional Coprocessor Instructions 1 12 Table 1 17 CPO Instructions 1 12 Table 1 18 System Control Coprocessor CPO Register Definitions 1 17 Table 1 19 Floating Point Latency Cycles 1 18 Table 1 20 System Interface Comparison Between R4400 PC and R4600 R4700 1 23 Table 1 21 Cache Comparison Between R4400 PC and R4600 R4700 1 24 Table 1 22 TLB Comparison Between R4400 PC and R4600 R4700 1 25 Table 1 23 Pipeline Comparison Between R4400 PC and R4600 R4700 1 25 Table 1 24 Coprocessor 0 Comparison Between R4400 PC and R4600 R4700 1 26 Table 1 25 Coprocessor 1 Comparison Between R4400 PC and R4600 R4700 1 26 Table 2 1 Byte Access within a Doubleword 2 3 Table 2 2 Multiply Divide Instruction Cycle Timing 2 4 Table 3 1 Pipeline Exceptions 3 6 Table 3 2 Pipeline Interlocks 3 6 xi List of Tables List of Tables Number Table 4 1 Table 4 2 Table 4 3 Table 4 4 Table 4 5 Table 4 6 Table 4 7 Table 4 8 Table 4 9 Table 4 10 Table 4 11 Table 4 12 Table 4 13 Table 4 14 Table 5 1 Table 5 2 Table 5 3 Table 5 4 Table 5 5 Table 5 6 Table 5 7 Table 5 8 Table 5 9 Table 5 10 Table 5 11 Table 5 12 Table 6 1 Table 6 2 Table 6 3 Table 6 4 Table 6 5 Table 6 6 T
178. and eliminating the bottleneck of waiting for each store operation to finish before issuing a subsequent memory operation Software can however select write through on a per page basis when it is appropriate such as for frame buffers Associated with the Data Cache is the store buffer When the R4600 R4700 executes a Store instruction this single entry buffer gets written with the store data while the tag comparison is performed If the tag matches then the data is written into the Data Cache in the next cycle that the Data Cache is not accessed the next non load cycle The store buffer allows the R4600 R4700 to execute a store every processor cycle and to perform back to back stores without penalty Write buffer Writes to external memory whether cache miss writebacks or stores to uncached or write through addresses use the on chip write buffer The write buffer holds up to four 64 bit address and data pairs or 1 cache line to be written back The entire buffer is used for a data cache writeback and allows the processor to proceed in parallel with memory update For uncached and write through stores the write buffer significantly increases performance over the R4000 family of processors R4600 R4700 Clocks The R4600 R4700 has a number of clocks for the user First there is the pipeline clock PClock This clock is used for the pipeline and pipeline related functions internal to the R4600 R4700 It is two times the MasterClock frequenc
179. apped Uncached 0 5 GB j nmappe Cached Address error Unmapped Address error Address error Note In 32 bit virtual addressing bit 31 is sign extended through bits 63 32 Failure results in an Address Error exception Figure 4 6 Kernel Mode Address Space Memory Management Chapter 4 Table 4 3 lists the characteristics of the 32 bit kernel mode segments and Table 4 4 lists the characteristics of the 64 bit kernel mode segments Address Bit Status Register Is Segment Address Range Segment Values One Of These Values Name Size KSU EXL ERL KX A 31 0 O kuseg 0x0000 0000 2 Gbytes through 23 bytes Ox7FFF FFFF KSU 005 A 31 29 1005 O ksegO 0x8000 0000 512 or through Mbytes Ox9FFF FFFF 229 bytes A 31 29 1019 EXD 1 O kseg1 0xA000 0000 512 through Mbytes OF OxBFFF FFFF 2 bytes A 31 29 1105 ERL 1 O ksseg 0xC000 0000 512 through Mbytes OxDFFF FFFF 27 bytes A 31 29 111 O kseg3 0xE000 0000 512 through Mbytes OxFFFF FFFF 229 bytes Table 4 3 32 bit Kernel Mode Segments 32 bit Kernel Mode User Space kuseg In Kernel mode when Status KX 0 and the most significant bit of the virtual address A31 is cleared the 32 bit kuseg virtual address space is selected it covers the full 231 bytes 2 Gbytes of the current user address space The virtual address is extended with the contents of the 8 bit AS
180. apter 2 Introduction 2 1 CPU Instruction Formats 2 1 Load and Store Instructions 2 2 Scheduling a Load Delay Slot 2 2 Defining Access Types 2 2 Computational Instructions 2 4 64 bit Virtual Address Operations with 32 bit operands 2 4 Cycle Timing for Multiply and Divide Instructions 2 4 Jump and Branch Instructions 2 5 Overview of Jump Instructions 2 5 Overview of Branch Instructions 2 5 Special Instructions 2 5 Exception Instructions 2 5 Coprocessor Instructions 2 5 Table of Contents Table of Contents The CPU Pipeline Chapter 3 Introduction CPU Pipeline Operation CPU Pipeline Stages lI Instruction Fetch Phase one 21I Instruction Fetch Phase two 1R Register Fetch Phase one 2R Register Fetch Phase two 1A Execution Phase one 2A Execution Phase two 1D Data Fetch Phase one 2D Data Fetch Phase two 1W Write Back Phase one 2W Write Back Phase two Branch Delay Load Delay Interlock and Exception Handling Exception Conditions Stall Conditions Slip Conditions R4600 R4700 Write Buffer WBWWWWWWWWWHWWWWWAWWWWW Memory Management Chapte Translation Lookaside Buffer TLB Hits and Misses Multiple Matches Address Spaces Virtual Address Space Physical Address Space Virtual to Physical Address Translation 32 bit Virtual Address Translation 64 bit Virtual Address Translation Operating Modes User Mode Operations 32 bit User Mode useg 64 bit User Mode xuseg Supervisor Mode Operations 32
181. are register is write only Figure 5 4 shows the format of the Compare register Compare Register 31 0 Compare 32 Figure 5 4 Compare Register Format CPU Exception Processing Chapter 5 Status Register 12 The Status register SR is a read write register that contains the operating mode interrupt enabling and the diagnostic states of the processor The following list describes the more important Status register fields Figure 5 5 show the format of the entire register including descriptions of the fields Some of the important fields include The 8 bit Interrupt Mask IM field controls the enabling of eight inter rupt conditions Interrupts must be enabled before they can cause the exception and the corresponding bits are set in both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause register For more information refer to the Interrupt Pending IP field of the Cause register IM 1 0 are the masks for the two soft ware interrupts while IM 7 2 correspond to Int 5 0 The 4 bit Coprocessor Usability CU field controls the usability of 4 possible coprocessors Regardless of the CUO bit setting CPO is al ways usable in Kernel mode For all other cases an instruction for or access to an unusable coprocessor causes an exception The 9 bit Diagnostic Status DS field Status 24 16 is used for self testing and checks the cache and virtual memory system The
182. at the memory location specified by the effective address are zero extended and loaded into general register rt Operation T vAddr lt offset s 4 offsetys o GPR base pAddr uncached lt AddressTranslation vAddr DATA pAddr pAddrpgijze 1 3 PDAddre 9 xor ReverseEndian mem LoadMemory uncached BYTE pAddr vAddr DATA byte vAddrs 9 xor BigEndianCPU 56 GPR rt 0 mem7 byte 8 byte Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details Appendix A LD Load Doubleword LD 31 26 25 21 20 16 15 0 LD base rt offset 110111 6 5 5 16 Format LD rt offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of the 64 bit doubleword at the memory location specified by the effective address are loaded into general register rt If any of the three least significant bits of the effective address are non zero an address error exception occurs Operation T vAddr lt offset s offsets 9 GPR base pAddr uncached lt AddressTranslation vAddr DATA mem lt LoadMemory uncached DOUBLEWORD pAddr vAddr DATA GPR rt mem Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set De
183. ata appears on the SysAD bus inclusive Standby Mode Operation Appendix D dt Integrated Device Technology Inc The R4600 R4700 provides a means to reduce the amount of power consumed by the internal core when the CPU would otherwise not be performing any useful operations This is known as Standby Mode and is discussed in this appendix Entering Standby Mode To enter Standby Mode first execute the WAIT instruction When the WAIT instruction finishes the W pipe stage if the SysAD bus is currently idle the internal clocks will shut down thus freezing the pipeline The PLL internal timer some of the input pin clocks IntB 0 NMI ExtRqst Reset and ColdReset and the output clocks TClock 1 0 RClock 1 0 SyncOut ModeClock and MasterOut will continue to run If the conditions are not correct when the WAIT instruction finishes the W pipe stage i e the SysAD bus is not idle the WAIT is treated as a NOP Once the CPU is in Standby Mode any interrupt including ExtRqst or Reset will cause the CPU to exit Standby Mode Coprocessor 0 Hazards Appendix E Integrated Device Technology Inc This appendix identifies the R4600 and R4700 Coprocessor 0 hazards In Table E 1 the number of instructions required between instruction A which places a value in a CPO register and instruction B which uses the same register as a
184. ate of RdRdy Signal for Read Requests As shown in Figure 12 3 WrRdy must be asserted for one clock cycle two cycles prior to the first address cycle of the processor write request to define the address cycle as the issue cycle cycle 5 in Figure 12 3 WrRdy does not need to be asserted during the issue cycle SCycle SClock SysAD Bus WrRdy Note WrRdy must be sampled LOW at the end of cycle 3 which is marked with an asterisk Figure 12 3 State of WrRdy Signal for Write Requests The processor repeats the address cycle for the request until the condi tions for a valid issue cycle are met After the issue cycle if the processor request requires data to be sent the data transmission begins There is only one issue cycle for any processor request The processor accepts external requests even while attempting to issue a processor request by releasing the system interface to slave state in response to an assertion of ExtRqst by the external agent 12 3 System Interface Chapter 12 Note that the rules governing the issue cycle of a processor request are strictly applied to determine the action the processor takes The processor either e completes the issuance of the processor request in its entirety before the external request is accepted or e releases the system interface to slave state without completing the is suance of the processor request In the latter case the processor
185. atenated with bits 15 through 0 of the immediate value to form a 32 bit sign extended value Load and Store Instructions In the R4600 R4700 as in the case of processors the instruction immediately following a load may use the loaded contents of the register In such cases the hardware interlocks requiring additional real cycles so scheduling load delay slots is still desirable although not required for functional code Two special instructions are provided in the R4600 R4700 implementation of the MIPS ISA Load Linked and Store Conditional These instructions are used in carefully coded sequences to provide one of several synchronization primitives including test and set bit level locks semaphores and sequencers event counts In the load and store descriptions the functions listed in Table A 2 are used to summarize the handling of virtual addresses and physical memory Function Meaning AddressTranslation Uses the TLB to find the physical address given the virtual address The function fails and an exception is taken if the required translation is not present in the TLB LoadMemory Uses the cache and main memory to find the contents of the word containing the specified physical address The low order two bits of the address and the Access Type field indicates which of each of the four bytes within the data word need to be returned If the cache is enabled for this access the entire word is returned and
186. ating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed point format The result is placed in the floating point register specified by fd This instruction is valid only for conversion from a single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid When the source operand is an Infinity or NaN or the correctly rounded integer result is outside of 231 to 234 1 an Invalid operation exception is raised If Invalid operation is not enabled then no exception is taken and 231 1 is returned Operation T StoreFPR fd W ConvertFmt ValueFPR fs fmt fmt W Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU Instruction Set Details Appendix B DIV fmt Floating Point Divide DIV fmt 31 26 25 21 20 16 15 11 10 6 5 0 fmt ft fs fd DIV 010001 000011 5 5 5 5 6 Format DIV fmt fd fs ft Description The contents of the floating point registers s
187. ations as described in R4000 User s Guide always hazardous detected and 1 cycle slip inserted two new cache algorithms added to C field for non coherent write through TagLo TagHi ECC CacheErr R4400SC bits implemented but mean ingless Only bits meaningful on R4400 PC implemented VCEI and VCED not possible TagLo as described in R4000 User s Guide bits 5 3 read writeable but otherwise unused bit 2 used for F bit Exceptions as described in R4000 User s Guide VCEI VCED and WATCH exceptions not implemented Index CACHE ops I Fill CACHE op use vAddr 3 4 to select line use vAddr 3 to select set vAddrj5 5 to select line of set Index Store Tag CACHE op PRId Status CE ignored Imp 0x04 TagLo P stored if Status CE set R4600 Imp 0x20 R4700 Imp 0x21 Table 1 24 Coprocessor 0 Comparison Between R4400 PC and R4600 R4700 Item R4400 PC R4600 R4700 Possible exception stall only for operands that can cause exceptions some simplifications in detection hard ware Floating point divide separate divide unit done in floating point adder Floating point square root done in floating point adder same Converts to from 64 bit integer uses unimplemented for integer oper ands results with more than 53 bits of precision handles full 64 bit operands and results Floating point registers Status FR enables all 32 fl
188. ations that are invalid for the given source operands Examples of these operations include IEEE Standard 754 specified functions implemented in software such as Remainder x REM y where y is O or x is infinite conversion of a floating point number to a decimal format whose value causes an overflow is infinity or is NaN and transcendental functions such as In 5 or cos 1 3 Refer to Appendix B for examples or for routines to handle these cases Trap Enabled Results The original operand values are undisturbed Trap Disabled Results The FPU sets the Invalid Operation Exception flag and a quiet NaN is delivered to the destination register Division by Zero Exception Z The Division by Zero exception is signaled on an implemented divide operation if the divisor is zero and the dividend is a finite nonzero number Software can simulate this exception for other operations that produce a signed infinity such as In 0 sec z 2 csc O or O Trap Enabled Results The result register is not modified and the source registers are preserved Trap Disabled Results The result when no trap occurs is a correctly signed infinity Overflow Exception O The Overflow exception is signaled when the magnitude of the rounded floating point result with an unbounded exponent range is larger than the largest finite number of the destination format This exception also sets the Inexact exception and Flag bits Trap Enabled Results The result
189. bit Supervisor Mode User Space suseg 32 bit Supervisor Mode Supervisor Space sseg 64 bit Supervisor Mode User Space xsuseg 64 bit Supervisor Mode Current Supervisor Space xsseg 64 bit Supervisor Mode Separate Supervisor Space csseg 4 8 Kernel Mode Operations 4 8 32 bit Kernel Mode User Space kuseg 4 10 32 bit Kernel Mode Kernel Space 0 kseg0 4 10 32 bit Kernel Mode Kernel Space 1 kseg1 4 10 32 bit Kernel Mode Supervisor Space ksseg 4 10 32 bit Kernel Mode Kernel Space 3 kseg3 4 11 64 bit Kernel Mode User Space xkuseg 4 11 64 bit Kernel Mode Current Supervisor Space xksseg 4 11 64 bit Kernel Mode Physical Spaces xkphys 4 12 64 bit Kernel Mode Kernel Space xkseg 4 12 64 bit Kernel Mode Compatibility Spaces ckseg1 0 cksseg ckseg3 4 12 NNNNOOTUBPBWWNNKH REE Ee OHNAMTAAWHBHWNNNNNNNNHE ALAA ADR ADA AD A A DB DB A BDO Table of Contents Table of Contents System Control Coprocessor 4 12 Format of a TLB Entry 4 13 CPO Registers 4 15 Index Register 0 4 16 Random Register 1 4 16 4 17 4 17 4 18 4 18 EntryLoO 2 and EntryLol 3 Registers PageMask Register 5 Wired Register 6 EntryHi Register CPO Register 10 Processor Revision Identifier PRId Register 15 4 19 Config Register 16 4 19 Load Linked Address LLAddr Register 17 4 20 Cache Tag Registers TagLo 28 and TagHi 29 4 21 Virtual to Physical Address Translation Process 4 22 TLB Misses 4 23 T
190. bits 30 and 31 differ 2 s complement overflow The destination register rd is not modified when an integer overflow exception occurs Operation T temp GPR rs GPR rt GPR rd temp31 temp34 0 Exceptions Integer overflow exception A 122 CPU Instruction Set Details Appendix A SU B U Subtract Unsigned SU B U 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 SUBU 000000 00000 100011 6 5 5 5 5 6 Format SUBU rd rs rt Description The contents of general register rt are subtracted from the contents of general register rs to form a result The result is placed into general register rd The operands must be valid sign extended 32 bit values The only difference between this instruction and the SUB instruction is that SUBU never traps on overflow No integer overflow exception occurs under any circumstances Operation T temp GPR rs GPR rt GPR rd temp3 temp34 0 Exceptions None A 123 CPU Instruction Set Details Appendix A SW Store Word SW 31 26 25 21 20 16 15 0 SW base rt offset 101011 6 5 5 16 Format Description SW rt offset base The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of general register rt are stored at the memory location specified by the effective address Operation If either of the tw
191. ble and Flag fields of the Control Status register The Flag bits implement IEEE 754 exception status flags and the Cause and Enable bits implement exception handling Control Status Register FS Bit When the FS bit is set denormalized results are flushed to O instead of causing an unimplemented operation exception Control Status Register Condition Bit When a floating point Compare operation takes place the result is stored at bit 23 the Condition bit to save or restore the state of the condition line The C bit is set to 1 if the condition is true the bit is cleared to O if the condition is false Bit 23 is affected only by compare and Move Control To FPU instructions Control Status Register Cause Flag and Enable Fields Figure 6 5 on page 6 5 illustrates the Cause Flag and Enable fields of the Control Status register Cause Bits Bits 17 12 in the Control Status register contain Cause bits as shown in Figure 6 5 on page 6 5 which reflect the results of the most recently executed instruction The Cause bits are a logical extension of the CPO Cause register they identify the exceptions raised by the last floating point operation and raise an interrupt or exception if the corresponding enable bit is set If more than one exception occurs on a single instruction each appropriate bit is set The Cause bits are written by each floating point operation but not by load store or move operations The Unimplemented Operation E
192. ble 11 1 Cache States Primary Cache States Each primary data cache line is normally in one of the following states e invalid e dirty exclusive Each primary instruction cache line is in one of the following states e invalid e valid Cache Line Ownership The processor is the owner of a cache line when it is in the dirty exclusive state and is responsible for the contents of that line There can only be one owner for each cache line The ownership of a cache line is set and maintained through the rules described below e A processor assumes ownership of the cache line if the state of the primary cache line is dirty exclusive A processor that owns a cache line is responsible for writing the cache line back to memory if the line is replaced during the execution of a Write back or Write back Invalidate cache instruction if the line is in a write back page The Cache instruction is explained in Appendix A e Memory always owns clean cache lines e The processor gives up ownership of a cache line when the state of the cache line changes to invalid Therefore based on these rules and that any valid data cache line is in the Dirty Exclusive state under normal operating conditions the processor is considered to be the owner of the cache line Cache Write Policy The R4600 R4700 processor manages its primary data cache by using either a write back or a write through policy on a per page basis Ina write back cache the data is not written back t
193. buffered version of TClock aligned with MasterClock Alignment error of this buffered TClock is the sum of the maximum delay mismatch of the matched delay clock buffers and the maximum clock jitter of TClock As the global system clock for the discrete logic that forms the external agent the buffered version of TClock clocks registers that sample processor outputs as well as clocking the registers that drive the processor inputs The transmission time for a signal from the processor to an external agent composed of discrete CMOS logic devices can be calculated from the following equation Transmission Time TClock period tpo for R4600 R4700 External Sample Register Setup Time Max External Clock Buffer Delay Mismatch Max Clock Jitter for R4600 R4700 Internal Clocks Max Clock Jitter for TClock 10 8 Clock Interface Chapter 10 Figure 10 8 is a block diagram of a system without phase lock employing the R4600 R4700 processor and an external agent composed of both a gate array and discrete CMOS logic devices MasterClock R4600 R4700 MasterClock SysCmd SysAD Control ys Gate SyncOut Array Syncin RClock TClock Sample Registers Figure 10 8 Gate Array and CMOS System Without Phase Lock Us ing the R4600 R4700 Processor The transmission time for a signal from an external agent composed of
194. can be memory mapped using only one TLB entry The second mechanism controls the replacement algorithm when a TLB miss occurs The R4600 R4700 provides a random replacement algorithm to select a TLB entry to be written with a new mapping however the processor provides a mechanism whereby a system specific number of mappings can be locked into the TLB and thus avoid being randomly replaced This facilitates the design of real time systems by allowing deterministic access to critical software The joint TLB also contains information to control the cache coherency protocol for each page Specifically each page has attribute bits to determine whether the coherency algorithm is uncached non coherent write back non coherent write through write allocate non coherent write through no write allocate sharable exclusive or update Non coherent write back is typically used for both code and data on the R4600 R4700 the write through modes support more efficient frame buffer accesses than the R4000 family The coherent modes are supported for R4000 compatibility and generate different transaction types on the system interface cache coherency is not supported however Overview Chapter 1 Instruction TLB The R4600 R4700 also incorporates a 2 entry instruction TLB Each entry maps a 4KB page The instruction TLB improves performance by allowing instruction address translation to occur in parallel with data address translation When a miss
195. cause SC to fail so persistent exceptions must be avoided If either of the two least significant bits of the effective address are non zero an address error exception takes place CPU Instruction Set Details Appendix A Operation T vAddr lt offset s offsets 9 GPR base pAddr uncached lt AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 pBAddrs 9 xor ReverseEndian 0 mem lt LoadMemory uncached WORD pAddr vAddr DATA byte vAddro 9 xor BigEndianCPU 0 GPR rt mem 8 byte I MEM31 8 byte 8 byte LLbit 1 SyncOperation Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details Appendix A LLD Load Linked Doubleword LLD 31 26 25 21 20 16 15 0 LLD base rt offset 110100 6 5 5 16 Format LLD rt offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form a virtual address The contents of the doubleword at the memory location specified by the effective address are loaded into general register rt This instruction implicitly performs a SYNC operation all loads and stores to shared memory fetched prior to the LLD must access memory before the LLD and loads and stores to shared memory fetched subsequent to the LLD must access memory after the LLD The processor begins checking the ac
196. ccuracy Exponent overflow oO O I O I Normalized exponent gt Emax Division by zero Z Z Z Zero is exponent E j 1 mantissa 0 Overflow on convert Vv E E Source out of integer range Signaling NaN source v fv v Signaling NaN source produces quiet NaN result Invalid operation V V V 0 0 etc Exponent underflow U E E Normalized exponent lt Enin Denormalized source None E E Exponent E 1 and mantissa lt gt 0 Note The IEEE Standard 754 specifies an inexact exception on overflow only if the overflow trap is disabled Table 7 2 FPU Exception Causing Conditions FPU Exceptions The following sections describe the conditions that cause the FPU to generate each of its exceptions and details the FPU response to each exception causing condition Inexact Exception I The FPU generates the Inexact exception if the rounded result of an operation is not exact or if it overflows The FPU usually examines the operands of floating point operations before execution actually begins to determine based on the exponent values of the operands if the operation can possibly cause an exception If there is a possibility of an instruction causing an exception trap the FPU uses a coprocessor stall to execute the instruction It is impossible however for the FPU to predetermine if an instruction will produce an inexact result If Inexact exception traps are enabled the FPU uses the coprocessor stall mechanism to execute al
197. ceed it must obtain the cache line that contains the data element to be loaded from the external agent If the new cache line replaces a current cache line with a W bit set the current cache line must be written back The processor examines the coherency attribute cache coherency attributes are described in Chapter 11 in the TLB entry for the page that contains the requested cache line and executes the following request e The coherency attribute is noncoherent the processor issues a non coherent read request Table 12 1 shows the actions taken on a load miss to primary cache Page Attribute State of Data Cache Line Being Replaced Clean Invalid Dirty W 1 Noncoherent NCR NCR W NCR Processor noncoherent block read request NCR W Processor noncoherent block read request followed by processor block write request Table 12 1 Load Miss to Primary Cache 12 11 System Interface Chapter 12 No Secondary Cache Mode Load Miss In no secondary cache mode if the cache line must be written back on a load miss the read request is issued and completed before the write request is handled The processor takes the following steps 1 The processor issues a noncoherent read request for the cache line that contains the data element to be loaded 2 The processor then waits for an external agent to provide the read response 3 The processor will restart the pipeline after the first doubleword the
198. ceptions Chapter 7 Exception Types 7 1 Exception Trap Processing 7 2 Flags 7 2 FPU Exceptions 7 3 Inexact Exception I 7 3 Invalid Operation Exception V 7 3 Division by Zero Exception Z 7 4 Overflow Exception O 7 4 Underflow Exception U 7 4 Unimplemented Instruction Exception E 7 5 Saving and Restoring State 7 5 Trap Handlers for IEEE Standard 754 Exceptions 7 6 Processor Signal Descriptions Chapter 8 Introduction 8 1 System Interface Signals 8 2 Clock Control Interface Signals 8 3 Interrupt Interface Signals 8 4 JTAG Interface Signals 8 4 Initialization Interface Signals 8 5 Initialization Interface Chapter 9 Introduction 9 1 Functional Overview 9 1 Reset and Initialization Signal Descriptions 9 1 Power on Reset 9 3 Cold Reset 9 3 Warm Reset 9 3 Initialization Sequence 9 4 Boot Mode Settings 9 6 Clock Interface Chapter 10 Introduction 10 1 Signal Terminology 10 1 Basic System Clocks 10 1 MasterClock 10 1 MasterOut 10 2 SyncIn SyncOut 10 2 PClock 10 2 SClock 10 2 TClock 10 2 RClock 10 2 System Timing Parameters 10 3 Alignment to SClock 10 3 Alignment to MasterClock 10 3 Phase Locked Loop PLL 10 3 PLL Components and Operation 10 4 Passive Components 10 4 Connecting Clocks to a Phase Locked System 10 5 Connecting Clocks to a System without Phase Locking 10 6 Connecting to a Gate Array Device 10 6 Connecting to a CMOS Logic System 10 8 vii Table of Contents Table of Contents Cache Organization Ope
199. cessed doubleword for modification by other processors and devices Load Linked Doubleword and Store Conditional Doubleword can be used to atomically update memory locations L1 LLD T1 TO ADD T2 T1 1 SCD T2 TO BEQ T2 O L1 NOP This atomically increments the word addressed by TO Changing the ADD to an OR changes this to an atomic bit set The operation of LLD is undefined if the addressed location is uncached and for synchronization between multiple processors the operation of LLD is undefined if the addressed location is noncoherent A cache miss that occurs between LLD and SCD may cause SCD to fail so no load or store operation should occur between LLD and SCD otherwise the SCD may never be successful Exceptions also cause SCD to fail so persistent exceptions must be avoided This instruction is available in User mode and it is not necessary for CPO to be enabled If any of the three least significant bits of the effective address are non zero an address error exception takes place CPU Instruction Set Details Appendix A Operation T vAddr lt offset 5 offsetys 9 GPR base pAddr uncached AddressTranslation vAddr DATA mem LoadMemory uncached DOUBLEWORD pAddr vAddr DATA GPR rt mem LLbit 1 SyncOperation Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details
200. chnology Inc Introduction This chapter describes the clock signals clocks used in the R4600 R4700 processor and the processor status reporting mechanism The subject matter includes basic system clocks system timing parameters connecting clocks to a phase locked system connecting clocks to a system without phase locking and processor status outputs Signal Terminology The following terminology is used in this chapter and book when describing signals e Rising edge indicates a low to high transition e Falling edge indicates a high to low transition e Clock to Q delay is the amount of time it takes for a signal to move from the input of a device clock to the output of the device Q Figure 10 1 and Figure 10 2 illustrate these terms single clock cycle q high to low transition low to high transition Figure 10 1 Signal Transitions data out data in i ee e clock input Clock to Q delay gt Figure 10 2 Clock to Q Delay Basic System Clocks The various clock signals used in the R4600 R4700 processor are described below starting with MasterClock upon which the processor bases all internal and external clocking Note All output clocks will have approximately a 50 duty cycle the jitter and any difference in rise and or fall times MasterClock The processor bases all internal and external clocking on the single MasterClock input signal The processor generates the clock ou
201. ck to sample data at the system interface and to clock data into the processor system interface output registers The first rising edge of SClock after ColdReset is deasserted is aligned with the first rising edge of MasterClock TClock TClock transmit clock clocks the output registers of an external agent and can be a global system clock for any other logic in the external agent TClock is identical to SClock The edges of TClock align precisely with the edges of SClock and TClock can also be aligned with MasterClock when Synclin is properly connected to SyncOut RClock The external agent uses RClock receive clock to clock its input registers The processor generates RClock at the same frequency as SClock although RClock leads TClock and SClock by 25 percent of SClock cycle time 10 2 Clock Interface Chapter 10 Figure 10 3 shows the clocks for a PClock to SClock division by 2 Cycle 1 2 3 4 MasterClock a eN S os PS _ IMckHigh tMckLow tuckP MasterOut oa ek TO ieee ee ees ho PClock Nee de am A ee ae ok m SClock i eee et a Oe a TClock a a ae eg i wooo o o RClock Nest Ge Medea TALIS poly SysAD Driven Y Y D Y y D X X D X i D i tbo SysAD Received omy TE ae Figure 10 3 Processor Clocks PClock to SClock Division by 2 System Timing Parameters As shown in Figure 10 3 data provided to the processor must be stable a minimum of tps nanoseconds ns bef
202. cle The one cycle branch delay is a result of the branch decision logic operating during the 1A pipeline phase of the branch instruction This allows the branch target address calculated in the previous phase to be used for the instruction access in the following 11 phase The pipeline will begin the fetch of the branch path as well as the fall through path in the cycle following the delay slot After the branch decision is made the processor will continue with the fetch of either the branch path for a taken branch or the fall through path for the non taken branch Figure 3 3 illustrates the branch delay One Cycle One Cycle One Cycle One Cycle One Cycle lI 21 1R 2R 1A 2A 1D 2D 1w 2w x u a R x 1 24 1p 2p iw aw XK u a ir x 14 2a 1p 2p u a ir 2r 14 2a ip 2p Branch and fall through address calculated Address selection made Figure 3 3 CPU Pipeline Branch Delay Load Delay The completion of a load at the end of the 2D pipeline phase produces an operand that is available for the 1A pipeline phase of the instruction following the load delay slot Figure 3 4 shows the load delay of one pipeline cycle One Cycle One Cycle One Cycle One Cycle One Cycle lI 21I IR 2R 1A 2A 1D 2D 1W 2W 2A 1D 2D y 1a 2A Load Delay Figure 3 4 CPU
203. concatenation Replication of bit value x into a y bit string Note x is always a single bit Selection of bits y through z of bit string x Little endian bit notation is always used If y is less than z this expression is an empty zero length bit string 2 s complement or floating point addition 2 s complement or floating point subtraction 2 s complement or floating point multiplication 2 s complement integer division 2 s complement modulo 2 s complement less than comparison Bit wise logical AND Bit wise logical OR Bit wise logical XOR Bit wise logical NOR GPR General Register x The content of GPR 0 is always zero Attempts to alter the content of GPR 0 have no effect CPR z x Coprocessor unit z general register x CCR z x Coprocessor unit z control register x coc Z Coprocessor unit z condition signal BigEndianMem Big endian mode as configured at reset 0 Little 1 gt Big Specifies the endi anness of the memory interface see LoadMemory and StoreMemory and the en dianness of Kernel and Supervisor mode execution ReverseEndian BigEndianCPU Signal to reverse the endianness of load and store instructions in User mode effected by setting the RE bit of the Status register Thus ReverseEndian may be computed as SRos and User mode The endianness for load and store instructions 0 Little 1
204. conditions are met e No processor read request is pending e The signal WrRdy has been asserted for one clock cycle two cycles before the issue cycle 12 8 System Interface Chapter 12 The R4600 R4700 has added two new modes to enhance the throughput of non block writes These modes allow for 2 cycle throughput on back to back non block writes The actual protocol is discussed in the write protocol section of this chapter The external agent must be capable of accepting a processor write request in these modes under the same conditions as for the R4x00 compatibility mode except as explained in the protocol section External Requests External requests include read write and null requests as shown in Figure 12 9 This section also includes a description of read response a special case of an external request R4600 R4700 External Agent External Requests e Read Write e Null Figure 12 9 External Requests Read request asks for a word of data from the processor s internal resource Write request provides a word of data to be written to the processor s internal resource Null request requires no action by the processor it provides a mecha nism for the external agent to return control of the system interface to the master state without affecting the processor The processor controls the flow of external requests through the arbi tration signals ExtRqst and Release as shown in Figure 12 10 The ext
205. contain a coherent or noncoherent last data cycle indication 6 After the data cycle is issued the write request is complete and the external agent sets the SysCmd and SysAD buses to a tri state allowing the system interface to return to master state Timings for the SysADC and SysCmdP buses are the same as those of the SysAD and SysCmd buses respectively 12 26 System Interface Chapter 12 SCycle SClock SysAD Bus SysCmd Bus Write NEOD ValidOut Validin ExtRast Release External write requests are only allowed to write a word of data to the processor Processor behavior in response to an external write request for any data element other than a word is undefined Master gt lt Slave gt Master gt li l lz2el la l4ls5s s 7z7z s8 age aos ae 142 Ay ee NE a a eg NY eG Addr Data0 a a Figure 12 23 External Write Request with System Interface initially Master State Read Response Protocol An external agent must return data to the processor in response to a processor read request by using a read response protocol A read response protocol consists of the following steps 1 The external agent waits for the processor to perform an uncompelled change to slave state 2 The external agent returns the data through a single data cycle or a series of data cycles 3 After the last data cycle is
206. contents of the FPU register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed point format The result is placed in the FPU register specified by fd Regardless of the setting of the current rounding mode the conversion is rounded as if the current rounding mode is round toward zero RM 1 This instruction is valid only for conversion from a single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid When the source operand is an Infinity or NaN or the correctly rounded integer result is outside of 231 to 231 1 an Invalid operation exception is raised If Invalid operation is not enabled then no exception is taken and 231 1 is returned Operation T StoreFPR fd W ConvertFmt ValueFPR fs fmt fmt W Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception
207. cribes the CacheErr register fields CacheErr Register 31 30 29 28 27 26 25 24 23 22 21 eR ec eo er es ee e8 0 0 0 CPU Exception Processing Chapter 5 Field Description Type of reference ER 0 gt instruction 1 gt data Cache level of the error EC 0 gt primary 1 gt reserved Indicates if a data field error occurred ED 0 gt no error 1 gt error Indicates if a tag field error occurred ET 0 no error 1 gt error Indicates the error occurred accessing processor managed resources in response to an external request 0 internal reference ES 1 gt external reference Since the R4600 R4700 doesn t have any external events that would look in a cache which is the only processor managed resource this bit would not be set under normal operating conditions EE Set if the error occurred on the SysAD bus Taking a cache error exception sets clears this bit EB Set if a data error occurred in addition to the instruction error indicated by the remainder of the bits If so this requires flushing the data cache after fixing the instruction error Physical address 21 3 of the reference that encountered the error SIdx The address may not be the same as the address of the double word in error but it is sufficient to locate that double word in the secondary cache Virtual address 13 12 of the double word in error PIdx To be used with SIdx to construct
208. ction formats are shown in Figure A 1 I Type Immediate 31 26 25 2120 1615 op rs rt immediate J Type Jump 31 26 25 op target R Type Register 31 26 25 2120 16 15 1110 65 0 o Ls _ rt shamt tunct_ op 6 bit operation code rs 5 bit source register specifier rt 5 bit target source destination or branch condition immediate 16 bit immediate branch displacement or address displacement target 26 bit jump target address rd 5 bit destination register specifier shamt 5 bit shift amount funct 6 bit function field Instruction Notation Conventions In this appendix all variable subfields in an instruction format such as rs rt immediate etc are shown in lowercase names For the sake of clarity we sometimes use an alias for a variable subfield in the formats of specific instructions For example we use rs base in the format for load and store instructions Such an alias is always lower case since it refers to a variable subfield Figures with the actual bit encoding for all the mnemonics are located at the end of this Appendix and the bit encoding also accompanies each instruction In the instruction descriptions that follow the Operation section describes the operation performed by each instruction using a high level language notation CPU Instruction Set Details Appendix A Special symbols used in the notation are described in Table A 1 Meaning Assignment Bit string
209. ctivities CPU Pipeline Branch Delay CPU Pipeline Load Delay Correspondence of Pipeline Stage to Interlock Condition Exception Detection Data Cache Miss Instruction cache miss Overview of a Virtual to Physical Address Translation 32 bit Virtual Address Translation 64 bit Virtual Address Translation User Mode Virtual Address Space Supervisor Mode Virtual Address Space Kernel Mode Address Space CPO Registers and the TLB Format of a TLB Entry Fields of the PageMask and EntryHi Registers Fields of the EntryLoO and EntryLo Registers Index Register Random Register Wired Register Boundary Wired Register Processor Revision Identifier Register Format Config Register Format LLAddr Register Format TagLo and TagHi Register P cache Formats TLB Address Translation XV 4 19 4 19 4 21 4 21 4 22 List of Figures Number Figure 5 1 Figure 5 2 Figure 5 3 Figure 5 4 Figure 5 5 Figure 5 6 Figure 5 7 Figure 5 8 Figure 5 9 Figure 5 10 Figure 5 11 Figure 5 12 Figure 5 13 Figure 5 14 Figure 5 15 Figure 5 16 Figure 5 17 Figure 5 18 Figure 5 19 Figure 5 20 Figure 5 21 Figure 6 1 Figure 6 2 Figure 6 3 Figure 6 4 Figure 6 5 Figure 6 6 Figure 6 7 Figure 6 8 Figure 6 9 Figure 7 1 Figure 8 1 Figure 9 1 Figure 9 2 Figure 9 3 List of Figures Figure Title Context Register Format BadVAddr Register Format Count Register Format Compare Register Format Status Register
210. d Floating Point Unit Chapter 6 Table 6 1 lists the assignments of the FCRs FCR Number Use FCRO Coprocessor implementation and revision register FCR1 to FCR30 Reserved FCR31 Rounding mode cause trap enables and flags Table 6 1 Floating Point Control Register Assignments Implementation and Revision Register FCRO The read only Implementation and Revision register FCRO specifies the implementation and revision number of the FPU This information can determine the coprocessor revision and performance level and can also be used by diagnostic software Figure 6 3 shows the layout of the register Table 6 2 which follows the figure describes the Implementation and Revision register FCRO fields Implementation Revision Register FCRO 1615 87 Figure 6 3 Implementation Revision Register Field Description Imp Implementation number R4600 0x20 R4700 0x21 Rev Revision number in the form of y x 0 Reserved Table 6 2 FCRO Fields The revision number is a value of the form y x where e yis a major revision number held in bits 7 4 e xis a minor revision number held in bits 3 0 The revision number distinguishes some chip revisions however there is no guarantee that changes to the chip are necessarily reflected by the revision number or that changes to the revision number necessarily reflect real chip changes For this reason revision number values a
211. d partial doubleword word or partial word write re quests e data that is returned in response to a processor noncoherent block read request or a processor doubleword partial doubleword word or partial word read request e data that is associated with external write requests e data that is returned in response to an external read request 12 36 System Interface Chapter 12 Data Identifier Bit Definitions SysCmd 7 marks the last data element and SysCmd 6 indicates whether or not the data is response data for both processor and external coherent and noncoherent data identifiers Response data is data returned in response to a read request SysCmd 5 indicates whether or not the data element is error free Erro neous data contains an uncorrectable error and is returned to the processor forcing a bus error The processor delivers data with the good data bit deasserted if a primary parity error is detected for a transmitted data item SysCmd 4 indicates to the processor whether to check the data and check bits for this data element SysCmd 3 is reserved for external data identifiers SysCmd 4 3 are reserved for noncoherent processor data identifiers SysCmd 2 0 are reserved for noncoherent data identifiers Table 12 14 lists the encoding of SysCmd 7 3 for processor data identi fiers SysCmd 7 Last Data Element Indication 0 Last data element 1 Not the last data elemen
212. d Point Format 31 26 25 21 20 16 15 11 10 6 5 0 COP1 fmt 0 fs fd CEIL L 010001 00000 001010 6 5 5 5 5 6 Format CEIL L fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed point format The result is placed in the floating point register specified by fd Regardless of the setting of the current rounding mode the conversion is rounded as if the current rounding mode is round to 2 This instruction is valid only for conversion from single or double precision floating point formats When the FR bit in the Status register equals one both even and odd register numbers are valid When the source operand is an Infinity NaN or the correctly rounded integer result is outside of 283 to 263 1 the Invalid operation exception is raised If the Invalid operation is not enabled then no exception is taken and 2 is returned Operation T StoreFPR fd L ConvertFmt ValueFPR fs fmt fmt L Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU Instruction Set Details Appendix B CEIL W fmt ceing Singe CEIL W fmt Fixed Point Format 31 26 25 21
213. d Table C 2 show data cache and instruction cache opera tions information A detailed explanation of the Fill_I equation follows Table C 2 Cache Operations Timing Appendix C Code Name Number of Cycles o Index_Writeback_Invalidate_D 10 cycles if the cache line is clean 12 cycles if the cache line is dirty Writeback 1 Index_Load_Tag_D 7 cycles Index_Store_Tag_D 8 cycles 3 Create_Dirty_Exclusive_D 10 cycles for a cache hit Hit_Invalidate_D 13 cycles for a cache miss if the cache line is clean 15 cycles for a cache miss if the cache line is dirty Writeback 7 cycles for a cache miss 9 cycles for a cache hit Hit_Writeback_Invalidate_D Hit_Writeback_D 7 cycles for a cache miss 12 cycles for a cache hit if the cache line is clean 14 cycles for a cache hit if the cache line is dirty Writeback 7 cycles for a cache miss 10 cycles for a cache hit if the cache line is clean 14 cycles for a cache hit if the cache line is dirty Writeback Note 1Code number corresponds to the code column of the CACHE instruction in Appendix A Table C 1 Primary Data Cache Operations Cache Operations Timing Appendix C Code Name Number of Cycles 0 Index_Invalidate_I 7 cycles 1 Index_Load_Tag_I 7 cycles 2 Index_Store_Tag_I 8 cycles 3 n a n a 4 Hit_Invalidate_I 7 cycles for a cache miss 9 cycles
214. d exception TLB modification exception Bus error exception Address error exception FPU Instruction Set Details Appendix B SQRT fmt Saeko SQRT fmt 31 26 25 21 20 16 15 11 10 6 5 0 fmt 0 fs fd SQRT 010001 00000 000100 5 5 5 5 6 Format SQRT fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified format and the positive arithmetic square root is taken The result is rounded as if calculated to infinite precision and then rounded to the specified format according to the current rounding mode If the value of fs corresponds to O the result will be O The result is placed in the floating point register specified by fd This instruction is valid only for single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation T StoreFPR fd fmt SquareRoot ValueFPR fs fmt Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Unimplemented operation exception Invalid operation exception Inexact exception FPU Instruction Set Details Appendix B SUB fmt
215. d into general register rt No integer overflow exception occurs under any circumstances The rs operand must be valid sign extended 32 bit values The only difference between this instruction and the ADDI instruction is that ADDIU never causes an overflow exception Operation T temp GPR rs immediate 5 8 immediate s 0 GPRIrt tempgy temps1 0 Exceptions None CPU Instruction Set Details Appendix A ADDU Add Unsigned ADDU 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 ADDU 000000 00000 100001 6 5 5 5 5 6 Format ADDU rd rs rt Description The contents of general register rs and the contents of general register rt are added to form the result The result is placed into general register rd No overflow exception occurs under any circumstances The source operands must be valid sign extended 32 bit values The only difference between this instruction and the ADD instruction is that ADDU never causes an overflow exception Operation T temp GPR rs GPR rt GPR rd temp31 temp34 0 Exceptions None CPU Instruction Set Details Appendix A AND And AND 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 AND 000000 00000 100100 6 5 5 5 5 6 Format AND rd rs rt Description The contents of general register rs are combined with the contents of general register rt i
216. d is not modified when an integer overflow exception occurs Operation T GPR rd GPRirs GPR rt Exceptions Integer overflow exception CPU Instruction Set Details Appendix A DSU B U Doubleword Subtract Unsigned DSU B U 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 DSUBU 000000 00000 101111 6 5 5 5 5 6 Format DSUBU rd rs rt Description The contents of general register rt are subtracted from the contents of general register rs to form a result The result is placed into general register rd The only difference between this instruction and the DSUB instruction is that DSUBU never traps on overflow No integer overflow exception occurs under any circumstances Operation i GPR rd lt GPR rs GPR rt Exceptions None CPU Instruction Set Details Appendix A E R ET Exception Return E R ET 31 26 2524 65 0 1 19 Format ERET Description ERET is the R4600 instruction for returning from an interrupt exception or error trap Unlike a branch or jump instruction ERET does not execute the next instruction ERET must not itself be placed in a branch delay slot If the processor is servicing an error trap SR 1 then load the PC from the ErrorEPC and clear the ERL bit of the Status register SR9 Otherwise SR 0 load the PC from the EPC and clear the EXL bit of the Status register SRj An ERET executed betwe
217. d value is Infinity or a NaN e Operations which are invalid for their format for instance CVT S S Denormalized and NaN operands are only trapped if the instruction is a convert or computational operation Moves and compares do not trap if their operands are either denormalized or NaNs The use of this exception for such conditions is optional most of these conditions are newly developed and are not expected to be widely used in early implementations Loopholes are provided in the architecture so that these conditions can be implemented with assistance provided by software maintaining full compatibility with the IEEE Standard 754 Trap Enabled Results The original operand values are undisturbed Trap Disabled Results This trap cannot be disabled Saving and Restoring State Sixteen or thirty two doubleword coprocessor load or store operations save or restore the coprocessor floating point register state in memory The remainder of control and status information can be saved or restored through Move To From Coprocessor Control Register instructions and saving and restoring the processor registers Normally the Control Status register is saved first and restored last When the coprocessor Control Status register FCR31 is read and the coprocessor is executing one or more floating point instructions the instruction s in progress are either completed or reported as exceptions The architecture requires that no more than one of these pen
218. data in other words the external agent can initiate an unrelated external request before it returns the response data for a processor read A processor read request is completed after the last word of response data has been received from the external agent Note that the data identifier see System Interface Commands and Data Identifiers on page 12 32 associated with the response data can signal that the returned data is erroneous causing the processor to take a bus error Processor read requests that have been issued but for which data has not yet been returned are said to be pending A read remains pending until the requested read data is returned In no secondary cache mode the external agent must be capable of accepting a processor read request any time the following two conditions are met e There is no processor read request pending e The signal RdRdy has been asserted for one clock cycle two cycles before the issue cycle Processor Write Request When a processor issues a write request the specified resource is accessed and the data is written to it Processor write requests are described in this section external write requests are described in External Requests on page 12 9 A processor write request is complete after the last word of data has been transmitted to the external agent In no secondary cache mode the external agent must be capable of accepting a processor write request any time the following two
219. dditional PCycles of overhead to move the dirty data into the write buffer b The additional latency for returning the remaining three data elements should be added in a similar fashion c If cache line needs to be written back the read request is posted first then the write is completed Example of Data Single Read With Zero Wait State The following example shows a data block read with a zero wait state StepDescriptionPCycles CPU overhead for cache miss detection 1 2 Address driven on SysAD bus 2 SysAD bus tri stated 2 Memory latency to return the data 0 2 First double word driven on SysAD bus 2 CPU overhead to write the data cache do the fixup and then restart 2 Total PCycles 9 10 OOLR WON External Cycles for Read Latency The external cycles to get the response data will look similar to Figure 12 13 For a larger divide by it will take longer to get the response data PClk Exec gee aage sclock f f SySAD Bus Ca Figure 12 13 Uncached Read External Cycles 12 18 System Interface Chapter 12 SCycle SClock SysAD Bus SysCmd Bus ValidOut ValidIn ExtRast Release RdRdy The same operation is shown in greater detail in Figure 12 14 These figures assume the following 1 Data is returned immediately after the Release is asserted and after the bus turn around cycle when the CPU tri states the bus to allow the external agent to drive it 2 The data meets
220. ddress The base address is determined by the BEV bit of the Status register as shown in Table 5 9 CPU Exception Processing Chapter 5 Table 5 10 shows the vector offset that is added to the base address to create the exception address BEV R4600 R4700 Processor Vector Base Cache Error Base 0 OxFFFF FFFF 8000 0000 OxFFFF FFFF A000 0000 1 OxFFFF FFFF BFCO 0200 OxFFFF FFFF BFCO 0200 Table 5 9 Exception Vector Base Addresses As shown in Table 5 9 when BEV 0 the vector base for the Cache Error exception changes from ksegO OxFFFF FFFF 8000 0000 to kseg1 OxXFFFF FFFF A000 0000 When BEV 1 the vector base for the Cache Error exception is OxFFFF FFFF BFCO 0200 This is an uncached and unmapped space allowing the exception to bypass the cache and TLB Exception mana Processor ector Offset TLB refill EXL 0 0x000 XTLB refill EXL 0 X 64 bit TLB 0x080 Cache Error 0x100 Others 0x180 Table 5 10 Exception Vector Offsets Priority of Exceptions The remainder of this chapter describes exceptions in the order of their priority as shown in Table 5 11 While more than one exception can occur for a single instruction only the exception with the highest priority is reported Exception Priority 1 Reset highest priority 9 Integer overflow Trap System Call Break point Reserved Instruction Coprocessor Unusable or Floating Point Except
221. de specifies the data format s specifies single precision binary floating point d specifies double precision binary floating point and w specifies binary fixed point OpCode Description LWC1 Load Word to FPU SWC1 Store Word from FPU LDC1 Load Doubleword to FPU SDC1 Store Doubleword From FPU MTC1 Move Word To FPU MFC1 Move Word From FPU CTC1 Move Control Word To FPU CFC1 Move Control Word From FPU DMTC1 Doubleword Move To FPU DMFC1 Doubleword Move From FPU Table 6 9 FPU Instruction Summary Load Move and Store Instructions OpCode Description CVT S fmt Floating point Convert to Single FP CVT D fmt Floating point Convert to Double FP CVT W fmt Floating point Convert to Single Fixed Point ROUND w fmt Floating point Round TRUNC w fmt Floating point Truncate CEIL w fmt Floating point Ceiling FLOOR w fmt Floating point Floor Table 6 10 FPU Instruction Summary Conversion Instructions Floating Point Unit Chapter 6 OpCode Description ADD fmt Floating point Add SUB fmt Floating point Subtract MUL fmt Floating point Multiply DIV fmt Floating point Divide ABS fmt Floating point Absolute Value MOV fmt Floating point Move NEG fmt Floating point Negate SQRT fmt Floating point Square Root Table 6 11 FPU Instruction Summary Computational Instructions OpCode Description C cond fm
222. delay of exactly one instruction That is the instruction immediately following a jump or branch that is occupying the delay slot is always executed while the target instruction is being fetched from storage A delay slot may not itself be occupied by a jump or branch instruction however this error is not detected and the results of such an operation are undefined If an exception or interrupt prevents the completion of a legal instruction during a delay slot the hardware sets the EPC register to point at the jump or branch instruction that precedes it When the code is restarted both the jump or branch instructions and the instruction in the delay slot are reexecuted Because jump and branch instructions may be restarted after exceptions or interrupts they must be restartable Therefore when a jump or branch instruction stores a return link value register 31 the register in which the link is stored may not be used as a source register Since instructions must be word aligned a J ump Register or J ump and Link Register instruction must use a register whose two low order bits are zero If these low order bits are not zero an address exception will occur when the jump target instruction is subsequently fetched CPU Instruction Set Details Appendix A Coprocessor Instructions Coprocessors are alternate execution units which have register files separate from the CPU The R4600 R4700 architecture MIPS III provides three coproce
223. diagnostic purposes only and serves no function during normal operation Figure 4 17 shows the format of the LLAddr register PAddr represents bits of the physical address PA 35 4 Memory Management Chapter 4 LLAddr Register PAddr 35 4 32 Figure 4 17 LLAddr Register Format Cache Tag Registers TagLo 28 and TagHi 29 The TagLo and TagHi registers are 32 bit read write registers that hold the primary cache tag and parity during cache initialization cache diagnostics or cache error processing The Tag registers are written by the CACHE and MTCO instructions The P field of these registers is ignored on Index Store Tag operations Parity is computed by the store operation The Windows NT Operating System uses the TagLo cpO register to save restore gp registers in the TLB refill exception handler Thus all 32 bits must be present even though they have no use for the primary purpose of TagLo Figure 4 18 shows the format of these registers for primary cache operations Table 4 13 lists the field definitions of the TagLo and TagHi registers 31 8 7 6 5 3 2 1 0 TagLo PTagLo PState RWNT F 0 P 24 2 3 1 Figure 4 18 TagLo and TagHi Register P cache Formats Field Description PTagLo Specifies the physical address bits 35 12 PState Specifies the primary cache state P Specifies the primary tag even parity bit F The FIFO bit used to implement FIFO refill of the cache
224. ding instructions can cause an exception Information indicating the type of exception is placed in the Control Status register When state is restored state information in the status word indicates that exceptions are pending Writing a zero value to the Cause field of Control Status register clears all pending exceptions permitting normal processing to restart after the floating point register state is restored The Cause field of the Control Status register holds the results of only one instruction the FPU examines source operands before an operation is initiated to determine if this instruction can possibly cause an exception If an exception is possible the FPU executes the instruction in stall mode to ensure that no more than one instruction that might cause an exception is executed at a time Floating Point Exceptions Chapter 7 Trap Handlers for IEEE Standard 754 Exceptions The IEEE Standard 754 strongly recommends that users be allowed to specify a trap handler for any of the five standard exceptions that can compute the trap handler can either compute or specify a substitute result to be placed in the destination register of the operation By retrieving an instruction using the processor Exception Program Counter EPC register the trap handler determines e exceptions occurring during the operation e the operation being performed e the destination format On Overflow or Underflow exceptions except for conversions
225. dix B ROUND L fmt jocraisiong RROUND L fmt Fixed Point Format 31 26 25 21 20 16 15 11 10 6 5 0 COP1 fmt 0 fs fd ROUND L 010001 00000 001000 6 5 5 5 5 6 Format ROUND L fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the long fixed point format The result is placed in the floating point register specified by fd Regardless of the setting of the current rounding mode the conversion is rounded as if the current rounding mode is round to nearest even 0 This instruction is valid only for conversion from single or double precision floating point formats When the source operand is an Infinity NaN or the correctly rounded integer result is outside of 2 to 289 1 the Invalid operation exception is raised If the Invalid operation is not enabled then no exception is taken and 2 1 is returned Operation T StoreFPR fd L ConvertFmt ValueFPR fs fmt fmt L Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU Instruction Set Details Appendix B ROUND W fmt Floating Point ROUND W fmt Round to Single Fixed Point Format 31 26 25 21 20 16 15 11 10 6
226. dom TLB Write Indexed and TLB Read Indexed instructions Figure 4 9 shows the format of this register When either a TLB refill TLB invalid or TLB modified exception occurs the EntryHi register is loaded with the virtual page number VPN2 and the ASID of the virtual address that did not have a matching TLB entry See Chapter 5 for more information about these exceptions Memory Management Chapter 4 Processor Revision Identifier PRId Register 15 The 32 bit read only Processor Revision Identifier PRId register contains information identifying the implementation and revision level of the CPU and CPO Figure 4 15 shows the format of the PRid register Table 4 11 describes the PRid register fields PRid Register 16 15 Figure 4 15 Processor Revision Identifier Register Format Field Description Implementation number R4600 Imp 0x20 imp R4700 Imp 0x21 Rev Revision number 0 Reserved Must be written as zeroes and returns zeroes when read Table 4 11 PRid Register Fields The low order byte bits 7 0 of the PRId register is interpreted as a revision number and the high order byte bits 15 8 is interpreted as an implementation number The implementation number of the R4600 R4700 processor is 0x20 The content of the high order halfword bits 31 16 of the register are reserved The revision number is stored as a value in the form y x where y is a major revision number in bits
227. dress error exception A 108 CPU Instruction Set Details Appendix A SDR Store Doubleword Right SDR 31 26 25 21 20 16 15 0 SDR base rt offset 101101 6 5 5 16 Format SDR rt offset base Description This instruction can be used with the SDL instruction to store the contents of a register into eight consecutive bytes of memory when the bytes cross a boundary between two doublewords SDR stores the right portion of the register into the appropriate part of the low order doubleword SDL stores the left portion of the register into the appropriate part of the low order doubleword of memory The SDR instruction adds its sign extended 16 bit offset to the contents of general register base to form a virtual address which may specify an arbitrary byte It alters only the word in memory which contains that byte From one to eight bytes will be stored depending on the starting byte specified Conceptually it starts at the least significant rightmost byte of the register and copies it to the specified byte in memory then it copies bytes from register to memory until it reaches the high order byte of the word in memory No address exceptions due to alignment are possible memory big endian register address 8 8 10 11 12 13 14 15 address 0 a 213 415 4M5 before ALBICDIEFELGIH 24 memor big endian PDR PANAPO address 8 8 9 10
228. e 2 E Doubler 2F 902 Dummy Input ane PLLoff Figure 10 5 R4600 R4700 PLL Network Connecting Clocks to a Phase Locked System When the processor is used in a phase locked system the external agent must phase lock its operation to a common MasterClock In such a system the delivery of data and data sampling have common characteristics even if the components have different delay values For example transmission time the amount of time a signal takes to move from one component to another along a trace on the board between any two components A and B of a phase locked system can be calculated from the following equation Transmission Time SClock period tpo for A tps for B Clock Jitter for A Max Clock Jitter for B Max 10 5 Clock Interface Chapter 10 Figure 10 6 shows a block level diagram of a phase locked system using the R4600 R4700 processor MasterClock R4600 R4700 External Agent MasterClock MasterClock SysCmd SysCmd SysAD SysAD SyncOut Syncin RClock TClock Figure 10 6 R4600 R4700Processor Phase Locked System Connecting Clocks to a System without Phase Locking When the R4600 R4700 processor is used in a system in which the external agent cannot lock its phase to a common MasterClock the output clocks RClock and TClock can clock the remainder of the system Two clocking methodologies are desc
229. e FPU cannot handle either the operands or the results of a floating point operation in its normal way The FPU responds by generating an exception to initiate a software trap or by setting a status flag Exception Types The FP Control Status register described in Chapter 6 contains an Enable bit for each exception type exception Enable bits determine whether an exception will cause the FPU to initiate a trap or set a status flag e Ifa trap is taken the FPU remains in the state found at the beginning of the operation and a software exception handling routine executes e If no trap is taken an appropriate value is written into the FPU des tination register and execution continues The FPU supports the five IEEE Standard 754 exceptions e Inexact I e Underflow U e Overflow O e Division by Zero Z Invalid Operation V Cause bits Enables and Flag bits status flags are used The FPU adds a sixth exception type Unimplemented Operation E This exception indicates the use of a software implementation The Unimplemented Operation exception has no Enable or Flag bit whenever this exception occurs an unimplemented exception trap is taken Figure 7 1 illustrates the Control Status register bits that support exceptions Bit 17 16 15 E a A wo Cause Bits lt E oO lt lt N O n JO o JO Selo le te Oo JN D Enable Bits HE o Flag Bits N ja
230. e W write back bit not the cache state indicates whether or not the primary cache contains modified data that must be written back to memory Note There is no hardware support for cache coherency Thus the only cache states used are Dirty Exclusive and Invalid 11 4 Cache Organization Operation and Coherency Chapter 11 Accessing the Primary Caches Figure 11 5 shows the virtual address VA index into the primary caches Each instruction and data cache size is 16 Kbytes Tag line Data line WwW W State Tag P Figure 11 5 Primary Cache Data and Tag Organization Cache States The terms below are used to describe the state of a cache line e Exclusive a cache line that is present in exactly one cache in the sys tem is exclusive This is always the case for the R4600 R4700 All cache lines are in an exclusive state e Dirty a cache line that contains data that has changed since it was loaded from memory is dirty e Clean a cache line that contains data that has not changed since it was loaded from memory is clean Shared a cache line that is present in more than one cache in the system The R4600 R4700 does not provide for hardware cache co herency This state should never happen in normal operations The R4600 R4700 only supports the four cache states as shown in Table 11 1 on page 6 The only states that will occur in the R4600 R4700 under normal operations are t
231. e determine the cacheability of a reference 64 bit User Mode xuseg In User mode when Status UX 1 User mode virtual addressing is extended to the 64 bit model shown in Figure 4 4 and a 1 Tbyte user address space is available labelled xuseg All valid User mode virtual addresses have bits 63 40 equal to 0 an attempt to reference an address with bits 63 40 not equal to O causes an Address Error exception The extended addressing TLB refill exception vector is used for TLB misses Supervisor Mode Operations Supervisor mode is designed for layered operating systems in which a true kernel runs in R4600 R4700 Kernel mode and the rest of the operating system runs in Supervisor mode The processor operates in Supervisor mode when the Status register contains the following bit values e KSU Olo EXL 0 ERL 0 In conjunction with these bits the SX bit in the Status register selects between 32 or 64 bit Supervisor mode virtual addressing e when SX 0 32 bit supervisor space virtual addressing is selected e when SX 1 64 bit supervisor space virtual addressing is selected Figure 4 5 shows Supervisor mode address mapping Table 4 2 which follows the figure lists the characteristics of the supervisor mode segments descriptions of the address spaces follow Address Address error error Address error Address error Note In 32 bit virtual addressing bit 31 is sign extended through bits 63 32 Failure result
232. e load and store oper ations e M Type or Move instructions e R Type or Register instructions which include the two and three register floating point operations The instruction description subsections that follow show how these three basic instruction formats are used by e Load and store instructions e Move instructions e Floating Point computational instructions e Floating Point branch instructions Floating point instructions are mapped onto the MIPS coprocessor instructions defining coprocessor unit number one CP1 as the floating point unit Each operation is valid only for certain formats Implementations may support some of these formats and operations through emulation but they only need to support combinations that are valid marked Vin Table B 1 Combinations marked Rin Table B 1 are not currently specified by this architecture and cause an unimplemented instruction trap They will be available for future extensions to the architecture FPU Instruction Set Details Appendix B Operation Source Format Single Double Word Longword ADD SUB MUL DIV SQRT ABS Al Al A A A aw Al Al A A Dj A MOV NEG TRUNC L ROUND L CEIL L FLOOR L TRUNC W ROUND W CEIL W FLOOR W CVT S lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt lt
233. e or system whose failure to perform can be reasonably expected to cause the failure of the life support device or system or to affect its safety or effectiveness The IDT logo is a registered trademark and BiCameral BurstRAM BUSMUX CacheRAM DECnet Double Density FASTX Four Port FLEXI CACHE Flexi PAK Flow thruEDC IDT c IDTenvY IDT sae IDT sim IDT ux MacStation MICROSLICE NICStAR Orion PalatteDAC REAL8 R3041 R3051 R3052 R3081 R3721 R4600 RISCompiler RISController RISCard RISCore RISC Subsystem RISC Windows SARAM SmartLogic SolutionPak SyncFIFO SyncBiFIFO SPC and Target System are trademarks of Integrated Device Technology Inc MIPS is a registered trademark of MIPS Computer Systems Inc All others are trademarks of their respective companies Table of Contents Integrated Device Technology Inc Overview Chapter 1 Introduction 1 1 Features 1 3 Device Overview 1 4 Pipeline Overview 1 4 CPU Register Overview 1 5 CPU Instruction Set Overview 1 6 Data Formats and Addressing 1 13 Coprocessors CPO CP2 1 15 System Control Coprocessor CPO 1 15 Floating Point Co Processor 1 18 Floating Point Units 1 18 Virtual to Physical Address Mapping 1 19 Joint TLB 1 19 Instruction TLB 1 20 Data TLB 1 20 Cache Memory 1 20 Instruction Cache 1 20 Data Cache 1 20 Write buffer 1 21 R4600 R4700 Clocks 1 21 System Interface 1 22 Comparison of R4600 R4700 and R4400 1 23 CPU Instruction Set Summary Ch
234. e than the R4000 protocol 12 20 System Interface Chapter 12 SCycle SClock SysAD Bus SysCmd Bus WrRady SCycle SClock SysAD Bus SysCmd Bus WrRdy The write re issue protocol is shown in Figure 12 16 Writes issue when WrRdy is asserted both two cycles prior to the address cycle and during the address cycle No No _ No_ No lie Issue Issue Issue Ssue A era a eG A ee ec J Adaro X Datao Addrt Datat X Addit Y Datat X Write NEOD Write NEOD Write X NEOD i Issue Figure 12 16 Write re issue The pipelined write protocol is shown in Figure 12 17 This protocol maintains the R4000 write issue rule issue if WrRdy asserted two cycles prior to the address cycle for one clock cycle but simply eliminates the two null cycles between writes The external agent is then required to accept one more write after it deasserts WrRdy Issue issue Ige Ige Ide Issue Ne EL Nak ee Ne Tea se A Adaro Data Addrt Datat Adar2 Data2 Y NEODX en write X NEODY Write NEOD Write ey Figure 12 17 Pipelined Writes All three write protocols apply for both single write and block writes This means that in pipeline write for example a single write can be followed immediately by a block write that the external agent must accept Processor block write requests are issued with the system
235. e virtual address a TLB miss exception occurs If the access control bits D and V indicate that the access is not valid a TLB modification or TLB invalid exception occurs If the C bits equal 010 the physical address that is retrieved accesses main memory bypassing the cache TLB Instructions Table 4 14 lists the instructions that the CPU provides for working with the TLB See Appendix A for a detailed description of these instructions Op Code Description of Instruction TLBP Translation Lookaside Buffer Probe TLBR Translation Lookaside Buffer Read TLBWI Translation Lookaside Buffer Write Index TLBWR Translation Lookaside Buffer Write Random Table 4 14 TLB Instructions CPU Exception Chapter 5 Processing Integrated Device Technology Inc This chapter describes the CPU exception processing including an explanation of exception processing followed by the format and use of each CPU exception register The chapter concludes with a description of each exception s cause together with the manner in which the CPU processes and services these exceptions For information about Floating Point Unit exceptions see Chapter 7 How Exception Processing Works The processor receives exceptions from a number of sources including translation lookaside buffer TLB misses arithmetic overflows I O interrupts and system calls When the CPU detects one of these
236. e when the Status register contains the following bit values e KSU bits 105 e EXL 0 e ERL 0 In conjunction with these bits the UX bit in the Status register selects between 32 or 64 bit User virtual addressing as follows e when UX 0 32 bit useg space is selected e when UX 1 64 bit xuseg space is selected Table 4 1 lists the characteristics of the two user mode segments useg and xuseg Address Bit Status Register Segment Address Range Segment Size Values Name Bit Values KSU EXL ERL UX 32 bit 10 O o o useg 0x0000 0000 2 Gbyte A 31 0 through 23 bytes Ox7FFF FFFF 64 bit 10 O o 1 xuseg 0x0000 0000 0000 0000 1 Tbyte A 63 40 O through 27 bytes Ox0000 OOFF FFFF FFFF Table 4 1 32 bit and 64 bit User Mode Segments 32 bit User Mode useg In User mode when Status UX 0 User mode virtual addressing is compatible with the 32 bit addressing model shown in Figure 4 4 and a 2 Gbyte user address space is available labelled useg Memory Management Chapter 4 All valid User mode virtual addresses have their most significant bit cleared to 0 any attempt to reference an address with the most significant bit set while in User mode causes an Address Error exception In 32 bit User mode virtual addressing the TLB refill exception vector is used for TLB misses The system maps all references to useg through the TLB and bit settings within the TLB entry for the pag
237. ed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the result of the last floating point compare is true the program branches to the target address with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation T 1 condition COC 1 T target lt offset s offset 0 T 1 if condition then PC amp PC target else NullifyCurrentinstruction endif Exceptions Coprocessor unusable exception FPU Instruction Set Details Appendix B Floating Point C cond fmt Compare C cond fmt 31 26 25 21 20 16 15 11 10 6 5 43 0 COP1 fmt ft fs 0 FC cond 010001 00000 6 5 5 5 5 2 4 Format C cond fmt fs ft Description The contents of the floating point registers specified by fs and ft are interpreted in the specified format and arithmetically compared A result is determined based on the comparison and the conditions specified in the instruction If one of the values is a Not a Number NaN and the high order bit of the condition field is set an invalid operation exception is taken After a one instruction delay the condition is available for testing with branch on floating point coprocessor condition instructions Comparisons are exact and can neither overflow nor underflow Four mutually exclusive relations are
238. ediate II Table 1 15 MIPS 2 MIPS 3 Additional Exception Instructions OpCode Description MIPS ISA Level DMFCz Doubleword Move From Coprocessor z II DMTCz Doubleword Move To Coprocessor z II LDCz Load Double Coprocessor z II SDCz Store Double Coprocessor z II Table 1 16 MIPS 2 MIPS 3 Additional Coprocessor Instructions OpCode Description MIPS ISA Level DMFCO Doubleword Move From CPO Ill DMTCO Doubleword Move To CPO Ill MTCO Move to CPO I MFCO Move from CPO I TLBR Read Indexed TLB Entry I TLBWI Write Indexed TLB Entry I TLBWR Write Random TLB Entry I TLBP Probe TLB for Matching Entry I CACHE Cache Operation R4xxx only ERET Exception Return R4xxx only WAIT Enter Standby mode R4600 only Table 1 17 CPO Instructions Overview Chapter 1 Data Formats and Addressing The R4600 R4700 processor uses four data formats a 64 bit doubleword a 32 bit word a 16 bit halfword and an 8 bit byte Byte ordering within each of the larger data formats halfword word doubleword can be configured in either big endian or little endian order Endianness refers to the location of byte O within the multi byte data structure Figures 1 4 and 1 5 show the ordering of bytes within words and the ordering of words within multiple word structures for the big endian and little endian conventions When the R4000 processor is configured as a big endian system b
239. efill exception TLB invalid exception Bus error exception Address error exception FPU Instruction Set Details Appendix B Load Word to FPU LWC1 eopiacassor 1 LWC1 31 26 25 21 20 16 15 0 LWC1 base ft offset 5 5 16 Format LWC 1 ft offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form an unsigned effective address The contents of the word at the memory location specified by the effective address is loaded into register ft of the floating point coprocessor The FR bit of the Status register specifies whether all 64 bit Floating Point registers are addressable If FR equals zero LWC1 loads either the high or low half of the 16 even Floating Point registers If FR equals one LWC1 loads the low 32 bits of both even and odd Floating Point registers If either of the two least significant bits of the effective address is non zero an address error exception occurs Operation vAddr lt offset 5 48 offset 5 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrps zg 1 3 pAddr3 o xor ReverseEndian 07 mem LoadMemory uncached WORD pAddr vAddr DATA byte vAddry o xor BigEndianCPU 07 if SRoe6 then CPR 1 ft undefined Me M3 1 8 byte 8 byte else if ft 0 then CPRI1 fty 1 1 0 CPR 1 ft 1 l Ol64 32 1 M M31 8 byte 8 byte e
240. either of the two preceding instructions is MFHI or MFLO the results of these instructions are undefined Correct operation requires separating reads of HI or LO from writes by a minimum of two instructions Operation T 2 LO lt undefined HI lt undefined T 1 LO lt undefined HI undefined T t lt 0 GPR rs 0 GPRirt LO lt t63 0 HI amp t127 64 Exceptions None CPU Instruction Set Details Appendix A DSLL _Doubleword Shift Left Logical DSLL 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL 0 rt rd sa DSLL 000000 00000 111000 6 5 5 5 5 6 Format DSLL rd rt sa Description The contents of general register rt are shifted left by sa bits inserting zeros into the low order bits The result is placed in register rd Operation T s lt 0O lsa GPR rd lt GPRIrt 63 s 0 os Exceptions None CPU Instruction Set Details DSLLV Doubleword Shift Left Logical Variable Appendix A DSLLV 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 DSLLV 000000 00000 010100 6 5 5 5 5 6 Format DSLLV rd rt rs Description The contents of general register rt are shifted left by the number of bits specified by the low order six bits contained in general register rs inserting zeros into the low order bits The result is placed in register rd Operation T s lt GPRIrs 5 0 GPR rd GP
241. eld Flag Bits When an exception case is detected and the exception Enable is not set the corresponding flag bit is set If an exception is taken none of the flag bits are modified Note however that system software may set the flag bits before invoking a user exception handler The Flag bits are cumulative and indicate that an exception was raised by an operation that was executed since they were explicitly reset Flag bits are set to 1 if an IEEE 754 exception is raised otherwise they remain unchanged The Flag bits are never cleared as a side effect of floating point operations however they can be set or cleared by writing a new value into the Status register using a Move To Coprocessor Control instruction Control Status Register Rounding Mode Control Bits Bits 1 and O in the Control Status register constitute the Rounding Mode RM field As shown in Table 6 4 these bits specify the rounding mode that the FPU uses for all floating point operations Rounding Mnemonic Description Mode RM 1 0 O RN Round result to nearest representable value round to value with least significant bit O when the two nearest representable values are equally near 1 RZ Round toward O round to value closest to and not greater in magnitude than the infinitely precise result 2 RP Round toward round to value closest to and not less than the infinitely precise result 3 RM Round toward round to value closest t
242. en a LL and SC also causes the SC to fail Operation T if SRo 1 then PC e ErrorEPC SR lt SR34 3 O SR4 0 else PC EPC SR lt SR3 _2 0 SRo endif LLbit 0 Exceptions Coprocessor unusable exception CPU Instruction Set Details Appendix A J Jump J 31 26 25 0 J target 000010 6 26 Format J target Description The 26 bit target address is shifted left two bits and combined with the high order bits of the address of the delay slot The program unconditionally jumps to this calculated address with a delay of one instruction Operation T temp lt target T 1 PC PCe3 2g temp 07 Exceptions None CPU Instruction Set Details Appendix A JAL Jump And Link JAL 31 26 25 0 JAL target 000011 6 26 Format JAL target Description The 26 bit target address is shifted left two bits and combined with the high order bits of the address of the delay slot The program unconditionally jumps to this calculated address with a delay of one instruction The address of the instruction after the delay slot is placed in the link register r31 Operation T temp lt target GPR 31 PC 8 T 1 PC amp PC 6 98 temp 07 Exceptions None CPU Instruction Set Details Appendix A JALR Jump And Link Register JALR 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs 0 rd 0 JALR 000000 00000 0
243. equence for a power on reset 1 Power on reset applies a stable Vcc of at least 4 5 3 0 volts from the 5 0 3 3 volt power supply to the processor During this time VCCOk is deasserted ColdReset and Reset are asserted and the MasterClock input oscillates 2 After at least 100 ms of stable Vcc and MasterClock the VCCOk signal is asserted to the processor The assertion of VCCOk begins the initialization of the processor After the mode bits have been read in the processor allows its internal phase locked loops to lock stabilizing the processor internal clock PClock the SyncOut Syncin clock path described in Chapter 10 and the master clock output MasterOut 3 ColdReset is asserted for at least 64K or 216 MasterClock cycles after the assertion of VCCOk Once the processor reads the boot time mode control serial data stream ColdReset can be deasserted ColdReset must be deasserted synchronously with MasterClock 4 The deasserted edge of ColdReset synchronizes the edges of SClock TClock and RClock to all processors if in a multiprocessor system 5 After ColdReset is deasserted synchronously and SClock TClock and RClock have stabilized Reset is deasserted to allow the processor to begin running Reset must be held asserted for at least 64 MasterClock cycles after the deassertion of ColdReset Reset must be deasserted synchronously with MasterClock Note ColdReset must be asserted when VCCOk asserts The behav
244. er Fetch Phase two During the 2R phase the following occurs e The instruction decoder decodes the instruction e Any required operands are fetched from the register file e Make a decision to either issue or slip for an interlock condition e For a branch the branch address is calculated 1A Execution Phase one During the 1A phase one of the following occurs e Any result from the A or D stages are bypassed The arithmetic logic unit ALU starts the integer arithmetic logical or shift operation e The ALU calculates the data virtual address for load and store in structions The ALU determines whether the branch condition is true 2A Execution Phase two During the 2A phase one of the following occurs e The integer arithmetic logical or shift operation will complete e A data cache access will start e Store data is shifted to the specified byte position s The data virtual to physical address translation in the DTLB will start 1D Data Fetch Phase one During the 1D phase one of the following occurs The data cache access will continue e The data address translation in the DTLB completes e The virtual to physical address translation in the JTLB will start The CPU Pipeline Chapter 3 2D Data Fetch Phase two During the 2D phase one of the following occurs e The data cache access will finish and the data is shifted down and ex tended e The virtual to physical address translation in the
245. er checks to see if this instruction or data is present in the primary cache e If the instruction data is present the processor retrieves it This is called a primary cache hit e If the instruction data is not present in the primary cache it is re trieved as a cache line from memory and is written into the primary cache 3 The processor retrieves the instruction data from the primary cache and operation continues For a data cache miss the processor can restart the pipeline after the first doubleword the one at the miss address is retrieved and continues the cache line refill in parallel It is possible for the same data to be in two places simultaneously main memory and the primary cache This data is kept consistent through the use of either a write back or a write through methodology For a write back cache the modified data is not written back to memory until the cache line is replaced In a write through cache the data is written to memory as the cached data is modified with a possible delay due to the write buffer R4600 R4700 Cache Description This section describes the organization of on chip primary caches As Figure 11 1 on page 1 shows the R4600 R4700 contains separate primary instruction and data caches Figure 11 2 provides block diagrams of the R4600 R4700 memory model R4600 R4700 Cache Controller Main Memory l cache Primary Caches D cache Figure 1
246. erface release external null request command on the SysCmd bus and asserts Validin for one cycle to return the system interface back to master state 4 The SysAD bus is unused does not contain valid data during the address cycle associated with an external null request 5 After the address cycle is issued the null request is complete For a system interface release external null request the external agent releases the SysCmd and SysAD buses and expects the system interface to return to master state gt lt Master gt la al ates ai 12 ae ae ee oe Asay B Si Figure 12 22 System Interface Release External Null Request External Write Request Protocol External write requests use a protocol identical to the processor single word write protocol except the Validin signal is asserted instead of ValidOut Figure 12 23 on page 12 27 shows a timing diagram of an external write request which consists of the following steps 1 The external agent asserts ExtRast to arbitrate for the system interface 2 The processor releases the system interface to slave state by asserting Release 3 The external agent drives a write command on the SysCmd bus a write address on the SysAD bus and asserts ValidIn 4 The external agent drives a data identifier on the SysCmd bus data on the SysAD bus and asserts Validin 5 The data identifier associated with the data cycle must
247. ernal agent must acquire mastership of the system interface before it is allowed to issue an external request the external agent arbitrates for mastership of the system interface by asserting ExtRqst and then waiting for the processor to assert Release for one cycle R4600 R4700 External Agent 1 External system requests bus mastership by asserting ExtRaqst 2 Processor grants mastership by asserting Release 3 External system issues an External Request 4 Processor regains bus mastership Figure 12 10 External Request 12 9 System Interface Chapter 12 Mastership of the system interface always returns to the processor after an external request is issued The processor does not accept a subse quent external request until it has completed the current request If there are no processor requests pending the processor decides based on its internal state whether to accept the external request or to issue a new processor request The processor can issue a new processor request even if the external agent is requesting access to the system interface The external agent asserts ExtRqst indicating that it wishes to begin an external request The external agent then waits for the processor to signal that it is ready to accept this request by asserting Release The processor signals that it is ready to accept an external request based on the criteria listed below e The processor completes any processor request that
248. error exception CPU Instruction Set Details Appendix A LWCz Load Word To Coprocessor LWCz 31 26 25 21 20 16 15 0 LWCz base rt offset 1100xx 5 5 16 Format LWCz rt offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The processor reads a word from the addressed memory location and makes the data available to coprocessor unit z The manner in which each coprocessor uses the data is defined by the individual coprocessor specifications If either of the two least significant bits of the effective address is non zero an address error exception occurs This instruction is not valid for use with CPO NOTE See the table Opcode Bit Encoding on next page or CPU Instruction Opcode Bit Encoding at the end of Appendix A Operation T vAddr lt offset s offsety5 0 GPR base pAddr uncached lt AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 PAddrs o xor ReverseEndian 0 mem lt LoadMemory uncached DOUBLEWORD pAddr vAddr DATA byte vAddro 9 xor BigEndianCPU 0 COPzLW byte rt mem Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception Coprocessor unusable exception Opcode Bit Encoding LWCz Bit 31 30 29 28 27 26 0 Lwei 1 1 0 0 0 1
249. ers contain the virtual address that failed address translation The EntryHi register also contains the ASID from which the translation fault occurred The contents of the EntryLo registers are undefined The EPC register contains the address of the instruction that caused the exception unless that instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set TLB Modified exception processing is shown in Figure 5 15 on page 13 Servicing The kernel uses the failed virtual address or virtual page number to identify the corresponding access control information The page identified may or may not permit write accesses if writes are not permitted a write protection violation occurs If write accesses are permitted the page frame is marked dirty writable by the kernel in its own data structures The TLBP instruction places the index of the TLB entry that must be altered into the Index register The EntryLo register is loaded with a word containing the physical page frame and access control bits with the D bit set and the EntryHi and EntryLo registers are written into the TLB CPU Exception Processing Chapter 5 Cache Error Exception This section explains the Cache Error exception Cause The Cache Error exception occurs when a primary cache parity error is detected This exception is maskable by the DE bit of the Status reg
250. es An overflow exception occurs if the carries out of bits 30 and 31 differ 2 s complement overflow The destination register rd is not modified when an integer overflow exception occurs Operation T temp GPR rs GPR rt GPR rd temp31 temp34 0 Exceptions Integer overflow exception CPU Instruction Set Details Appendix A ADDI Add Immediate ADDI ADDI rt rs immediate Description 31 26 25 21 20 16 15 0 ADDI rs rt immediate 001000 6 5 5 16 Format The 16 bit immediate is sign extended and added to the contents of general register rs to form the result The result is placed into general register rt The rs operand must be valid sign extended 32 bit values An overflow exception occurs if carries out of bits 30 and 31 differ 2 s complement overflow The destination register rtis not modified when an integer overflow exception occurs Operation T temp GPR rs immediate 5 48 immediate s5 0 GPRIrt lt tempg4 temp34 0 Exceptions Integer overflow exception CPU Instruction Set Details Appendix A ADDIU Add Immediate Unsigned ADDIU 31 26 25 21 20 16 15 0 ADDIU rs rt immediate 001001 6 5 5 16 Format ADDIU rt rs immediate Description The 16 bit immediate is sign extended and added to the contents of general register rs to form the result The result is place
251. escription of Control Status reg ister Cause Flag and Enable bits RM Rounding mode bits See Table 6 4 on page 7 and the description of Control Status register Rounding Mode Control bits Table 6 3 Control Status Register Fields Bit 17 16 E Cause Bits Enable Bits Flag Bits Inexact Operation Underflow Overflow Division by Zero Invalid Operation Unimplemented Operation Figure 6 5 Control Status Register Cause Flag and Enable Fields Floating Point Unit Chapter 6 Accessing the Control Status Register When the Control Status register is read by a Move Control From Coprocessor 1 CFC1 instruction all unfinished instructions in the pipeline are completed before the contents of the register are moved to the main processor If a floating point exception occurs as the pipeline empties the FP exception is taken and the CFC1 instruction is re executed after the exception is serviced The bits in the Control Status register can be set or cleared by writing to the register using a Move Control To Coprocessor 1 CTC1 instruction CTC1 is not issued until all previous floating point operations are complete IEEE Standard 754 IEEE Standard 754 specifies that floating point operations detect certain exceptional cases raise flags and can invoke an exception handler when an exception occurs These features are implemented in the MIPS architecture with the Cause Ena
252. ess of the preceding branch instruction and the BD bit of the Cause register is set Trap exception processing is shown in Figure 5 15 on page 13 Servicing The process executing at the time of a Trap exception is handed a floating point exception integer overflow signal This error is usually fatal L See Appendix A for instruction description 5 25 CPU Exception Processing Chapter 5 System Call Exception This section explains the System Call exception Cause A System Call exception occurs during an attempt to execute the SYSCALL instruction This exception is not maskable Processing The common exception vector is used for this exception and the Sys code in the Cause register is set The EPC register contains the address of the SYSCALL instruction unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction If the SYSCALL instruction is in a branch delay slot the BD bit of the Status register is set otherwise this bit is cleared System Call exception processing is shown in Figure 5 15 on page 13 Servicing When this exception occurs control is transferred to the applicable system routine To resume execution the EPC register must be altered so that the SYSCALL instruction does not re execute this is accomplished by adding a value of 4 to the EPC register EPC register 4 before returning If a SYSCALL instruction is in a branch delay sl
253. etch memory locations containing the physical page frame and access control bits for a pair of TLB entries The two entries are placed into the EntryLoO EntryLol register the EntryHi and EntryLo registers are written into the TLB It is possible that the virtual address used to obtain the physical address and access control information is on a page that is not resident in the TLB This condition is processed by allowing a TLB refill exception in the TLB refill handler This second exception goes to the common exception vector because the EXL bit of the Status register is set CPU Exception Processing Chapter 5 TLB Invalid Exception This subsection explains the TLB invalid exception Cause The TLB invalid exception occurs when a virtual address reference matches a TLB entry that is marked invalid TLB valid bit cleared This exception is not maskable Processing The common exception vector is used for this exception The TLBL or TLBS code in the ExcCode field of the Cause register is set This indicates whether the instruction as shown by the EPC register and BD bit in the Cause register caused the miss by an instruction reference load operation or store operation When this exception occurs the BadVAddr Context XContext and EntryHi registers contain the virtual address that failed address translation The EntryHi register also contains the ASID from which the translation fault occurred The Random register normally cont
254. even Floating Point registers If FR equals one MTC1 loads the low 32 bits of both even and odd Floating Point registers Operation T data GPRirt 31 0 T 1 if SRog 1 then CPR 1 fs undefined data else if fs9 0 then CPR 1 fs4 4 0 CPRI 1 fs4_ 4 Oleg g2 data else CPR 1 fs4 1 0 data CPR 1 fs4 1 0 31 0 endif Exceptions Coprocessor unusable exception FPU Instruction Set Details Appendix B MUL fmt Floating Point Multiply MUL fmt 31 26 25 21 20 16 15 11 10 6 5 0 fmt ft fs fd MUL 010001 000010 5 5 5 5 6 Format MUL fmt fd fs ft Description The contents of the floating point registers specified by fs and ft are interpreted in the specified format and arithmetically multiplied The result is rounded as if calculated to infinite precision and then rounded to the specified format according to the current rounding mode The result is placed in the floating point register specified by fd This instruction is valid only for single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation T StoreFPR fd fmt ValueFPR fs fmt V
255. exceptions Overview The FPU operates as a coprocessor for the CPU it is assigned coprocessor label CP1 and extends the CPU instruction set to perform arithmetic operations on floating point values The R4600 R4700 Floating Point Coprocessor The R4600 R4700 incorporates an entire floating point coprocessor on chip including a floating point register file and execution units The floating point coprocessor forms a seamless interface with the integer unit decoding and executing instructions in parallel with the integer unit In comparison to the R4600 the floating point coprocessor of the R4700 has improved floating multiply operations The R4600 R4700 uses the floating point unit to perform integer multiply and divide and results are placed in the HI and LO registers The values can then be transferred to the general purpose register file using the MFHI MFLO instructions The R4700 performs an integer multiply faster than the R4600 by 2 clock cycles but it takes the same number of clock cycles for integer division The R4700 improves the multiply compared to the R4600 by performing a single precision multiply in 4 clock cycles and a double precision multiply in 5 clock cycles Figure 6 1 illustrates the functional organization of the FPU Data Cache Control FP Bypass Pipeline Chain FP Add Subj Cvt Div Saqrt FP Int Mul Int Div Figure 6 1 FPU Functional Block Diagram Floating
256. exceptions the normal sequence of instruction execution is suspended and the processor enters Kernel mode see Chapter 4 for a description of system operating modes The processor then disables interrupts and forces execution of a software exception processor called a handler located at a fixed address The handler may save the context of the processor including the contents of the program counter the current operating mode User or Supervisor and the status of the interrupts enabled or disabled This context would be saved so it can be restored when the exception has been serviced When an exception occurs the CPU loads the Exception Program Counter EPC register with a location where execution can restart after the exception has been serviced The restart location in the EPC register is the address of the instruction that caused the exception or if the instruction was executing in a branch delay slot the address of the branch instruction immediately preceding the delay slot The registers described later in the chapter assist in this exception processing by retaining address cause and status information For a description of the exception handling process see the description of the individual exception contained in this chapter or the flowcharts at the end of this chapter Exception Processing Registers This section describes the CPO registers that are used in exception processing Table 5 1 on page 5 2 lists these registers a
257. f Exceptions Coprocessor unusable exception A 148 CPU Instruction Set Details Appendix A XOR Exclusive Or XOR 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 XOR 000000 00000 100110 6 5 5 5 5 6 Format XOR rd rs rt Description The contents of general register rs are combined with the contents of general register rt in a bit wise logical exclusive OR operation The result is placed into general register rd Operation T GPR rd lt GPR rs xor GPR rt Exceptions None A 149 CPU Instruction Set Details Appendix A XORI Exclusive OR Immediate XORI 31 26 25 21 20 16 15 0 XORI rs rt immediate 001110 6 5 5 16 Format XORI rt rs immediate Description The 16 bit immediate is zero extended and combined with the contents of general register rs in a bit wise logical exclusive OR operation The result is placed into general register rt Operation T GPR rt GPR rs xor 048 immediate Exceptions None A 150 CPU Instruction Set Details Appendix A CPU Instruction Opcode Bit Encoding The remainder of this Appendix presents the opcode bit encoding for the CPU instruction set ISA and extensions as implemented by the R4600 R4700 Table A 4 lists the R4600 R4700 Opcode Bit Encoding A 151 CPU Instruction Set Details Appendix A
258. f general register rs are compared to zero If the contents of general register rs have the sign bit cleared and are not equal to zero then the program branches to the target address with a delay of one instruction Operation T 1 target lt offset s offset 0 condition GPR rs g3 0 and GPR rs 064 if condition then PC PC target end if Exceptions None CPU Instruction Set Details Appendix A Branch On Greater BGTZL Than Zero Likely BGTZL 31 26 25 21 20 16 15 0 BGTZL rs 0 offset 010111 00000 6 5 5 16 Format Description A branch target address is computed from the sum of the address of BGTZL rs offset the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended The contents of general register rs are compared to zero If the contents of general register rs have the sign bit cleared and are not equal to zero then the program branches to the target address with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation Exceptions None T 1 if condition then target lt offset lt offset 0 condition GPR rs g3 0 and GPR rs 064 PC PC target else NullifyCurrentinstruction endif CPU Instruction Set Details Appendix A BLEZ greeatesae BLEZ
259. fined HI lt undefined T t eS GPAlrs 31_0 GPR rt 3 6 LO e tg4 t310 HI tgg te3 32 Exceptions None CPU Instruction Set Details Appendix A MULTU Multiply Unsigned MULTU 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt 0 MULTU 000000 000000 0000 011001 6 5 5 10 6 Format MULTU rs rt Description The contents of general register rs and the contents of general register rt are multiplied treating both operands as unsigned values No overflow exception occurs under any circumstances The operands must be valid 32 bit sign extended values When the operation completes the low order word of the double result is loaded into special register LO and the high order word of the double result is loaded into special register HI If either of the two preceding instructions is MFHI or MFLO the results of these instructions are undefined Correct operation requires separating reads of HI or LO from writes by a minimum of two instructions Operation T 2 LO lt undefined HI lt undefined T 1 LO lt undefined HI lt undefined T t lt 0 I GPR rs 31_ 0 0 GPRI rt 31 0 LO lt 31 t31 0 HI lt tg3 te3 32 Exceptions None CPU Instruction Set Details Appendix A NOR Nor NOR 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 NOR 000000 00000 100111 6 5 5 5 5 6 Format NOR rd rs rt Description
260. following steps 1 The external agent asserts ExtRqst when it wishes to submit an external request 2 The processor waits until it is ready to handle an external request whereupon it asserts Release for one cycle 3 The processor sets the SysAD and SysCmd buses to tri state 4 The external agent must begin driving the SysAD bus and the SysCmd bus two cycles after the assertion of Release 5 The external agent deasserts ExtRqst two cycles after the assertion of Release unless the external agent wishes to perform an additional external request 6 The external agent sets the SysAD and the SysCmd buses to tri state at the completion of an external request The processor can start issuing a processor request one cycle after the external agent sets the bus to tri state Note Timings for the SysADC and SysCmdP buses are the same as those of the SysAD and SysCmd buses respectively Dare ce aa Slave e Master gt Basla ae oe ses ae sp ilo Ih te tat il ae Watatatatadetneaeaeate re y D fa Figure 12 20 Arbitration Protocol for External Requests External Read Request Protocol External reads are requests for a word of data from a processor internal resource such as a register External read requests cannot be split that is no other request can occur between the external read request and its read response 12 24 System Interface
261. for a cache hit 5 Fill_I Cycle number must be calculated based on the sys tem response to a memory access because Fill_I causes an instruction cache refill from memory This equation yields the number of processor cycles for a Fill_I cache op Number_of_cycles_for_a_Fill_I_CacheOp 10 0 SYSDIV 1 2 x SYSDIV ML x SYSDIV D x SYSDIV 3 6 Hit_Writeback_I 7 cycles for a cache miss 20 cycles for a cache hit Writeback Note 1Code number corresponds to the code column of the CACHE instruction in Appendix A 2For definitions and discussion of the Fill_I equation variables refer to the subsection Details of the Fill_I Equation which follows this table 3The term 0 SYSDIV 1 has a value between 0 and SYSDIV 1 depending on the alignment of the execution of the cache op with the system clock Table C 2 Primary Instruction Cache Operations Details on the Fill_ Equation These are the definitions for the Hit_Writeback_I equation in Table C 2 SYSDIV ML 2 8 Number of processor cycles per system cycle ranges from Number of system cycles of memory latency defined as the number of cycles the SysAD bus is driven by the external agent before the first double word of data appears D Number of system cycles required to return the block of data defined as the number of cycles beginning when the first double word of data appears on the SysAD bus and ending when the last double word of d
262. fore executing the ERET instruction IE Interrupt Enable O disable interrupts 1 enables interrupts Table 5 3 Status Register Fields CPU Exception Processing Chapter 5 Status Register Modes and Access States Fields of the Status register set the modes and access states described in the sections that follow Interrupt Enable Interrupts are enabled when all of the following conditions are true e IE 1 e EXL 0 e ERL 0 If these conditions are met the settings of the IM bits identify the interrupt Note Setting the IE bit may be delayed by up to 3 cycles If performing nested interrupts re enable the IE bit first Operating Modes The following CPU Status register bit settings are required for User Kernel and Supervisor modes see Chapter 4 for more information about operating modes The processor is in User mode when KSU 105 EXL 0 and ERL 0 e The processor is in Supervisor mode when KSU 019 EXL 0 and ERL 0 The processor is in Kernel mode when KSU 005 or EXL 1 or ERL 32 and 64 bit Virtual Addressing The following CPU Status register bit settings select 32 or 64 bit virtual addressing for User and Supervisor operating modes Enabling 64 bit virtual addressing permits the execution of 64 bit opcodes and translation of 64 bit virtual addresses 64 bit virtual addressing for User and Supervisor modes can be set independently but is always used for Kernel mode The KX field co
263. ger TLB for address translation Compared to the 8 stage R4000 pipeline the R4600 R4700 is more efficient requires fewer stalls This is because the branch and load latency for the R4600 R4700 is shorter than for the R4000 both are 2 cycles for the R4600 R4700 but are 3 and 4 cycles respectively for the R4000 Overview Chapter 1 The internal pipeline of the R4600 R4700 processor operates at twice the frequency of the master clock as discussed in Chapter 3 The processor achieves high throughput by pipelining cache accesses shortening register access times implementing virtual indexed primary caches and allowing the latency of certain functional units to span more than one pipeline clock cycles Refer to Chapter 3 for a detailed discussion of the CPU pipeline operation including descriptions of the delay instructions interruptions to the pipeline flow caused by interlocks and exceptions and the R4600 R4700 implementation of a store buffer Refer to Chapter 6 for a detailed discussion of the FPU pipeline CPU Register Overview The R4600 R4700 has thirty two general purpose registers These registers are used for scalar integer operations and address calculation The register file consists of two read ports and one write port and is fully bypassed to minimize operation latency in the pipeline Figure 1 2 shows the R4600 R4700 CPU registers General Purpose Registers 63 0 Multiply and Divide Registers 63 0 HI j 63
264. hapter 6 Figure 6 9 on page 6 13 assumes that one instruction is completed every PCycle Most FPU instructions however require more than one cycle in the EX stage This means the FPU must stall the pipeline if an instruction execution cannot proceed because of register or resource conflicts Floating point operations proceed in parallel with non floating point operations Floating point operations are not allowed to overlap each other with two exceptions e An add operation may start 2 cycles after the start of a multiply and thus will be completely overlapped by the multiply e A multiply operation may overlap for up to 2 cycles as follows R4600 Anew multiply may start 6 cycles after another multiply R4700 A new multiply may start 4 cycles after another multiply for both single and double precision Non floating point operations as well as other integer operations may be executed in parallel with the floating point operations All of this is handled automatically by internal hardware in the R4600 R4700 Instruction Execution Cycle Time Unlike the CPU which executes almost all instructions in a single cycle more time may be required to execute FPU instructions Table 6 14 gives the minimum latency of each floating point operation Operation Pipeline Cycles Operation Pipeline Cycles Single Double Single Double ADD fmt 4 4 BC1T 1 SUB fmt 4 4 BC1F 1 MUL fmt
265. he Dirty Exclusive and Invalid states Note Even though valid data is in the Dirty Exclusive state it may still be consistent with memory One must look at the dirty bit W to determine if the cache line is to be written back to memory when it is replaced 11 5 Cache Organization Operation and Coherency Chapter 11 Each primary cache line in the R4600 R4700 system is in one of the states described in Table 11 1 Cache Line State Description Invalid A cache line that does not contain valid information must be marked invalid and cannot be used A cache line in any other state than invalid is assumed to contain valid informa tion Shared A cache line that is present in more than one cache in the system is shared This state will not occur for normal operations Clean Exclusive Dirty Exclusive A clean exclusive cache line contains valid information and this cache line is not present in any other cache The cache line is consistent with memory and is not owned by the pro cessor see Cache Line Ownership on page 6 in this chapter This state will not occur for normal operations A dirty exclusive cache line contains valid information and is not present in any other cache The cache line may or may not be consistent with memory and is owned by the processor see Cache Line Ownership on page 6 in this chapter Use the W bit to deter mine if the line must be written back on replacement Ta
266. he divisor is zero The operands must be valid sign extended 32 bit values This instruction is typically followed by additional instructions to check for a zero divisor and for overflow When the operation completes the quotient word of the double result is loaded into special register LO and the remainder word of the double result is loaded into special register HI If either of the two preceding instructions is MFHI or MFLO the results of those instructions are undefined Correct operation requires separating reads of HI or LO from writes by two or more instructions Operation T 2 LO lt undefined HI lt undefined T 1 LO lt undefined HI lt undefined Ae q eS GPRirs 31 9 div GPRIrt 31 0 r oa GPRJrs 31 0 mod GPR rt 31_ 0 LO e 31 Il da1 0 HI 134 Il 131 0 Exceptions None CPU Instruction Set Details Appendix A DIVU Divide Unsigned DIVU 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt 0 DIVU 000000 000000 0000 011011 6 5 5 10 6 Format DIVU rs rt Description The contents of general register rs are divided by the contents of general register rt treating both operands as unsigned values No integer overflow exception occurs under any circumstances and the result of this operation is undefined when the divisor is zero The operands must be valid sign extended 32 bit values This instruction is typically followed by additional instructions to check for a
267. he doubleword for reading parity When the CE bit of the Status register is set Hit WriteBack Hit WriteBack Invalidate Index WriteBack Invalidate and Fill also use vAddr 3 to select the doubleword that has its parity modified This operation is performed unconditionally The Hit operation accesses the specified cache as normal data references and performs the specified operation if the cache block contains valid data with the specified physical address a hit If both sets are invalid or contain different addresses a miss no operation is performed Write back from a primary cache goes to memory The address to be written is specified by the cache tag and not the translated physical address TLB Refill and TLB Invalid exceptions can occur on any operation For Index operations where the physical address is used to index the cache but need not match the cache tag unmapped addresses may be used to avoid TLB exceptions This operation never causes TLB Modified or Virtual Coherency exceptions Bits 17 16 of the instruction specify the cache as follows Code Name Cache o I primary instruction 1 D primary data 2 3 NA Undefined CPU Instruction Set Details Appendix A Code Bits 20 18 this value is listed under the Code column of the instruction specify the operation as follows Caches Name Operation I Index Invalidate Set the cache state of the cache block to Invalid Index_
268. he target address with a delay of one instruction General register rs may not be general register 31 because such an instruction is not restartable An attempt to execute this instruction with register 31 specified as rs is not trapped however Operation T target lt offset 5 offset 07 condition GPR rs g3 1 GPR 31 lt PC 8 T 1 if condition then PC amp PC target endif Exceptions None CPU Instruction Set Details Appendix A BLTZALL than zero and Link ikey BLTZALL 31 26 25 21 20 16 15 0 REGIMM rs BLTZALL offset 000001 10010 6 5 5 16 Format BLTZALL rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended Unconditionally the address of the instruction after the delay slot is placed in the link register r31 If the contents of general register rs have the sign bit set then the program branches to the target address with a delay of one instruction General register rs may not be general register 31 because such an instruction is not restartable An attempt to execute this instruction with register 31 specified as rs is not trapped however If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation T target lt offset 6 offset 07 condition GPR r
269. hifted left two bits and sign extended If the coprocessor z s condition signal CpCond is true then the program branches to the target address with a delay of one instruction Because the internal condition signal is sampled during the previous instruction there must be at least one instruction between this instruction and a coprocessor instruction that changes the internal condition signal Operation T 1 condition COC z T target lt offset offset 07 T 1 if condition then PC PC target endif NOTE See the table Opcode Bit Encoding on next page or CPU Instruction Opcode Bit Encoding at the end of Appendix A Exceptions Coprocessor unusable exception Opcode Bit Encoding BCzT Bit 3130 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 secor 2 11 0 0 o o o 1 Jo ofojo olof Bit 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 seit ol 1 0 0 o 1 0 1 olo ofojo olof Bit 3130 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 scot 9 1 0 0 1 0 0 1 0 0 jojojolojo 1 a 7 vy Y vy vy Opcode BC sub opcode Branch condition Coprocessor Unit Number CPU Instruction Set Details Appendix A BCZTL Branch On Coprocessor z BCZTL True Likely 31 26 25 21 20 1615 0 COPz BC BCTL offset 0100xx 01000 00011 6 5 5 16 Format
270. ical translation in TLB unchanged to physical TLB memory ae Ie Virtual Address with 64K 2 16 Mbyte pages 64 6362 61 4039 24 23 j Offset passed 0 or 1 VPN Offset 24 16 24 16 bits 64K pages Figure 4 3 64 bit Virtual Address Translation Operating Modes The processor has three operating modes that function in both 32 and 64 bit operations e User mode e Supervisor mode e Kernel mode These modes are described in the next three sections User Mode Operations In User mode a single uniform virtual address space labelled User segment is available its size is e 2 Gbytes 231 bytes for Status UX 0 useg e 1 Tbyte 240 bytes for Status UX 1 xuseg Memory Management Chapter 4 Figure 4 4 shows the User mode virtual address space Ox FEFEF FFEF FFFF FFFF Address Address Error Error 0x 8000 0000 Ox 0000 0100 0000 0000 Ox 0000 0000 0x 0000 0000 0000 0000 Note For 32 bit virtual addresses bit 31 is sign extended through bits 63 32 Failure i e bit 31 1 results in an Address Error exception Figure 4 4 User Mode Virtual Address Space The User segment starts at address O and the current active user process resides in either useg 32 bit virtual addressing or xuseg in 64 bit virtual addressing The TLB identically maps all references to useg xuseg from all modes and controls cache accessibility The processor operates in User mod
271. ide variety of processor based applications Because of their low power consumption coupled with high performance they are well suited for a wide variety of embedded applications including laser printers X terminals internetworking equipment imaging equipment and high end video games The R4600 and R4700 are also well suited to high performance desktop applications such as Windows NT desktop and notebook systems and 3 D workstations Compatible with the IDT79R4400PC family for both hardware and software the R4600 and R4700 will serve in many of the same applications but in addition support low power operation for applications such as notebook computers Floating Point The R4700 has improved FPA multiply operations All other features of the R4700 are the same as those in the R4600 In this manual these two products are referred to collectively as the R4600 R4700 except when information pertains only to one of them In that situation they are referred to individually Secondary Cache The R4600 R4700 does not provide integrated secondary cache and multiprocessor support as found in the R4000SC and R4000MC but it is simple to build an external secondary cache For most embedded applications however the large on chip two way set associative caches make this unnecessary Performance The R4600 R4700 brings R4000SC performance levels to the R4000PC package while at the same time providing lower cost and lower power It does thi
272. identifies the contents of the SysAD bus during any cycle in which it is valid The most significant bit of the SysCmd bus is always used to indicate whether the current cycle is an address cycle or a data cycle e During address cycles SysCmd 8 0 the remainder of the SysCmd bus SysCmd 7 0 contains a System interface command the encod ing of system interface commands is detailed in System Interface Commands and Data Identifiers on page 12 32 e During data cycles SysCmd 8 1 the remainder of the SysCmd bus SysCmd 7 0 contains a data identifier the encoding of data identifiers is detailed later in this chapter 12 2 System Interface Chapter 12 Issue Cycles There are two types of processor issue cycles e processor read request issue cycles e processor write request issue cycles The processor samples the signal RdRdy to determine the issue cycle for a processor read request the processor samples the signal WrRdy to determine the issue cycle of a processor write request As shown in Figure 12 2 RdRdy must be asserted for one clock cycle two cycles prior to the address cycle of the processor read request to define the address cycle as the issue cycle cycle 5 in Figure 12 2 RdRdy does not need to be asserted during the issue cycle SCycle SClock SysAD Bus RdRdy Note RdRdy must be sampled LOW at the end of cycle 3 which is marked with an asterisk Figure 12 2 St
273. if an ERET instruction occurs between the Load Linked instruction and this store instruction the store fails and is inhibited from taking place The success or failure of the store operation as defined above is indicated by the contents of general register rt after execution of the instruction A successful store sets the contents of general register rt to 1 an unsuccessful store sets it to 0 The operation of Store Conditional is undefined when the address is different from the address used in the last Load Linked This instruction is available in User mode it is not necessary for CPO to be enabled If either of the two least significant bits of the effective address is non zero an address error exception takes place If this instruction should both fail and take an exception the exception takes precedence Operation T vAddr lt offset 5 offsety5 0 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 pAddra_o xor ReverseEndian 0 data lt GPR rt e3 s byte 0 08 Y if LLbit then StoreMemory uncached WORD data pAddr vAddr DATA endif GPR rt 063 LLbit SyncOperation Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception A 103 CPU Instruction Set Details Appendix A SCD Store Conditional Doubleword SCD 31 26 25 21 20 16 1
274. igh Yes a System command data identifier bus parity SysCmdP 1 O High Yes a Valid input ValidIn I Low No NA Valid output ValidOut O Low b External request ExtRqst NA Release interface Release b Read ready RdRdy NA Write ready WrRdy NA Interrupts Int 5 0 NA Nonmaskable interrupt NMI NA Boot mode data in ModelIn NA Boot mode clock ModeClock d JTAG data in JTDI NA JTAG data out JTDO b JTAG command JTMS NA JTAG clock input JTCK NA Transmit clocks TClock 1 0 c Receive clocks RClock 1 0 c Master clock MasterClock NA Master clock out MasterOut c Synchronization clock out SyncOut c Synchronization clock in SyncIn NA I O output IOOut b I O input IOIn NA Vcc is OK VCCOk NA Cold reset ColdReset NA Reset Reset NA Fault Fault b Key to Reset State Column a All I O pins SysAD 63 0 SysADC 7 0 etc remain 3 stated until the Reset signal deasserts b All output only pins ValidOut Release etc except the clocks are 3 stated until the ColdReset signal deasserts c All clocks except ModeClock are 3 stated until VCCOk asserts d ModeClock is always driven NA Not applicable to input pins Table 9 1 R4600 R4700 Processor Signal Summary Initialization Interface Chapter 9 Power on Reset Figure 9 1 Figure 9 2 and Figure 9 3 illustrate the power on warm and cold resets This is the s
275. ight most byte of the register then it loads bytes from memory into the register until it reaches the high order byte of the word in memory The most significant left most byte s of the register will not be changed address 4 address 0 memory big endian 4 5 6 7 0 1 2 3 register before A B C D 24 LWR 24 4 0 after A B C 4 The contents of general register rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LWR or LWL instruction which also specifies register rt No address exceptions due to alignment are possible CPU Instruction Set Details Appendix A Operation T vAddr lt offsets5 4 offsets 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 0Addr2 9 xor ReverseEndian if BigEndianMem 1 then pAddr pAddrpsize si s 0 endif byte vAddr 9 xor BigEndianCPU word vAddro xor BigEndianCPU mem LoadMemory uncached 0 byte pAddr vAddr DATA temp lt a eee oad byte o MEM31 32 word 32 word 8 byte GPR rt temp3 temp Given a word in a register and a word in memory the operation of LWR is as follows LWR Register A B Cc
276. ill Fill the primary instruction cache block from memory If the CE bit of the Status register is set the contents of the ECC register is used instead of the computed parity bits for addressed doubleword when written to the instruction cache Uses bit 13 to pick the set Hit WriteBack Hit WriteBack If the cache block contains the specified address and the W bit is set write back the data to memory and clear the W bit If the cache block contains the specified address write back the data unconditionally Operation T vAddr lt offset s offsetys o GPR base pAddr uncached lt AddressTranslation vAddr DATA CacheOp op vAddr pAddr Exceptions Coprocessor unusable exception CPU Instruction Set Details Move Control From Appendix A CFCz Coprocessor CFCz 31 26 25 21 20 16 15 11 10 0 COPz CF rt rd 0 0100xx 00010 00000 6 5 5 5 11 Format CFCz rt rd Description The contents of coprocessor control register rd of coprocessor unit z are loaded into general register rt This instruction is not valid for CPO Operation T T 1 GPR rt lt data data CCR z rd 34 CCR z rd Exceptions Coprocessor unusable exception Opcode Bit Encoding CFCz Bit 31 30 29 28 27 26 25 24 23 22 21 0 CFci 9 1 0 0 0 1 0101 1
277. int exceptions can occur due to these operations Data may also be directly moved between the floating point coprocessor and the processor by move to coprocessor and move from coprocessor instructions Like the floating point load and store operations move to from operations perform no format conversions and never cause floating point exceptions An additional pair of coprocessor registers are available called Floating Point Control registers for which the only data movement operations supported are moves to and from processor General Purpose registers FPU Instruction Set Details Appendix B Floating Point Operations The floating point unit operation set includes floating point add floating point subtract floating point multiply floating point divide floating point square root convert between fixed point and floating point formats convert between floating point formats floating point compare These operations satisfy the requirements of IEEE Standard 754 requirements for accuracy Specifically these operations obtain a result which is identical to an infinite precision result rounded to the specified format using the current rounding mode Instructions must specify the format of their operands Except for conversion functions mixed format operations are not provided Instruction Notation Conventions In this appendix all variable subfields in an instruction format such as fs ft immediate and so on are shown in lo
278. interface in master state as described below a processor noncoherent block request for eight words of data is illustrated in Figure 12 18 on page 12 22 1 The processor issues a write command on the SysCmd bus and a write address on the SysAD bus 2 The processor asserts ValidOut 3 The processor drives a data identifier on the SysCmd bus and data on the SysAD bus 4 The processor asserts ValidOut for a number of cycles sufficient to transmit the block of data 5 The data identifier associated with the last data cycle must contain a last data cycle indication 12 21 System Interface Chapter 12 Figure 12 18 illustrate a processor noncoherent block request for eight words of data with a data pattern of DDDD lt Master gt I 1 SCycle Be a Ita ee of ee B 9 10 1 12 soeke gear a Ee a SP ede te eae a SysAD Bus Adar Y Datao Datat Data2 Y Datas Y SysCmd Bus Validout Validin RdRdy WrRdy Release Figure 12 18 Processor Noncoherent Block Write Request Protocol Processor Request and Flow Control The external agent uses RdRdy to control the flow of processor read requests Figure 12 19 on page 12 23 illustrates this flow control as described in the steps below 1 The processor samples the signal RdRdy to determine if the external agent is capable of accepting a read request 2 The signal WrRdy controls the flow of a processor
279. ion Soft Reset 10 Address error Data access Nonmaskable Interrupt NMI 11 TLB refill Data access Address error Instruction fetch 12 TLB invalid Data access 2 3 4 5 6 7 8 TLB refill Instruction fetch 13 TLB modified Data write TLB invalid Instruction fetch 14 Cache error Data access Cache error Instruction fetch 15 Bus error Data access Bus error Instruction fetch 16 Interrupt lowest priority Table 5 11 Exception Priority Order Generally speaking the exceptions described in the following sections are handled processed by hardware these exceptions are then serviced by software CPU Exception Processing Chapter 5 Reset Exception This section explains the Reset exception Cause The Reset exception occurs when the ColdReset signal is asserted and then deasserted This exception is not maskable Processing The CPU provides a special exception vector for this exception of OxFFFF FFFF BFCO 0000 The Reset vector resides in unmapped and uncached CPU address space so the hardware need not initialize the TLB or the cache to process this exception It also means the processor can fetch and execute instructions while the caches and virtual memory are in an undefined state The contents of all registers in the CPU are undefined when this exception occurs except for the following register fields e In the Status register SR is
280. ion Set Details Appendix B CFC1 Move Control Word From FPU CFC1 Coprocessor 1 31 26 25 21 20 16 15 11 10 0 COP1 CF rt fs 0 010001 00010 000 0000 0000 6 5 5 5 11 Format CFC 1 rt fs Description The contents of the FPU control register fs are loaded into general register rt This operation is only defined when fs equals O or 31 The contents of general register rt are undefined for time T of the instruction immediately following this load instruction Operation T temp FCRI fs T 1 GPR rt lt temp3 temp Exceptions Coprocessor unusable exception FPU Instruction Set Details Appendix B Move Control Word To FPU CTC1 Coprocessor 1 CTC1 31 26 25 21 20 16 15 11 10 0 COP1 CT rt fs 0 010001 00110 000 0000 0000 5 5 5 11 Format CTC1 rt fs Description The contents of general register rt are loaded into FPU control register fs This operation is only defined when fs equals 31 Writing to Control Register 31 the floating point Control Status register causes an interrupt or exception if any cause bit and its corresponding enable bit are both set The register will be written before the exception occurs The contents of floating point control register fs are undefined for time T of the instruction immediately following this load instruction Operation T temp GPR rt 31 6 T 1 FCR fs lt temp COC 1
281. ior of the processor is undefined if VCCOk asserts while ColdReset is deasserted Cold Reset A cold reset can begin anytime after the processor has read the initialization data stream causing the processor to start with the Reset exception A cold reset requires the same sequence as a power on reset except that the power is presumed to be stable before the assertion of the reset inputs and the deassertion of VCCOk To begin the reset sequence VCCOk must be deasserted for a minimum of 100 ms before reassertion Warm Reset To execute a warm reset the Reset input is asserted synchronously with MasterClock It is then held asserted for at least 64 MasterClock cycles before being deasserted synchronously with MasterClock The processor internal clocks PClock and SClock and the System interface clocks TClock and RClock are not affected by a warm reset The boot time mode control serial data stream is not read by the processor on a warm reset A warm reset forces the processor to start with a Soft Reset exception The master clock output MasterOut generates any reset related signals for the processor that must be synchronous with MasterClock After a power on reset cold reset or warm reset all processor internal state machines are reset and the processor begins execution at the reset vector All processor internal states are preserved during a warm reset although the precise state of the caches depends on whether or not a cache mi
282. is a designated memory location that can be incremented or decremented In the test and set method only one processor at a time is permitted to enter the critical section Using a counter up to N processors are allowed to concurrently execute the critical section All processors after the Nth processor must wait until one of the N processors exits the critical section and a space becomes available The counter works by not allowing more than one processor to modify it at any given time Conceptually the counter can be viewed as a variable that counts the number of limited resources for example the number of processes or software licenses etc 11 9 Cache Organization Operation and Coherency Chapter 11 Figure 11 8 shows this process Execute critical section Yes Try decrementing counter Try incrementing counter Yes Continue processing Figure 11 8 Synchronization Using a Counter Load Linked and Store Conditional The R4600 R4700 instructions Load Linked LL and Store Conditional SC provide support for processor synchronization These two instructions work very much like their simpler counterparts load and store The LL instruction in addition to doing a simple load has the side effect of setting a bit called the link bit This link bit forms a breakable link between the LL instruction and the subsequent SC instruction The SC performs a simple store if the link bit is set when the store executes If the
283. is set to zero The result is placed into general register rt No integer overflow exception occurs under any circumstances The comparison is valid even if the subtraction used during the comparison overflows Operation T if O ll GPRIrs lt 0 II immediate s immediate 5 9 then GPRIrd 0 3 1 else GPRird 064 endif Exceptions None A 116 CPU Instruction Set Details Appendix A SLTU Set On Less Than Unsigned SLTU 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 SLTU 000000 00000 101011 6 5 5 5 5 6 Format SLTU rd rs rt Description The contents of general register rt are subtracted from the contents of general register rs Considering both quantities as unsigned integers if the contents of general register rs are less than the contents of general register rt the result is set to one otherwise the result is set to zero The result is placed into general register rd No integer overflow exception occurs under any circumstances The comparison is valid even if the subtraction used during the comparison overflows Operation T if O GPRIrs lt 0 GPRIrt then GPR rd 0 1 1 else GPR rd 084 endif Exceptions None A 117 CPU Instruction Set Details Appendix A SRA Shift Right Arithmetic SRA 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL 0 rt rd sa SRA 000000 00000 000011
284. is the operation specified if more than one TLB entry matches Operation T Indexe 1 0 31 for i in 0 TLBEntries 1 if TLB i 167 141 and not 0 5 TLB iJo16 205 EntryHisg 13 and not 01 TLB iJ gt 16 295 and TLB i 140 or TLB i 135 128 EntryHiz o then Index 07 iso endif endfor Exceptions Coprocessor unusable exception A 138 CPU Instruction Set Details Appendix A TLBR Read Indexed TLB Entry TLBR 31 26 25 24 6 5 0 CO 0 TLBR 010000 1 000 0000 0000 0000 0000 000001 1 19 6 Format TLBR Description The G bit which controls ASID matching read from the TLB is written into both of the EntryLoO and EntryLo1 registers The EntryHi and EntryLo registers are loaded with the contents of the TLB entry pointed at by the contents of the TLB Index register The operation is invalid and the results are unspecified if the contents of the TLB Index register are greater than the number of TLB entries in the processor Operation T PageMask TLB Indexs gloss 190 EntryHi TLB Indexs o 191 142g and not TLB Indexs gloss 192 EntryLo1 lt TLB Indexs 0 127 65 TLB Indexs o 140 EntryLoO pe TLB Indexs ol63 1 TLB Indexs 01140 Exceptions Coprocessor unusable exception A 139 CPU Instruction Set Details Appendix A TLBWI Write Indexed TLB Entry TLBWI 31 26 25 24 6 5 0 COPO CO 0 TLBWI 010000
285. issued the read response is complete and the external agent sets the SysCmd and SysAD buses to a tri state 4 The system interface returns to master state Note The processor always performs an uncompelled change to slave state in the same cycle that it issues a read request 5 The data identifier for data cycles must indicate the fact that this data is response data 6 The data identifier associated with the last data cycle must contain a last data cycle indication For read responses to non coherent block read requests which is the only read request for normal operations of the R4600 R4700 the response data will not need to identify an initial cache state The cache state will automatically be assigned as dirty exclusive by the R4600 R4700 The data identifier associated with a data cycle can indicate that the data transmitted during that cycle is erroneous however an external agent must return a data block of the correct size regardless of the fact that the data may be in error The R4600 R4700 only checks the error bit for the first doubleword of a block the other error bits for the block of data are ignored If an initial erroneous data cycle is detected the processor takes a bus error at the completion of the data transfer 12 27 System Interface Chapter 12 Read response data must only be delivered to the processor when a processor read request is pending The behavior of the processor is unde fined when a read resp
286. issues the processor request provided the processor request is still necessary after the external request is complete The rules governing an issue cycle again apply to the processor request Handshake Signals The processor manages the flow of requests through the following six control signals RdRdy WrRdy are used by the external agent to indicate when it can accept a new read RdRdy or write WrRdy transaction e ExtRaqst Release are used to transfer control of the SysAD and SysCmd buses ExtRaqst is used by an external agent to indicate a need to control the interface Release is asserted by the processor when it transfers the mastership of the system interface to the exter nal agent e The R4600 R4700 processor uses ValidOut and the external agent uses Validin to indicate valid command data on the SysCmd SysAD buses System Interface Protocols Figure 12 4 shows the system interface operates from register to register That is processor outputs come directly from output registers and begin to change with the rising edge of SClock Processor inputs are fed directly to input registers that latch these input signals with the rising edge of SClock This allows the system interface to run at the highest possible clock frequency R4600 R4700 Output data Input data Figure 12 4 System Interface Register to Register Operation l Clock is an internal clock used by the processor to sample data at the s
287. ister Processing The processor sets the ERL bit in the Status register saves the exception restart address in ErrorEPC register and then transfers to a special vector in uncached space If the BEV bit O the vector is OxFFFF FFFF A000 0100 If the BEV bit 1 the vector is OxFFFF FFFF BFCO 0300 No other registers are changed Cache Error exception processing is shown in Figure 5 13 on page 13 Servicing All errors should be logged To correct cache parity errors the system uses the CACHE instruction to invalidate the cache block overwrites the old data through a cache miss and resumes execution with an ERET Other errors are not correctable and are likely to be fatal to the current process CPU Exception Processing Chapter 5 Bus Error Exception This section explains the Bus Error exception Cause A Bus Error exception is raised by board level circuitry for events such as bus time out backplane bus parity errors and invalid physical memory addresses or access types This exception is not maskable A Bus Error exception occurs only when a cache miss refill uncached reference or unbuffered write occurs synchronously a Bus Error exception resulting from a buffered write transaction must be reported using the general interrupt mechanism Processing The common interrupt vector is used for a Bus Error exception The IBE or DBE code in the ExcCode field of the Cause register is set signifying whether the instructio
288. ister base to form a virtual address which may specify an arbitrary byte It alters only the word in memory which contains that byte From one to four bytes will be stored depending on the starting byte specified Conceptually it starts at the most significant byte of the register and copies it to the specified byte in memory then it copies bytes from register to memory until it reaches the low order byte of the word in memory No address exceptions due to alignment are possible memory big endian register address 4 4 5 6 7 A B D address 0 0 1 2 3 ne z e SWL 24 1 0 address 4 4 5 6 7 after address 0 0 A B C Operation T vAddr lt offset s offset 45 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr lt pAddrps ze 1 3 Addr3 9 xor ReverseEndian If BigEndianMem 0 then pAddr lt pAddr 07 endif byte lt vAddr 9 xor BigEndianCPU2 if vAddr xor BigEndianCPU 0 then data 0 0 48 y GPRIrt s1 24 s byte else data 0 8 Py e GPR rt s1 24 8 byte 0 endif StoreMemory uncached byte data pAddr vAddr DATA A 126 CPU Instruction Set Details Appendix A Given a doubleword in a register and a doubleword in memory the operation of SWL is as follows TLB refill exception
289. ister rd of coprocessor unit z This instruction is not valid for CPO Operation T data GPR rt T 1 CCR z rd lt data Exceptions Coprocessor unusable NOTE See CPU Instruction Opcode Bit Encoding at the end of Appendix A CPU Instruction Set Details Appendix A DADD Doubleword Add DADD 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL fa rt rd 0 DADD 000000 00000 101100 6 5 5 5 5 6 Format DADD rd rs rt Description The contents of general register rs and the contents of general register rtare added to form the result The result is placed into general register rd An overflow exception occurs if the carries out of bits 62 and 63 differ 2 s complement overflow The destination register rd is not modified when an integer overflow exception occurs Operation T GPR rd lt GPR rs GPR rt Exceptions Integer overflow exception CPU Instruction Set Details Appendix A DADD _Doubleword Add Immediate DADDI 31 26 25 21 20 16 15 0 DADDI rs rt immediate 011000 6 5 5 16 Format DADDI rt rs immediate Description The 16 bit immediate is sign extended and added to the contents of general register rs to form the result The result is placed into general register rt An overflow exception occurs if carries out of bits 62 and 63 differ 2 s complement overflow The destination register rtis not modified when an
290. isters FGRs that can be accessed in the following ways e As 32 general purpose registers 32 FGRs each of which is 32 bits wide when the FR bit in the CPU Status register equals 0 or as 32 gen eral purpose registers 32 FGRs each of which is 64 bits wide when FR equals 1 The CPU accesses these registers through move load and store instructions e As 16 floating point registers see the next section for a description of FPRs each of which is 64 bits wide when the FR bit in the CPU Sta tus register equals 0 The FPRs hold values in either single or double precision floating point format Each FPR corresponds to adjacently numbered FGRs as shown in Figure 6 2 on page 6 3 e As 32 floating point registers see the next section for a description of FPRs each of which is 64 bits wide when the FR bit in the CPU Sta tus register equals 1 The FPRs hold values in either single or double precision floating point format Each FPR corresponds to an FGR as shown in Figure 6 2 Floating Point Unit Chapter 6 Floating Point Floating Point Floating Point Floating Point Registers FPR General Purpose Registers Registers FPR General Purpose Registers FR 0 34 FGR 0 FR 1 63 FGR 0 FPRO least FGRO FPRO FGRO most FGR1 FPR1 FGR1 least FGR2 FPR2 FGR2 FGR3 FPR3 FGR3 Floating Point Control Registers FCR Control Status Register Implementa
291. ithin that page Noncoherent Lines with a noncoherent attribute type can reside in a cache a load miss causes the processor to issue a noncoherent block read request to a location within the cached page For a store miss to a write allocate page the processor issues a noncoherent block read request to a location within the cached page and then does the write through If the page has the no write allocate attribute a store miss will generate a write to the memory as in the uncached case Cache Operation Modes The R4600 R4700 processor only supports the no secondary cache mode only uncached and noncoherent coherency attributes are applicable of R4x00 operation R4600 R4700 Processor Synchronization Support In a multiprocessor system it is essential that two or more processors working on a common task execute without corrupting each other s subtasks Synchronization an operation that guarantees an orderly access to shared memory must be implemented for a properly functioning multiprocessor system Two of the more widely used methods are discussed in this section test and set and counter Even though the R4600 R4700 does not support symmetric multi processing SMP these are useful for multi master and heterogenous multi processing Test and Set Test and set uses a variable called the semaphore which protects data from being simultaneously modified by more than one processor In other words a processor can lock out other proces
292. ivision is performed in the adder The multiplier is idle MUL S D can start only when all of the following conditions are met in the 1A phase The multiplier is one of the following idle Started execution at least 6 cycles earlier on the current multiply e The adder is idle SQRT S D can start when the following conditions are met in the 1A phase e The adder is idle The multiplier must be idle CVT fmt instructions can only start when all of the following conditions are met in the 1A phase The adder is idle The multiplier is idle ADD S D or SUB S D can start only when all of the following conditions are met in the 1A phase e The adder is idle The multiplier is either idle started execution of the current multiply at least 2 cycles earlier NEG S D or ABS S D can start only when all of the following conditions are met in the 1A phase The adder is idle The multiplier is idle C COND S D can start only when all of the following conditions are met in the 1A phase e The adder is idle The multiplier is idle Floating Point Exceptions Chapter 7 Integrated Device Technology Inc This chapter describes FPU floating point exceptions including FPU exception types exception trap processing exception flags saving and restoring state when handling an exception and trap handlers for IEEE Standard 754 exceptions A floating point exception occurs whenever th
293. l Cycles Processor Read Cycle Processor Noncoherent Word Write Request Protocol Write re issue Pipelined Writes Processor Noncoherent Block Write Request Protocol Delayed for the Assertion of WrRdy Page 10 1 10 1 10 3 10 4 10 5 10 6 10 7 10 9 11 1 11 2 11 4 11 5 11 7 11 9 11 10 11 11 11 12 12 2 12 3 12 3 12 4 12 6 12 7 12 7 12 8 12 9 12 9 12 11 12 16 12 18 12 19 12 20 12 20 12 21 12 22 12 23 xvii List of Figures List of Figures Number Figure 12 20 Figure 12 21 Figure 12 22 Figure 12 23 Figure 12 24 Figure 12 25 Figure 12 26 Figure 12 27 Figure 12 28 Figure 12 29 Figure 12 30 Figure 12 31 Figure 12 32 Figure 12 33 Figure 12 34 Figure 13 1 Figure 13 2 Figure 13 3 Figure 13 4 Figure A 1 Figure B 1 Figure B 2 Figure B 3 Figure Title Two Processor Write Requests Second Write Arbitration Protocol for External Requests 1 External Read Request System Interface in Master State System Interface Release External Null Request External Write Request with System Interface initially Master State Processor Word Read Request followed by a Word Read Response Block Read Response With Zero Wait State Block Read Transaction With One Wait State Read Response Reduced Data Rate System Interface in Slave State System Interface Command Syntax Bit Definition Read Request SysCmd Bus Bit Definition Write Request SysCmd Bus Bit Definition Null Reques
294. l floating point operations that require more than two cycles Since this mode of execution can impact performance Inexact exception traps should be enabled only when necessary Trap Enabled Results If Inexact exception traps are enabled the result register is not modified and the source registers are preserved Trap Disabled Results The rounded or overflowed result is delivered to the destination register if no other software trap occurs Invalid Operation Exception V The Invalid Operation exception is signaled if one or both of the operands are invalid for an implemented operation When the exception occurs without a trap the MIPS ISA defines the result as a quiet Not a Number NaN The invalid operations are e Addition or subtraction magnitude subtraction of infinities such as or oo e Multiplication O times with any signs e Division 0 0 or with any signs e Comparison of predicates involving lt or gt without when the oper ands are unordered e Any arithmetic operation on a signaling NaN A move MOV operation is not considered to be an arithmetic operation but absolute value ABS and negate NEG are considered to be arithmetic operations and cause this exception if one or both operands is a signaling NaN e Square root Vx where x is less than zero Floating Point Exceptions Chapter 7 Software can simulate the Invalid Operation exception for other oper
295. l input driven HIGH do not connect reserved for future output should be driven high reserved for future input GrpRunB output do not connect same reserved for future output GrpStallB input should be connected to VCC same reserved for future input FaultB output pin indicates compare mismatch driven HIGH do not connect reserved for future output Table 1 20 System Interface Comparison Between R4400 PC and R4600 R4700 Overview Chapter 1 Item R4400 PC R4600 R4700 Cache Sizes 16KB Instruction cache 16KB Data cache 16KB Instruction cache 16KB Data cache Cache Line Sizes software selectable between 16B and fixed at 32B 32B Cache Index vAddr 13 0 vAddr 2 0 Cache Tag pAddrss 19 same Cache Organization direct mapped 2 way set associative Data cache write policy write allocate and write back write allocate or not based on TLB entry write through or not based on TLB entry Data cache miss stall output address copy dirty data to writeback buffer refill cache output writeback data same with FIFO to select the set to refill Data order for block sub block ordering same reads Data order for block sequential same writes Instruction cache miss restart after all data received and writ same restart ten to cache Data cache miss restart restart after all data received and writ ten to cache restar
296. le for use as software parameters but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction Operation T BreakpointException Exceptions Breakpoint exception CPU Instruction Set Details Appendix A CACHE Cache CACHE 31 26 25 21 20 16 15 0 CACHE base op offset 101111 6 5 5 16 Format CACHE op offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form a virtual address The virtual address is translated to a physical address using the TLB and the 5 bit sub opcode specifies a cache operation for that address If CPO is not usable User or Supervisor mode the CPO enable bit in the Status register is clear and a coprocessor unusable exception is taken The operation of this instruction on any operation cache combination not listed below is undefined The operation of this instruction on uncached addresses is also undefined The R4600 R4700 uses only the tag comparisons not the valid bits to choose which data it supplies to the instruction unit This makes it important that the tags of the A and B sets are never the same The Index operation uses part of the virtual address to specify a cache block with vAddr 3 selecting the set to access For a primary cache of 16KB with 32 bytes per tag vAddr 5 specifies the block Index Load Tag also uses vAddry_3 to select t
297. loaded into the cache StoreMemory Uses the cache write buffer and main memory to store the word or part of word specified as data in the word con taining the specified physical address The low order two bits of the address and the Access Type field indicates which of each of the four bytes within the data word should be stored Table A 2 Load and Store Common Functions CPU Instruction Set Details Appendix A As shown in Table A 2 the Access Type field indicates the size of the data item to be loaded or stored Regardless of access type or byte numbering order endianness the address specifies the byte which has the smallest byte address in the addressed field For a big endian machine this is the leftmost byte and contains the sign for a 2 s complement number for a little endian machine this is the rightmost byte Access Type Mnemonic Value Meaning DOUBLEWORD 7 8 bytes 64 bits SEPTIBYTE 6 7 bytes 56 bits SEXTIBYTE 5 6 bytes 48 bits QUINTIBYTE 4 5 bytes 40 bits WORD 3 4 bytes 32 bits TRIPLEBYTE 2 3 bytes 24 bits HALFWORD 1 2 bytes 16 bits BYTE 0 1 byte 8 bits Table A 3 Access Type Specifications for Loads Stores The bytes within the addressed doubleword which are used can be determined directly from the access type and the three low order bits of the address J ump and Branch Instructions All jump and branch instructions have an architectural
298. long with their number each register has a unique identification number that is referred to as its register number For instance the ECC register is register number 26 The remaining CPO registers are used in memory management as described in Chapter 4 Software examines the CPO registers during exception processing to determine the cause of the exception and the state of the CPU at the time the exception occurred The registers in Table 5 1 are used in exception processing and are described in the sections that follow CPU Exception Processing Chapter 5 Register Name Reg No Context 4 BadVAddr Bad Virtual Address 8 Count 9 Compare register 11 Status 12 Cause 13 EPC Exception Program Counter 14 XContext 20 ECC 26 CacheErr Cache Error and Status 27 ErrorEPC Error Exception Program Counter 30 Table 5 1 CP0 Exception Processing Registers Context Register 4 The Context register is a read write register containing the pointer to an entry in the page table entry PTE array this array is an operating system data structure that stores virtual to physical address translations When there is a TLB miss the CPU loads the TLB with the missing translation from the PTE array Normally the operating system uses the Context register to address the current page map which resides in the kernel mapped segment kseg3 The Context register duplicates some of the information provided
299. lse CPRI1 fta 1 1 0 MemM31 8 byte 8 byte CPR 1 fta 1 1 O 31 0 endif Exceptions Coprocessor unusable TLB refill exception TLB invalid exception Bus error exception Address error exception FPU Instruction Set Details Appendix B M F FPU M FC1 Coprocessor 1 M FC1 31 26 25 21 20 16 15 11 10 0 COP1 MF rt fs 0 010001 00000 000 0000 0000 5 5 5 11 Format MFC1 rt fs Description The contents of register fs from the floating point coprocessor are loaded into processor register rt The contents of register rt are undefined for time T of the instruction immediately following this load instruction The FR bit of the Status register specifies whether all 32 registers of the R4600 are addressable If FR equals zero MFC1 loads either the high or low half of the 16 even Floating Point registers If FR equals one MFC1 stores the low 32 bits of both even and odd Floating Point registers Operation T if SRog 1 then data CPR 1 fs else if fS 0 then data CPR 1 fs4 4 O 31 0 else data CPR 1 fs4 4 0l63 32 endif T 1 GPR rt data3 data Exceptions Coprocessor unusable exception FPU Instruction Set Details Appendix B MOV fmt Floating Point Move MOV fmt 31 26 25 21 20 16 15 11 10 6 5 0 COP1 fmt 0 fs fd MOV 010001 00000 000110 6 5 5 5 5 6 Format MOV fmt fd fs Description
300. lt lt lt lt lt lt lt lt lt lt lt CVT D CVT W V CVT L lt lt lt V C V V R Table B 1 Valid FPU Instruction Formats The coprocessor branch on condition true false instructions can be used to logically negate any predicate Thus the 32 possible conditions require only 16 distinct comparisons as shown in Table B 2 below FPU Instruction Set Details Appendix B Condition Relations Invalid Operation Mnemonic Code Greater Less Equal Unordered Exception If Than Than Unordered True False F T 0 F F F F No UN OR 1 F F F T No EQ NEQ 2 F F T F No UEQ OGL 3 F F T T No OLT UGE 4 F T F F No ULT OGE 5 F T F T No OLE UGT 6 F T T F No ULE OGT 7 F T T T No SF ST 8 F F F F Yes NGLE GLE 9 F F F T Yes SEQ SNE 10 F F T F Yes NGL GL 11 F F T T Yes LT NLT 12 F T F F Yes NGE GE 13 F T F T Yes LE NLE 14 F T T F Yes NGT GT 15 F T T T Yes Table B 2 Logical Negation of Predicates by Condition True False Floating Point Loads Stores and Moves All movement of data between the floating point coprocessor and memory is accomplished by coprocessor load and store operations which reference the floating point coprocessor General Purpose registers These operations are unformatted no format conversions are performed and therefore no floating po
301. m to define the order in which the data elements are retrieved The smallest data element of a block transfer for the R4600 R4700 is a doubleword and Figure 12 34 shows the retrieval of a block of data that consists of 4 doublewords the cache line size is 8 words in which DW2 is taken first octalword AN quadword Order of retrieval 0 1 DWO taken third DW3 taken second DW1 taken fourth DW2 taken first Figure 12 34 Retrieving Data in a Subblock Order Using the subblock ordering shown in Figure 12 34 the doubleword at the target address is retrieved first DW2 followed by the remaining doubleword DW3 in this quadword Next the quadword that fills out the octalword are retrieved in the same order as the prior quadword in this case DWO is followed by DW1 12 39 System Interface Chapter 12 It may be easier way to understand subblock ordering by taking a look at the method used for generating the address of each doubleword as it is retrieved The subblock ordering logic generates this address by executing a bit wise exclusive OR XOR of the starting block address with the output of a binary counter that increments with each doubleword starting at doubleword zero 009 Using this scheme Table 12 16 Table 12 17 and Table 12 18 list the subblock ordering of doublewords for an 8 word block based on three different starting block addresses 109 ll and Olg The subblock ordering is
302. mparison overflows Operation T if GPR rs lt GPR rt then GPR rd 0 1 else GPR rd 064 endif Exceptions None A 114 CPU Instruction Set Details Appendix A SLTI Set On Less Than Immediate SLT 31 26 25 21 20 16 15 0 SLTI rs rt immediate 001010 6 5 5 16 Format SLTI rt rs immediate Description The 16 bit immediate is sign extended and subtracted from the contents of general register rs Considering both quantities as signed integers if rs is less than the sign extended immediate the result is set to one otherwise the result is set to zero The result is placed into general register rt No integer overflow exception occurs under any circumstances The comparison is valid even if the subtraction used during the comparison overflows Operation T if GPR rs lt immediate s immediate 5 o then GPRI rd 08 1 else GPR rd 064 endif Exceptions None A 115 CPU Instruction Set Details Appendix A Set On Less Than SLTI U Immediate Unsigned SLTI U 31 26 25 21 20 16 15 0 SLTIU rs rt immediate 001011 6 5 5 16 Format SLTIU rt rs immediate Description The 16 bit immediate is sign extended and subtracted from the contents of general register rs Considering both quantities as unsigned integers if rs is less than the sign extended immediate the result is set to one otherwise the result
303. must be a valid sign extended 32 bit value Operation T s lt 0O lsa temp lt 0 GPRIrt 34 s GPR rd temps4 temp Exceptions None A 120 CPU Instruction Set Details Appendix A SRLV Shift Right Logical Variable SRLV 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 SRLV 000000 00000 000110 6 5 5 5 5 6 Format SRLV rd rt rs Description The contents of general register rt are shifted right by the number of bits specified by the low order five bits of general register rs inserting zeros into the high order bits The result is placed in register rd The operand must be a valid sign extended 32 bit value Operation T s lt GPRIrs q 0 temp lt 0 GPRIrt 31_ 5 GPR rd temp3 temp Exceptions None A 121 CPU Instruction Set Details Appendix A SUB Subtract S U B 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 SUB 000000 00000 100010 6 5 5 5 5 6 Format SUB rd rs rt Description The contents of general register rt are subtracted from the contents of general register rs to form a result The result is placed into general register rd The operands must be valid sign extended 32 bit values The only difference between this instruction and the SUBU instruction is that SUBU never traps on overflow An integer overflow exception takes place if the carries out of
304. n compatible with the 32 bit address model ksegO The KO field of the Config register described in this chapter controls cacheability and coherency e cksegl This 64 bit virtual address space is an unmapped and un cached region compatible with the 32 bit address model kseg1 e cksseg This 64 bit virtual address space is the current supervisor virtual space compatible with the 32 bit address model ksseg e ckseg3 This 64 bit virtual address space is kernel virtual space compatible with the 32 bit address model kseg3 System Control Coprocessor The System Control Coprocessor CPO is implemented as an integral part of the CPU and supports memory management address translation exception handling and other privileged operations CPO contains the registers shown in Figure 4 7 plus a 48 entry TLB The sections that follow describe how the processor uses each of the memory management related registers Each CPO register has a unique number that identifies it this number is referred to as the register number For instance the Page Mask register is register number 5 Memory Management Chapter 4 EntryHi 10 FTF EntryLo0 2 EntryLo1 3 Context 4 BadVAddr 8 Safe entries See Random Register Random Page Mask Wired 6 PRid 15 Config Compare 11 Cause 13 XContext 20 CacheErr 27 ErrorEPC contents of TLB Wired R 127 0 16 30
305. n as indicated by the EPC register and BD bit in the Cause register caused the exception by an instruction reference load operation or store operation The EPC register contains the address of the instruction that caused the exception unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set Bus Error processing is shown in Figure 5 15 on page 13 Servicing The physical address at which the fault occurred can be computed from information available in the CPO registers e If the IBE code in the Cause register is set indicating an instruction fetch reference the virtual address is contained in the EPC register e If the DBE code is set indicating a load or store reference the in struction that caused the exception is located at the virtual address contained in the EPC register or 4 the contents of the EPC register if the BD bit of the Cause register is set The virtual address of the load and store reference can then be obtained by interpreting the instruction The physical address can be obtained by using the TLBP instruction and reading the EntryLo register to compute the physical page number The process executing at the time of this exception is handed a bus error signal which is usually fatal CPU Exception Processing Chapter 5 Integer Overflow Exception This section explains the Integer Overflow e
306. n a bit wise logical AND operation The result is placed into general register rd Operation T GPR rd GPR rs and GPR rt Exceptions None CPU Instruction Set Details Appendix A ANDI And Immediate ANDI 31 26 25 21 20 16 15 0 ANDI rs rt immediate 001100 6 5 5 16 Format ANDI rt rs immediate Description The 16 bit immediate is zero extended and combined with the contents of general register rs in a bit wise logical AND operation The result is placed into general register rt Operation T GPR rt 048 immediate and GPR rs 15 0 Exceptions None CPU Instruction Set Details Appendix A BCz2F Branch On Coprocessor z False BCzZF 31 26 25 21 20 1615 0 COPz BC BCF offset 0100 x x 01000 00000 6 5 5 16 Format BCzF offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If coprocessor z s condition signal CpCond as sampled during the previous instruction is false then the program branches to the target address with a delay of one instruction Because the internal condition signal is sampled during the previous instruction there must be at least one instruction between this instruction and a coprocessor instruction that changes the internal condition signal Operation T 1 condition n
307. n external request it releases the system interface from master to slave state by asserting Release for one cycle 3 The system interface returns to master state as soon as the issue of the external request is complete This process is described in External Arbitration Protocol on page 12 24 Uncompelled Change to Slave State An uncompelled change to slave state is the transition of the system interface from master state to slave state initiated by the processor when a processor read request is pending Release is asserted automatically after a read request An uncompelled change to slave state occurs during the issue cycle of a read request After an uncompelled change to slave state the processor returns to master state at the end of the next external request This can be a read response or some other type of external request An external agent must note that the processor has performed an uncompelled change to slave state and begin driving the SysAD bus along with the SysCmd bus As long as the system interface is in slave state the external agent can begin a single external request without arbitrating for the system interface that is without asserting ExtRqst After the external request the system interface returns to master state Whenever a processor read request is pending after the issue of a read request the processor automatically switches the system interface to slave state even though the external agent is not
308. n the packages for the R4600 and R4700 There are no required external passive components Passive Components The Phase Locked Loop circuit requires several passive components for proper operation which are connected to PLLCapO PLLCap1 VccP and VssP as illustrated in Figure 10 4 R4600 R4700 Note C1 C2 C3 Rs and Ls are Board Caps Figure 10 4 PLL Passive Components It is essential to isolate the analog power and ground for the PLL circuit VccP VssP from the regular power and ground Vcc Vss Initial evaluations have yielded good results with the following values R 5 ohms C1 1 nF C2 82 nF c3 10 HF Cp 470 pF Since the optimum values for the filter components depend upon the application and the system noise environment these values should be considered as starting points for further experimentation within your specific application 10 4 Clock Interface Chapter 10 Figure 10 5 shows the internal PLL and clock distribution network of the R4600 R4700 coldReetE na gt dd SyncOut SyncIn MasterOut yn X ne 1 2F 2 gt HAAT F oe Divide Doubler 2F 902 Align X TClock 2 RClock MasterIn Nd gt Mode bits gt x lt F 256 gt dd ModeClock PClock 2F 71F PLL 2 Clock Clock f 2F Distribution Tre
309. nagement Chapter 4 Field Description Random TLB random index 0 Reserved Must be written as zeroes and returns zeroes when read Table 4 8 Random Register Field Descriptions EntryLo0 2 and EntryLol 3 Registers The EntryLo register consists of two registers that have identical formats e EntryLo0 is used for even virtual pages e EntryLol is used for odd virtual pages The EntryLoO and EntryLol registers are read write registers They hold the physical page frame number PFN of the TLB entry for even and odd pages respectively when performing TLB read and write operations Figure 4 10 on page 15 shows the format of these registers PageMask Register 5 The PageMask register is a read write register used for reading from or writing to the TLB it holds a comparison mask that sets the variable page size for each TLB entry as shown in Table 4 9 TLB read and write operations use this register as either a source or a destination when virtual addresses are presented for translation into physical address the corresponding bits in the TLB identify which virtual address bits among bits 24 13 are used in the comparison When the Mask field is not one of the values shown in Table 4 9 the operation of the TLB is undefined Bit Page Size 2 2 2 2 2 1 1 1 1 1 1 1 4 3 2 1 0 9 8 7 6 5 4 3 4 Kbytes 0 0 ololol olo o 0 16Kbytes ol ol 0 0 OJ Of oO oO EE 1 64Kbytes 0 0 0 0 0 0 0
310. nded If the result of the last floating point compare is false the program branches to the target address with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation T 1 condition not COC 1 T target lt offset offset 07 T 1 if condition then PC amp PC target else NullifyCurrentinstruction endif Exceptions Coprocessor unusable exception FPU Instruction Set Details Appendix B Branch On FPU True BC1 T Coprocessor 1 BC1 T 31 26 25 21 20 1615 0 COP1 BC BCT offset 010001 01000 00001 6 5 5 16 Format BCIT offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the result of the last floating point compare is true the program branches to the target address with a delay of one instruction Operation T 1 condition COC 1 T target lt offset s offset 07 T 1 if condition then PC lt PC target endif Exceptions Coprocessor unusable exception FPU Instruction Set Details Appendix B B h On FPU True Likel BC1 TL COPTIC SSOR di BC1 TL 31 26 25 21 20 16 15 0 COP1 BC BCTL offset 010001 01000 00011 6 5 5 16 Format BCITL offset Description A branch target address is comput
311. ned Figure 9 1 Power on Reset Initialization Interface Chapter 9 MasterClock MClk gt TDS E lt ____ gt gt VCCOK _ 256 MCIk cycles af Dq eases _ c i TMDS gt a gt TMDH Bit Bit Bit 0 1 255 ModelIn gt 64K MCIk cycles ColdReset _ gt 64 MCIk cydles TDS gt gt lt TDS Reset Masterout Undefined ESRI L026 pes Poe syneout _ Undefined eC aay a8 ome oe a TClock _ Undefined ava RClock ka Undefined JN Figure 9 2 Cold Reset Initialization Interface Chapter 9 Vec wage I VCCOK 256 MCIk cycleg S E a ModeClock_ ae ModelIn ColdReset _ Reset gt 64 MCIk cycles E geou N TClock FN RClock _ ae Figure 9 3 Warm Reset Boot Mode Settings Unlike the R4000 the speed of the R4600 R4700 output drivers is statically controlled at boot time Table 9 2 lists the processor boot mode settings The following rules apply to the boot mode settings listed in the table e Bit O of the stream is presented to the processor when VCCOk is first asserted e Selecting a reserved value results in undefined processor behav ior e Bits 19 to 255 are reserved bits e Zeros must be scanned in for all reserved bits Initialization Interface Chapter 9
312. nology Inc The R4600 R4700 processor provides a full featured memory management unit MMU which uses an on chip Translation Lookaside Buffer TLB to translate virtual addresses into physical addresses This chapter describes the processor virtual and physical address spaces the virtual to physical address translation the operation of the TLB in making these translations and those System Control Coprocessor CPO registers that provide the software interface to the TLB Translation Lookaside Buffer TLB Mapped virtual addresses are translated into physical addresses using an on chip TLB The TLB is a fully associative memory that holds 48 entries which provide mapping to 48 odd even page pairs 96 pages When address mapping is indicated each TLB entry is checked simultaneously for a match with the virtual address that is extended with an ASID stored in the EntryHi register The address mapped to a page ranges in size from 4Kbytes to 16Mbytes in multiples of 4 that is 4K 16K 64K 256K 1M 4M 16M Hits and Misses If there is a virtual address match or hit in the TLB the physical page number is extracted from the TLB and concatenated with the offset to form the physical address see Figure 4 1 If no match occurs TLB miss an exception is taken and software refills the TLB from the page table resident in memory Software can write over a selected TLB entry or use a hardware mechanism to write into a random entry
313. nordered OR Ordered EQ Equal NEQ Not Equal UEQ Unordered or Equal OLG Ordered or Less Than or Greater Than OLT Ordered Less Than UGE Unordered or Greater Than or Equal ULT Unordered or Less Than OGE Ordered Greater Than OLE Ordered Less Than or Equal UGT Unordered or Greater Than ULE Unordered or Less Than or Equal OGT Ordered Greater Than SF Signaling False ST Signaling True NGLE Not Greater Than or Less Than or Equal GLE Greater Than or Less Than or Equal SEQ Signaling Equal SNE Signaling Not Equal NGL Not Greater Than or Less Than GL Greater Than or Less Than LT Less Than NLT Not Less Than NGE Not Greater Than or Equal GE Greater Than or Equal LE Less Than or Equal NLE Not Less Than or Equal NGT Not Greater Than GT Greater Than Table 6 13 Mnemonics and Definitions of Compare Instruction Conditions FPU Instruction Pipeline Overview The FPU provides an instruction pipeline that parallels the CPU instruction pipeline It shares the same five stage pipeline architecture with the CPU see Chapter 3 Instruction Execution Figure 6 9 illustrates the 5 stage FPU pipeline This is the same as that of the integer pipeline but allows for the longer execution times of the floating point instructions 1W 2W 1D 2D 1A 2A 1R 2R lI 2I One Cycle Figure 6 9 FPU Instruction Pipeline Floating Point Unit C
314. nput JTCK I High No NA Transmit clocks TClock 1 0 O High Yes c Receive clocks RClock 1 0 O High Yes c Master clock MasterClock I High No NA Master clock out MasterOut O High Yes c Synchronization clock out SyncOut O High Yes c Synchronization clock in SyncIn I High No NA I O output IOOut O High Yes b I O input IOIn I High No NA Vecis OK veco 1 High No NA Cold reset ColdReset I Low No NA Reset Reset I Low No NA Fault Fault O Low Yes b Key to Reset State Column a All I O pins SysAD 63 0 SysADC 7 0 etc remain 3 stated until the Reset signal deasserts b All output only pins ValidOut Release etc except the clocks are 3 stated until the ColdReset signal deasserts c All clocks except ModeClock are 3 stated until VCCOk asserts d ModeClock is always driven NA Not applicable to input pins Table 8 6 R4600 R4700 Processor Signal Summary Initialization Interface Chapter 9 dt Integrated Device Technology Inc Introduction This chapter describes the R4600 R4700 Initialization interface This includes the reset signal description and types initialization sequence with signals and timing dependencies and boot modes which are set at initialization time Signal names are listed in bold letters for instance the signal VCCOk indicates the Vcc voltage is stable Low active signals are indicated by an asterisk at the end of the name as in
315. nslation vAddr DATA pAddr pAddrps ze 1 3 DAddre 9 xor ReverseEndian 0 mem lt LoadMemory uncached WORD pAddr vAddr DATA byte vAddrs 9 xor BigEndianCPU 0 GPRI rt 0 mem31 8 byte s byte Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details MFCO Move From System Control Coprocessor The contents of coprocessor register rd of the CPO are loaded into general register rt May be used on both 32 bit and 64 bit CPO registers Operation 31 26 25 21 20 16 15 1110 COPO MF rt rd 0 010000 00000 0000000 0000 6 5 5 5 11 Format MFCO rt rd Description T data CPR O rd T 1 GPR rt data34 datas 9 Exceptions Coprocessor unusable exception MFCO CPU Instruction Set Details Appendix A M FCz Move From Coprocessor M FCz 31 26 25 21 20 16 15 11 10 0 COPz MF rt rd 0 0100xx 00000 000 0000 0000 6 5 5 5 11 Note See the table Opcode Bit Encoding on next page or CPU Instruction Opcode Bit Encoding at the end of Appendix A Format MFCz rt rd Description The contents of coprocessor register rd of coprocessor z are loaded into general register rt Execution of the instruction referencing coprocessor 3 causes a reserved instruction exception not a coprocessor unusable exception
316. nterrupt inputs The interface has a simple timing specification and is capable of transferring data between the processor and memory at a peak rate of 400MB sec at 50MHz Figure 1 11 shows a typical system using the R4600 R4700 In this example there is DRAM a boot EPROM and an optional secondary cache Address Control Memory I Q Controller Figure 1 11 Typical System Block Diagram Overview Chapter 1 Comparison of R4600 R4700 and R4400 This section compares features of the R4600 R4700 to the earlier R4400 PC Table 1 20 to Table 1 26 highlight some of the differences between the R4600 R4700 and the R4400 PC This list is not exhaustive Item R4400 PC R4600 R4700 1 O R4400 TTL compatible R4600 R4700 TTL compatible 5V 0 5 RV4400 LV CMOS RV4600 RV4700 LVCMOS 3 3V 0 3V Package 179 pin ceramic PGA same and 208 pin MQUAD JTAG yes no serial out connected directly to serial in Block transfer sizes 16B or 32B 32B Sclock divisor 2 3 4 6 8 2 3 4 5 6 7 8 Non block writes max throughput of 4 sclock cycles two new system interface protocol options that support 2 sclock cycle throughput remains 4 in compatibility mode Serial configuration as described in R4000 User s Guide different as described in Table 9 2 on page 9 7 Address bits 63 56 on reads and writes Zero
317. ntrols whether the TLB Refill Vector or the XTLB Refill Vector address is used for TLB misses on Kernel addresses 64 bit op codes are always valid in Kernel mode e 64 bit addressing and operations are enabled for Supervisor mode when SX 1 e 64 bit addressing and operations are enabled for User mode when UX Kernel Address Space Accesses Access to the kernel address space is allowed when the processor is in Kernel mode Supervisor Address Space Accesses Access to the supervisor address space is allowed when the processor is in Kernel or Supervisor mode as described above in the paragraph titled Operating Modes User Address Space Accesses Access to the user address space is allowed in any of the three operating modes Status Register Reset The contents of the Status register are undefined at reset except for the following bits ERL and BEV 1 The SR bit distinguishes between Reset and Soft Reset Nonmaskable Interrupt NMI CPU Exception Processing Chapter 5 Cause Register 13 The 32 bit read write Cause register describes the cause of the most recent exception Figure 5 6 shows the fields of this register Table 5 4 which follows the figure describes the Cause register fields A 5 bit exception code ExcCode indicates the cause of the most recent exception as listed in Table 5 5 on page 5 8 All bits in the Cause register with the exception of the IP 1 0 bits are read only IP 1 0 are used for
318. o resolve an interlock or stall between various multiply or divide instructions and a subsequent MFHI or MFLO instruction Instruction R4600 R4700 MULT 10 8 MULTU 10 8 DIV 42 42 DIVU 42 42 DMULT 12 10 DMULTU 12 10 DDIV 74 74 DDIVU 74 74 Table 2 2 Multiply Divide Instruction Cycle Timing For more information about computational instructions refer to the individual instruction as described in Appendix A CPU Instruction Set Summary Chapter 2 J ump and Branch Instructions Jump and branch instructions change the control flow of a program All jump and branch instructions occur with a delay of one instruction that is the instruction immediately following the jump or branch this is known as the instruction in the delay slot always executes while the target instruction is being fetched from storage Overview of J ump Instructions Subroutine calls in high level languages are usually implemented with Jump or Jump and Link instructions both of which are J type instructions In J type format the 26 bit target address shifts left 2 bits and combines with the high order 4 bits of the current program counter to form an absolute address Returns dispatches and large cross page jumps are usually implemented with the Jump Register or Jump and Link Register instructions Both are R type instructions that take the 32 bit or 64 bit byte address contained in one of the general purpose registers
319. o and not greater than the infinitely precise result Table 6 4 Rounding Mode Bit Decoding Floating Point Formats The FPU performs both 32 bit single precision and 64 bit double precision IEEE standard floating point operations The 32 bit single precision format has a 24 bit signed magnitude fraction field f s and an 8 bit exponent e as shown in Figure 6 6 31 30 e f s Sign Exponent Fraction 1 8 23 Figure 6 6 Single Precision Floating Point Format Floating Point Unit Chapter 6 The 64 bit double precision format has a 53 bit signed magnitude fraction field f s and an 11 bit exponent as shown in Figure 6 7 63 62 52 51 f s e Sign Exponent Fraction 1 11 52 Figure 6 7 Double Precision Floating Point Format As shown in the above figures numbers in floating point format are composed of three fields e sign field s e biased exponent e E bias e fraction f b b9 Dp_1 The range of the unbiased exponent E includes every integer between the two values Emin and Emax inclusive together with two other reserved values Enin 1 to encode 0 and denormalized numbers e Emax 1 to encode t and NaNs Not a Number For single and double precision formats each representable nonzero numerical value has just one encoding For single and double precision formats the value of a number v is determined by the equations shown in Table 6 5 No Equati
320. o general register rs If the contents of general register rs are not equal to the contents of general register rt a trap exception occurs The code field is available for use as software parameters but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction Operation if GPR rs GPR rt then TrapException endif Exceptions Trap exception A 146 CPU Instruction Set Details TNEI Trap If Not Equal Immediate Appendix A TNEI TNEI rs immediate Description The 16 bit immediate is sign extended and compared to the contents of general register rs If the contents of general register rs are not equal to the sign extended immediate a trap exception occurs 31 26 25 21 20 16 15 0 REGIMM rs TNEI immediate 000001 01110 6 5 5 16 Format Operation T if GPR rs immediate 5 immediate45_ then TrapException endif Exceptions Trap exception A 147 CPU Instruction Set Details WAIT Wait Appendix A WAIT 31 26 25 24 65 COPO CO 0 010000 1 0000000 0000 0000 0000 WAIT 100000 6 1 19 Format WAIT Description The WAIT instruction is used to halt the internal pipeline and thus reduce the power consumption of the CPU See Appendix G for more details Operation T if SysAD bus is idle then StopPipeline endi
321. o least significant bits of the effective address are non zero an address error exception occurs T vAddr offset s offsety5 o GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrps ze 1 3 pBAddro 9 xor ReverseEndian 07 byte vAddrs 9 xor BigEndianCPU 0 data GPR rtles g byte 08 StoreMemory uncached WORD data pAddr vAddr DATA Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception A 124 CPU Instruction Set Details Appendix A SWCz Store Word From Coprocessor SWCz 31 26 25 21 20 16 15 0 SWCz base rt offset 1110xx 6 5 5 16 SWCz Bit 31 30 29 28 27 26 Format SWCz rt offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address Coprocessor unit z sources a word which the processor writes to the addressed memory location The data to be stored is defined by individual coprocessor specifications This instruction is not valid for use with CPO If either of the two least significant bits of the effective address is non zero an address error exception occurs Execution of the instruction referencing coprocessor 3 causes a reserved instruction exception not a coprocessor unusable exception Operation T vAddr offset s
322. o memory until the cache line is replaced A write through policy means the store data is written to the cache and to memory The write of the data to memory may not occur at the same time as the write to cache due to the write buffer For a write back entry if the cache line is valid and has been modified the W bit is set the processor writes this cache line back to memory when the line is replaced either in the course of satisfying a cache miss or during the execution of a Write back or Write back Invalidate CACHE instruction 11 6 Cache Organization Operation and Coherency Chapter 11 For a write through entry whenever a store hits in the cache line the data is also written to memory via the write buffer The store will not set or clear the W bit for a write through cache line This is to allow a different virtual address that maps to the same physical address and with a write back policy to still set the W bit For a miss to a write through line the action taken will be determined by the write allocation policy For a write allocate entry the cache line is first retrieved from memory and the store will then continue A no write allocate entry will just post the write to the system interface via the write buffer in the same manner as an uncached write When the processor writes a cache line back to memory it does not ordinarily retain a copy of the cache line and the state of the cache line is changed to invalid However
323. o the following classes e Load and Store instructions move data between memory and general registers They are all I type instructions since the only addressing mode supported is base register 16 bit immediate offset e Computational instructions perform arithmetic logical and shift op erations on values in registers They occur in both R type both operands are registers and I type one operand is a 16 bit immediate formats e Jump and Branch instructions change the control flow of a program Jumps are always made to absolute 26 bit word addresses J type format or register addresses R type for returns and dispatches Branches have 16 bit offsets relative to the program counter I type J ump and Link instructions save their return address in register 31 e Coprocessor instructions perform operations in the coprocessors Coprocessor loads and stores are I type Coprocessor computational instructions have coprocessor dependent formats see the FPU in structions in Appendix B Coprocessor zero CPO instructions ma nipulate the memory management and exception handling facilities of the processor e Special instructions perform a variety of tasks including movement of data between special and general registers trap and breakpoint They are always R type CPU Instruction Set Details Appendix A Instruction Formats Every CPU instruction consists of a single word 32 bits aligned on a word boundary and the major instru
324. oating point registers same FCRO Imp 0x05 R4600 Imp 0x20 R4700 Imp 0x21 Table 1 25 Coprocessor 1 Comparison Between R4400 PC and R4600 R4700 CPU Instruction Set Chapter 2 Summary at Integrated Device Technology Inc Introduction This chapter is an overview of the central processing unit CPU instruction set refer to Appendix A for detailed descriptions of individual CPU instructions An overview of the floating point unit FPU instruction set is in Chapter 6 refer to Appendix B for detailed descriptions of individual FPU instructions CPU Instruction Formats Each CPU instruction consists of a single 32 bit word aligned on a word boundary There are three instruction formats immediate I type jump J type and register R type as shown in Figure 2 1 The use of a small number of instruction formats simplifies instruction decoding thus higher frequency operations and allowing the compiler to synthesize more complicated and less frequently used operations and addressing modes from these three formats as needed I Type Immediate 31 2625 2120 1615 0 op rs rt immediate J Type Jump 31 26 25 0 op target j R Type Register 31 26 25 21 20 _16 15 1110 65 0 op rs rt rd sa funct Figure Legend op 6 bit operation code rs 5 bit source register specifier rt 5 bit target source destination register or branch condition immediate 1
325. on 1 if E Emax 1 and f 0 then vis NaN regardless of s 2 if E Emax 1 and f 0 then v 1 o 3 if Emin lt E lt Emax then v 1 82 1 4 ifE E 1 and f 0 then v 1 S2E Q 5 if E Emin 1 and f 0 then v 1 80 Table 6 5 Equations for Calculating Values in Single and Double Precision Floating Point Format For all floating point formats if v is NaN the most significant bit of f determines whether the value is a signaling or quiet NaN v is a signaling NaN if the most significant bit of fis set otherwise vis a quiet NaN Floating Point Unit Chapter 6 defines the values for the format parameters Minimum and maximum floating point values are given in Table 6 7 Format remer Single Double f 24 53 Emax 127 1023 Emin 126 1022 Exponent bias 127 1023 Exponent width in bits 8 11 Integer bit hidden hidden Fraction width in bits 24 53 Format width in bits 32 64 Table 6 6 Floating Point Format Parameter Values Type Value Float Minimum 1 40129846e 45 Float Minimum Norm 1 17549435e 38 Float Maximum 3 40282347e 38 Double Minimum 4 9406564584 124654e 324 Double Minimum Norm 2 22507385850720 14e 308 Double Maximum 1 797693 1348623 157e 308 Table 6 7 Minimum and Maximum Floating Point Values Binary Fixed Point Format Binary fixed point values are held in 2
326. on This vector is located within unmapped and uncached address space so that the cache and TLB need not be initialized to process an NMI interrupt When an NMI exception occurs the SR bit of the Status register is set to differentiate this exception from a Reset exception Because an NMI can occur in the midst of another exception it is not normally possible to continue program execution after servicing an NMI Unlike Reset and Soft Reset but like other exceptions NMI is taken only at instruction boundaries The state of the caches and memory system are preserved by this exception To terminate a pending read that has hung the best approach is to return a bus error However if you wish to use a CPU exception to indicate a hung read Soft Reset is preferable to NMI When this exception occurs the contents of all registers are preserved except for e ErrorEPC register which contains the restart PC e ERL bit of the Status register which is set to 1 e SR bit of the Status register which is set to 1 BEV bit of the Status register which is set to 1 NMI exception processing is shown in Figure 5 14 on page 13 Servicing The NMI exception is serviced by saving the current processor state for diagnostic purposes and reinitializing the system for the Reset exception CPU Exception Processing Chapter 5 Address Error Exception This section explains the Address Error exception Cause The Address Error exception occurs when an
327. on Servicing Guidelines CPU Exception Processing Chapter 5 nstr in Br Dly Slot Enhi VPN2 ASID Enhi VPN2 ASID Context VPN2 Context VPN2 Set Cause Reg Set Cause Reg EXCCode CE and EXCCode CE and Cause bit 31 BD lt 1 Cause bit 31 BD 0 Check if exception within another exception 1 0 Set BadVA Set BadVA EPC lt PC EPC e PC XTLB Instruction y y y Vec Off 0x080 Vec Off 0x000 Vec Off 0x180 4 Points to Refill Exception Points to General Exception Processor forced to Kernel Mode amp interrupt disabled EXL e 1 0 normal BEV 1 bootstrap SR bit 22 Base is sign extended for 64 bits PC amp OxFFFF FFFF 8000 0000 PC lt OxFFFF FFFF BFCO 0200 Vec Off Vec Off unmapped cached unmapped uncached To TLB XTLB Exception Servicing Guidelines Figure 5 18 TLB XTLB Miss Exception Handler HW CPU Exception Processing MFCO CONTEXT Service Code Chapter 5 Comments Unmapped vector so TLBMod TLBInv TLB Refill or VCEP exceptions not possible EXL 1 so Interrupt exceptions disabled OS System to avoid all other exceptions Only CacheErr Reset Soft Reset NMI exceptions possible Load the mapping of the virtual address in Context Reg Move it to
328. on handler operations for the following exception types e reset e soft reset e nonmaskable interrupt NMI e cache error e remaining processor exceptions When the EXL bit in the Status register is 0 either User or Supervisor operating mode is specified by the KSU bits in the Status register When the EXL bit or the ERL bit is a 1 the processor is in Kernel mode When the processor takes an exception the EXL bit is set to 1 which means the system is in Kernel mode After saving the appropriate state the exception handler typically resets the EXL bit back to 0 When restoring the state and restarting the handler sets the EXL bit back to 1 Returning from an exception also resets the EXL bit to O see the ERET instruction in Appendix A In the following sections sample hardware processes for various exceptions are shown together with the servicing required by the handler software Reset Exception Process Figure 5 12 shows the Reset exception process T undefined Random TLBENTRIES 1 Wired lt 0 Config lt 0 EC EP 00000000 BE 110 010 010 1 1 0 undefined ErrorEPC e PC SR amp SR341 23 1 0 O SR49 3 1 SRy 0 PC lt OxFFFF FFFF BFCO 0000 Figure 5 12 Reset Exception Processing CPU Exception Processing Chapter 5 Cache Error Exception Process Figure 5 13 shows the Cache Error exception process T ErrorEPC amp PC CacheErr lt ER EC ED ET
329. oncoherent write through no write allocate 1 Cacheable noncoherent write through write allocate 2 Uncached 3 Cacheable noncoherent write back 4 7 Reserved Table 4 6 TLB Page Coherency C Bit Values CPO Registers The following sections describe the CPO registers shown in Figure 4 7 on page 13 that are assigned specifically as a software interface with memory management each register is followed by its register number in parentheses e Index register CPO register number 0 Random register 1 EntryLoO 2 and EntryLo1 3 registers PageMask register 5 Wired register 6 EntryHi register 10 PRId register 15 Config register 16 LLAdadr register 17 TagLo 28 and TagHi 29 registers Memory Management Chapter 4 Index Register 0 The Index register is a 32 bit read write register containing six bits to index an entry in the TLB The high order bit of the register shows the success or failure of a TLB Probe TLBP instruction The Index register also specifies the TLB entry affected by TLB Read TLBR or TLB Write Index TLBWI instructions Figure 4 11 shows the format of the Index register Table 4 7 which follows the figure describes the Index register fields Index Register Figure 4 11 Index Register Field Description P Probe failure Set to 1 when the previous TLBProbe TLBP instruction was unsuccessful ider Index to the TLB entry affected by the TLBRead and
330. ons FPU Instruction Summary Compare and Branch Instructions Mnemonics and Definitions of Compare Instruction Conditions Floating Point Operation Latencies Default FPU Exception Actions FPU Exception Causing Conditions System Interface Signals Clock Control Interface Signals Interrupt Interface Signals JTAG Interface Signals Initialization Interface Signals R4600 R4700 Processor Signal Summary Page 4 5 4 7 4 10 4 11 4 12 4 15 4 16 4 17 4 17 4 18 4 19 4 20 4 21 is Y oo ann c RREOOONONDYD pi ee xii List of Tables List of Tables Number Table Title Page Table 9 1 R4600 R4700 Processor Signal Summary 9 2 Table 9 2 Boot Mode Settings 9 7 Table 11 1 Cache States 11 6 Table 11 2 Coherency Attributes and Processor Behavior 11 8 Table 12 1 Load Miss to Primary Cache 12 11 Table 12 2 Store Miss to Primary Cache 12 12 Table 12 3 System Interface Requests 12 14 Table 12 4 Transmit Data Rates and Patterns 12 30 Table 12 5 Release Latency for External Requests 12 32 Table 12 6 Encoding of SysCmd 7 5 for System Interface Commands 12 33 Table 12 7 Encoding of SysCmd 4 3 for Read Requests 12 34 Table 12 8 Encoding of SysCmd 2 0 for Block Read Request 12 34 Table 12 9 Doubleword Word or Partial word Read Re quest Data Size Encoding of SysCmd 2 0 12 34 Table 12 10 Write Request Encoding of SysCmd 4 3 12 35 Table 12 11 Block Write Request Encoding of SysCmd 2 0 12 35 Table 12 12 Doublewo
331. onse is presented to it and there is no processor read pending Figure 12 24 illustrates a processor word read request followed by a word read response Figure 12 25 illustrates a read response for a processor block read with the system interface already in slave state Figure 12 26 illustrates a block read transaction with one wait state Note Timings for the SysADC and SysCmdP buses are the same as those of the SysAD and SysCmd buses respectively lt Slave gt Master 8 9 10 14 12 eee aeess Dated Master gt SCycle Lihat ce see eae Se sock th ep ay eg ap SysAD Bus Addr 6 a B SysCmd Bus Read ValidOut 6 Validin ExtRast Release RdRdy Figure 12 24 Processor Word Read Request followed by a Word Read Response Master gt lt Slave gt lt Master gt SCycle tanka an sae Hl Se lose pt ake I Spee Il ge aa I ia SClock se Fs ae ee SysAD Bus Y addr Datao Datat Data2 Y Datas ETE SysCmd Bus Read NData NData NData NE0D ValidOut 6 Validin ExtRast Release RdRdy Figure 12 25 Block Read Response With Zero Wait State 12 28 System Interface Chapter 12 Master Slave Master lt gt lt lt gt SCycle tiare lal eee a
332. or bits are ignored 12 15 System Interface Chapter 12 Figure 12 12 illustrates a processor read request coupled with an uncompelled change to slave state Note Timings for the SysADC and SysCmdP buses are the same as those of the SysAD and SysCmd buses respectively lt Master gt lt gt scyle 2 3s 4 se 9 10 ar 12 SClock Watatatatatatataatats SysAD Bus Addr SysCmd Bus Read ValidOut 3 Validin RdRdy WrRady Release Figure Note Numbers in boxes correspond to numbered steps in preceding text Figure 12 12 Processor Read Request Protocol The assertion of Release indicates either an uncompelled change to slave state or a response to the assertion of ExtRqst whereupon the processor accepts either a read response or any other external request If any external request other than a read response is issued the processor performs another uncompelled change to slave state after processing the external request The actual read response where the external agent returns the requested data is shown later in this chapter External Instruction Read Response Time The R4600 R4700 accesses the external bus due to instruction cache miss or an uncached reference The length of time for an external read is based on the overhead at the beginning and end of the read along with the time to drive the address and get the response da
333. ore data Uncached Loads or Stores When the processor performs an uncached load it issues a nonco herent word read request the actual access can be for a doubleword word partial word or byte but the request is called a word read request to differentiate it from the block read request When the processor performs an uncached store it issues a doubleword partial doubleword word or partial word write request The CPU expects valid parity and data in the full SysAD bus all 64 bits even if it is looking for less than a double word Even if you do not want to return the full double word you still must tell it not to check the parity if you are not using all 64 bits In other words either return 64 bits with parity or tell it not to check parity All writes by the processor will be buffered from the system interface by the 4 deep write buffer The write requests are sent to the system inter face when there are no other requests in progress If the write buffer contains any entries when a block request is needed the write buffer is first flushed before any read request will occur cache miss or uncached load Both a data cache miss and an uncached data load will flush the write buffer CACHE Operations The processor provides a variety of CACHE operations to maintain the state and contents of the primary cache During the execution of the CACHE operation instructions the processor can issue write requests 12 13 System In
334. ore the rising edge of SClock and be held valid for a minimum of tpy ns after the rising edge of SClock Alignment to SClock Processor data becomes stable a minimum of tp ns and a maximum of tpo ns after the rising edge of SClock This drive time is the sum of the maximum delay through the processor output drivers together with the maximum clock to Q delay of the processor output registers Alignment to MasterClock Certain processor inputs specifically VCCOk ColdReset and Reset are sampled based on MasterClock while others are output based on MasterClock The same setup hold and drive off parameters tps tpp tpm and tpo shown in Figure 10 3 apply to these inputs and outputs but they are measured relative to MasterClock instead of SClock Phase Locked Loop PLL The processor aligns SyncOut PClock SClock TClock and RClock with internal phase locked loop PLL circuits that generate aligned clocks based on SyncOut Syncin By their nature PLL circuits are only capable of generating aligned clocks for MasterClock frequencies within a limited range 10 3 Clock Interface Chapter 10 Clocks generated using PLL circuits contain some inherent inaccuracy or jitter a clock aligned with MasterClock by the PLL can lead or trail MasterClock by as much as the related maximum jitter specified in the data sheet PLL Components and Operation The passive components required for the Phase Locked Loop circuit are contained i
335. ot a more complicated algorithm beyond the scope of this description may be required CPU Exception Processing Chapter 5 Breakpoint Exception This section explains the Breakpoint exception Cause A Breakpoint exception occurs when an attempt is made to execute the BREAK instruction This exception is not maskable Processing The common exception vector is used for this exception and the BP code in the Cause register is set The EPC register contains the address of the BREAK instruction unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction If the BREAK instruction is in a branch delay slot the BD bit of the Status register is set otherwise the bit is cleared Breakpoint exception processing is shown in Figure 5 15 on page 13 Servicing When the Breakpoint exception occurs control is transferred to the applicable system routine Additional distinctions can be made by analyzing the unused bits of the BREAK instruction bits 25 6 and loading the contents of the instruction whose address the EPC register contains A value of 4 must be added to the contents of the EPC register EPC register 4 to locate the instruction if it resides in a branch delay slot To resume execution the EPC register must be altered so that the BREAK instruction does not re execute this is accomplished by adding a value of 4 to the EPC register EPC register 4 before
336. ot COC z T target lt offset offset 07 T 1 if condition then PC lt PC target endif Note See the table Opcode Bit Encoding on next page or CPU Instruction Opcode Bit Encoding at the end of Appendix A Exceptions Coprocessor unusable exception Opcode Bit Encoding BCzF Bit 31 30 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 Boor 9 1 9 9 0 0 0 1 0 0 0 0 0 0 0 0 Bit 31 30 29 28 27 26 25 24 23 22 21 201918 17 16 0 BciF 9 1 9 9 0 1 0 1 0 0 0 0 0 0 0 0 Bit 31 30 29 28 27 26 25 24 23 22 21 20 1918 17 16 0 Bc2F 9 1 9 9 1 0 0 1 0 0 0 0 0 0 0140 KX NX A wy Opdode Y v kg peo BC sub opcode Branch condition Coprocessor Unit Number CPU Instruction Set Details Appendix A Branch On Coprocessor z Coprocessor Unit Number 31 26 25 21 20 16 15 0 COPz BC BCFL offset 0100xx 01000 00010 6 5 5 16 Format BCZFL offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the contents of coprocessor z s condition signal as sampled during the previous instruction is f
337. ot mapped the physical address selected is taken from bits 35 0 of the virtual address Bits 61 59 of the virtual address specify the cacheability and coherency attributes as shown in Table 4 5 Value Cacheability and Coherency Attributes Starting Address 61 59 O Cacheable noncoherent write through no 0x8000 0000 0000 0000 write allocate 1 Cacheable noncoherent write through write Ox8800 0000 0000 0000 allocate 2 Uncached 0x9000 0000 0000 0000 3 Cacheable noncoherent 0x9800 0000 0000 0000 4 7 Reserved OxA000 0000 0000 0000 Table 4 5 Cacheability and Coherency Attributes 64 bit Kernel Mode Kernel Space xkseg In Kernel mode when Status KX 1 and bits 63 62 of the 64 bit virtual address are 11 the address space selected is one of the following kernel virtual space xkseg the current supervisor virtual space the virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address e one of the four 32 bit kernel compatibility spaces as described in the next section 64 bit Kernel Mode Compatibility Spaces ckseg1 0 cksseg ckseg3 In Kernel mode when Status KX 1 bits 63 62 of the 64 bit virtual address are 11 and bits 61 31 of the virtual address equal 1 the lower two bytes of address as shown in Figure 4 6 select one of the following 512 Mbyte compatibility spaces e cksegO This 64 bit virtual address space is an unmapped regio
338. ow order bits are not zero an address exception will occur when the jump target instruction is subsequently fetched Operation T temp GPRirs T 1 PC lt temp Exceptions None CPU Instruction Set Details Appendix A L B Load Byte L B 31 26 25 21 20 16 15 0 LB base rt offset 100000 6 5 5 16 Format Description LB rt offset base The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of the byte at the memory location specified by the effective address are sign extended and loaded into general register rt Operation T vAddr lt offset s offsetys5 0 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrpgize 1 3 DAddre 9 xor ReverseEndian mem lt LoadMemory uncached BYTE pAddr vAddr DATA byte vAddrs o xor BigEndianCPU GPR rt mem7 8 pyte 56 mem7 g byte s byte Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception CPU Instruction Set Details Appendix A LBU Load Byte Unsigned LBU 31 26 25 21 20 16 15 0 LBU base rt offset 100100 6 5 5 16 Format LBU rt offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of the byte
339. pecified by fs and ft are interpreted in the specified format and arithmetically divided The result is rounded as if calculated to infinite precision and then rounded to the specified format according to the current rounding mode The result is placed in the floating point register specified by fd This instruction is valid for only single or double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Operation T StoreFPR fd fmt ValueFPR fs fmt ValueFPR ft fmt Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Unimplemented operation exception Invalid operation exception Division by zero exception Inexact exception Overflow exception Underflow exception FPU Instruction Set Details DMFC1 Doubleword Move From Floating Point Coprocessor Appendix B DMFC1 31 26 25 21 20 16 15 1110 0 COP1 DMF rt fs 0 010001 00001 000 0000 0000 6 5 5 5 11 Format DMFC1 rt fs Description The contents of register fs from the floating point coprocessor is stored into processor register rt The contents of general register rt are undefined for time
340. peration codes marked with a gamma cause a reserved instruction exception They are reserved for future versions of the architecture d Operation codes marked with a delta are valid only for R4600 processors with CPO enabled and cause a reserved instruction exception on other processors f Operation codes marked with a phi are invalid but do not cause reserved instruction exceptions in R4600 implementations Table A 4 A 152 FPU Instruction Set Appendix B Details Integrated Device Technology Inc Introduction This appendix provides a detailed description of each floating point unit FPU instruction refer to Appendix A for a detailed description of the CPU instructions The instructions are listed alphabetically and any exceptions that may occur due to the execution of each instruction are listed after the description of each instruction Descriptions of the immediate causes and the manner of handling exceptions are omitted from the instruction descriptions in this appendix refer to Chapter 7 for detailed descriptions of floating point exceptions and handling Figure B 3 on page B 45 lists the entire bit encoding for the constant fields of the floating point instruction set the bit encoding for each instruction is included with that individual instruction Instruction Formats There are three basic instruction format types e J Type or Immediate instructions which includ
341. place If this instruction should both fail and take an exception the exception takes precedence Operation vAddr offset s offsetys 0 GPR base pAddr uncached AddressTranslation vAddr DATA data lt GPRirl if LLbit then StoreMemory uncached DOUBLEWORD data pAddr vAddr DATA endif GPR rt 0 LLbit SyncOperation Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception A 104 CPU Instruction Set Details Appendix A SD Store Doubleword SD 31 26 25 21 20 16 15 0 base rt offset SD 111111 6 5 5 16 Format SD rt offset base Description The 16 bit offsetis sign extended and added to the contents of general register base to form a virtual address The contents of general register rt are stored at the memory location specified by the effective address If either of the three least significant bits of the effective address are non zero an address error exception occurs Operation T vAddr lt offset offset s 9 GPR base pAddr uncached lt AddressTranslation vAddr DATA data GPR rt StoreMemory uncached DOUBLEWORD data pAddr vAddr DATA Exceptions TLB refill exception TLB invalid exception TLB modification exception Bus error exception Address error exception A 105 CPU Instruction Set Details Appendix A
342. point exceptions while allowing both overlapped and pipelined operations Precise exceptions are extremely important in mission critical environments such as ADA and highly desirable for debugging in any environment The floating point unit s operation set includes floating point add subtract multiply divide square root conversion between fixed point and floating point format conversion among floating point formats and floating point compare These operations comply with the IEEE Standard 754 Table 1 19 shows the latencies of some of the floating point instructions in internal processor cycles Due to pipelining repeat rates may be higher Also note that many operations are autonomous and can go in parallel Operation Single Precision Double Precision ADD 4 4 SUB 4 4 MUL R4600 8 R4600 8 R4700 4 R4700 5 DIV 32 61 SQRT 31 60 CMP 3 3 FIX 4 4 FLOAT 6 6 ABS 1 1 MOV 1 1 NEG 1 1 LWC1 LDC1 2 2 SWC1 SDC1 1 1 Table 1 19 Floating Point Latency Cycles Overview Chapter 1 Virtual to Physical Address Mapping The R4600 R4700 provides three modes of operation e user mode e supervisor mode e kernel mode This mechanism is available to system software to provide a secure environment for user processes Bits in a status register determine the mode of operation In the user mode the R4600 R4700 provides a single uniform virtual address space of 256GB 2GB when Status UX
343. possible as results less than equal greater than and unordered The last case arises when one or both of the operands are NaN every NaN compares unordered with everything including itself Comparisons ignore the sign of zero so O 0 This instruction is valid only for single and double precision floating point formats The operation is not defined if bit O of any register specification is set and the FR bit in the Status register equals zero since the register numbers specify an even odd pair of adjacent coprocessor general registers When the FR bit in the Status register equals one both even and odd register numbers are valid Note See FPU Instruction Opcode Bit Encoding at the end of Appendix B FPU Instruction Set Details Appendix B Operation T if NaN ValueFPR fs fmt or NaN ValueFPR ft fmt then less lt false equal lt false unordered lt true if cond then signal InvalidOperationException endif else less ValueFPR fs fmt lt ValueFPR ft fmt equal ValueFPR fs fmt ValueFPR ft fmt unordered lt false endif condition conds and less or cond and equal or condo and unordered FCR 31 o3 lt condition COC 1 condition Exceptions Coprocessor unusable Floating Point exception Coprocessor Exceptions Unimplemented operation exception Invalid operation exception FPU Instruction Set Details Appendix B CEIL L fmt cetingt Long CEIL L fmt Fixe
344. processor has issued a write request in R4x00 compatible write mode set at boot time the processor cannot issue a subsequent re quest until at least four cycles after the issue cycle of the write request This means back to back write requests with a single data cycle are sepa rated by two unused system cycles as shown in Figure 12 6 e After the processor has issued a write request in either of the two new write modes write reissue and pipelined writes the processor can issue a subsequent write immediately provided the WrRdy requirement is meet This is discussed in more detail later in this chapter 12 6 System Interface Chapter 12 scyce 1 2 3 SClock re ae ae ae Cycles SysAD Bus Addr Data Y Addr Data Write 1 Write 2 WrRdy i Figure 12 6 Back to Back Write Cycle Timing R4000 compatible mode Processor Requests A processor request is a request or a series of requests through the system interface to access some external resource As shown in Figure 12 7 processor requests include only reads and writes R4600 R4700 External Agent Processor Requests Read e Write Figure 12 7 Processor Requests Read request asks for a block doubleword partial doubleword word or partial word of data either from main memory or from another system resource Write request provides a block doubleword partial doubleword word or partial word of da
345. ption flag causes it to write various CPO registers with the exception state change the current PC to the appropriate exception vector address and clear the exception bits of earlier pipeline stages This implementation allows all preceding instructions to complete execution and prevents all subsequent instructions from completing Thus the value in the EPC is sufficient to restart execution It also ensures that exceptions are taken in the order of execution an instruction taking an exception may itself be killed by an instruction further down the pipeline that takes an exception in a later cycle Figure 3 6 shows the exception detection procedure e g a reserved instruction exception Exception Vector Exception Vector Address Figure 3 6 Exception Detection Stall Conditions Stalls are used to stop the pipeline for conditions detected after the R pipe stage When a stall occurs the processor will resolve the condition and then the pipeline will continue The CPU Pipeline Chapter 3 Figure 3 7 shows a data cache miss stall R R Detect Cache Miss Start moving dirty cache line data to write buffer Get first doubleword into cache and restart pipeline Load remainder of cache line into cache Figure 3 7 Data Cache Miss The data cache miss is detected in the D pipe
346. r 17 LLAddr Load Linked Address 18 19 Reserved 20 XContext Pointer to kernel virtual PTE table for 64 bit address spaces 21 25 Reserved 26 ECC Secondary cache error checking and correcting ECC and Primary parity 27 CacheErr Cache Error and Status register 28 TagLo Cache Tag register 29 TagHi Cache Tag register 30 ErrorEPC Error Exception Program Counter 31 Reserved Table 1 18 System Control Coprocessor CPO Register Definitions Overview Chapter 1 Floating Point Co Processor The R4600 R4700 incorporates an entire floating point co processor on chip including a floating point register file and execution units The floating point co processor forms a seamless interface with the integer unit decoding and executing instructions in parallel with the integer unit The R4700 enhances the FPA implemented in the original R4600 resulting in an improved peak MFLOP rate Floating Point Units The R4600 R4700 floating point execution units supports single and double precision arithmetic as specified in the IEEE Standard 754 The execution unit is broken into a separate multiply unit and a combined add convert divide square root unit Overlap of multiplies and add subtract is supported The multiplier is partially pipelined allowing a new multiply to begin every 6 cycles for the R4600 and every 4 cycles for the R4700 As in the R3010 and R4000 the R4600 R4700 maintains fully precise floating
347. r Block Read Request SysCmd 2 0 Read Data Size 1 byte valid Byte 2 bytes valid Halfword 3 bytes valid Tribyte 4 bytes valid Word 5 bytes valid Quintibyte 6 bytes valid Sextibyte 7 bytes valid Septibyte 8 bytes valid Doubleword 0 1 2 3 4 5 6 7 Table 12 9 Doubleword Word or Partial word Read Request Data Size Encoding of SysCmd 2 0 Write Requests Figure 12 30 shows the format of a SysCmd write request Write Request Specific see tables Figure 12 30 Write Request SysCmd Bus Bit Definition 12 34 System Interface Chapter 12 Table 12 10 lists the write attributes encoded in bits SysCmd 4 3 Table 12 11 lists the block write replacement attributes encoded in bits SysCmd 2 0 Table 12 12 lists the write request bit encoding in SysCmd 2 0 SysCmd 4 3 Write Attributes Reserved Reserved Block write wi N oO Doubleword partial doubleword word or partial word Table 12 10 Write Request Encoding of SysCmd 4 3 SysCmd 2 Cache Line Replacement Attributes 0 Cache line replaced Cache line retained SysCmd 1 0 Write Block Size Reserved 8 words Nl o 3 Reserved Table 12 11 Block Write Request Encoding of SysCmd 2 0 SysCmd 2 0 Write Data Size 1 byte valid Byte 2 bytes valid Halfword 3 bytes valid Tribyte
348. r instructions perform operations in the coprocessors Coprocessor load and store instructions are I type e Coprocessor 0 system coprocessor instructions perform operations on CPO registers to control the memory management and exception handling facilities of the processor and the standby mode for power management These are listed in Table 1 17 e Special instructions perform system calls and breakpoint operations These instructions are always R type e Exception instructions cause a branch to the general exception han dling vector based upon the result of a comparison These instruc tions occur in both R type both the operands and the result are registers and I type one operand is a 16 bit immediate value for mats Overview Chapter 1 Chapter 2 provides more detail about these instructions and Appendix A gives a complete description of each Table 1 1 through Table 1 16 list CPU instructions common to MIPS R Series processors along with the level in which they first appeared The last column in each table refers to the MIPS ISA level in which the instruction first appeared Table 1 17 lists CPO instructions OpCode Description MIPS ISA Level LB Load Byte I LBU Load Byte Unsigned I LH Load Halfword I LHU Load Halfword Unsigned I LW Load Word I LWL Load Word Left I LWR Load Word Right I SB Store Byte I SH Store Halfword I SW Store Word I SWL Store Word Left I SWR
349. rand fault signal This error is usually fatal CPU Exception Processing Chapter 5 Coprocessor Unusable Exception This sections explains the Coprocessor Unusable exception Cause The Coprocessor Unusable exception occurs when an attempt is made to execute a coprocessor instruction for either e a corresponding coprocessor unit that has not been marked usable or e CPO instructions when the unit has not been marked usable and the process executes in User mode This exception is not maskable Processing The common exception vector is used for this exception and the CPU code in the Cause register is set The contents of the Coprocessor Usage Error field of the coprocessor Control register indicate which of the four coprocessors was referenced The EPC register contains the address of the unusable coprocessor instruction unless it is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction Coprocessor Unusable exception processing is shown in Figure 5 15 on page 13 Servicing The coprocessor unit to which an attempted reference was made is identified by the Coprocessor Usage Error field which results in one of the following situations e If the process is entitled access to the coprocessor the coprocessor is marked usable and the corresponding user state is restored to the co processor e If the process is entitled access to the coprocessor but the coproces sor doe
350. ration and Coherency Chapter 11 Introduction 11 1 Memory Organization 11 1 Overview of Cache Operations 11 2 R4600 R4700 Cache Description 11 2 Cache Line Size 11 2 Cache Organization and Accessibility 11 2 Organization of the Primary Instruction Cache I Cache 11 3 Organization of the Primary Data Cache D Cache 11 3 Accessing the Primary Caches 11 5 Cache States 11 5 Primary Cache States 11 6 Cache Line Ownership 11 6 Cache Write Policy 11 6 Cache State Transition Diagrams 11 7 Cache Coherency Overview 11 7 Cache Coherency Attributes 11 7 Uncached 11 8 Noncoherent 11 8 Cache Operation Modes 11 8 R4600 R4700 Processor Synchronization Support 11 8 Test and Set 11 8 Counter 11 9 Load Linked and Store Conditional 11 10 Examples Using LL and SC 11 11 System Interface Chapter 12 Introduction 12 1 Terminology 12 1 System Interface Description 12 1 Interface Buses 12 2 Address and Data Cycles 12 2 Issue Cycles 12 3 Handshake Signals 12 4 System Interface Protocols 12 4 Master and Slave States 12 5 Moving from Master to Slave State 12 5 External Arbitration 12 5 Uncompelled Change to Slave State 12 5 Processor and External Requests 12 6 Rules for Processor Requests 12 6 Processor Requests 12 7 Processor Read Request 12 8 Processor Write Request 12 8 External Requests 12 9 External Read Request 12 10 External Write Request 12 10 Read Response 12 10 viii Table of Contents Table of Contents Handling Requests 12 11 Load
351. rced to Kernel Mode amp interrupt disabled 0 normal 1 bootstrap Base is sign extended for 64 bits PC lt OxFFFF FFFF 8000 0000 PC lt OxFFFF FFFF BFCO 0200 180 180 unmapped cached unmapped uncached ag To General Exception Servicing Guidelines Exceptions other than Reset Soft Reset NMI CacheErr or first level TLB miss Note Interrupts can be masked by IE or IMs Figure 5 16 General Exception Handler HW CPU Exception Processing Chapter 5 Comments MFCO Unmapped vector so TLBMod TLBinv on TLB Refill exceptions not possible EP Status EXL 1 so Interrupt exceptions disabled Cause OS System to avoid all other exceptions Only CacheErr Reset Soft Reset NMI exceptions possible MTCO Set Status Bits KSU lt 00 FAE S 0 optional only to enable Interrupts while keeping Kernel Mode Y After EXL 0 all exceptions allowed Check CAUSE REG amp Jump to except interrupt if masked by IE or IM appropriate Service Code and CacheErr if masked by DE Y Service Code MTCO EPC STATUS ERET is not allowed in the branch delay slot of another Jump Instruction Processor does not execute the instruction which is in the ERET s branch delay slot PC EPC EXL lt 0 LLbit 0 Figure 5 17 General Excepti
352. rd Word or Partial word Write Request Data Size Encoding of SysCmd 2 0 12 35 Table 12 13 External Null Request Encoding of SysCmd 4 3 12 36 Table 12 14 Processor Data Identifier Encoding of SysCmd 7 3 12 37 Table 12 15 External Data Identifier Encoding of SysCmd 7 3 12 38 Table 12 16 Sequence of Doublewords Transferred Using Subblock Ordering Address 102 12 40 Table 12 17 Sequence of Doublewords Transferred Using Subblock Ordering Address 112 12 40 Table 12 18 Sequence of Doublewords Transferred Using Subblock Ordering Address 012 12 40 Table 12 19 Partial Word Transfer Byte Lane Usage 12 41 Table 14 1 Error Checking and Correcting Summary for Internal Transactions 14 3 Table 14 2 Error Checking and Correcting Summary for External Transactions 14 3 Table A 1 CPU Instruction Operation Notations A 3 Table A 2 Load and Store Common Functions A 4 Table A 3 Access Type Specifications for Loads Stores A 5 Table B 1 Valid FPU Instruction Formats B 2 Table B 2 Logical Negation of Predicates by Condition True False B 3 Table B 3 Load and Store Common Functions B 5 Table B 4 Format Field Decoding B 6 Table B 5 Floating Point Instructions and Operations B 7 Table C 1 Primary Data Cache Operations C 2 Table C 2 Primary Instruction Cache Operations C 3 Table E 3 Coprocessor 0 Hazards E 1 xiii Overview Chapter 1 dt Integrated Device Technology Inc Introduction The IDT79R4600 R4600 and IDT79R4700 R4700 support a w
353. re not listed and software should not rely on the revision number to characterize the chip Control Status Register FCR31 The Control Status register FCR31 contains control and status information that can be accessed by instructions in either Kernel or User mode FCR31 also controls the arithmetic rounding mode and enables User mode traps as well as identifying any exceptions that may have occurred in the most recently executed instruction along with any exceptions that may have occurred without being trapped Floating Point Unit Chapter 6 Figure 6 4 on page 6 5 shows the format of the Control Status register and Table 6 3 which follows the figure describes the Control Status register fields Figure 6 5 on page 6 5 shows the Control Status register Cause Flag and Enable fields Control Status Register FCR31 31 25 24 23 22 12 11 caus enables age EVZOUI VZOUI VZOUI Figure 6 4 FP Control Status Register Bit Assignments Field Description FS When set denormalized results are flushed to O instead of causing an unimplemented operation exception Cc Condition bit See description of Control Status register Condition bit Cause Cause bits See Figure 6 5 and the description of Control Status register Cause Flag and Enable bits Enables Enable bits See Figure 6 5 and the description of Control Status register Cause Flag and Enable bits Flags Flag bits See Figure 6 5 and the d
354. register rt are shifted right by the number of bits specified by the low order six bits of general register rs inserting zeros into the high order bits The result is placed in register rd Operation T s lt GPRIrs 5 0 GPR rd 0 GPR rtlg3 5 Exceptions None CPU Instruction Set Details DSRL32 eword Shift Right Logical 32 Appendix A DSRL32 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL 0 rt rd sa DSRL32 000000 00000 111110 6 5 5 5 5 6 Format DSRL32 rd rt sa Description The contents of general register rt are shifted right by 32 sa bits inserting zeros into the high order bits The result is placed in register rd Operation T s lt 1 sa GPR rd 0 GPR rt g3 Exceptions None CPU Instruction Set Details Appendix A DSUB Doubleword Subtract DSUB 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 DSUB 000000 00000 101110 6 5 5 5 5 6 Format DSUB rd rs rt Description The contents of general register rt are subtracted from the contents of general register rs to form a result The result is placed into general register rd The only difference between this instruction and the DSUBU instruction is that DSUBU never traps on overflow An integer overflow exception takes place if the carries out of bits 62 and 63 differ 2 s complement overflow The destination register r
355. ress Figure 1 9 Little Endian Misaligned Word Addressing Coprocessors CP0 CP2 The MIPS ISA MIPS II for the R4600 R4700 and R4000 R4400 defines three coprocessors designated CPO through CP2 e Coprocessor O CPO is incorporated on the CPU chip and supports the virtual memory system and exception handling CPO is also re ferred to as the System Control Coprocessor e Coprocessor 1 CP1 is incorporated on the R4600 R4700 and imple ments the MIPS floating point instruction set e Coprocessor 2 CP2 is reserved for future use CPO and CP1 are described in the sections that follow System Control Coprocessor CPO CPO translates virtual addresses into physical addresses and manages exceptions and transitions between kernel supervisor and user states CPO also controls the cache subsystem as well as providing diagnostic control and error recovery facilities CPO is also used to control the power management for the R4600 R4700 This is the standby mode and it can be used to reduce the power consumption of the internal core of the CPU The standby mode is entered by executing the WAIT instruction with the SysAD bus idle and is exited by any interrupt This feature is discussed in Appendix G Overview Chapter 1 The CPO registers shown in Figure 1 10 and described in Table 1 18 on page 1 17 manipulate the memory management and exception handling capabilities of the CPU Note Access to reserved or undefined CPO register
356. ress space is selected Addressing of the csseg is compatible with addressing sseg in 32 bit mode The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space begins at virtual address OxFFFF FFFF C000 0000 and runs through OxFFFF FFFF DFFF FFFF Kernel Mode Operations The processor operates in Kernel mode when the Status register contains one of the following values e KSU 00 EXL 1 e ERL 1 In conjunction with these bits the KX bit in the Status register selects between 32 or 64 bit Kernel mode addressing e when KX 0 32 bit kernel space virtual addressing is selected e when KX 1 64 bit kernel space virtual addressing is selected The processor enters Kernel mode whenever an exception is detected and it remains in Kernel mode until an Exception Return ERET instruction is executed The ERET instruction restores the processor to the mode existing prior to the exception Kernel mode virtual address space is divided into regions differentiated by the high order bits of the virtual address as shown in Figure 4 6 Memory Management E000 0000 0x C000 0000 0 5 GB Unmapped Uncached 0 5 GB Unmapped ox 8000 0000 _ Cached Ox A000 0000 Ox 0000 0000 Ox FFFF FFFF Ox FFFF FFFF E Ox FFFF Ox FFFF Ox FFFF FFFF Ox C000 OOFF Ox C000 0000 0x 8000 0000 Ox 4000 0100 Chapter 4 0 5 GB Unm
357. ress which may specify an arbitrary byte It alters only the word in memory which contains that byte From one to four bytes will be stored depending on the starting byte specified Conceptually it starts at the least significant rightmost byte of the register and copies it to the specified byte in memory then copies bytes from register to memory until it reaches the high order byte of the word in memory No address exceptions due to alignment are possible address 4 address 0 address 4 address 0 memory big endian register 5 s before A B C D 24 SWR 24 1 0 after A 128 CPU Instruction Set Detail sS Appendix A Operation T vAddr lt offset s 4 offset 45 9 GPR base pAddr uncached AddressTranslation vAddr DATA pAddr pAddrpgize 1 3 0Addr2 9 xor ReverseEndian If BigEndianMem 0 then pAddr lt pAddrs _ 07 endif byte vAddr 9 xor BigEndianCPU if vAddro xor BigEndianCPU 0 then data 0 GPRIrt 31 8 byte o 08 gt Y else data GPRI rt 31 8byte o 0 PM 0 endif StoreMemory uncached WORD byte data pAddr vAddr DATA Given a doubleword in a register and a doubleword in memory the operation of SWR is as follows TLB refill exception TLB invalid exception
358. results are unde fined An exception may or may not result Register Name Register Name Index Config LLAddr LDLo E E XContext Random EntryLo0 EntryLo1 Context PageMask Wired E BadVAddr Count EntryHi Compare SR Cause EPC PRId O DAN OO oO RA ON a o ECC CacheErr TagLo TagHi ErrorEPC DL E _ _ N o A ol Exception Processing Memory Management Figure 1 10 R4600 R4700 CPO Registers Overview Chapter 1 Number Register Description 0 Index Programmable pointer into TLB array 1 Random Pseudorandom pointer into TLB array read only 2 EntryLoO Low half of TLB entry for even virtual page VPN 3 EntryLol Low half of TLB entry for odd virtual page VPN 4 Context Pointer to kernel virtual page table entry PTE for 32 bit address spaces 5 PageMask TLB Page Mask 6 Wired Number of wired TLB entries 7 Reserved 8 BadVAddr Bad virtual address 9 Count Timer Count 10 EntryHi High half of TLB entry 11 Compare Timer Compare 12 SR Status register 13 Cause Cause of last exception 14 EPC Exception Program Counter 15 PRId Processor Revision Identifier 16 Config Configuration registe
359. rface Command Syntax 12 33 Read Requests 12 33 Write Requests 12 34 Null Requests 12 36 System Interface Data Identifier Syntax 12 36 Noncoherent Data 12 36 Data Identifier Bit Definitions 12 37 System Interface Addresses 12 38 Addressing Conventions 12 38 Subblock Ordering 12 38 Example of Sequential Ordering 12 39 Example of Subblock Ordering 12 39 Processor Internal Address Map 12 42 Table of Contents Table of Contents R4600 R4700 Processor Interrupts Introduction Hardware Interrupts Nonmaskable Interrupt NMI Asserting Interrupts R4600 R4700 Error Checking Introduction Error Checking in the Processor Types of Error Checking Parity Error Detection Error Checking Operation System Interface System Interface Command Bus Summary of Error Checking Operations CPU Instruction Set Details Introduction Instruction Classes Instruction Formats Instruction Notation Conventions Instruction Notation Examples Load and Store Instructions Jump and Branch Instructions Coprocessor Instructions System Control Coprocessor CPO Instructions CPU Instruction Opcode Bit Encoding FPU Instruction Set Details Introduction Instruction Formats Floating Point Loads Stores and Moves Floating Point Operations Instruction Notation Conventions Instruction Notation Examples Load and Store Instructions Computational Instructions FPU Instruction Opcode Bit Encoding Cache Operations Timing Introduction Caveats
360. rface to from the primary caches System Interface The processor generates correct check bits for doubleword word or partial word data transmitted to the system interface As it checks for data correctness the processor passes data check bits from the primary cache directly without changing the bits to the system interface The processor does not check data received from the system interface for external writes By setting the NChck bit in the data identifier it is possible to prevent the processor from checking read response data from the system interface For cache refill if the NChck bit is set the CPU will generally correct parity before placing data into the cache The R4600 R4700 only checks parity for the first double word returned on a block instruction fetch that is for the double word that contains the instruction that was missed on in the cache This double word is checked just as if it had been read out of the ICache This parity check is done as a byte parity check For single read and with the NChck bit set the CPU will check parity for all 64 bit even if the transfer size is less than that When the R4600 R4700 is checking parity it does not actually regenerate the word parity but rather turns the byte parity supplied by the system into word parity It XORS the bits in groups of four As a result if bad byte parity is supplied by the system bad word parity will get written into the cache This is done to be consistent
361. ribed in this section connecting to a gate array device or connecting to discrete CMOS logic devices Connecting to a Gate Array Device When connecting to a gate array device both RClock and TClock are used within the gate array The gate array internally buffers RClock and uses this buffered version to clock registers that sample processor outputs These sampling registers should be immediately followed by staging registers clocked by an internally buffered version of TClock This buffered version of TClock should be the global system clock for the logic inside the gate array and the clock for all registers that drive processor inputs Figure 10 7 on page 10 7 is a block diagram of this circuit Staging registers place a constraint on the sum of the clock to Q delay of the sample registers and the setup time of the synchronizing registers inside the gate arrays as shown in the following equation Clock to Q Delay Setup of Synch Register lt 0 25 RClock period Max Clock Jitter for RClock Max Delay Mismatch for Clock Buffers on RClock and TClock 10 6 Clock Interface Chapter 10 Figure 10 7 is a block diagram of a system without phase lock using the R4600 R4700 processor with an external agent implemented as a gate array Sampling Staging MasterClock Register Register R4600 R4700 MasterClock SysCmd SysAD SyncOut Syncin RClock TClock
362. rieve the value of an FPR or to change the value of an FGR FPU Instruction Set Details Appendix B value lt ValueFPR fpr fmt case fmt of S W if FGRp 0 value FGR fpr else value FGR fpr 1 endif D undefined for fpr not even value FGRI fpr end StoreFPR fpr fmt value case fmt of S W if FGRp 0 FGRI fpr FGRIfprlg3 39 value else FGRIfpr 1 lt value FGRI fpr 1 3 0 endif D undefined for fpr not even FGRIfpr lt value end FR 1 value ValueFPR fpr fmt case fmt of S value FGRI fpr s1 0 D L value FGRI fpr W value FGR fpr end StoreFPR fpr fmt value case fmt of S W FGRIfpr undefined value D L FGRIfpr lt value end FPU Instruction Set Details Appendix B ABS fmt Absolute Value ABS fmt 31 26 25 21 20 16 15 11 10 6 5 0 fmt 0 fs fd ABS 010001 00000 000101 5 5 5 5 6 Format ABS fmt fd fs Description The contents of the FPU register specified by fs are interpreted in the specified format and the arithmetic absolute value is taken The result is placed in the floating point register specified by fd The absolute value operation is arithmetic a NaN operand signals invalid operation This instruction is valid only for single and double precision floating point formats The operation is not defined if bit O of any register specification is
363. rrupt is caused either by an external write request to the R4600 R4700 or by a dedicated pin in the R4600 R4700 This pin is latched into an internal register by the rising edge of SClock Asserting Interrupts External writes to the CPU are directed to various internal resources based on an internal address map of the processor When SysAD 6 0 0 during an ADDR cycle of external write request an external write to any address writes to an architecturally transparent register called the Interrupt register this register is available for external write cycles but not for external reads During a data cycle SysAD 22 16 are the write enables for the seven individual Interrupt register bits 0 disabled 1 enabled and SysAD 6 0 are the values to be written into these bits 0 no interrupt 1 interrupt This allows any subset of the Interrupt register to be set or cleared with a single write request Figure 13 1 shows the mechanics of an external write to the Interrupt register Interrupt register SysAD 6 0 Interrupt Value 5 4 See Figure 13 2 and Figure 13 3 22 at 20 19 18 17 46 SysAD 22 16 Write Enables Figure 13 1 Interrupt Register Bits and Enables 13 1 R4600 R4700 Processor Interrupts Chapter 13 Figure 13 2 shows how the R4600 R4700 interrupts are readable through the Cause register The interrupt bits Int 5 0 are latched into the internal register by the rising edge of SClock e
364. rt 0 DMULT 000000 000000 0000 011100 6 5 5 10 6 Format DMULT rs rt Description The contents of general registers rs and rt are multiplied treating both operands as 2 s complement values No integer overflow exception occurs under any circumstances When the operation completes the low order word of the double result is loaded into special register LO and the high order word of the double result is loaded into special register HI If either of the two preceding instructions is MFHI or MFLO the results of these instructions are undefined Correct operation requires separating reads of HI or LO from writes by a minimum of two other instructions Operation T 2 LO lt undefined HI lt undefined T 1 LO lt undefined HI lt undefined Te ot lt GPR rs GPR rt LO lt tg3 9 HI lt ty27 64 Exceptions None CPU Instruction Set Details Appendix A DMULTU Snina O DMULTU 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt 0 DMULTU 000000 000000 0000 011101 6 5 5 10 6 Format DMULTU rs rt Description The contents of general register rs and the contents of general register rt are multiplied treating both operands as unsigned values No overflow exception occurs under any circumstances When the operation completes the low order word of the double result is loaded into special register LO and the high order word of the double result is loaded into special register HI If
365. rtual address are 1109 the sseg virtual address space is selected it covers 279 bytes 512Mbytes of the current supervisor address space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space begins at virtual address OxCO0OO 0000 and runs through OxDFFF FFFF 64 bit Supervisor Mode User Space xsuseg In Supervisor mode when Status SX 1 and bits 63 62 of the virtual address are set to 003 the xsuseg virtual address space is selected it covers the full 27 bytes 1Tbyte of the current user address space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space starts at virtual address 0x0000 0000 0000 0000 and runs through 0x0000 OOFF FFFF FFFF 64 bit Supervisor Mode Current Supervisor Space xsseg In Supervisor mode when Status SX 1 and bits 63 62 of the virtual address are set to Oly the xsseg current supervisor virtual address space is selected The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space begins at virtual address 0x4000 0000 0000 0000 and runs through 0x4000 OOFF FFFF FFFF Memory Management Chapter 4 64 bit Supervisor Mode Separate Supervisor Space csseg In Supervisor mode when Status SX 1 and bits 63 62 of the virtual address are set to 11 the csseg separate supervisor virtual add
366. ry cache data parity bits for cache initialization cache diagnostics or cache error processing Tag parity is loaded from and stored to the TagLo register The ECC register is loaded by the Index Load Tag CACHE operation Content of the ECC register is e written into the primary data cache on store instructions instead of the computed parity when the CE bit of the Status register is set e substituted for the computed instruction parity for the CACHE oper ation Fill To force a cache parity value use the Status CE bit and the ECC register 5 9 CPU Exception Processing Chapter 5 Figure 5 9 shows the format of the ECC register Table 5 7 which follows the figure describes the register fields ECC Register Field Description ECC An 8 bit field specifying the parity bits read from or written to a primary cache Reserved Must be written as zeroes and returns zeroes when read Table 5 7 ECC Register Fields Cache Error CacheErr Register 27 The 32 bit read only CacheErr register processes parity errors in the primary cache Parity errors cannot be corrected The CacheErr register holds cache index and status bits that indicate the source and nature of the error it is loaded when a Cache Error exception is asserted When a read response returns with bad parity this exception is also asserted Figure 5 10 shows the format of the CacheErr register which follows the figure des
367. s offset 0 condition GPR rs g3 1 and GPR rs 064 T 1 if condition then PC PC target else NullifyCurrentinstruction endif Exceptions None CPU Instruction Set Details BLTZ Branch On Less Than Zero Appendix A BLTZ 31 26 25 21 20 16 15 0 REGIMM rs BLTZ offset 000001 00000 6 5 5 16 Format BLTZ rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the contents of general register rs have the sign bit set then the program branches to the target address with a delay of one instruction Operation target lt offset s offset 07 condition GPRirs g3 1 T 1 if condition then PC lt PC target endif Exceptions None CPU Instruction Set Details Appendix A BLTZAL thanZeroandtink BLTZAL 31 26 25 21 20 16 15 0 REGIMM rs BLTZAL offset 000001 10000 6 5 5 16 Format BLTZAL rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended Unconditionally the address of the instruction after the delay slot is placed in the link register r31 If the contents of general register rs have the sign bit set then the program branches to t
368. s sign extending the high order bits The result is placed in register rd Operation T s lt GPR rs 5 6 GPR rd GPR rt g3 GPRIrt e3 s Exceptions None CPU Instruction Set Details Appendix A DSRA32 inme 32 DSRA32 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL 0 rt rd sa DSRA32 000000 00000 111111 6 5 5 5 5 6 Format DSRA32 rd rt sa Description The contents of general register rt are shifted right by 32 sa bits sign extending the high order bits The result is placed in register rd Operation T s lt 1 sa GPR rd lt GPR rt g3 GPR rt 63 5 Exceptions None CPU Instruction Set Details Appendix A DSRL Doubleword DSRL Shift Right Logical 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL 0 rt rd sa DSRL 000000 00000 111010 6 5 5 5 5 6 Format DSRL rd rt sa Description The contents of general register rt are shifted right by sa bits inserting zeros into the high order bits The result is placed in register rd Operation T s 0 sa GPR rd 0 GPR rt e3 s Exceptions None CPU Instruction Set Details DSRLV Doubleword Shift Right Logical Variable Appendix A DSRLV 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL rs rt rd 0 DSRLV 000000 00000 010110 6 5 5 5 5 6 Format DSRLV rd rt rs Description The contents of general
369. s a TLB hit If there is no match a TLB Miss exception is taken by the processor and software is allowed to refill the TLB from a page table of virtual physical addresses in memory If there is a virtual address match in the TLB the physical address is output from the TLB and concatenated with the Offset which represents an address within the page frame space The Offset does not pass through the TLB Virtual to physical translation is described in greater detail throughout the remainder of this chapter Figure 4 19 on page 22 is a flow diagram of the process The next two sections describe the 32 bit and 64 bit address translations Memory Management Chapter 4 32 bit Virtual Address Translation Figure 4 2 shows the virtual to physical address translation of a 32 bit virtual address e The top portion of Figure 4 2 shows a virtual address with a 12 bit or 4Kbyte page size labelled Offset The remaining 20 bits of the ad dress represent the VPN and index the 1M entry page table The bottom portion of Figure 4 2 shows a virtual address with a 24 bit or 16Mbyte page size labelled Offset The remaining 8 bits of the address represent the VPN and index the 256 entry page table Virtual Address with 1M 22 4 Kbyte pages 3231 29 28 20 bits 1M pages 12 11 VPN Offset J Virtual to physical Offset passed translation in TLB unchanged to Bits 31 30 and 29 of the virtual TLB physical address select user supervi
370. s by providing larger on chip caches that are two way set associative fewer pipeline stalls and early restart for data cache misses The result is higher performance than for an R4000 at the same frequency and for the same system latencies exact figures are system dependent Upward Compatibility The R4600 R4700 provides complete upward application software compatibility with the IDT79R3000 family of microprocessors including the IDT79R3000A and the IDT RISController family IDT79R30xx family as well the IDT79R4000 family of microprocessors Microsoft Windows NT and UNISOFT Unix V 4 operating systems insure the availability of thousands of applications programs geared to provide a complete solution to a large number of processing needs An array of development tools facilitates the rapid development of R4600 R4700 based systems enabling a wide variety of customers to take advantage of the MIPS Open Architecture philosophy Together with the R4400 the R4600 R4700 provides a compatible timely and necessary evolution path from 32 bit to true 64 bit computing The original design objectives of the R4000 clearly mandated this evolution path the result is a true 64 bit processor fully compatible with 32 bit operating systems and applications The R4600 R4700 enables 32 bit applications to access 64 bit compute power painlessly The software tools support a wide variety of models including 32 bit address and data 64 bit address and da
371. s complement format Unsigned fixed point values are not directly provided by the floating point instruction set Figure 6 8 illustrates binary fixed point format Table 6 8 which follows the figure lists the binary fixed point format fields 31 30 0 Sign Integer 1 31 Figure 6 8 Binary Fixed Point Format Field Description sign sign bit integer integer value Table 6 8 Binary Fixed Point Format Fields Floating Point Unit Chapter 6 Floating Point Instruction Set Overview All FPU instructions are 32 bits long aligned on a word boundary They can be divided into the following groups e Load Store and Move instructions move data between memory the main processor and the FPU General Purpose registers e Conversion instructions perform conversion operations between the various data formats e Computational instructions perform arithmetic operations on float ing point values in the FPU registers e Compare instructions perform comparisons of the contents of regis ters and set a conditional bit based on the results Branch on FPU Condition instructions perform a branch to the spec ified target if the specified coprocessor condition is met Table 6 9 through Table 6 12 list the instruction set of the FPU A complete description of each instruction is provided in Appendix B In the instruction formats shown in Table 6 9 through Table 6 12 the fmt appended to the instruction opco
372. s g3 1 GPR 31 PC 8 T 1 if condition then PC amp PC target else NullifyCurrentinstruction endif Exceptions None CPU Instruction Set Details Appendix A B LTZ L Branch On Less Than Zero Likely B LTZ L 31 26 25 21 20 16 15 0 REGIMM rs BLTZL offset 000001 00010 6 5 5 16 Format BLTZ rs offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended If the contents of general register rs have the sign bit set then the program branches to the target address with a delay of one instruction If the conditional branch is not taken the instruction in the branch delay slot is nullified Operation T target e offset 6 offset 07 condition GPR rs g3 1 T 1 if condition then PC PC target else NullifyCurrentInstruction endif Exceptions None CPU Instruction Set Details Appendix A BNE Branch On Not Equal BNE 31 26 25 21 20 16 15 0 BNE rs rt offset 000101 6 5 5 16 Format BNE rs rt offset Description A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16 bit offset shifted left two bits and sign extended The contents of general register rs and the contents of general register rt are compared If the two registers are not equ
373. s in an Address Error exception Figure 4 5 Supervisor Mode Virtual Address Space Memory Management Chapter 4 Status Register Bit Values Address Bit Segment Segment Values KSU EXL ERL SX Name Address Range Size 32 bit Ol 0 o o suseg 0x0000 0000 2 Gbytes A 31 0 through 23 bytes Ox7FFF FFFF 32 bit Oly O 0 0 sseg OxC000 0000 512 Mbytes A 31 29 1103 through 29 bytes OxDFFF FFFF 64 bit Ol O o 1 xsuseg 0x0000 0000 0000 0000 1 Tbyte A 63 62 003 through 27 bytes 0x0000 OOFF FFFF FFFF 64 bit Ol O o 1 xsseg 0x4000 0000 0000 0000 1 Tbyte A 63 62 O14 through 240 bytes 0x4000 OOFF FFFF FFFF 64 bit Ol O o 1 csseg OxFFFF FFFF C000 0000 512 Mbytes A 63 62 113 through 29 bytes OxFFFF FFFF DFFF FFFF Table 4 2 32 bit and 64 bit Supervisor Mode Segments 32 bit Supervisor Mode User Space suseg In Supervisor mode when Status SX 0 and the most significant bit of the 32 bit virtual address is set to 0 the suseg virtual address space is selected it covers the full 2 bytes 2Gbytes of the current user address space The virtual address is extended with the contents of the 8 bit ASID field to form a unique virtual address This mapped space starts at virtual address O0x0000 0000 and runs through Ox7FFF FFFF 32 bit Supervisor Mode Supervisor Space sseg In Supervisor mode when Status SX 0 and the three most significant bits of the 32 bit vi
374. s not exist or has failed interpretation of the coprocessor in struction is possible e If the BD bit is set in the Cause register the branch instruction must be interpreted then the coprocessor instruction can be emulated and execution resumed with the EPC register advanced past the coproces sor instruction e Ifthe process is not entitled access to the coprocessor the process ex ecuting at the time is handed an illegal instruction privileged instruc tion fault signal This error is usually fatal CPU Exception Processing Chapter 5 Floating Point Exception This sections explains the Floating Point exception Cause The Floating Point exception is used by the floating point coprocessor This exception is not maskable Processing The common exception vector is used for this exception and the FPE code in the Cause register is set The contents of the Floating Point Control Status register indicate the cause of this exception Floating Point exception processing is shown in Figure 5 15 on page 13 Servicing This exception is cleared by clearing the appropriate bit in the Floating Point Control Status register For an unimplemented instruction exception the kernel should emulate the instruction for other exceptions the kernel should pass the exception to the user program that caused the exception CPU Exception Processing Chapter 5 Interrupt Exception This sections explains the Interrupt exception
375. sented by the virtual page number VPN is compared VPN with tag in TLB If there is a match the page frame ee fraso VPN number PFN representing the upper bits of the physical address PA is output from the TLB TLB 3 The Offset which does not pass through y E i i ena j j a Physical address Figure 4 1 Overview of a Virtual to Physical Address Translation As shown in Figure 4 2 and Figure 4 3 the virtual address is extended with an 8 bit address space identifier ASID which reduces the frequency of TLB flushing when switching contexts This 8 bit ASID is in the CPO EntryHi register described later in this chapter The Global bit G is in the EntryLo0O and EntryLo1 registers described later in this chapter Physical Address Space Using a 36 bit address the processor physical address space encompasses 64Gigabytes The section following describes the translation of a virtual address to a physical address Virtual to Physical Address Translation Converting a virtual address to a physical address begins by comparing the virtual address from the processor with the virtual address in the TLB there is a match when the virtual page number VPN of the address is the same as the VPN field of the entry and either e the Global G bit of the TLB entry is set or e the ASID field of the virtual address is the same as the ASID field of the TLB entry This match is referred to a
376. software interrupts Cause Register 31 30 29 28 27 16 15 BD 0 CE N 12 Figure 5 6 Cause Register Format Field Description BD Indicates whether the last exception taken occurred in a branch delay slot 1 gt delay slot 0 gt normal CE Coprocessor unit number referenced when a Coprocessor Unusable excep tion is taken IP Indicates an interrupt is pending 1 gt interrupt pending O gt no interrupt ExcCode Exception code field see Table 5 5 on page 5 8 0 Reserved Must be written as zeroes and returns zeroes when read Table 5 4 Cause Register Fields CPU Exception Processing Chapter 5 Exception Mnemonic Description Code Value 0 Int Interrupt 1 Mod TLB modification exception 2 TLBL TLB exception load or instruction fetch 3 TLBS TLB exception store 4 AdEL Address error exception load or instruction fetch 5 AdES Address error exception store 6 IBE Bus error exception instruction fetch 7 DBE Bus error exception data reference load or store 8 Sys Syscall exception 9 Bp Breakpoint exception 10 RI Reserved instruction exception 11 CpU Coprocessor Unusable exception 12 Ov Arithmetic Overflow exception 13 Tr Trap exception 14 Reserved 15 FPE Floating Point exception 16 31 Reserved Table 5 5 Cause Register ExcCode Field Exception Program Counter EPC Register
377. sor 36 bit Physical Address memory or kernel address spaces y PFN Offset Virtual to physical Offset to pt passed translation in val unchanged to TLB physical oe memory TN 0 Offset 8 24 8 bits 256 pages Virtual Address with 256 2 16 Mbyte pages Figure 4 2 32 bit Virtual Address Translation 64 bit Virtual Address Translation Figure 4 3 on page 4 shows the virtual to physical address translation of a 64 bit virtual address This figure illustrates the two extremes in the range of possible page sizes a 4Kbyte page 12 bits and a 16Mbyte page 24 bits e The top portion of Figure 4 3 shows a virtual address with a 12 bit or 4Kbyte page size labelled Offset The remaining 28 bits of the address represent the VPN and index the 256M entry page table e The bottom portion of Figure 4 3 shows a virtual address with a 24 bit or 16Mbyte page size labelled Offset The remaining 16 bits of the address represent the VPN and index the 64K entry page table Memory Management Chapter 4 Virtual Address with 256M 228 4 Kbyte pages 64 636261 40 39 28 bits 256M pages 12 11 VPN Offset 24 12 LAN J Offset passed Virtual to physical TLB SEE to translation in TLB physical Bits 62 and 63 of the virtual 36 bit Physical Address memory address select user supervisor 35 or kernel address spaces PFN Offset Virtual to phys
378. sor has previously performed an uncompelled change to slave state the external agent can begin an external request immediately After issuing an external request the external agent must return the system interface to master state If the external agent does not have any additional external requests to perform ExtRqst must be deasserted two cycles after the cycle in which Release was asserted For a string of external requests the ExtRqst signal is asserted until the last request cycle whereupon it is deasserted two cycles after the cycle in which Release was asserted The processor continues to handle external requests as long as ExtRqst is asserted however the processor cannot release the system interface to slave state for a subsequent external request until it has completed the current request As long as ExtRqpst is asserted the string of external requests is not interrupted by a processor request This section describes the following external request protocols e read e null e write e read response 12 23 System Interface SCycle SClock SysAD Bus Validin ExtRast Release RdRdy DE j SysCmd Bus Cmd YNEoD Chapter 12 External Arbitration Protocol System interface arbitration uses the signals ExtRqst and Release as described above Figure 12 20 is a timing diagram of the arbitration protocol in which slave and master states are shown The arbitration cycle consists of the
379. sors from accessing shared data when the processor is in a critical section a part of program in which no more than a fixed number of processors is allowed to execute In the case of test and set only one processor can enter the critical section 11 8 Cache Organization Operation and Coherency Chapter 11 Figure 11 7 illustrates a test and set synchronization procedure that uses a semaphore when the semaphore is set to O the shared data is unlocked and when the semaphore is set to 1 the shared data is locked 1 Load semaphore 2 Unlocked 0 Yes 3 Try locking semaphore Yes 5 Execute critical section Access shared data 6 Unlock semaphore Continue processing Figure 11 7 Synchronization with Test and Set The processor begins by loading the semaphore and checking to see if it is unlocked set to 0 in steps 1 and 2 If the semaphore is not O the processor loops back to step 1 If the semaphore is O indicating the shared data is not locked the processor next tries to lock out any other access to the shared data step 3 If not successful the processor loops back to step 1 and reloads the semaphore If the processor is successful at setting the semaphore step 4 it executes the critical section of code step 5 and gains access to the shared data completes its task unlocks the semaphore step 6 and continues processing Counter Another common synchronization technique uses a counter A counter
380. ss sequence has been interrupted by resetting the processor state machines Initialization Interface Chapter 9 Initialization Sequence The boot mode initialization sequence begins immediately after VCCOk is asserted As the processor reads the serial stream of 256 bits through the Modeln pin the boot mode bits initialize all fundamental processor modes The signals used are described in Chapter 8 This is the initialization sequence 1 The system deasserts the VCCOk signal The ModeClock output is held asserted 2 The processor synchronizes the ModeClock output at the time VCCOk is asserted The first rising edge of ModeClock occurs at least 256 MasterClock cycles after VCCOk is asserted There could be more clock cycles due to internal delays on the VccOK signal After the first rising edge each additional rising edge will be 256 master clock cycles 3 Each bit of the initialization stream is presented at the Modeln pin after each rising edge of the ModeClock The processor samples 256 initialization bits from the Modeln input MasterClock MClk TDS m y gt 100ms 256 VCCOK _ 256 MCIk cycles MCIk Seles ModeClock_ aS TMDS gt a TMDH 7 ia oY 1 __ Bit ModelIn _ Bit 0 X Bit 1 j 2s ColdReset _ gt 64K MCIk cycles Reset Masterout_ Undefined recs a awe Syncout_ Undefined La UE TClock _ Undefined RClock _ Undefi
381. ssor 3 Opcode Bit Encoding LDCz Bit 31 30 29 28 27 26 0 Lbci 1 1 OJ 1 0 1 Bit 31 30 29 28 27 26 0 Lpc2 1 1 0 1 1 0 NY Opcode Coprocessor Unit Number CPU Instruction Set Details Appendix A LDL Load Doubleword Left LDL 31 26 25 21 20 1615 0 LDL 011010 base rt offset 6 5 5 16 Format LDL rt offset base Description This instruction can be used in combination with the LDR instruction to load a register with eight consecutive bytes from memory when the bytes cross a doubleword boundary LDL loads the left portion of the register with the appropriate part of the high order doubleword LDR loads the right portion of the register with the appropriate part of the low order doubleword The LDL instruction adds its sign extended 16 bit offset to the contents of general register base to form a virtual address which can specify an arbitrary byte It reads bytes only from the doubleword in memory which contains the specified starting byte From one to eight bytes will be loaded depending on the starting byte specified Conceptually it starts at the specified byte in memory and loads that byte into the high order left most byte of the register then it loads bytes from memory into the register until it reaches the low order byte of the doubleword in memory The least significant right most byte s of the register will not
382. ssor Error Cache Cache System Interface Checked Generated Checked Trap on From Primary From Primary Data Check Bits Trap on Error Error Cache Cache Table 14 1 Error Checking and Correcting Summary for Internal Transactions Read Bus Request Write Request Processor Data NA NA System Interface Address Command and Check Bits Trans Generated NA mit System Interface Address Command and Check Bits Receive Not Checked Not Checked System Interface Data From Processor Checked Trap on Error System Interface Data Check Bits Generated Checked Trap on Error Table 14 2 Error Checking and Correcting Summary for External Transactions 14 3 CPU Instruction Set Appendix A Details Integrated Device Technology Inc Introduction This appendix provides a detailed description of the operation of each R4600 R4700 instruction The instructions are listed in alphabetical order Exceptions that may occur due to the execution of each instruction are listed after the description of each instruction Descriptions of the immediate cause and manner of handling exceptions are omitted from the instruction descriptions in this appendix Figures at the end of this appendix list the bit encoding for the constant fields of each instruction and the bit encoding for each individual instruction is included with that instruction Instruction Classes CPU instructions are divided int
383. ssor units or classes and these coprocessors have two register spaces each space containing thirty two registers These registers may be either 32 bits or 64 bits wide e The first space coprocessor general registers may be directly loaded from memory and stored into memory and their contents may be transferred between the coprocessor and processor e The second space coprocessor control registers may only have their contents transferred directly between the coprocessor and the proces sor Coprocessor instructions may alter registers in either space System Control Coprocessor CPO Instructions There are some special limitations imposed on operations involving CPO that is incorporated within the CPU The move to from coprocessor instructions are the only valid mechanism for writing to and reading from the CPO registers Several CPO instructions are defined to directly read write and probe TLB entries and to modify the operating modes in preparation for returning to User mode or interrupt enabled states CPU Instruction Set Details Appendix A ADD Add ADD 31 26 25 21 20 16 15 11 10 6 5 0 SPECIAL e rt rd 0 ADD 000000 00000 100000 6 5 5 5 5 6 Format ADD rd rs rt Description The contents of general register rs and the contents of general register rt are added to form the result The result is placed into general register rd The operands must be valid sign extended 32 bit valu
384. struction Set Details Appendix B Store Doubleword from FPU SDC1 Coprocessor 1 SDC1 31 26 25 21 20 16 15 0 SDC1 base ft offset 5 5 16 Format SDC1 ft offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form an unsigned effective address When FR 0 the contents of registers ft and ft 1 from the floating point coprocessor are stored at the memory location specified by the effective address This instruction is not valid and is undefined when the least significant bit of ft is non zero When FR 1 the 64 bit register ft is stored to the contents of the doubleword at the memory location specified by the effective address The FR bit of the Status register SRog specifies whether all 32 registers of the R4600 are addressable When FR equals zero this instruction is not defined if the least significant bit of ftis non zero If FR equals one ft may specify either odd or even registers If any of the three least significant bits of the effective address are non zero an address error exception takes place Operation vAddr lt offset 5 16 offset 5 9 GPRIbase pAddr uncached AddressTranslation vAddr DATA if SRoa6 data CPR 1 ft else data CPR 1 ft I 0 endif StoreMemory uncached DOUBLEWORD data pAddr vAddr DATA Exceptions Coprocessor unusable TLB refill exception TLB invali
385. t DDIVU rs rt Description The contents of general register rs are divided by the contents of general register rt treating both operands as unsigned values No integer overflow exception occurs under any circumstances and the result of this operation is undefined when the divisor is zero This instruction is typically followed by additional instructions to check for a zero divisor When the operation completes the quotient word of the double result is loaded into special register LO and the remainder word of the double result is loaded into special register HI If either of the two preceding instructions is MFHI or MFLO the results of those instructions are undefined Correct operation requires separating reads of HI or LO from writes by two or more instructions Operation T 2 LO lt undefined HI lt undefined T 1 LO lt undefined HI lt undefined T LO lt 0 GPRirs div 0 GPRIrt HI lt 0 GPRi rs mod 0 GPR rt Exceptions None CPU Instruction Set Details Appendix A DIV Divide DIV 31 26 25 21 20 16 15 6 5 0 SPECIAL rs rt 0 DIV 000000 00 0000 0000 011010 6 5 5 10 6 Format DIV rs rt Description The contents of general register rs are divided by the contents of general register rt treating both operands as 2 s complement values No overflow exception occurs under any circumstances and the result of this operation is undefined when t
386. t Syncln SysADC 7 0 SysCmd 8 0 SysCmdP ValidIn ValidOut ExtRqst IOOut Release IOln R4600 RdRdy Fault R4700 WrRdy Clock Control Interface System Interface VccP Logic VssP Symbol ModeClock ModelIN JTDO ColdReset JTMS Reset JIG Int 5 0 Initialization Interface Interface JTAG Interrupt Interface Processor Signal Descriptions Chapter 8 System Interface Signals System interface signals provide the connection between the R4600 R4700 processor and the other components in the system Table 8 1 lists the system interface signals Name Definition Direction Description ExtRgqst External request Input An external agent asserts ExtRqst to request use of the System interface The pro cessor grants the request by asserting Release Release Release interface Output In response to the assertion of ExtRqst or a CPU read request the processor asserts Release signalling to the requesting device that the System interface is available RdRdy Read ready Input The external agent asserts RdRdy to indi cate that it can accept a processor read request SysAD 63 0 System address Input A 64 bit address and data bus for communi data bus Output cation between the processor and an external agent SysADC 7 0 System address Input An 8 bit bus containing check bits for the data check bus Output SysAD bus SysCmd 8 0 System com Input A 9
387. t SysCmd 6 Response Data Indication 0 Data is response data 1 Data is not response data SysCmd 5 Good Data Indication 0 Data is error free 1 Data is erroneous SysCmd 4 3 Reserved Table 12 14 Processor Data Identifier Encoding of SysCmd 7 3 12 37 System Interface Chapter 12 Table 12 15 lists the encoding of SysCmd 7 3 for external data identi fiers SysCmd 7 Last Data Element Indication 0 Last data element Not the last data element SysCmd 6 Response Data Indication 0 Data is response data 1 Data is not response data SysCmd 5 Good Data Indication o Data is error free 1 Data is erroneous SysCmd 4 Data Checking Enable 0 Check the data and check bits 1 Do not check the data and check bits SysCmd 3 Reserved Table 12 15 External Data Identifier Encoding of SysCmd 7 3 System Interface Addresses System interface addresses are full 36 bit physical addresses presented on the least significant 36 bits bits 35 through 0 of the SysAD bus during address cycles the remaining bits of the SysAD bus are unused during address cycles Addressing Conventions Addresses associated with doubleword partial doubleword word or partial word transactions are aligned for the size of the data element The system uses the following address conventions e Addresses associated with block requests are aligned to double word boundaries
388. t 15 the sign bit of an immediate value is extended for 16 bit positions and the result is concatenated with bits 15 through 0 of the immediate value to form a 32 bit sign extended value FPU Instruction Set Details Appendix B Load and Store Instructions In the R4600 implementation the instruction immediately following a load may use the contents of the register being loaded In such cases the hardware interlocks requiring additional real cycles so scheduling load delay slots is still desirable although not required for functional code The behavior of the load store instructions is dependent on the width of the FGRs When the FR bit in the Status register equals zero the Floating Point General registers FGRs are 32 bits wide e When the FR bit in the Status register equals one the Floating Point General registers FGRs are 64 bits wide In the load and store operation descriptions the functions listed in Table B 3 are used to summarize the handling of virtual addresses and physical memory Function Meaning AddressTranslation Uses the TLB to find the physical address given the virtual address The function fails and an exception is taken if the required translation is not present in the TLB LoadMemory Uses the cache and main memory to find the contents of the word containing the specified physical address The low order two bits of the address and the Access Type field indicates which of each of the
389. t Floating point Compare BC1T Branch on FPU True BC1F Branch on FPU False BCITL Branch on FPU True Likely BC1FL Branch on FPU False Likely Table 6 12 FPU Instruction Summary Compare and Branch Instructions Floating Point Load Store and Move Instructions This section discusses the manner in which the FPU uses the load store and move instructions listed in Table 6 9 on page 10 Appendix B provides a detailed description of each instruction Transfers Between FPU and Memory All data movement between the FPU and memory is accomplished by using one of the following instructions e Load Word To Coprocessor 1 LWC1 or Store Word To Coprocessor 1 SWC 1 instructions which reference a single 32 bit word of the FPU general registers e Load Doubleword LDC1 or Store Doubleword SDC1 instructions which reference a 64 bit doubleword These load and store operations are unformatted no format conversions are performed and therefore no floating point exceptions can occur due to these operations With the LDC1 and SDC1 instructions the R4600 R4700 floating point unit can take advantage of the 64 bit wide data cache and issue a coprocessor load or store double word instruction with every cycle Transfers Between FPU and CPU Data can also be moved directly between the FPU and the CPU by using one of the following instructions Move To Coprocessor 1 MTC1 e Move From Coprocessor 1 MFC1 e Doubleword Move To Coprocessor 1 DMTC
390. t SysCmd Bus Bit Definition Data Identifier SysCmd Bus Bit Definition Retrieving a Data Block in Sequential Order Retrieving Data in a Subblock Order Interrupt Register Bits and Enables R4600 R4700 Interrupt Signals R4600 R4700 Nonmaskable Interrupt Signal Masking of the R4600 R4700 Interrupts CPU Instruction Formats Load and Store Instruction Format Computational Instruction Format Bit Encoding for FPU Instructions Page 2 24 12 25 12 26 12 27 12 28 12 29 12 29 12 30 12 33 12 33 12 34 12 36 12 36 12 39 12 39 13 1 13 2 13 2 13 3 A 2 B 5 B 6 B 45 xviii List of Tables dt Integrated Device Technology Inc Number Table Title Page Table 1 1 CPU Instruction Set Load and Store Instructions 1 7 Table 1 2 CPU Instruction Set Arithmetic Instructions ALU Immediate 1 7 Table 1 3 CPU Instruction Set Arithmetic 3 Operand R Type 1 8 Table 1 4 CPU Instruction Set Multiply and Divide Instructions 1 8 Table 1 5 CPU Instruction Set Jump and Branch Instruction 1 8 Table 1 6 CPU Instruction Set Shift Instructions 1 9 Table 1 7 Instruction Set Coprocessor Instructions 1 9 Table 1 8 CPU Instruction Set Special Instructions 1 9 Table 1 9 MIPS 2 MIPS 3 Additional Load and Store Instructions 1 10 Table 1 10 MIPS 2 MIPS 3 Additional Arithmetic Instructions ALU Immediate 1 10 Table 1 11 MIPS 2 MIPS 3 Additional Multiply and Divide Instructions Table 1 12 MIPS 2 MIPS 3 Additional Br
391. t on first doubleword send sub sequent doublewords to response buffer Instruction Tag 2 bit cache state 1 bit cache state Cache miss overhead 5 8 cycles 3 cycles Instruction cache parity 1 parity bit per 8 data bits 1 parity bit per 32 data bits Data cache parity 1 parity bit per 8 data bits same Table 1 21 Cache Comparison Between R4400 PC and R4600 R4700 Overview Chapter 1 Item R4400 PC R4600 R4700 Instruction virtual 2 entry ITLB same address translation ITLB miss 1 cycle penalty refilled from JTLB LRU replacement 1 cycle on branch jump and ERET 2 cycles otherwise refilled from JTLB LRU replacement Data virtual address done directly in JTLB 4 entry DTLB translation DTLB miss n a 1 cycle penalty refilled from JTLB pseudo LRU replacement JTLB 48 entries of even odd page pairs fully same associative Page size 4KB 16KB 16MB same Multiple entry match in JTLB sets TS in Status and disables TLB until Reset to prevent damage no damage for multiple match no detection or shutdown implemented Virtual address size Physical address size VSIZE 40 PSIZE 36 same same Table 1 22 TLB Comparison Between R4400 PC and R4600 R4700 Item R4400 PC R4600 R4700 ALU latency 1 cycle 1 cycle Load latency 3 cycles 2 cycles Branch latency Store buffer not write buffer 4
392. t to the aligned doubleword this may be a byte halfword tribyte quadbyte word quintibyte sextibyte septibyte or an octalbyte doubleword For example in little endian mode on a byte request where the address modulo 8 is 0 SySAD 7 0 are valid during the data cycles Table 12 19 shows the byte lanes used for partial word transfers for both little and big endian Bytes Address SysAD byte lanes used big endian SysCmd 2 0 Mod 8 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0 0 1 2 1 3 000 4 5 6 7 0 2 2 001 4 6 0 3 1 010 4 e 5 ty 011 4 e eee 100 3 e e 101 2 e 7 0 110 1 e e 8 111 0 e e 7 0 15 8 23 16 31 24 39 32 47 40 55 48 63 56 SysAD byte lanes used little endian Table 12 19 Partial Word Transfer Byte Lane Usage 12 41 System Interface Chapter 12 Processor Internal Address Map External reads and writes provide access to processor internal resources that may be of interest to an external agent The processor decodes bits SysAD 6 0 of the address associated with an external read or write request to determine which processor internal resource is the target However the R4600 R4700 does not contain any resources that are readable through an extern
393. ta 12 16 System Interface Chapter 12 Instruction Read Latency Steps for System Clock The read latency for a system clock in the divide by two mode is as follows 1 The startup overhead is one to two pipeline cycles PCycle for the CPU to transfer the address to the pads to be output The second PCycle is needed if the miss is detected on a PCycle not aligned with the rising edge of SClock 2 The CPU drives the address on the SysAD bus for two PCycles 3 The CPU tri states the SysAD bus for two PCycles 4 The CPU waits for the main memory to return the data This is expressed as n x 2 PCycles 5 The first double word is driven in the SysAD from the main memory for two PCycles 6 The remaining three double words of instruction are driven on SysAD for 3 2 PCycles Notes on the Instruction Read Latency Steps a For instruction misses the pipeline starts after all the instructions are returned b n is the total number of idle cycles even between double word instruction For zero wait state systems n 0 Example of Instruction Block Read With Zero Wait State The following example shows an instruction block read with a zero wait state StepDescriptionPCycles 1 CPU overhead for cache miss detection 1 2 2 Address driven on SysAD bus 2 3 SysAD bus tri stated 2 4 Memory latency to return the data 0 2 5 First double word driven on SysAD bus 2 6 Remaining three instructions returned 2 3 6 Total PCycles 13
394. ta and 32 bit address 64 bit data 32 bit address data enables applications to be migrated without cleaning up some software Overview Chapter 1 The R4600 R4700 offers high performance large caches and MMU and FPA functions to these systems For desktop systems the R4600 R4700 supports a full migration to 64 bit allowing 64 bit systems to execute true 64 bit or older 32 bit applications For embedded applications the power and bandwidth of 64 bit data types can be used without the memory expansion of 64 bit addressing The list on the following page summarizes the R4600 R4700 features For a feature by feature comparison with the R4000 refer to the tables beginning on page 9 23 Overview Chapter 1 Features True 64 bit microprocessor 64 bit integer operations 64 bit floating point operations 64 bit registers 64 bit virtual address space High performance microprocessor For R4600 133 peak MIPS at 133MHz For R4700 175 peak MIPS at 175MHz For R4600 44 peak MFLOP s at 133MHz For R4700 87 peak MFLOP s at 175MH For R4600 109 SPECint92 and 83 SPECfp92 at 150Mz For R4700 132 SPECint92 and 94 SPECfp92 at 175Mz Large two way set associative caches on chip Improved FPA multiply performance R4700 only 1 mul 1 add every 4 clock cycles High level of integration 64 bit integer CPU 64 bit floating point unit 16KB instruction cache 16KB data cache Flexible MMU with large
395. ta to be written either to main memory or to another system resource Processor requests are managed by the processor in the equivalent of the R4000 R4400 no secondary cache mode In no secondary cache mode the processor issues requests in a strict sequential fashion that is the processor is only allowed to have one request pending at any time For example the processor issues a read request and waits for a read response before issuing any subsequent requests The processor submits a write request only if there are no read requests pending The processor has the input signals RdRdy and WrRdy to allow an external agent to manage the flow of processor requests RdRdy controls the flow of processor read requests while WrRdy controls the flow of processor write requests The processor request cycle sequence is shown in Figure 12 8 12 7 System Interface Chapter 12 R4600 R4700 External Agent 1 Processor issues read or write request 2 External system controls acceptance of requests by asserting RdRdy or WrRdy Figure 12 8 Processor Request Processor Read Request When a processor issues a read request the external agent must access the specified resource and return the requested data Processor read requests are described in this section external read requests are described in External Requests on page 12 9 A processor read request can be split from the external agent s return of the requested
396. tails LDCz Load Doubleword To Coprocessor Appendix A LDCz 31 26 25 21 20 16 15 0 LDCz base rt offset 1101xx 6 5 5 16 Format LDCz rt offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form a virtual address The processor reads a doubleword from the addressed memory location and makes the data available to coprocessor unit z The manner in which each coprocessor uses the data is defined by the individual coprocessor specifications If any of the three least significant bits of the effective address are non zero an address error exception takes place This instruction is not valid for use with CPO This instruction is undefined when the least significant bit of the rt field is non zero Execution of the instruction referencing coprocessor 3 causes a reserved instruction exception not a coprocessor unusable exception NOTE See the table Opcode Bit Encoding on next page or CPU Instruction Opcode Bit Encoding at the end of Appendix A Operation vAddr lt offset s offset s 9 GPR base pAddr uncached AddressTranslation vAddr DATA mem lt LoadMemory uncached DOUBLEWORD pAddr vAddr DATA COPzLD rt mem Exceptions TLB refill exception TLB invalid exception Bus error exception Address error exception Coprocessor unusable exception Reserved instruction exception coproce
397. tate a processor read request is issued by driving a read command on the SysCmd bus and a read address on the SysAD bus 3 At the same time the processor asserts ValidOut for one cycle indicating valid data is present on the SysCmd and the SysAD buses Note Only one processor read request can be pending at a time 4 The processor makes an uncompelled change to slave state at the issue cycle of the read request by asserting the Release signal for one cycle Note The external agent must not assert the signal ExtRqast for the purposes of returning a read response but rather must wait for the uncompelled change to slave state The signal ExtRqst can be asserted before or during a read response to perform an external request other than a read response 5 The processor releases the SysCmd and the SysAD buses one SCycle after the assertion of Release 6 The external agent drives the SysCmd and the SysAD buses within two cycles after the assertion of Release Once in slave state starting at cycle 5 in Figure 12 12 the external agent can return the requested data through a read response The read response can return the requested data or if the requested data could not be successfully retrieved an indication that the returned data is erro neous If the returned data is erroneous the processor takes a bus error exception Note The R4600 R4700 only check the error bit for the first doubleword of read response data all other err
398. terface Chapter 12 Load Linked Store Conditional Operation Generally the execution of a Load Linked Store Conditional instruction sequence is not visible at the system interface that is no special requests are generated due to the execution of this instruction sequence There is however one situation in which the execution of a Load Linked Store Conditional instruction sequence is visible as indicated by the link address retained bit during a processor read request as programmed by the SysCmd 2 bit This situation occurs when the data location targeted by a Load Linked Store Conditional instruction sequence maps to the same cache line to which the instruction area containing the Load Linked Store Conditional code sequence is mapped In this case immediately after executing the Load Linked instruction the cache line that contains the link location is replaced by the instruction line containing the code The link address is kept in a register separate from the cache and remains active as long as the link bit set by the Load Linked instruction is set The link bit which is set by the load linked instruction is cleared by a change of cache state for the line containing the link address or by a Return From Exception For more information refer to Chapter 11 or see the specific Load Linked and Store Conditional instructions described in Appendix A Processor and External Request Protocols The following sections contain a cycle b
399. ternal agent issues a write request the specified resource is accessed and the data is written to it An external write request is complete after the word of data has been transmitted to the processor The only processor resource available to an external write request is the IP field of the Cause register Read Response A read response returns data in response to a processor read request as shown in Figure 12 11 While a read response is technically an external request it has one characteristic that differentiates it from all other external requests it does not perform system interface arbitration For this reason read responses are handled separately from all other external requests and are simply called read responses When a read response comes back with bad parity for the first datum a cache error exception results 12 10 System Interface Chapter 12 R4600 R4700 External Agent 1 Read request 2 Read response Figure 12 11 Read Response Handling Requests This section details the sequence protocol and syntax see Termi nology on page 12 1 for definitions of these terms of both processor and external requests The following system events are discussed e load miss no secondary cache mode store miss no secondary cache mode store hit uncached loads stores CACHE operations load linked store conditional Load Miss When a processor load misses in the primary cache before the processor can pro
400. that is the low order 3 bits of address are 0 Doubleword requests set the low order 3 bits of address to 0 Word requests set the low order 2 bits of address to 0 Halfword requests set the low order bit of address to 0 Byte tribyte quintibyte sextibyte and septibyte requests use the byte address Subblock Ordering The order in which data is returned in response to a processor block read request is subblock ordering In subblock ordering the processor delivers the address of the requested doubleword within the block An external agent must return the block of data using subblock ordering starting with the addressed doubleword A block of data elements whether bytes halfwords words or double words can be retrieved from storage in two ways in sequential order or using a subblock order This section describes these retrieval methods with an emphasis on subblock ordering Note that the R4600 R4700 only uses subblock ordering for block reads 12 38 System Interface Chapter 12 Example of Sequential Ordering Sequential ordering retrieves the data elements of a block in serial or sequential order Figure 12 33 shows a sequential order in which DWO is taken first and DWS3 is taken last DWO DWO taken first DW3 taken fourth DW1 taken second DW2 taken third Figure 12 33 Retrieving a Data Block in Sequential Order Example of Subblock Ordering Subblock ordering allows the syste
401. the setup and hold requirements for the rising edge of the SClock that is identified in the preceding and following figures with an asterisk qa Master p lt Slave gt e Master gt iP Be ee lea easel es ee heel oe 10 11 12 SULCPOCe Se aWre nese ae ae y Addr Datao Read NEOD er Figure 12 14 Processor Read Cycle Processor Write Request Protocol Processor write requests are issued using one of two protocols Doubleword partial doubleword word or partial word writes use a word write request protocol e Block writes use a block write request protocol Processor word write requests are issued with the system interface in master state as described in the following steps Figure 12 15 shows a processor noncoherent word write request cycle 1 A processor single word write request is issued by driving a write command on the SysCmd bus and a write address on the SysAD bus 2 The processor asserts ValidOut 3 The processor drives a data identifier on the SysCmd bus and data on the SysAD bus 4 The data identifier associated with the data cycle must contain a last data cycle indication At the end of the cycle ValidOut is deasserted Note Timings for the SysADC and SysCmdP buses are the same as those of the SysAD and SysCmd buses respectively Called word to distinguish it from block request protocol Data
402. there are exceptions For example the processor retains a copy of the cache line if a cache line is written back by the Hit Write back cache instruction If the W bit is set the cache line is written back and the W bit is cleared The processor signals this line retention during a write by setting SysCmd 2 to a 1 as described in Chapter 12 Cache State Transition Diagrams The following sections describe the cache state diagrams that illustrate the cache state transitions for the primary cache Figure 11 6 shows the state diagram of the primary cache When an external agent supplies a cache line it need not return the initial state of the cache line for normal operations see Chapter 12 for a definition of an external agent This is because the only read request the R4600 R4700 should issue are for non coherent data and the lower three bits for the data identifier are reserved The initial state will automatically be set to DE by the R4600 R4700 Otherwise the processor changes the state of the cache line during one of the following events e A store to a dirty exclusive line remains in a dirty exclusive state e The state is changed to invalid for A Cache invalidate operation If the line is replaced Write hit Index Invalidate Read hit Hit Invalidate Figure 11 6 Primary Data Cache State Diagram Cache Coherency Overview Systems using more than one master must have a mechanism to maintain data consistency throughout the s
403. tion Revision Register 31 FCR31 0 31 FCRO 0 Figure 6 2 FPU Registers Floating Point Registers The FPU provides e 16 Floating Point registers FPRs for Status FR O or 32 Floating Point registers FPRs for Status FR 1 These 64 bit registers hold floating point values during floating point operations and are physically formed from the General Purpose registers FGRs When the FR bit in the Status register equals 1 the FPR references a single 64 bit FGR The FPRs hold values in either single or double precision floating point format If the FR bit equals O only even numbers the least register as shown in Figure 6 2 can be used to address FPRs When the FR bit is set to a 1 all FPR register numbers are valid If the FR bit equals O during a double precision floating point operation the general registers are accessed in double pairs Thus in a double precision operation selecting Floating Point Register O FPRO actually addresses adjacent Floating Point General Purpose registers FGRO and FGR1 Floating Point Control Registers The FPU has 32 control registers FCRs that can only be accessed by move operations The FCRs are described below The Implementation Revision register FCRO holds revision informa tion about the FPU e The Control Status register FCR31 controls and monitors excep tions holds the result of compare operations and establishes round ing modes e FCRI to FCR30 are reserve
404. titute value for the exception causing result of the floating point operation The particular default action taken depends upon the type of exception Table 7 1 lists the default action taken by the FPU for each of the IEEE exceptions Field Description Rounding Default action Mode I Inexact exception Any Supply a rounded result U Underflow exception Any Take unimplemented unless FCSR FS bit is set O Overflow exception RN Modify overflow values to with the sign of the intermediate result RZ Modify overflow values to the format s largest finite number with the sign of the intermediate result RP Modify negative overflows to the format s most nega tive finite number modify positive overflows to RM Modify positive overflows to the format s largest finite number modify negative overflows to Z Division by zero Any Supply a properly signed V Invalid operation Any Supply a quiet Not a Number NaN Table 7 1 Default FPU Exception Actions The FPU detects the eight exception causes internally When the FPU encounters one of these unusual situations it causes either an IEEE exception or an Unimplemented Operation exception E Floating Point Exceptions Chapter 7 lists the exception causing conditions of the IEEE Standard 754 FPA Internal IEEE Trap Trap Notes Result Standard 754 Enable Disable Inexact result I I I Loss of a
405. to FPU LDC1 Coprocessor 1 LDC1 31 26 25 21 20 16 15 0 LD 110 C i 1 01 base ft offset 6 Format LDC1 ft offset base Description The 16 bit offset is sign extended and added to the contents of general register base to form an unsigned effective address When FR 0 the contents of the doubleword at the memory location specified by the effective address is loaded into registers ft and ft 1 of the floating point coprocessor This instruction is not valid and is undefined when the least significant bit of ft is non zero When FR 1 the contents of the doubleword at the memory location specified by the effective address are loaded into the 64 bit register ft of the floating point coprocessor The FR bit of the Status register SRg specifies whether all 32 registers of the R4600 are addressable If FR equals zero this instruction is not defined when the least significant bit of ftis non zero If FR equals one ft may specify either odd or even registers If any of the three least significant bits of the effective address are non zero an address error exception takes place Operation vAddr lt offset 5 48 1 offsety5 0 GPR base pAddr uncached AddressTranslation vAddr DATA data LoadMemory uncached DOUBLEWORD pAddr vAddr DATA if SRo6 then CPR 1 ft lt data else CPR 1 ft 0 data endif Exceptions Coprocessor unusable TLB r
406. tput signal MasterOut at the same frequency as MasterClock and aligns MasterOut with MasterClock if Syncin is properly connected to SyncOut 10 1 Clock Interface Chapter 10 MasterOut The processor generates the clock output signal MasterOut at the same frequency as MasterClock and aligns MasterOut with MasterClock if Syncin is properly connected to SyncOut MasterOut clocks certain external logic such as the reset logic SynclIn SyncOut The processor generates SyncOut at the same frequency as MasterClock and aligns Syncin with MasterClock SyncOut must be connected to Syncin either directly or through an external buffer The processor can compensate for both output driver and input buffer delays and when necessary delay caused by an external buffer according to the connections of TClock and RClock to the rest of the system when aligning Syncin with MasterClock Figure 10 8 on page 10 9 gives an illustration of SyncOut connected to Syncin through an external buffer PClock The processor generates an internal clock PClock at twice the frequency of MasterClock and precisely aligns every other rising edge of PClock with the rising edge of MasterClock All internal registers and latches use PClock which is the pipeline clock rate SClock The R4600 R4700 processor divides PClock by 2 3 4 5 6 7 or 8 programmed at boot mode initialization to generate the internal clock signal SClock The processor uses SClo
407. trol Coprocessor registers to manipulate the memory management and exception handling facilities of the processor Appendix A contains details of the CPO instructions Integrated Device Technology The CPU Pipeline Chapter 3 Inc Introduction This chapter describes the basic operation of the CPU pipeline which includes descriptions of the delay instructions instructions that follow a branch or load instruction in the pipeline interruptions to the pipeline flow caused by interlocks and exceptions and R4600 R4700 implementation of an uncached store buffer The FPU pipeline is described in a later chapter CPU Pipeline Operation The R4600 R4700 uses a 5 stage pipeline similar to the R3000 The simplicity of this pipeline allows the R4600 R4700 to be lower cost and lower power than super scalar or super pipelined processors Unlike the R3000 the R4600 R4700 does virtual to physical translation in parallel with cache access This allows the R4600 R4700 to operate at over twice the frequency of the R3000 and to support a larger TLB for address translation Compared to the 8 stage R4000 pipeline the R4600 R4700 is more efficient requires fewer stalls Once the pipeline has been filled five instructions are executed simultaneously Figure 3 1 shows the five stages of the instruction pipeline the next section describes the pipeline stages Figure Legend
408. ts Each format contains a number of different instructions which are described further in this chapter Fields of the instruction formats are described in Chapter 2 Instruction decoding is simplified by limiting the number of formats to these three This limitation means that the more complicated and less frequently used operations and addressing modes can be synthesized by the compiler using sequences of these same simple instructions The instruction set can be further divided into the following groupings e Load and Store instructions move data between memory and general registers They are all immediate I type instructions since the only addressing mode supported is base register plus 16 bit signed imme diate offset e Computational instructions perform arithmetic logical shift multi ply and divide operations on values in registers They include register R type in which both the operands and the result are stored in reg isters and immediate I type in which one operand is a 16 bit imme diate value formats Jump and Branch instructions change the control flow of a program Jumps are always made to a paged absolute address formed by com bining a 26 bit target address with the high order bits of the Program Counter J type format or register address R type format Branches have 16 bit offsets relative to the program counter I type Jump And Link instructions save their return address in register 31 e Coprocesso
409. usable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception Underflow exception FPU Instruction Set Details Appendix B CVT L fmt converticlong CVT L fmt Fixed Point Format 31 26 25 21 20 16 15 11 10 6 5 0 fmt 0 fs fd CVT L 010001 00000 100101 5 5 5 5 6 Format CVT L fmt fd fs Description The contents of the floating point register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the long fixed point format The result is placed in the floating point register specified by fd This instruction is valid only for conversions from single or double precision floating point formats When the source operand is an Infinity NaN or the correctly rounded integer result is outside of 283 to 283 1 the Invalid operation exception is raised Ifthe Invalid operation is not enabled then no exception is taken and 2 1 is returned Operation T StoreFPR fd L ConvertFmt ValueFPR fs fmt fmt L Exceptions Coprocessor unusable exception Floating Point exception Coprocessor Exceptions Invalid operation exception Unimplemented operation exception Inexact exception Overflow exception FPU Instruction Set Details Appendix B CVT S fmt convertio Single CVT S fmt Floating Point Format 26 25 21 2
410. usly or asynchronously for a cold reset or synchronously to initiate a warm reset Reset must be deasserted synchronously with MasterClock VCCOk Vcc is OK Input When asserted this signal indicates to the processor that Vcc gt Vecmin for more than 100 milliseconds and will remain stable The assertion of VCCOk initiates the initialization sequence Table 8 5 Initialization Interface Signals Processor Signal Descriptions Chapter 8 Table 8 6 lists the R4600 R4700 processor signals and their possible states Asserted Reset Description Name 1 0 State 3 State State System address data bus SysAD 63 0 I O High Yes a System address data check bus SysADC 7 0 I O High Yes a System command data identifier bus SysCmd 8 0 1 0 High Yes a System command data identifier bus parity SysCmdP I O High Yes a Valid input ValidIn I Low No NA Valid output ValidOut O Low Yes b External request ExtRqst I Low No NA Release interface Release O Low Yes b Read ready RdRdy Tt Low No NA Write ready WrRdy I Low No NA Interrupts Int 5 0 I Low No NA Nonmaskable interrupt NMI I Low No NA Boot mode data in ModelIn I High No NA Boot mode clock ModeClock O High No d JTAG datain gt JID r mh No T oa Cd JTAG data out JTDO O High Yes b JTAG command JTMS I High No NA JTAG clock i
411. wer case The instruction name such as ADD SUB and so on is shown in upper case For the sake of clarity we sometimes use an alias for a variable subfield in the formats of specific instructions For example we use rs base in the format for load and store instructions Such an alias is always lower case since it refers to a variable subfield In some instructions the instruction subfields op and function can have constant 6 bit values When reference is made to these instructions upper case mnemonics are used For instance in the floating point ADD instruction we use op COP1 and function FADD In other cases a single field has both fixed and variable subfields so the name contains both upper and lower case characters Bit encoding for mnemonics are shown in Figure B 3 at the end of this appendix and are also included with each individual instruction In the instruction description examples that follow the Operation section describes the operation performed by each instruction using a high level language notation Instruction Notation Examples The following examples illustrate the application of some of the instruction notation conventions Example 1 GPR rt lt immediate 016 Sixteen zero bits are concatenated with an immediate value typically 16 bits and the 32 bit string with the lower 16 bits set to zero is assigned to General Purpose Register rt Example 2 immediate 5 immediate 5 o Bi
412. write request 3 The processor does not complete the issue of a read request until it issues an address cycle in response to the request for which the signal RdRdy was asserted two cycles earlier 4 The processor does not complete the issue of a write request until it issues an address cycle in response to the write request for which the signal WrRdy was asserted two cycles earlier 12 22 System Interface Chapter 12 SCycle SClock Figure 12 19 illustrates two processor write requests in which the issue of the second is delayed for the assertion of WrRdy Note Timings for the SysADC and SysCmdP buses are the same as those of the SysAD and SysCmd buses respectively 9 10 41 12 8 VO es ey ee ee a SysAD Bus Addr Datao X Addr Datao SysCmd Bus Write NEOD ValidOut ValidIn RdRdy WrRady Release Figure 12 19 Two Processor Write Requests Second Write Delayed for the Assertion of WrRdy External Request Protocols External requests can only be issued with the system interface in slave state An external agent asserts ExtRqst to arbitrate see External Arbitration Protocol on page 12 24 for the system interface then waits for the processor to release the system interface to slave state by asserting Release before the external agent issues an external request If the system interface is already in slave state that is the proces
413. xception Cause An Integer Overflow exception occurs when an ADD ADDI SUB DADD DADDI or DSUB instruction results in a 2 s complement overflow This exception is not maskable Processing The common exception vector is used for this exception and the OV code in the Cause register is set The EPC register contains the address of the instruction that caused the exception unless the instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set Integer Overflow exception processing is shown in Figure 5 15 on page 13 Servicing The process executing at the time of the exception is handed a floating point exception integer overflow signal This error is usually fatal to the current process L See Appendix A for instruction description 5 24 CPU Exception Processing Chapter 5 Trap Exception This section explains the Trap exception Cause The Trap exception occurs when a TGE TGEU TLT TLTU TEQ TNE TGEI TGEUI TLTI TLTUI TEQI or TNEI instruction results in a TRUE condition This exception is not maskable Processing The common exception vector is used for this exception and the Tr code in the Cause register is set The EPC register contains the address of the instruction causing the exception unless the instruction is in a branch delay slot in which case the EPC register contains the addr
414. xception one for references to 32 bit virtual address spaces and one for references to 64 bit virtual address spaces The UX SX and KX bits of the Status register determine whether the user supervisor or kernel address spaces referenced are 32 bit or 64 bit spaces All references use these vectors when the EXL bit is set to O in the Status register This exception sets the TLBL or TLBS code in the ExcCode field of the Cause register This code indicates whether the instruction as shown by the EPC register and the BD bit in the Cause register caused the miss by an instruction reference load operation or store operation When this exception occurs the BadVAddr Context XContext and EntryHi registers hold the virtual address that failed address translation The EntryHi register also contains the ASID from which the translation fault occurred The Random register normally suggests a valid location in which to place the replacement TLB entry The contents of the EntryLo register are undefined The EPC register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set TLB Refill exception processing is shown in Figure 5 15 on page 13 Servicing To service this exception the contents of the Context or XContext register are used as a virtual address to f
415. y The next clock is the system interface clock SClock This is also an internal clock and is used to sample data at the system interface and to clock data into the processor system interface output registers The SClock is a divided version of the PClock The divisor is selected at boot time There are three external clocks Some outputs are replicated to minimize loading The MasterOut is at the same frequency as MasterClock and can be used to clock certain external logic The other clocks are used by the external agent These are the TClock Transmit clock and the RClock Receive clock The TClock is used to clock the output registers signals transmitted to the R4600 R4700 of the external agent and is at the same frequency as SClock The RClock is used to clock the input register signals received from the R4600 R4700 of the external agent It is also at the same frequency as the SClock but its phase leads the SClock and TClock by 25 The R4600 R4700 implements an on chip PLL to eliminate the effects of clock skew Overview Chapter 1 System Interface The R4600 R4700 supports a 64 bit system interface that is compatible with the R4000PC system interface This interface operates from two clocks provided by the R4600 R4700 TClock 1 0 and RClock 1 0 at a division of the pipeline clock The interface consists of a 64 bit Address Data bus with 8 check bits and a 9 bit command bus In addition there are 8 handshake signals and 6 i
416. y 3 2 processor clock frequency divided by 4 EC 3 processor clock frequency divided by 5 4 processor clock frequency divided by 6 5 processor clock frequency divided by 7 6 processor clock frequency divided by 8 7 Reserved Writeback data rate 0 gt DDDD Doubleword every cycle 1 gt DDxDDx 2 Doublewords every 3 cycles 2 gt DDxxDDxx 2 Doublewords every 4 cycles 3 DxDxDxDx 2 Doublewords every 4 cycles EP 4 DDxxxDDxxx 2 Doublewords every 5 cycles 5 DDxxxxDDxxxx 2 Doublewords every 6 cycles 6 DxxDxxDxxDxx 2 Doublewords every 6 cycles 7 DDxxxxxDDxxxxx 2 Doublewords every 7 cycles 8 DxxxDxxxDxxxDxxx 2 Doublewords every 8 cycles 9 15 Reserved BigEndianMem BE 0 gt Little endian 1 gt Big endian Ic Primary I cache Size I cache size 2 bytes In the R4600 R4700 processor this is set to 16 Kbytes IC 010 DC Primary D cache Size D cache size 212 DC bytes In the R4600 R4700 processor this is set to 16 Kbytes DC 010 IB Primary I cache line size 1 32 bytes 8 Words DB Primary D cache line size 1 gt 32 bytes 8 Words KO kseg0 coherency algorithm see EntryLo0 and EntryLo1 registers Others Reserved Returns indicated values when read Table 4 12 Config Register Fields Load Linked Address LLAddr Register 17 The read write Load Linked Address LLAddr register contains the physical address read by the most recent Load Linked instruction This register is for
417. y II Table 1 12 MIPS 2 MIPS 3 Additional Branch Instructions OpCode Description MIPS ISA Level DADD Doubleword Add Ill DADDU Doubleword Add Unsigned Ill DSUB Doubleword Subtract Ill DSUBU Doubleword Subtract Unsigned Il Table 1 13 MIPS 2 MIPS 3 Additional Arithmetic Instructions 3 operand R type OpCode Description MIPS ISA Level DSLL Doubleword Shift Left Logical Il DSRL Doubleword Shift Right Logical Ill DSRA Doubleword Shift Right Arithmetic Ill DSLLV Doubleword Shift Left Logical Variable HI DSRLV Doubleword Shift Right Logical Variable Ill DSRAV_ Doubleword Shift Right Arithmetic Variable III DSLL32 Doubleword Shift Left Logical 32 Ill DSRL32 Doubleword Shift Right Logical 32 Ill DSRA32_ Doubleword Shift Right Arithmetic 32 HI Table 1 14 MIPS 2 MIPS 3 Additional Shift Instructions Overview Chapter 1 OpCode Description MIPS ISA Level TGE Trap if Greater Than or Equal II TGEU Trap if Greater Than or Equal Unsigned II TLT Trap if Less Than II TLTU Trap if Less Than Unsigned II TEQ Trap if Equal II TNE Trap if Not Equal II TGEI Trap if Greater Than or Equal Immediate II TGEIU Trap if Greater Than or Equal Immediate II Unsigned TLTI Trap if Less Than Immediate II TLTIU Trap if Less Than Immediate Unsigned II TEQI Trap if Equal Immediate II TNEI Trap if Not Equal Imm
418. y be serviced again even though there should be no interrupt pending CPU Exception Processing Chapter 5 Exception Handling and Servicing Flowcharts The remainder of this chapter contains figures of flowcharts for the exceptions described in Table 5 12 and guidelines for their handlers Figure Description Figure 5 16 General exceptions and their exception handler Figure 5 17 Figure 5 18 TLB XTLB miss exception and their exception handler Figure 5 19 Figure 5 20 Cache error exception and its handler Figure 5 21 Reset soft reset and NMI exceptions and a guideline to their handler Table 5 12 List of Exception Flowcharts Generally speaking the exceptions are handled by hardware HW and then the exceptions are serviced by software SW CPU Exception Processing Chapter 5 Comments Set FP Control Status Register Fp Control Status Register is only set Enhi VPN2 ASID if the respective exception occurs Context lt VPN2 EnHi X Context are set only for Set Cause Register TLB Invalid Modified amp Refill exceptions EXCCode CE Instr in Br Dly Slot Cause 31 BD amp 1 Cause 31 BD 0 Check if exception within another exception BadVaA is set only for Set BadVA Set BadVA TLB Invalid Modified Refill and VCED I exceptions EPC lt PC 4 EPC lt PC Note not set if Bus Error h Exception Processor fo
419. y cycle description of the bus arbitration protocols for each type of processor and external request Table 12 3 lists the abbreviations and definitions for each of the buses that are used in the timing diagrams that follow Scope Abbreviation Meaning Global Unsd Unused SysAD bus Addr Physical address Data lt n gt Data element number n of a block of data SysCmd bus Cmd An unspecified system interface command Read A processor or external read request command Write A processor or external write request command SINull A system interface release external null request command NData Anoncoherent data identifier for a data element other than the last data element NEOD A noncoherent data identifier for the last data element Table 12 3 System Interface Requests Processor Request Protocols Processor request protocols described in this section include e read e write Note In the timing diagrams the two closely spaced wavy vertical lines see SCycle 2 in Figure 12 20 on page 12 24 indicate one or more identical cycles 12 14 System Interface Chapter 12 Processor Read Request Protocol Steps The following sequence describes the protocol for a processor read request the numbered steps below correspond to the numbers in Figure 12 12 on page 12 16 1 RdRdy is asserted low indicating the external agent is ready to accept a read request 2 With the system interface in master s
420. ystem This mechanism is called a cache coherency protocol The R4600 R4700 does not provide any hardware cache coherency Cache coherency must be handled with software Cache Coherency Attributes Cache coherency attributes are necessary to ensure the consistency of data throughout the system 11 7 Cache Organization Operation and Coherency Chapter 11 Bits in the translation look aside buffer TLB control coherency on a per page basis Specifically the TLB contains 3 bits per entry that provide two possible coherency attribute types they are listed below and described more fully in the following sections e uncached e noncoherent includes 3 attribute values Table 11 2 summarizes the behavior of the processor on load misses and store misses for each of the coherency attribute types listed above The following sections describe in detail these coherency attribute types Attribute Type Load Miss Store Miss Uncached Main memory read Main memory write Noncoherent Noncoherent read Noncoherent read write allocate page Main memory write no write allocate page Table 11 2 Coherency Attributes and Processor Behavior Uncached Lines within an uncached page are never in a cache When a page has the uncached coherency attribute the processor issues a doubleword partial doubleword word or partial word read or write request directly to main memory bypassing the cache for any load or store to a location w
421. ystem interface and to clock data into the processor system interface output registers see Chapter 10 for more details 12 4 System Interface Chapter 12 Master and Slave States When the R4600 R4700 processor is driving the SysAD and SysCmd buses the system interface is in master state When the external agent is driving the SysAD and SysCmd buses the system interface is in slave state In master state the processor drives the SysAD and SysCmd buses and will assert the signal ValidOut whenever these buses are valid In slave state the external agent drives the SysAD and SysCmd buses and asserts the signal ValidIn whenever these buses are valid Moving from Master to Slave State The system interface remains in master state unless one of the following occurs e The external agent requests and is granted the system interface ex ternal arbitration e The processor issues a read request and performs an uncompelled change to slave state External Arbitration The system interface must be in slave state for the external agent to issue an external request through the system interface The transition from master state to slave state is arbitrated by the processor using the system interface handshake signals ExtRqst and Release This transi tion is described by the following procedure 1 An external agent signals that it wishes to issue an external request by asserting ExtRast 2 When the processor is ready to accept a
422. yte O is the most significant leftmost byte thereby providing compatibility with MC 68000 and IBM 370 conventions Figure 1 4 shows this configuration Higher Word Bit Address Address 31 24 23 1615 8 7 12 12 14 8 8 10 4 4 6 El 5 Address Figure 1 4 Big Endian Byte Ordering When configured as a little endian system byte O is always the least significant rightmost byte which is compatible with iAPX x86 and DEC VAX conventions Figure 1 5 shows this configuration Higher Word Bit Address Address 31 24 23 1615 12 15 14 8 11 10 4 7 6 Lower 3 2 Address Figure 1 5 Little Endian Byte Ordering In this text bit O is always the least significant rightmost bit thus bit designations are always little endian although no instructions explicitly designate bit positions within words Figures 1 6 and 1 7 show little endian and big endian byte ordering in doublewords Overview Chapter 1 Most significant byte Least significant byte Word Bit 63 5655 4847 4039 3231 2423 16 15 Byte eas 6 5 4 3 2 T Halfword Byte l Bt 7 6543210 Bits in a Byte Figure 1 6 Little Endian Data in a Doubleword Most significant byte Least significant byte Bit 63 5655 4847 4039 32 31 2423

cache

Contents

Download Pdf Manuals

Related Search

Related Contents