Contents Pdf

Home

1. eqvx 31 S A B 284 Rc extsbx 31 S A 00000 954 Rc fabsx 63 D 264 Rc fcfidx 63 D B 846 Rc fcmpo 63 crfD 00 B 32 0 fcmpu 63 B 0 0 fctidx ETR 814 Rc fetidzx oer Sc 00000 B 815 Re fctiwx 63 D B 14 Re fctiwzx 63 D B 15 Re fmrx 63 D B 72 Re fnabsx 63 D B 136 Re icbi 31 00000 A B 982 0 Ibzux 31 D A B 119 0 Ibzx 31 D A B 87 0 Idarx 31 D A B 84 0 Ifdux 31 D A B 631 0 Ifdx 31 D A B 599 0 Ifsux 31 D A B 567 0 Ifsx 31 D A B 535 0 Ihbrx 31 D A B 790 0 Ihzux 31 D A B 311 0 Ihzx 31 D A B 279 0 Iswi 31 D A NB 597 0 Iswx 4 IR We A B 533 0 Iwaux 31 D A B 373 0 Appendix A PowerPC Instruction Set Listings A 33 Iwax 31 D A B 341 0 Iwbrx 31 D A B 534 0 mcrfs 63 64 0 mcrxr 31 512 0 mfcr 31 19 0 mffsx 63 583 Re mfsrin 95 31 659 0 mtfsb0x 63 70 Re mtfsb1x 63 38 Re mtfsfix 63 134 Re mtmsrd 12 178 0 mtsr 31 S 210 0 mtsrd 31 S 82 0 mtsrin 95 31 S 242 0 mtsrdin 99 31 S 114 0 nandx 476 Re orx 31 S B 444 Re orcx 31 S B 412 Rc slbia 123 31 00000 0 498 0 slbie 123 31 00000 B 434 0 sradx 31 S A B 794 Rc srawx 31 S A B 792 Rc srawix 31 S A SH 824 Rc srdx 31 S A B 539 Rc stbx 31 S A B 215 0 A 34 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual stdex 31 S A B 214 1 stdux 31 S A B 181 0 ne 3 e g ME g
2. L2CE VI A L2WE aborted rd SRAMAddress RO SRAMMemory SRAMData Note Par indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read Figure 9 29 Burst Read Write Write L2 Cache Access Flow Through 9 1 7 2 Pipelined Burst SRAM Pipelined burst SRAMs operate at higher frequencies than flow through burst SRAMs by clocking the read data from the memory array into a buffer before driving the data onto the data bus This causes initial read accesses by the pipelined burst SRAMs to occur one cycle later than flow through burst SRAMs but the L2 bus frequencies supported can be higher Note that the 750 s L2 cache interface requires the use of single cycle deselect pipelined burst SRAM for proper operation Figure 9 30 shows a burst read write read memory access sequence when the L2 cache interface is configured with pipelined burst SRAM SRAMCIK LI LI LIL LT PFLELI LILI iy ie ahi L2CE L2WE SRAMAddress SRAMMemory SRAMData 7 Notes Rary indicates where some burst RAMs may begin driving the data bus Rxtr indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read Figure 9 30 Burst Read Write Read L2 Cache Access Pipelined Chapter 9 L2 Cache Interface Operation 9 11 Figure
3. stfiwx 5 011111 S A B 1111010111 0 extsw 011111 S A 00000 1111011010 Re dcbz 011111 00000 A B 1111110110 0 Ibz 100010 D A d Ibzu 100011 D A d stw 100100 S A d stwu 100101 S A d Ihz 101000 D A d Ihzu 101001 D A d Iha 101010 D A d Ihau 101011 D A d Imw 7 101110 D A d stmw 101111 S A d Ifs 110000 D A d Ifsu 110001 D A d stfs 110100 S A d stfsu 110101 S A d stfd 110110 S A d stfdu 110111 S A d wa 111010 D A fdivsx 111011 D A fsubsx 111011 D A faddsx 111011 D A A 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Name fsqrtsx gt fresx fmulsx fmsubsx fmaddsx fnmsubsx fnmaddsx std stdu fcmpu frspx fctiwx fetiwzx fdivx fsubx faddx fsqrtx gt fselx fmulx frsqrtex fmsubx fmaddx fnmsubx fnmaddx Tempo mtfsb1x fnegx mcrfs mtfsb0x fmrx mtfsfix fnabsx fabsx 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1011 D 00000 B 00000 10110 Re 1011 D 00000 B 00000 11000 Re 1011 D A 00000 C 11001 Rc 1011 D A 1011 D A 1110 S A 1110 S A ds el o o o o o 0000001100 0000001110 0000001111 00000 00000 00000 A A 00000 D A D A D 00000 D A D el gt gt U C OJO Se Ola EBEN 0000100000 00000 0000100110 EES croD 00000 D 0000101000 0001000000 H 0001000110 0001001000 00000 00000 o
4. D Cache Hit Miss Y PA O 31 Figure 5 3 PowerPC 750 Microprocessor DMMU Block Diagram 5 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 5 1 3 Address Translation Mechanisms PowerPC processors support the following three types of address translation e Page address translation translates the page frame address for a 4 Kbyte page size e Block address translation translates the block number for blocks that range in size from 128 Kbytes to 256 Mbytes e Real addressing mode address translation when address translation is disabled the physical address is identical to the effective address Figure 5 4 shows the three address translation mechanisms provided by the MMUs The segment descriptors shown in the figure control the page address translation mechanism When an access uses page address translation the appropriate segment descriptor is required In 32 bit implementations the appropriate segment descriptor is selected from the 16 on chip segment registers by the four highest order effective address bits A control bit in the corresponding segment descriptor then determines if the access is to memory memory mapped or to the direct store interface space Note that the direct store interface was present in the architecture only for compatibility with existing I O devices that used this interface However it is being removed from the architecture and the 750 does not support it When an acc
5. mopeys SUS gl suonona3su y lg 831 NW uopona su HI ogonasul Z 1g p9 LH uuz p9 9114 un Buiss990 d youesg Wun uonanysu yun yoredsiq piom 9 anand uononsu suononsul Z JOO SOURWOLO d JU9WaBeue JOMOd EWJOU y dOO OVI Jann 20072 1 u W 3199q 19 UN0 eseg oul sounjee euonIppy Figure 8 2 PowerPC 750 Microprocessor Block Diagram 8 5 Chapter 8 Bus Interface Operation Cache lines are selected for replacement based on a pseudo least recently used PLRU algorithm Each time a cache line is accessed it is tagged as the most recently used line of the set When a miss occurs and all eight lines in the set are marked as valid the least recently used line is replaced with the new data When data to be replaced is in the modified state the modified data is written into a write back buffer while the missed data is being read from memory When the load completes the 750 then pushes the replaced line from the write back buffer to the L2 cache if enabled or to main memory in a burst write operation 8 1 2 Operation of the L2 Cache The 750 provides an on chip two way set associative tag memory and a dedicated L2 cache port with support for up to 1 Mbyte of external synchronous SRAMs for data storage The L2 cache normally operates in copy back mode and supports system c
6. nemme 1 mee 10 15 16 17 18 19 25 26 31 PMCTRIGGER R 0 9 Figure 11 1 Monitor Mode Control Register 0 MMCRO This register must be cleared at power up Reading this register does not change its contents Table 11 2 describes the bits of the MMCRO register Table 11 2 MMCRO Bit Settings Disables counting unconditionally O The values of the PMCn counters can be changed by hardware 1 The values of the PMCn counters cannot be changed by hardware Disables counting while in supervisor mode O The PMCn counters can be changed by hardware 1 If the processor is in supervisor mode MSR PR is cleared the counters are not changed by hardware DU Disables counting while in user mode O The PMCn counters can be changed by hardware 1 If the processor is in user mode MSR PR is set the PMCn counters are not changed by hardware Disables counting while MSR PM is set O The PMCn counters can be changed by hardware 1 If MSR PM is set the PMCn counters are not changed by hardware Disables counting while MSR PM is zero O The PMChn counters can be changed by hardware 1 If MSR PM is cleared the PMCn counters are not changed by hardware ENINT Enables performance monitor interrupt signaling 0 interrupt signaling is disabled 1 Interrupt signaling is enabled Cleared by hardware when a performance monitor interrupt is taken To re enable these interrupt signals software must set this bit after servici
7. Table A 18 Memory Synchronization Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 eieio 31 00000 00000 854 0 isync 19 00000 00000 150 0 Idarx 31 D B 84 0 Iwarx 31 D B 20 0 stdcx 31 S B 214 1 stwex 31 S B 150 1 sync 31 00000 00000 598 0 Note 1 64 bit instruction Table A 19 Floating Point Load Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 wl 50 D A d du 51 D A d l dux 31 D A B 631 0 idx 31 D A B 599 0 pel 48 D A d sul 49 D A d Wa 31 D 567 0 Wen 31 D 535 0 A 24 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table A 20 Floating Point Store Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 stfd 54 S A stfdu 55 A stfdux 31 A B 0 stfiwx A B stfs A stfsu A stfsux A stfsx A Note 1 Optional instruction Table A 21 Floating Point Move Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 fabsx 63 D 00000 B 264 Rc fmrx 63 D 00000 B 72 Rc fnabsx 63 D 00000 B 136 Rc fnegx 63 D 00000 B 40 Re Table A 22 Branch Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 bx 18 LI AA LK bex 16 BO BI BD AA LK bec
8. The number of instructions that can be executed after the issue of a predicted branch instruction is limited by the fact that no instruction executed after a predicted branch may actually update the register files or memory until the branch is completed That is instructions may be issued and executed but cannot reach the write back stage in the completion unit When an instruction following a predicted branch completes execution it does not write back its results to the architected registers instead it stalls in the completion queue Of course when the completion queue is full no additional instructions can be dispatched even if an execution unit is idle In the case of a misprediction the 750 can easily redirect its machine state because the programming model has not been updated When a branch is mispredicted all instructions that were dispatched after the predicted branch instruction are flushed from the completion queue and any results are flushed from the rename registers The BTIC is a cache of recently used branch target instructions If the search for the branch target hits in the cache the first one or two branch instructions is available in the instruction queue on the next cycle shown in Figure 6 5 Two instructions are fetched on a BTIC hit unless the branch target is the last instruction in a cache block in which case one instruction is fetched In some situations an instruction sequence creates dependencies that keep
9. Data Instruction Accesses Accesses EA 0 19 EA 0 19 goen o MMU 32 Bit GD EA 4 19 EA 15 19 ele EA 0 14 IBATOU IBATOL Segment Registers e IBAT3U IBAT3L EA 15 19 Upper 24 Bits of Virtual Address PA EA 0 14 1 On Chip DBATOU TLBs DBATOL I Optional ce DBAT3U Page Table Search Logic I y Optional PAJO 14 o PA 15 19 SDR1 SPR 25 PA O 19 A 20 31 PSA y Optional PA 0 31 Loa Figure 5 1 MMU Conceptual Block Diagram 32 Bit Implementations 5 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual nstruction Unit BPU EA 0 19 EA O 3 A 20 31 Segment Registers EA 4 19 SDR1 SPR25 EA 0 19 EA 0 14 Page Table Search Logic IBAT Array IBATOL IBAT3U IBAT3L PA O 19 A 20 26 Cache Select 127 PA O 19 Y PA O 31 Cache Hit Miss Figure 5 2 PowerPC 750 Microprocessor IMMU Block Diagram Chapter 5 Memory Management 5 7 Load Store A 20 31 Unit EA 0 19 EA O 3 EA 0 19 DBAT Array Segment Registers DBATOL DBAT3U DBAT3L EA 0 14 EA 4 19 D Cache Select A 20 26 127 PA O 19 Page Table 7 Search Logic esas Ea SDR1 SPR 25
10. For the icbi instruction the effective address is not computed or translated so it cannot generate a protection violation or exception This instruction performs a virtual lookup into the instruction cache index only All ways of the selected instruction cache set are invalidated The icbi instruction is not broadcast on the 60x bus The icbi instruction invalidates the cache blocks independent of whether the cache is disabled or locked Chapter 3 Instruction and Data Cache Operation 3 17 3 5 Cache Operations This section describes the 750 cache operations 3 5 1 Cache Block Replacement Castout Operations Both the instruction and data cache use a pseudo least recently used PLRU replacement algorithm when a new block needs to be placed in the cache When the data to be replaced is in the modified M state that data is written into a castout buffer while the missed data is being accessed on the bus When the load completes the 750 then pushes the replaced cache block from the castout buffer to the L2 cache if L2 is enabled or to main memory if L2 is disabled The replacement logic first checks to see if there are any invalid blocks in the set and chooses the lowest order invalid block L O 7 as the replacement target If all eight blocks in the set are valid the PLRU algorithm is used to determine which block should be replaced The PLRU algorithm is shown in Figure 3 5 3 18 IBM PowerPC 740 PowerPC 750 RISC Micropro
11. H HIDn hardware implementation dependent registers HIDO description 2 9 doze bit 10 3 DPM enable bit 10 2 nap bit 10 4 HID1 description 2 13 PLL configuration 2 13 7 30 HRESET hard reset signal 7 23 8 43 Index 4 I TABR instruction address breakpoint register 2 8 ICTC instruction cache throttling control reg ister 2 21 10 11 IEEE 1149 1 compliant interface 8 44 Illegal instruction class 2 33 Instruction cache configuration 3 4 instruction cache block fill operations 3 21 organization 3 5 Instruction cache throttling 10 10 Instruction timing examples cache hit 6 12 cache miss 6 15 execution unit 6 18 instruction flow 6 8 memory performance considerations 6 27 overview 6 3 terminology 6 1 Instructions branch address calculation 2 53 branch instructions 6 9 6 18 6 20 A 25 cache control instructions 9 4 cache management instructions A 27 classes 2 32 condition register logical 2 54 A 26 defined instructions 2 33 external control instructions 2 64 A 28 floating point arithmetic 2 42 A 20 compare 2 43 A 21 FP load instructions A 24 FP move instructions A 25 FP rounding and conversion 2 43 A 21 FP status and control register 2 44 FP store instructions A 25 FPSCR instructions A 21 multiply add 2 42 A 20 illegal instructions 2 33 instruction cache throttling 10 10 instruction flow diagram 6 10 instruction serialization 6 17 instruction serialization types 6 17 instruction set summary 2 31 IBM
12. On Chip Instruction and Data Caches and Section 1 2 5 L2 Cache Implementation Not Supported in the PowerPC 740 The BPU also contains a 64 entry BTIC that provides immediate access to cached target instructions For more information see Section 1 2 2 2 Branch Processing Unit BPU 1 7 Exception Model The following sections describe the PowerPC exception model and the 750 implementation A detailed description of the 750 exception model is provided in Chapter 4 Exceptions 1 7 1 PowerPC Exception Model The PowerPC exception mechanism allows the processor to interrupt the instruction flow to handle certain situations caused by external signals errors or unusual conditions arising from the instruction execution When exceptions occur information about the state of the processor is saved to certain registers and the processor begins execution at an address exception vector predetermined for each exception Exception processing occurs in supervisor mode Chapter 1 PowerPC 740 PowerPC 750 Overview 1 29 Although multiple exception conditions can map to a single exception vector a more specific condition may be determined by examining a register associated with the exception for example the DSISR and the FPSCR Additionally some exception conditions can be explicitly enabled or disabled by software The PowerPC architecture requires that exceptions be handled in program order therefore although a parti
13. in The Programming Environments Manual for 32 bit implementations Implementation Note The 750 BAT registers are not initialized by the hardware after the power up or reset sequence Consequently all valid bits in both instruction and data BATs must be cleared before setting any BAT for the first time This is true regardless of whether address translation is enabled Also software must avoid overlapping blocks while updating a BAT or areas Even if translation is disabled multiple BAT hits are treated as programming errors and can corrupt the BAT registers and produce unpredictable results Always re zero during the reset ISR After zeroing all BATs set them in order to the desired values HRESET disorders the BATs SRESET does not 5 4 Memory Segment Model The 750 adheres to the memory segment model as defined in Chapter 7 Memory Management in The Programming Environments Manual for 32 bit implementations Memory in the PowerPC OEA is divided into 256 Mbyte segments This segmented memory model provides a way to map 4 Kbyte pages of effective addresses to 4 Kbyte pages in physical memory page address translation while providing the programming flexibility afforded by a large virtual address space 52 bits The segment page address translation mechanism may be superseded by the block address translation BAT mechanism described in Section 5 3 Block Address Translation If not the translation proceeds in the followin
14. 0010000110 B 0010001000 Appendix A PowerPC Instruction Set Listings B 0100001000 A 15 Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 mffsx 111111 D 00000 00000 1001000111 Re mtfstx 111111 FM o B 1011000111 Rc fctidx 1 111111 00000 1100101110 fetidzx 111111 00000 EN 1100101111 D fefidx 111111 00000 ER 1101001110 Ro Notes 1 64 bit instruction 2 Supervisor level instruction 3 Supervisor level instruction 4 Optional 64 bit bridge instruction 5 Optional instruction S Supervisor and user level instruction 7 Load store string multiple instruction 32 bit instruction not implemented by the PowerPC 750 A 16 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual A 3 Instructions Grouped by Functional Categories Table A 3 through Table A 30 list the PowerPC instructions grouped by function Key Reserved bits Table A 3 Integer Arithmetic Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 addx 31 D A B OE 266 Re addcx 31 D A B OE 10 Re addex 31 D A B OE 138 Re addi 14 D A SIMM addic 12 D A SIMM addic 13 D A SIMM addis 15 D A SIMM addmex 31 D A 00000 OE 234 Re addzex 31 D A 00000 OE 202 Re divdx 31 D A B OE 489 Re divdux 31 D A B OE 457 Re divwx 31 D A B OE 491 Re divwux 31 D A B OE 459
15. Because a time base signal could have occurred along with an enabled counter overflow condition software should always reset INTONBITTRANS to zero if the value in INTONBITTRANS was a one 7 8 RTCSELECT 64 bit time base bit selection enable Pick bit 63 to count Pick bit 55 to count Pick bit 51 to count Pick bit 47 to count INTONBITTRANS Cause interrupt signaling on bit transition identified in RTCSELECT from off to on 0 Do not allow interrupt signal if chosen bit transitions 1 Signal interrupt if chosen bit transitions Software is responsible for setting and clearing INTONBITTRANS 10 15 THRESHOLD Threshold value The 750 supports all 6 bits allowing threshold values from 0 63 The intent of the THRESHOLD support is to characterize L1 data cache misses PMC1INTCONTROL Enables interrupt signaling due to PMC1 counter overflow O Disable PMC1 interrupt signaling due to PMC1 counter overflow 1 Enable PMC1 Interrupt signaling due to PMC1 counter overflow PMCINTCONTROL Enable interrupt signaling due to any PMC2 PMC4 counter overflow Overrides the setting of DISCOUNT 0 Disable PMC2 PMC4 interrupt signaling due to PMC2 PMC4 counter overflow 1 Enable PMC2 PMCA interrupt signaling due to PMC2 PMC4 counter overflow 18 PMCTRIGGER Can be used to trigger counting of PMC2 PMC4 after PMC1 has overflowed or after a performance monitor interrupt is signaled O Enable PMC2 PMC4 counting 1 Disable PMC2 PMC4 counting until eit
16. During the address transfer the physical address and all attributes of the transaction are transferred from the bus master to the slave device s Snooping logic may monitor the transfer to enforce cache coherency see discussion about snooping in Section 8 3 3 Address Transfer Termination The signals used in the address transfer include the following signal groups e Address transfer start signal transfer start TS e Address transfer signals address bus A 0 31 and address parity AP 0 3 e Address transfer attribute signals transfer type TT 0 4 transfer size TSIZ 0 2 transfer burst TBST cache inhibit CI write through WT and global GBL Figure 8 7 shows that the timing for all of these signals except TS is identical All of the address transfer and address transfer attribute signals are combined into the ADDR grouping in Figure 8 7 The TS signal indicates that the 750 has begun an address transfer and that the address and transfer attributes are valid within the context of a synchronous bus The 750 always asserts TS coincident with ABB As an input TS need not coincide with the assertion of ABB on the bus that is TS can be asserted with or on a subsequent clock cycle after ABB is asserted the 750 tracks this transaction correctly In Figure 8 7 the address transfer occurs during bus clock cycles 1 and 2 arbitration occurs in bus clock cycle 0 and the address transfer is terminate
17. Enable MCP The primary purpose of this bit is to mask out further machine check exceptions caused by assertion of MCP similar to how MSR EE can mask external interrupts O Masks MCP Asserting MCP does not generate a machine check exception or a checkstop 1 Asserting MCP causes checkstop if MSR ME 0 or a machine check exception if ME 1 Enable disable 60x bus address and data parity generation O Parity generation is enabled 1 Ifthe system does not use address or data parity and the respective parity checking is disabled HIDO EBA or HIDO EBD 0 input receivers for those signals are disabled require no pull up resistors and thus should be left unconnected If all parity generation is disabled all parity checking should also be disabled and parity signals need not be connected Enable disable 60x bus address parity checking O Prevents address parity checking 1 Allows a address parity error to cause a checkstop if MSR ME O or a machine check exception if MSR ME 1 EBA and EBD allow the processor to operate with memory subsystems that do not generate parity Chapter 2 Programming Model 2 9 Table 2 4 HIDO Bit Functions Continued Enable 60x bus data parity checking O Parity checking is disabled 1 Allows a data parity error to cause a checkstop if MSR ME 0 or a machine check exception if MSR ME 1 EBA and EBD allow the processor to operate with memory subsystems that do not generate parity HRESET s
18. May occur on any cycle IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 2 9 7 5 TLBI Sync TLBISYNC Input The TLBI Sync TLBISYNC signal is an input only signal on the 750 Following are the state meaning and timing comments for the TLBISYNC signal State Meaning Asserted Indicates that instruction execution stops after execution of a tlbsync instruction Negated Indicates that the instruction execution may continue or resume after the completion of a tlbsync instruction Timing Comments Assertion Negation May occur on any cycle The TLBISYNC signal must be held negated during HRESET Start Up TLBISYNC is sampled at the negation of HRESET to select 32 bit data bus mode if TLBIS YNC is negated at start up 32 bit mode is disabled and the default 64 bit mode is selected 7 2 9 7 6 L2 Cache Interface The 750 s dedicated L2 cache interface provides all the signals required for the support of up to 1 Mbyte of synchronous SRAM for data storage The use of the L2 data parity L2DP 0 7 and L2 low power mode enable L2ZZ signals is optional and depends on the SRAMs selected for use with the 750 Note that the least significant bit of L2 address L2ADDR 16 0 signals is identified as bit O and the most significant bit is identified as bit 16 Note that the L2 cache interface is not implemented in the 740 7 2 9 8 L2 Address L2ADDR 16 0 Output Following are the state meaning and timing comment
19. Move to Special Purpose Register Move from Special Purpose Register Table 2 47 lists the SPR numbers for both user and supervisor level accesses Table 2 47 PowerPC Encodings ES DBAT1U 10000 11010 Supervisor OEA DBAT2L 10000 11101 Supervisor OEA 2 56 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 47 PowerPC Encodings Continued ES a ges ay ea E ICO AAN EI CN eco ono seron mir mm e EE mir ICON ACI EX CN ICO AIN EE CN MES EI Chapter 2 Programming Model 2 57 Table 2 47 PowerPC a ENREDO Continued Notes The order of the two 5 bit halves of the SPR number is reversed compared with actual instruction coding For mtspr and mfspr instructions the SPR number coded in assembly language does not appear directly as a 10 bit binary number in the instruction The number coded is split into two 5 bit halves that are reversed in the instruction with the high order five bits appearing in bits 16 20 of the instruction and the low order five bits in bits 11 15 mfspr mtspr 2 The TB registers are referred to as TBRs rather than SPRs and can be written to using the mtspr instruction in supervisor mode and the TBR numbers here The TB registers can be read in user mode using either the mftb or mtspr instruction and specifying TBR 268 for TBL and SPR 269 for TBU Encodings for the 750 specific SPRs are listed in Table 2 48 Table 2 48 SP
20. Note 1 TIN and TIV are read only status bits The THRM3 register shown in Figure 2 11 is used to enable the thermal assist unit and to control the comparator output sample time The thermal assist logic manages the thermal management interrupt generation and time multiplexed comparisons in dual threshold mode as well as other control functions Reserved Sampled Interval Timer Value 0 17 18 30 31 Figure 2 11 Thermal Management Register 3 THRM3 Chapter 2 Programming Model 2 23 The bits in THRM3 are described in Table 2 17 Table 2 17 THRM3 Bit Settings 8 bm nz Reserved for future use System software should clear these bits when writing to the THRM3 18 30 SITV Sample interval timer value Number of elapsed processor clock cycles before a junction temperature vs threshold comparison result is sampled for TIN bit setting and interrupt generation This is necessary due to the thermal sensor DAC and the analog comparator settling time being greater than the processor cycle time The value should be configured to allow a sampling interval of 20 microseconds Enables the thermal sensor compare operation if either THRM1 V or THRM2 V is set The THRM registers can be accessed with the mtspr and mfspr instructions using the following SPR numbers e THRM1 is SPR 1020 e THRM2 is SPR 1021 e THRM3 is SPR 1022 2 1 5 L2 Cache Control Register L2CR The L2 cache control register show
21. OEA Chapter 2 Programming Model 2 31 e Memory synchronization instructions These instructions are used for memory synchronizing See Section 2 3 4 7 Memory Synchronization Instructions UISA Section 2 3 5 2 Memory Synchronization Instructions VEA for more information e Memory control instructions These instructions provide control of caches TLBs and segment registers For more information see Section 2 3 5 3 Memory Control Instructions VEA and Section 2 3 6 3 Memory Control Instructions OEA e External control instructions These include instructions for use with special input output devices For more information see Section 2 3 5 4 Optional External Control Instructions Note that this grouping of instructions does not necessarily indicate the execution unit that processes a particular instruction or group of instructions This information which is useful for scheduling instructions most effectively is provided in Chapter 6 Instruction Timing Integer instructions operate on word operands Floating point instructions operate on single precision and double precision floating point operands The PowerPC architecture uses instructions that are four bytes long and word aligned It provides for byte half word and word operand loads and stores between memory and a set of 32 general purpose registers GPRs It also provides for word and double word operand loads and stores
22. PTE VSID VA 0 23 PTE API VA 24 29 4 Ifa match is not found step 3 is repeated for each of the other seven PTEs in the primary PTEG If a match is found the table search process continues as described in step 8 If a match is not found within the 8 PTEs of the primary PTEG the address of the secondary PTEG is generated 5 The first PTE PTEO in the secondary PTEG is read from memory Again because PTE reads have a WIM bit combination of 0b001 an entire cache line is read into the on chip cache 6 The PTE in the selected secondary PTEG is tested for a match with the virtual page number VPN of the access For a match to occur the following must be true PTE H 1 PTE V 1 PTE VSID VA 0 23 PTE API VA 24 29 7 Ifa match is not found step 6 is repeated for each of the other seven PTEs in the secondary PTEG If it is never found an exception is taken step 9 5 30 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 8 Ifa match is found the PTE is written into the on chip TLB and the R bit is updated in the PTE in memory if necessary If there is no memory protection violation the C bit is also updated in memory if the access is a write operation and the table search is complete 9 Ifa match is not found within the 8 PTEs of the secondary PTEG the search fails and a page fault exception condition occurs either an ISI exception or a DSI exception Figure 5 9 and Figu
23. e Integer load and store multiple instructions e Floating point load instructions e Floating point store instructions e Memory synchronization instructions Implementation Notes The following describes how the 750 handles misalignment The 750 provides hardware support for misaligned memory accesses It performs those accesses within a single cycle if the operand lies within a double word boundary Misaligned memory accesses that cross a double word boundary degrade performance For string operations the hardware makes no attempt to combine register values to reduce the number of discrete accesses Combining stores enhances performance if store gathering is enabled and the accesses meet the criteria described in Section 6 4 7 Integer Store Gathering Note that the PowerPC architecture requires load store multiple instruction accesses to be aligned At a minimum additional cache access cycles are required Although many unaligned memory accesses are supported in hardware the frequent use of them is discouraged since they can compromise the overall performance of the processor Accesses that cross a translation boundary may be restarted That is a misaligned access that crosses a page boundary is completely restarted if the second portion of the access causes a page fault This may cause the first access to be repeated On some processors such as the 603 a TLB reload would cause an instruction restart On the 750 TLB reloads a
24. i i KE L2WE burst rd SRAMAddress SRAMMemory SRAMData Note WA is the last previous write that was queued in the late write RAM Figure 9 35 Burst Read Write Write L2 Cache Access Late Write SRAM 9 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 10 Power and Thermal Management The PowerPC 750 microprocessor is specifically designed for low power operation It provides both automatic and program controlled power reduction modes for progressive reduction of power consumption It also provides a thermal assist unit TAU to allow on chip thermal measurement allowing sophisticated thermal management for high performance portable systems This chapter describes the hardware support provided by the 750 for power and thermal management 10 1 Dynamic Power Management Dynamic power management DPM automatically powers up and down the individual execution units of the 750 based upon the contents of the instruction stream For example if no floating point instructions are being executed the floating point unit is automatically powered down Power is not actually removed from the execution unit instead each execution unit has an independent clock input which is automatically controlled on a clock by clock basis Since CMOS circuits consume negligible power when they are not switching stopping the clock to an execution
25. 1 Address only operations are broadcast on the 60x bus Affected instructions are eieio sync debi dcbf and dcbst A sync instruction completes only after a successful broadcast Execution of eieio causes a broadcast that may be used to prevent any external devices such as a bus bridge chip from store gathering Note that dcbz with M 1 coherency required always broadcasts on the 60x bus regardless of the setting of this bit An icbi is never broadcast No cache operations except dcbz are snooped by the 750 regardless of whether the ABE is set Bus activity caused by these instructions results directly from performing the operation on the 750 cache SPD SGE BTIC Branch Target Instruction Cache enable used to enable use of the 64 entry branch instruction cache O The BTIC is disabled the contents are invalidated and the BTIC behaves as if it was empty BE Environments Manual 1 Allows the use of the 512 entry branch history table BHT The BHT is disabled at power on reset All entries are set to weakly not taken STE ue 31 NOOPTI No op the data cache touch instructions O The debt and debtst instructions are enabled 29 BHT Branch history table enable 0 BHT disabled The 750 uses static branch prediction as defined by the PowerPC architecture UISA for those branch instructions the BHT would have otherwise used to predict that is those that use the CR as the only mechanism to determine direction For more informa
26. 6 6 1 2 6 6 1 3 6 7 7 1 7 2 7 2 1 7 2 1 1 7 2 1 2 7 2 1 3 7 2 1 3 1 7 2 1 3 2 1 2 2 7 2 2 1 7 2 2 1 1 Contents Contents Page Ting Number L2 Cache Access Timing Considerations PowerPC 750 Only 6 15 Instruction Dispatch and Completion Consideratons 6 16 Rename Register Operation vercion iia ida arneses 6 17 Instruction Seriali7 Meca as 6 17 Execution TRA io 6 18 Branch Processing Unit Execution TIMINB coooonnocononononnninanononanononannnnnnccnn conos 6 18 Branch Folding and Removal of Fall Through Branch Instructions 6 18 Branch Instructions and Completon 6 20 Branch Prediction and Resol tdi 6 21 Static Branch Prediction iia ab 6 22 Predicted Branch Timing Examples A 6 22 Integer Unit Execution Timing seet 6 24 Floating Point Unit Execution Tummg 6 24 Effect of Floating Point Exceptions on Performance 6 25 Load Store Unit Execution Timing cccsscssssscssssscsensscesssscesnssceenessescees 6 25 Effect of Operand Placement on Performance 6 25 a eer 6 26 System Register Unit Execution Timings ini iii i n 6 27 Memory Performance Constderatons 6 27 Caching and Memory Coberency a icccssscesactaveesccusaseacsassevaavasvnnscdessaceeaaseeeeawedes 6 27 Effectot KE E 6 28 Instruction Scheduling Guidelines 200 0 cee ceecseceesceceeececeseeeceeeeeceeeeecseeeeenteeeesaes 6 29 Branch Dispatch and Completion Unit Resource Requirements 6 29 Branch Resolution Resource
27. Burst Read Write Read L2 Cache Access Late Write SRAMI 9 13 Burst Read Modify Write L2 Cache Access Late Write SRAM ee 9 13 Burst Read Write Write L2 Cache Access Late Write SRAM 9 14 Thermal Assist Unit Block Kaeramg ere ii 10 6 Monitor Mode Control Register 0 OMMCRO 11 4 Monitor Mode Control Register 1 OMMCRTI 11 5 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Illustrations Paragraph Title Page Number Number Figure 11 3 Performance Monitor Counter Registers DMCT DMCA 11 6 Figure 11 4 Sampled instruction Address Registers GlA cece eeseeeeeeeeeeeseeeseeceteeeseeenaees 11 10 xix Illustrations Illustrations Paragraph Page Number me Number XX IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number Table 1 Table ii Table iii Table 1 1 Table 1 2 Table 1 3 Table 1 4 Table 1 5 Table 2 1 Table 2 2 Table 2 3 Table 2 4 Table 2 5 Table 2 6 Table 2 7 Table 2 8 Table 2 9 Table 2 10 Table 2 11 Table 2 12 Table 2 13 Table 2 14 Table 2 15 Table 2 16 Table 2 17 Table 2 18 Table 2 19 Table 2 20 Table 2 21 Table 2 22 Table 2 23 Table 2 24 Table 2 25 Table 2 26 Table 2 27 Table 2 28 Table 2 29 Table 2 30 Table 2 31 Table 2 32 Table 2 33 Table 2 34 Table 2 35 Tables Tables Page mle Number Acronyms and Abbreviated TENSA AAA XXX Terro pelen ii Eu eren XXXIV Instruction Field Conventions eiii ia XXXV Architecture Defined Register
28. EIEIO operation is broadcast on the external bus to enforce ordering in the external memory system The eieio operation bypasses the L2 cache and is forwarded to the bus unit If HIDO ABE 0 the operation is not broadcast Because the 750 does not reorder noncacheable accesses eieio is not needed to force ordering However if store gathering is enabled and an eieio is detected in a store queue stores are not gathered If HIDO ABE 1 broadcasting eieio prevents external devices such as a bus bridge chip from gathering stores Instruction i The isync instruction is refetch serializing that is it causes the 750 to purge its Synchronize instruction queue and wait for all prior instructions to complete before refetching the next instruction which is not executed until all previous instructions complete to the point where they cannot cause an exception The isync instruction does not wait for all pending stores in the store queue to complete Any instruction after an isync sees all effects of prior instructions 2 3 5 3 Memory Control Instructions VEA Memory control instructions can be classified as follows e Cache management instructions user level and supervisor level e Segment register manipulation instructions OEA e Translation lookaside buffer management instructions OEA This section describes the user level cache management instructions defined by the VEA See Section 2 3 6 3 Memory Control Instructions OEA fo
29. No other sequences of operations cause this effect In this case the address tenure of the second transaction will not begin until one to three bus clocks after the end of the data tenure of the first transaction Chapter 8 Bus Interface Operation 8 11 8 3 Address Bus Tenure This section describes the three phases of the address tenure address bus arbitration address transfer and address termination 8 3 1 Address Bus Arbitration When the 750 needs access to the external bus and it is not parked BG is negated it asserts bus request BR until it is granted mastership of the bus and the bus is available see Figure 8 5 The external arbiter must grant master elect status to the potential master by asserting the bus grant BG signal The 750 requesting the bus determines that the bus is available when the ABB input is negated When the address bus is not busy ABB input is negated BG is asserted and the address retry ARTRY input is negated This is referred to as a qualified bus grant The potential master assumes address bus mastership by asserting ABB when it receives a qualified bus grant Logical Bus Clock need_bus Figure 8 5 Address Bus Arbitration External arbiters must allow only one device at a time to be the address bus master Implementations in which no other device can be a master BG can be grounded always asserted to continually grant mastership of the address bus to the 750 If the 75
30. PowerPC Instruction Set Listings Name addmex mullwx mtsrin 924 dcbtst stbux addx dcbt Ihzx eqvx tlbie 325 eciwx Ihzux xorx mfspr Iwax Ihax tibia 3257 mftb Iwaux Ihaux sthx orcx sradix slbie 125 ecowx sthux orx divdux divwux mtspr debi 2 nandx divdx 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 01 D A 00000 OE 0011101010 Re 01 D A OE 0011101011 Re 01 01 01 01 S 00000 A S 00000 A l 0011110010 0011110110 0011110111 OE 0100001010 Re Eee 01 0100010110 0 01 0100010111 0 01 01 01 01 D 00000 D S gt P gt gt DO WwW Ww OD o a Pa po B 0100011100 Re 0100110010 0100110110 0100110111 SE Le 01 gt gt gt B 0100111100 R O 01 0101010011 01 01 01 01 ojoj ojojo A Co f a f e f o tbr o B b 0101010101 0101110010 0101110011 01 0101110101 01 0101110111 01 01 01 01 0110010111 0110011100 1100111011 0110110010 01 0110110110 01 SE e A 212 CO OO OCH w OH 0110110111 01 01 01 01 DIN NIN OC nN nN Mn UO o o ma rr gt er oo gt Fy S SS n 5 U gt 0110111100 0111001001 0111001011 0111010011 D Q 01 o o o o ete 312 01110
31. Store String Word Indexed As described in Section 4 5 6 Alignment Exception 0x00600 a misaligned string operation suffers a performance penalty compared to an aligned operation of the same type A non word aligned string operation that crosses a 4 Kbyte boundary or a word aligned string operation that crosses a 256 Mbyte boundary always causes an alignment exception A non word aligned string operation that crosses a double word boundary is also slower than a word aligned string operation Implementation Note The following describes the 750 implementation of load store string instructions e For load store string operations the hardware does not combine register values to reduce the number of discrete accesses However if store gathering is enabled and the accesses fall under the criteria for store gathering the stores may be combined to enhance performance At a minimum additional cache access cycles are required e The 750 supports misaligned single register load and store accesses in little endian mode without causing an alignment exception However execution of misaligned load store multiple string operations cause an alignment exception 2 50 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 2 3 4 3 9 Floating Point Load and Store Address Generation Floating point load and store operations generate effective addresses using the register indirect with immediate index addressing mode and register indirect
32. The dispatch rate depends upon the availability of resources such as the execution units rename registers and completion queue entries and upon the serializing behavior of some instructions Instructions are dispatched in program order an instruction in IQ1 cannot be dispatched ahead of one in IQO Chapter 6 Instruction Timing 6 9 Figure 6 4 shows the paths taken by instructions Fetch l Maximum four instructions per clock cycle Instruction Queue In program order Branch Dispatch Processing Unit Maximum 2 instructions per clock cycle 1 instruction per unit e ek SE a Completion Queue Assignment Reservation l r x pola D Sch il Stations l l l l a Loa e L A L e L D SRU Store Queue Completion Queue Complete Retire in program order Figure 6 4 Instruction Flow Diagram 6 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 6 3 2 Instruction Fetch Timing Instruction fetch latency depends on whether the fetch hits the BTIC the on chip instruction cache or the L2 cache if one is implemented If no cache hit occurs a memory transaction is required in which case fetch latency is affected by bus traffic bus
33. entries both the ITLB and DTLB entries indexed by EA 14 19 The architecture allows tlbie to optionally enable a TLB invalidate signaling mechanism in hardware so that other processors also invalidate their resident copies of the matching PTE The 750 does not signal the TLB invalidation to other processors nor does it perform any action when a TLB invalidation is performed by another processor The tlbsync instruction causes instruction execution to stop if the TLBISYNC signal is asserted If TLBIS YNC is negated instruction execution may continue or resume after the completion of a tlbsynce instruction Section 8 8 2 TLBIS YNC Input describes the TLB synchronization mechanism in further detail The tlbia instruction is not implemented on the 750 and when its opcode is encountered an illegal instruction program exception is generated To invalidate all entries of both TLBs 64 tlbie instructions must be executed incrementing the value in EA14 EA19 by one each Chapter 5 Memory Management 5 27 time See Chapter 8 Instruction Set in The Programming Environments Manual for detailed information about the tlbie instruction Software must ensure that instruction fetches or memory references to the virtual pages specified by the tlbie have been completed prior to executing the tlbie instruction Other than the possible TLB miss on the next instruction prefetch the tlbie instruction does not affect the instruction fetch op
34. occurring in parallel are ignored Chapter 5 Memory Management 5 9 Address Translation Dis Effective Address MSRIIR 0 or MSR DR 0 Match with BAT Registers Segment Descriptor Located T 1 T 0 Block Address dle Trans ation See Section 5 3 51 Virtual Address Direct Store Interface Translation Real Addressing Mode Effective Address Physical Address See Section 5 2 Look Up in Page Table DSI ISI Exception 0 31 0 31 Pra Address Figure 5 4 Address Translation Types 0 31 Physical Address When the processor generates an access and the corresponding address translation enable bit in MSR is cleared the resulting physical address is identical to the effective address and all other translation mechanisms are ignored Instruction address translation and data address translation are enabled by setting MSR IR and MSR DR respectively 5 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 5 1 4 Memory Protection Facilities In addition to the translation of effective addresses to physical addresses the MMUs provide access protection of supervisor areas from user access and can designate areas of memory as read only as well as no execute or guarded Table 5 2 shows the protection options supported by the MMUs for pages Table 5 2 Access Protection Options for Pages User Read Supervisor Read z Supervisor l Fetch l Fetch Sup
35. s Manual 6 Perform an L2 global invalidate The global invalidate could be performed before enabling the DLL or in parallel with waiting for the DLL to stabilize Refer to Section 9 1 4 L2 Cache Global Invalidation for more information about L2 cache global invalidation Note that a global invalidate always takes much longer than it takes for the DLL to stabilize 7 After the DLL stabilizes an L2 global invalidate has been performed and the other L2 configuration bits have been set enable the L2 cache for normal operation by setting the L2CR L2E bit to 1 9 1 4 L2 Cache Global Invalidation The L2 cache supports a global invalidation function in which all bits of the L2 tags tag data bits tag status bits and LRU bit are cleared It is performed by an on chip hardware state machine that sequentially cycles through the L2 tags The global invalidation function is controlled through L2CR L2I and it must be performed only while the L2 cache is disabled The 750 can continue operation during a global invalidation provided the L2 cache has been properly disabled before the global invalidation operation starts The sequence for performing a global invalidation of the L2 cache is as follows 1 Execute a sync instruction to finish any pending store operations in the load store unit disable the L2 cache by clearing L2CR L2E and execute an additional syne instruction after disabling the L2 cache to ensure that any pending operati
36. xvii Paragraph Number Figure 6 6 Figure 6 7 Figure 6 8 Figure 6 9 Figure 6 10 Figure 7 1 Figure 8 1 Figure 8 2 Figure 8 3 Figure 8 4 Figure 8 5 Figure 8 6 Figure 8 7 Figure 8 8 Figure 8 9 Figure 8 10 Figure 8 11 Figure 8 12 Figure 8 13 Figure 8 14 Figure 8 15 Figure 8 16 Figure 8 17 Figure 8 18 Figure 8 19 Figure 8 20 Figure 8 21 Figure 8 22 Figure 8 23 Figure 8 24 Figure 8 25 Figure 9 26 Figure 9 27 Figure 9 28 Figure 9 29 Figure 9 30 Figure 9 31 Figure 9 32 Figure 9 33 Figure 9 34 Figure 9 35 Figure 10 1 Figure 11 1 Figure 11 2 xviii Illustrations Page HES Number Instruction Timing Cache Miss cidad dada e eegen RENE 6 15 Branch Folding miii ri 6 19 Removal of Fall Through Branch Instruction ooonononnnncnonoconcnnacnnonncnoncnannnnnnnon 6 19 Branch Completion a oa 6 20 Branch Instruction TIAS 6 23 PowerPG 750 Signal Groups EE 7 3 Bus Interface Address Butters siii lali 8 2 PowerPC 750 Microprocessor Block Diagram oooonnococnnocccooncccnonanononcnonanccinnnac ns 8 5 A A a e E A A E N 8 8 Overlapping Tenures on the 750 Bus for a Single Beat Transfer 0 0 0 8 9 Address BUS ATDiIrauOn EN 8 12 Address Bus Arbitration Showing Bus Parking 8 13 Address B s Re 8 15 Snooped Address Cycle with ARTRY oooooccconococonocononocononcnononcnonnnnnonnnccnonncnnnnnnnos 8 23 Data EE 8 24 Normal Single Beat Read Termination ooooonnncccnnnccconocncnnncnononcnononanonanccnnnnccinnnoss 8 27 Norm
37. 11 2 1 2 User Monitor Mode Control Register 0 UMMCRO oocooocccocccoconcconanccinnnnos 11 5 11 2 1 3 Monitor Mode Control Register 1 MMCR1 ee eeecceceeeeeesteeeesteeeenaees 11 5 11 2 1 4 User Monitor Mode Control Register 1 UMMCR1 sssssessesssssessssesses 11 6 11 2 1 5 Performance Monitor Counter Registers DMCT DM 11 6 11 2 1 6 User Performance Monitor Counter Registers UPMC1 UPMC4 11 10 11 2 1 7 Sampled Instruction Address Register GlA eee eeeeeeeseeeseeeeeeeteeeees 11 10 11 2 1 8 User Sampled Instruction Address Register USIA 11 11 11 3 Eyent EE eene 11 11 11 4 VENUE Te E 11 12 11 5 NOTES ue cae tt AA E SAL E a E ie aes 11 12 Appendix A PowerPC Instruction Set Listings Al Instructions Sorted by Mnemonic A 1 A 2 Instructions Sorted by Opcode ui sunasdesdnassanssvensdtonsectueaeens A 9 A 3 Instructions Grouped by Functional Categories A 17 AA Instructions Sorted e CEET A 29 A 5 Instruction Eege A 41 Appendix B Instructions Not Implemented B 1 Lists Of ee B 1 Glossary of Terms and Abbreviations G 1 OS G 1 xvi IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number Figure 1 1 Figure 1 2 Figure 1 3 Figure 1 4 Figure 1 5 Figure 1 6 Figure 2 1 Figure 2 2 Figure 2 3 Figure 2 4 Figure 2 5 Figure 2 6 Figure 2 7 Figure 2 8 Figure 2 9 Figure 2 10 Figure 2 11 Figure 2 12 Figure 3 1 Figure 3 2 Figure 3 3 Figure 3 4 Figure 3 5 Figure 3 6 Figure 4 1 Figur
38. 2 Signal Descriptions This section describes individual 750 signals grouped according to Figure 7 1 Note that the following sections summarize signal functions Chapter 8 Bus Interface Operation describes many of these signals in greater detail both with respect to how individual signals function and how groups of signals interact 7 2 1 Address Bus Arbitration Signals The address arbitration signals are input and output signals the 750 uses to request the address bus recognize when the request is granted and indicate to other devices when mastership is granted For a detailed description of how these signals interact see Section 8 3 1 Address Bus Arbitration 7 2 1 1 Bus Request BR Output Following are the state meaning and timing comments for the BR output signal State Meaning Asserted Indicates that the 750 is requesting mastership of the address bus Note that BR may be asserted for one or more cycles and then de asserted due to an internal cancellation of the bus request for example due to a load hit in the touch load buffer See Section 8 3 1 Address Bus Arbitration Negated Indicates that the 750 is not requesting the address bus The 750 may have no bus operation pending it may be parked or the ARTRY input was asserted on the previous bus clock cycle Timing Comments Assertion Occurs when the 750 is not parked and a bus transaction is needed This may occur even if the two possible p
39. 20 4 23 THRMn 2 21 10 7 UMMCRO 2 15 UMMCRI 2 16 UPMCn 2 20 USIA 2 20 performance monitor registers 2 14 programming model 2 2 SPR encodings 2 58 supervisor level BAT registers 2 5 DABR 2 7 DAR 2 6 DEC 2 7 DSISR 2 6 EAR 2 7 HIDO 2 9 10 2 HID1 2 13 IABR 2 8 ICTC 2 21 10 11 L2CR 2 24 9 5 MMCRO 2 14 4 23 11 3 MMCRI 2 16 4 23 11 5 MSR 2 4 PMC1 and PMC2 1 26 PMCn 2 16 4 23 PVR 2 5 SDRI1 2 5 SIA 2 20 4 23 11 10 SPRGn 2 6 SPRs for performance monitor 11 1 SRn 2 5 SRRO SRR1 2 6 THRMn 2 21 10 7 Index 8 time base TB 2 6 user level CR 2 3 CTR 2 4 FPRn 2 3 FPSCR 2 3 GPRn 2 3 LR 2 3 time base TB 2 4 2 6 UMMCRO 2 15 UMMCRI 2 16 UPMCn 2 20 USIA 2 20 11 11 XER 2 3 Rename buffer definition 6 2 Rename register operation 6 17 Reservation station definition 6 2 Reserved instruction class 2 34 Reset HRESET signal 7 23 8 43 reset exception 4 13 SRESET signal 7 23 8 43 Retirement definition 6 2 rfi 4 11 Rotate shift instructions 2 40 A 19 RSRV reserve signal 7 24 8 43 S SDRI1 register 2 5 Segment registers SR description 2 5 SR manipulation instructions 2 67 A 28 Segmented memory model see Memory man agement unit Serializing instructions 6 17 Shift rotate instructions 2 40 A 19 SIA sampled instruction address register 2 20 4 23 11 10 Signals AACK 7 14 ABB 7 5 8 10 address arbitration 7 4 8 10 address transfer 8 14 address transfer attribute 8 15 An7 7 APn 7 7 IBM Pow
40. 21 L2 Cache Control Register OR iaa 2 24 Operand Conventions cissie oiii ai n i esiin einas 2 28 Floating Point Execution Models UISA coooocccccoccccconccononcnononcnnnnnccnnnnccnnnncnnns 2 28 Data Organization in Memory and Data Transfers AA 2 28 Alignment and Misaligned Accesges kee 2 29 Floating Point Operands eege 2 29 Instruction Set Summary pocieranie annii as 2 31 Classes f Instru tionS erer 2 32 Definition of Boundedly Undefined A 2 33 Dehned Instruction Class ee etree aie Bae dae ee 2 33 Ilegal Instruction WE 2 33 Reserved Instruction Class iii Silane nin gets 2 34 Addressing Modes vts anio 2 35 Memory Addressing iii ii recedes t ii etna ied 2 35 M mory CHE 2 35 Effective Address Calcula eect Ga hie nee eke 2 35 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number 2 3 2 4 2 3 2 4 1 2 3 2 4 2 2 245 2 3 3 2 3 4 2 3 4 1 2 3 4 1 1 2 3 4 1 2 2 3 4 1 3 2 3 4 1 4 2 3 4 2 2 3 4 2 1 2 3 4 2 2 2 3 4 2 3 2 3 4 2 4 2 3 4 2 5 2 3 4 2 6 2 3 4 3 2 3 4 3 1 2 3 4 3 2 2 3 4 3 3 2 3 4 3 4 2 3 4 3 5 2 3 4 3 6 2 3 4 3 7 2 3 4 3 8 2 3 4 3 9 2 3 4 3 10 2 3 4 4 2 3 4 4 1 2 3 4 4 2 2 3 4 4 3 2 3 4 4 4 2 3 4 5 2 3 4 6 2 3 4 6 1 2 3 4 6 2 2 3 4 7 Did 2 3 5 1 2 3 5 2 2 3 5 3 Contents Contents Page Nie Number RUE E 2 36 Context VNCHOMI ZATION EE 2 36 Execution Synchronization cccivsveisssiessecaseis tevesdsessaceasedcaetaseapestasonsadeesns 2 36 Instruction Related Exc
41. 37 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number Table A 40 Table A 41 Table A 42 Table A 43 Table A 44 Table A 45 Table A 46 Table A 47 Table B 1 Table B 2 Tables Tables Page TINE Number KE A 37 IS A aie le alt rg he La nse ee A 37 Ee EE A 38 NEE ege eege bebe A 39 RER EE eege eet one reer deeg A 39 MIDS Perini A AS A 40 PowerPC Instruction Set Legend ME A 41 PowerPC Instruction Set Legend a icjsieiissscaisscssccsatscuseadisassdentsseceadstaupeaesanteaeens A 47 32 Bit Instructions Not Implemented AA B 1 64 Bit Instructions Not Implemented AA B 1 XXV Tables Paragraph Page Number Ss Number xxvi IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual About This Book The primary objective of this user s manual is to define the functionality of the PowerPC 750 and PowerPC 740 microprocessors for use by software and hardware developers Although the emphasis of this manual is upon the 750 unless otherwise noted all information here applies to 740 This book is intended as a companion to the PowerPC Microprocessor Family The Programming Environments referred to as The Programming Environments Manual Note Soft copies of the latest version of this manual and documents referred to in this manual that are produced by IBM can be accessed on the world wide web as follows http www chips ibm com Note A vertical bar located to the left of
42. 6 Exceptions of The Programming Environments Manual Chapter 4 Exceptions 4 11 4 4 Process Switching The following instructions are useful for restoring proper context during process switching e The sync instruction orders the effects of instruction execution All instructions previously initiated appear to have completed before the sync instruction completes and no subsequent instructions appear to be initiated until the syne instruction completes For an example showing use of sync see Chapter 2 PowerPC Register Set of The Programming Environments Manual e The isync instruction waits for all previous instructions to complete and then discards any fetched instructions causing subsequent instructions to be fetched or refetched from memory and to execute in the context privilege translation and protection established by the previous instructions e The stwex instruction clears any outstanding reservations ensuring that an lwarx instruction in an old process is not paired with an stwex instruction in a new one The operating system should set MSR RI as described in 4 3 3 4 5 Exception Definitions Table 4 6 shows all the types of exceptions that can occur with the 750 and MSR settings when the processor goes into supervisor mode due to an exception Depending on the exception certain of these bits are stored in SRR1 when an exception is taken Table 4 6 MSR Setting Due to Exception Exception Type S
43. 740 PowerPC 750 RISC Microprocessor User s Manual At any given time the L1 instruction cache may have one instruction fetch request and the L1 data cache may have one load and two stores requesting L2 cache access The L2 cache also services snoop requests from the 60x bus When there are multiple pending requests to the L2 cache snoop requests have highest priority followed by data load and store requests serviced on a first in first out basis Instruction fetch requests have the lowest priority in accessing the L2 cache when there are multiple accesses pending If read requests from both the L1 instruction and data caches are pending the L2 cache can perform hit under miss and supplies the available instruction or data while a bus transaction for the previous L2 cache miss is performed The L2 cache does not support miss under miss and the second instruction fetch or data load stalls until the bus operation resulting from the first L2 miss completes All requests to the L2 cache that are marked cacheable even if the respective L1 cache is disabled or locked cause tag lookup and will be serviced if the instructions or data are in the L2 cache Burst and single beat read requests from the L1 caches that hit in the L2 cache are forwarded instructions or data and the L2 LRU bit for that tag is updated Burst writes from the L1 data cache due to a castout or replacement copyback are written only to the L2 cache and the L2 cache sector is
44. 750 3 3 5 PowerPC 750 Initiated Load Store Operations Load and store operations are assumed to be weakly ordered on the 750 The load store unit LSU can perform load operations that occur later in the program ahead of store operations even when the data cache is disabled see Section 3 3 5 2 Sequential Consistency of Memory Accesses However strongly ordered load and store operations can be enforced through the setting of the I bit of the page WIMG bits when address translation is enabled Note that when address translation is disabled real addressing mode the default WIMG bits cause the I bit to be cleared accesses are assumed to be 3 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual cacheable and thus the accesses are weakly ordered Refer to Section 5 2 Real Addressing Mode for a description of the WIMG bits when address translation is disabled The 750 does not provide support for direct store segments Operations attempting to access a direct store segment will invoke a DSI exception For additional information about DSI exceptions refer to Section 4 5 3 DSI Exception 0x00300 3 3 5 1 Performed Loads and Stores The PowerPC architecture defines a performed load operation as one that has the addressed memory location bound to the target register of the load instruction The architecture defines a performed store operation as one where the stored value is the value that any other process
45. 750 RISC Microprocessor User s Manual 1 2 9 Clocking The 750 requires a single system clock input SYSCLK that represents the bus interface frequency Internally the processor uses a phase locked loop PLL circuit to generate a master core clock that is frequency multiplied and phase locked to the SYSCLK input This core frequency is used to operate the internal circuitry The PLL is configured by the PLL_CFG 0 3 signals which select the multiplier that the PLL uses to multiply the SYSCLK frequency up to the internal core frequency The feedback in the PLL guarantees that the processor clock is phase locked to the bus clock regardless of process variations temperature changes or parasitic capacitances The PLL also ensures a 50 duty cycle for the processor clock The 750 supports various processor to bus clock frequency ratios although not all ratios are available for all frequencies Configuration of the processor bus clock ratios is displayed through a 750 specific register HID1 For information about supported clock frequencies see the 750 hardware specifications 1 3 PowerPC 750 Microprocessor Implementation The PowerPC architecture is derived from the POWER architecture Performance Optimized with Enhanced RISC architecture The PowerPC architecture shares the benefits of the POWER architecture optimized for single chip implementations The PowerPC architecture design facilitates parallel instruction execution and is scalab
46. 750 exceptions and conditions that cause them Exceptions specific to the 750 are indicated Table 1 5 Exceptions and Conditions Exception Type Ken est Causing Conditions System reset 00100 Assertion of either HRESET or SRESET or at power on reset Machine check 00200 Assertion of TEA during a data bus transaction assertion of MCP or an address data or L2 bus parity error MSR ME must be set DSI 00300 As specified in the PowerPC architecture For TLB misses on load store or cache operations a DSI exception occurs if a page fault occurs 00400 As defined by the PowerPC architecture External interrupt 00500 MSR EE 1 and INT is asserted Chapter 1 PowerPC 740 PowerPC 750 Overview 1 31 Exception Type Floating point unavailable Decrementer Reserved System call Reserved Reserved Performance monitor Instruction address breakpoint System management interrupt Reserved Thermal management interrupt Reserved Note 1750 specific Table 1 5 Exceptions and Conditions Continued Vector Offset hex 00600 A floating point load store stmw stwcx Imw lwarx eciwx or ecowx instruction operand is not word aligned A multiple string load store operation is attempted in little endian mode The operand of dcbz is in memory that is write through required or caching inhibited or the cache is disabled 00700 0 Causing Conditions 00700 As defined by the PowerPC architecture
47. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3 ptr ridelx riderx ridicx ridiclx ridicrx ridimix rlwimix rlwinmx rlwnmx Note 1 64 bit instruction Table A 7 Integer Shift Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 sidx 31 S A B 27 Rc slwx 31 S A B 24 Rc sradx 31 S A B 794 Rc sradix 31 S A sh 413 sh Rc srawx 31 S A B 792 Re srawix 31 S A SH 824 Re srdx 31 S A B 539 Rc srwx 31 S A B 536 Rc Note 1 64 bit instruction Appendix A PowerPC Instruction Set Listings A 19 Table A 8 Floating Point Arithmetic Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 faddx 63 D 21 Re faddsx 59 D 21 Re fdivx 63 D 18 Re fdivsx 59 D 18 Re fmulx 63 D 25 Re fmulsx 59 D 25 Re fresx 59 D 24 Re frsqrtex 63 D 26 Re fsubx 63 D 20 Re fsubsx 59 D 20 Re fselx 63 D 23 Re fsqrtx 1 63 D 22 Re fsqrtsx 1 59 D 22 Rc Note 1 Optional instruction 2 32 bit instruction not implemented by the PowerPC 750 Table A 9 Floating Point Multiply Add Instructions Name 0 5 6 7 8 9 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 fmsubx 63 D A B C 28 Rc fmsubsx 59 D A B C 28 Re fnmaddx 63 D A B C 31 Rc fnmaddsx 59 D A B C 31 Re A 20 IBM P
48. 9 31 shows a burst read modify write memory access sequence when the L2 cache interface is configured with pipelined burst SRAM SRAaMCk U LIU UU UU UU UU UE L2CE JL L E E ee L2WE l burst rd burst rd SRAMAddress SRAMMemory SRAMData Notes Rary indicates where some burst RAMs may begin driving the data bus Ba indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read Figure 9 31 Burst Read Modify Write L2 Cache Access Pipelined Figure 9 32 shows a burst read write write memory access sequence when the L2 cache interface is configured with pipelined burst SRAM SRAMCIk UUU UU U UUU L2CE i 1 1 A E L2WE aborted rd SRAMAddress RO SRAMMemory SRAMData Notes Rary indicates where some burst RAMs may begin driving the data bus Par indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read Figure 9 32 Burst Read Write Write L2 Cache Access Pipelined 9 1 7 3 Late Write SRAM Late write SRAMs offer improved performance when compared to pipelined burst SRAMs by not requiring an extra read cycle during read operations and requiring one cycle less when transitioning from a read to write operation Late write SRAMs implement an internal write queue allowing write da
49. Block Diagram The TAU provides thermal control by periodically comparing the 750 s junction temperature against user programmed thresholds and generating a thermal management interrupt if the threshold values are crossed The TAU also enables the user to determine the junction temperature through a software successive approximation routine 10 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The TAU is controlled through three supervisor level SPRs accessed through the mtspr mfspr instructions Two of the SPRs THRM1 and THRM2 provide temperature threshold values that can be compared to the junction temperature value and control bits that enable comparison and thermal interrupt generation The third SPR THRM3 provides a TAU enable bit and a sample interval timer Note that all the bits in THRM1 THRM2 and THRM3 are cleared to O during a hard reset and the TAU remains idle and in a low power state until configured and enabled The bit fields in the THRM1 and THRM2 SPRs are described in Table 10 2 Table 10 2 THRM1 and THRM2 Bit Field Settings CI Thermal management interrupt bit Read only This bit is set if the thermal sensor output crosses the threshold specified in the SPR The state of this bit is valid only if TIV is set The interpretation of the TIN bit is controlled by the TID bit ee EE management interrupt valid Read only This bit is set by the thermal assist logic to indicate that the thermal mana
50. CQ O must not cause an exception 6 30 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual e Requirements for completing an instruction from CQ 1 are as follows Instruction in CQ O must complete in same cycle Instruction in CQ 1 must be finished Instruction in CQ 1 must not follow an unresolved predicted branch Instruction in CQ 1 must not cause an exception Instruction in CQ 1 must be an integer or load instruction Number of CR updates from both CQ 0 and CQ 1 must not exceed two Number of GPR updates from both CQ 0 and CQ 1 must not exceed two Number of FPR updates from both CQ 0 and CQ 1 must not exceed two 6 7 Instruction Latency Summary Table 6 3 through Table 6 8 list latencies associated with instructions executed by each execution unit Table 6 3 describes branch instruction latencies Table 6 3 Branch Instructions CN WEE Unless these instructions update either the CTR or the LR branch Gen operations are folded if they are either taken or predicted as taken They fall 16 through if they are not taken or predicted as not taken Table 6 4 lists system register instruction latencies Table 6 4 System Register Instructions EA A CAER Ee ESE isync mfspr DBATs mfspr IBATs mfspr not I DBATs mtspr DBATs mtspr IBATs Execution Execution Execution Execution ae Chapter 6 Instruction Timing 6 31 Table 6 4 System Re
51. Data Bus Parity DP O 7 Output coooocccnoncccnnoccconanccnoncnonancncnnncconnncninnss 7 18 Data bus East ee td 7 18 Data SERRURIER Input sico iii ninas 7 19 Data Transfer errereen anne vole va 7 19 Transfer Acknowledge TA Input cccccccccccscscssssssssesesessesesescseseeeesesees 7 19 Data Retry OR RK Eeer Ee 7 20 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number 7 2 8 3 7 2 9 7 2 9 1 7 2 9 2 7 2 9 3 7 2 9 4 7 2 9 5 7 2 9 6 7 2 9 6 1 7 2 9 6 2 7 2 9 7 7 2 9 7 1 T2942 EE 129 74 7 2 9 7 5 7 2 9 7 6 7 2 9 8 7 2 9 9 7 2 9 9 1 7 2 9 9 2 7 2 9 10 7 2 9 10 1 7 2 9 10 2 7 2 9 11 7 2 9 12 7 2 9 13 7 2 9 14 7 2 9 15 7 2 9 16 7 2 9 17 7 2 10 7 2 11 7 2 11 1 7 2 11 2 7 2 11 3 RES Ee 8 1 Contents Contents Page mts Number Transfer Error Acknowledge TEA Input cocooococococococinnnncnnncnninininicininnoos 7 20 System tatus ST OT AIS id il ege 7 21 Interrupt NT nput E 7 21 System Management Interrupt GGMlJnpmt 7 21 Machine Check Interrupt MCP Input coocococccncccnoccconncnonaconnnonnncnnnocnocnnos 7 21 Cheekstop Input CKS TP IN Inputs aid 7 22 Checkstop Output CKSTP_OUT Output coccocccnoccconncnonoconcninnncnnnacanacnnos 7 22 LEE 7 22 Hard Reset HRESET Input E 7 23 Soft Reset REI eech 7 23 Proc ssor SAUS OS A 7 23 Quiescent Request ORPO Out 7 23 Quiescent Acknowledge QACK Input cooconocccccccnonnconncconocanncn
52. For information how cache control instructions affect the L2 see Chapter 9 L2 Cache Interface Operation Table 2 52 summarizes the cache instructions defined by the VEA Note that these instructions are accessible to user level programs Table 2 52 User Level Cache Instructions Data Cache Block rA rB The VEA defines this instruction to allow for potential system performance Touch enhancements through the use of software initiated prefetch hints Implementations are not required to take any action based on execution of this instruction but they may prefetch the cache block corresponding to the EA into their cache When debt executes the 750 checks for protection violations as for a load instruction This instruction is treated as a no op for the following cases A valid translation is not found either in BAT or TLB The access causes a protection violation The page is mapped cache inhibited G 1 guarded or T 1 The cache is locked or disabled HIDO NOOPTI 1 Otherwise if no data is in the cache location the 750 requests a cache line fill with intent to modify Data brought into the cache is validated as if it were a load instruction The memory reference of a debt sets the reference bit Data Cache Block rA rB This instruction behaves like debt Touch for Store Data Cache Block rA rB The EA is computed translated and checked for protection violations For Set to Zero cache hits four beats of zeros are wr
53. IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 6 4 1 3 Branch Prediction and Resolution The 750 supports the following two types of branch prediction e Static branch prediction This is defined by the PowerPC architecture as part of the encoding of branch instructions e Dynamic branch prediction This is a processor specific mechanism implemented in hardware in particular the branch history table or BHT that monitors branch instruction behavior and maintains a record from which the next occurrence of the branch instruction is predicted When a conditional branch cannot be resolved due to a CR data dependency the BPU predicts whether it will be taken and instruction fetching proceeds down the predicted path If the branch prediction resolves as incorrect the instruction queue and all subsequently executed instructions are purged instructions executed prior to the predicted branch are allowed to complete and instruction fetching resumes down the correct path The 750 executes through two levels of prediction Instructions from the first unresolved branch can execute but they cannot complete until the branch is resolved If a second branch instruction is encountered in the predicted instruction stream it can be predicted and instructions can be fetched but not executed from the second branch No action can be taken for a third branch instruction until at least one of the two previous branch instructions is resolved
54. Imput oocooconnnococnoncccnoncnononcnonanccnonnccnannnos 7 8 Address Transfer Attribute Signals xz 3 cc 2 yescsesesteasceacctdsccedsasscotaaccaesectesceseesas 7 8 Transter Lp TC Ti gene eseaa screen aetna E 7 8 Transfer Type CUT LR 7 8 Transfer Type CTTIO Al Input 7 8 Transfer Size CTS EI Output os 30 acesdedghics iii Zeiten Kee 7 11 Transfer Burst TBS Diada aida 7 12 Transfer Burst BS OMPI ca 7 12 Transfer Burst CUBS APURO eee 7 12 Cache Inhibit El Output Ee e 7 12 Write Through OW Oummt 7 13 Global GBL EE 7 13 Global GBL Output EE 7 13 Global GBI lap EE 7 13 Address Transfer Termination Signals ccescccsesccecsseceeeseceeseeeeesneeeeneeees 7 13 Address Acknowledge A ACK Input 7 14 Address Retry ARTR Vinicio lides 7 14 Address Retry ARTRY Oumut 7 14 Address Retry ARTRY Input oooocnccnnoccnococonononnnonnonnnaconocancconncc nncnnn 7 15 Data Bus Arbitration son EE 7 15 Data Bus Grant DBG Input EE 7 15 Data Bus Write Only DBWO Input oooonconccnnoccnoconnconnnonnnancnnncnnccnncnnncnnos 7 16 Data Bus Busy DBB ca ia 7 16 Data Bus Busy DBB QutpUt ccscccsccssssssssssssssessssssesessssseseeseees 7 16 Data Bus Busy OBB Input iia 7 16 Data RE 7 17 Data B s DH 0 31 DL 0 3 TDi eN 7 17 Data Bus DH 0 31 DL 0 31 OUtpUt cooconononocinoninccconcnnnconcconcnnnonnos 7 17 Data Bus DH 0 31 DUIO 21l Jnput 7 18 Data Bis Parity KT L E EE EE 7 18
55. Instruction Set Legend Continued pa O A AO AA E LR Pp AE DE AA EE pa AA E HE EE se E E E E A S AA E SN pe 13 PA AE ee Cedo e A US A E PE AAA ERA a E DE PANES gt AE E Pa JE NA TER e E E OE EC NE PMI EE EE CUM SAA APNEA ee A E A E EN a AAA A es E Eo es ERE EEE E CEA A gt E EE E AAA pee Aa We AA O A RNE SAS pe O AAA ARA A AE EA A EA EEES pee eg ER DEET A AA A A EA E RN gt E AAA ECO PE AO Tee Kee ES AAA e TA Eor E ALA Pt E a O E ESA E AE E AAA gt LES ETA EST E A a A A AS Appendix A PowerPC Instruction Set Listings A 45 Table A 46 PowerPC Instruction Set Legend Continued El EENG HE E a AA A E E DEE ET DEET EEN Tt We ME EEES a AH AA AAA EL A A ME EA E AN gt AE EE EEN e AE HE AR e 1 Y PP AER A E EAS ESSE EA CAE ERA E A SE EEN EEN eo JE SE EE A 46 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table A 46 PowerPC Instruction Set Legend Continued 1 Supervisor and user level instruction 2 Load store string or multiple instruction 3 Optional instruction provided to support temporary 64 bit bridge 4 Defined for the 32 bit architecture and by the temporary 64 bit bridge Table A 47 PowerPC Instruction Set Legend Supervisor 64 Bit 64 Bit Level Only Bridge Optional Form XO x lt O O andcx ojo FIr a WE HR E KE EAS ER EE EA Ele E E A EE EAS EE AE AA EE EA ESA E EE EE EJE EC EA AAA TES Appendix A Power
56. Iwzx 011111 D A B 0000010111 0 sws 011111 S A B 0000011000 Re entizwx 011111 0000011010 andx 011111 S A B 0000011100 Re cmpl 011111 ef O L A B 0000100000 0 subfx 011111 D A B 0000101000 Re Idux 011111 D A B 0000110101 0 A 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 dcbst 011111 00000 A B 0000110110 0 Iwzux 011111 D A B 0000110111 0 entizdx 011111 S A 00000 0000111010 Re andex 011111 pos a B 0000111100 a omm vo a e f ovorovoros fol mulhdx 011111 D B 0 0001001001 Re mulhwx 011111 D B 0001001011 Re mtsrd 24 011111 S 00000 0001010010 0 mifmsr 9 011111 D 00000 0001010011 0 wen ose e a e Los fo wn or ooo a a Loo fol Ibzx 011111 D A B 0001010111 0 negx 011111 D A 00000 0001101000 Re mtsrdin 2 4 011111 S 00000 B 0001110010 0 Ibzux 011111 D A B 0001110111 0 addex 011111 D A 0010001010 Re mert 011111 S 0010010000 0 mtmsr 2 4 011111 S 0010010010 0 stdx 011111 S 0010010101 0 wl ome os a e ooweoror fo mtmsrd 1 011111 S 00000 00000 0010110010 0 stdux 011111 S A B 0010110101 0 stwux 011111 S A B 0010110111 0 subfzex 011111 D A 00000 0011001000 Re stdcx 011111 S A B 0011010110 1 stbx 011111 S A B 0011010111 0 subfmex 011111 D A 00000 0011101000 Re mulld 011111 D A B 0011101001 Re Appendix A
57. Little endian A byte ordering method in memory where the address n of a word corresponds to the least significant byte In an addressed memory word the bytes are ordered left to right 3 2 1 0 with 3 being the most significant byte See Big endian MESI modified exclusive shared invalid Cache coherency protocol used to manage caches on different devices that share a memory system Note that the PowerPC architecture does not specify the implementation of a MESI protocol to ensure cache coherency Memory access ordering The specific order in which the processor performs load and store memory accesses and the order in which those accesses complete Memory mapped accesses Accesses whose addresses use the page or block address translation mechanisms provided by the MMU and that occur externally with the bus protocol defined for memory Memory coherency An aspect of caching in which it is ensured that an accurate view of memory is provided to all devices that share system memory Memory consistency Refers to agreement of levels of memory with respect to a single processor and system memory for example on chip cache secondary cache and system memory Memory management unit MMU The functional unit that is capable of translating an effective logical address to a physical address providing protection mechanisms and defining caching methods Modified state MEI state M in which one and only one caching device has the v
58. Logical Instructions el a Condition Register AND erand crbD crbA crbB Condition Register OR ee crbD crbA crbB Condition Register XOR eng crbD crbA crbB 2 54 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 42 Condition Register Logical Instructions Continued e tone Condition Register NAND fernand crbD crbA crbB Condition Register NOR ernor crbD crbA crbB EE CT CAN EE E EXE CO CN E CN CS Note that if the LR update option is enabled for any of these instructions the PowerPC architecture defines these forms of the instructions as invalid 2 3 4 4 4 Trap Instructions The trap instructions shown in Table 2 43 are provided to test for a specified set of conditions If any of the conditions tested by a trap instruction are met the system trap type program exception is taken For more information see Section 4 5 7 Program Exception 0x00700 If the tested conditions are not met instruction execution continues normally Table 2 43 Trap Instructions e Des am Trap Word Immediate Iw TO rA SIMM Trap Word E TO rA rB See Appendix F Simplified Mnemonics in The Programming Environments Manual for a complete set of simplified mnemonics 2 3 4 5 System Linkage Instruction UISA The System Call sc instruction permits a program to call on the system to perform a service see Table 2 44 See also Section 2 3 6 1 System Linkage Instructions OEA for
59. MODEL UISA Memory Management Registers Count General Purpose Instruction BAT Data BAT Segment Register Registers Registers Registers Registers CTR GPRO IBATOU SPR 528 DBATOU SPR536 SRO XER GPR1 IBATOL SPR 529 DBATOL SPR 537 SR1 IBAT1U SPR 530 DBATIU SPR 538 IBATIL SPR 531 DBATIL SPR 539 GPR31 IBAT2U SPR 532 DBAT2U SPR 540 LR SPR 8 A para SPR533 DBAT2L SPR 541 Floating Point Performance Registers IBAT3U SPR 534 DBAT3U SPR 542 Monitor Registers FPRO IBAT3L SPR 535 DBAT3L SPR 543 For Reading Performance Counters XER Link Register FPRI Exception Handling Registers Data Address Save and Restore UPMC1 SPR 937 e Register Registers SPR 272 UPMC2 SPR 938 FPR31 SPR 273 DAR SRRO SPR 26 UPMC3 SPR 941 Condition SPR 274 DSISR A Ee Register SPR 275 DSISR SPR18 Sampled Instruction CR Address Miscellaneous Registers USIA SPR 939 Floating Point External Access Time Base Decrementer Status and Register For Writing Monitor Control Control Register DEC EAR SPR 282 TBL SPR 284 UMMCRO SPR 936 FPSCR TBU SPR 285 UMMCR1 SPR 940 Data Address L2 Control Instruction Addr
60. PowerPC 750 RISC Microprocessor User s Manual 3 3 2 MEI Protocol The 750 data cache coherency protocol is a coherent subset of the standard MESI four state cache protocol that omits the shared state The 750 s data cache characterizes each 32 byte block it contains as being in one of three MEI states Addresses presented to the cache are indexed into the cache directory with bits A 20 26 and the upper order 20 bits from the physical address translation PA O 19 are compared against the indexed cache directory tags If neither of the indexed tags matches the result is a cache miss If a tag matches a cache hit occurred and the directory indicates the state of the cache block through two state bits kept with the tag The three possible states for a cache block in the cache are the modified state M the exclusive state E and the invalid state 1 The three MEI states are defined in Table 3 1 Table 3 1 MEI State Definitions Pe ee Modified M The addressed cache block is present in the cache and is modified with respect to system memory that is the modified data in the cache block has not been written back to memory The cache block may be present in the 750 s L2 cache but it is not present in any other coherent cache Exclusive E The addressed cache block is present in the cache and this cache has exclusive ownership of the addressed block The addressed block may be present in the 750 s L2 cache but it is not presen
61. Reourementz 6 30 Dispatch Unit Resource Requirements ococooccccnoncccnoncnononcncnoncnononccononaninnnos 6 30 Completion Unit Resource Regurements 6 30 Instruction Latency Summary iii ais 6 31 Chapter 7 Signal Descriptions Signal CONTISUTAION een nennst orn iai E E AEE aae 7 3 siena Descriptions eege 7 4 Address Bus Arbitration Signals ccccesscccssccecssccecssccecssccecsscceesseeeeesceeees 7 4 Bus Request BR Oumut 7 4 Bus Grant BG EE 7 4 Address Bus Busy ABB NEE 7 5 Address Bus Busy ABB QuUtpUt oococccccncncononononononononononononononononanananananon 7 5 Address Bus Busy LADB I Jop 7 5 Address Transter Start a AREA 7 6 Transfer Start E EE 7 6 Transfer Statt ATTEN 7 6 xi Paragraph Number Deane 7 2 3 F231 7 2 3 1 1 7 2 3 1 2 E232 123 21 TZ 7 2 4 7 2 4 1 7 2 4 1 1 7 2 4 1 2 7 2 4 2 7 2 4 3 7 2 4 3 1 1243 2 7 2 4 4 7 2 4 5 7 2 4 6 7 2 4 6 1 7 2 4 6 2 12 3 7 2 5 1 7 2 5 2 SAN K ee 7 2 6 7 2 6 1 7 2 6 2 7 2 6 3 7 2 6 3 1 7 2 6 3 2 7 2 1 Tesk T2714 2 AZ Toe Fidel 2A TAIT 22 7 2 1 3 7 2 8 7 2 8 1 7 2 8 2 xii Contents Page HES Number Transfer Start E E EE 7 6 A 7 6 Address Bus AIO 211 i a n ied 7 7 Address Bus A 0 31 Output EE 7 7 Address Bus CAOS EE 7 7 Address Bus Parity CSR Meet gie Eeer deg tele 7 7 Address Bus Parity AP 0 3 Output cocoooocnnocccnnoccconancconnccnnonanonnnccinnnoss 7 7 Address Bus Parity AP 0 3
62. Reservation instruction to write through DSI exception segment or block DSISR 5 1 Iwarx stwcx eciwx or ecowx Reservation instruction or external control DSI exception instruction to direct store segment instruction when SR T 1 DSISR 5 1 Floating point load or store to FP memory access when SR T 1 See data access to direct store segment direct store segment in Table 5 3 Load or store that results ina Does not occur in 750 Does not apply direct store error eciwx or ecowx attempted when eciwx or ecowx attempted with EAR E 0 DSI exception external control facility disabled DSISR 11 1 Imw stmw Iswi Iswx stswi or Imw stmw Iswi Iswx stswi or stswx Alignment exception stswx instruction attempted in instruction attempted while MSR LE 1 little endian mode Operand misalignment Translation enabled and a floating point Alignment exception some load store stmw stwcx Imw Iwarx eciwx of these cases are or ecowx instruction operand is not implementation specific word aligned 5 1 8 MMU Instructions and Register Summary The MMU instructions and registers allow the operating system to set up the block address translation areas and the page tables in memory Note that because the implementation of TLBs is optional the instructions that refer to these structures are also optional However as these structures serve as caches of the page table the architecture specifies a software protocol for maintaining coh
63. TA E EA Pe E O AE E TEA Re E DESEE E E A AAA AA gt gt Si S X P P zi gt X X X P P X X X X X X X A 42 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table A 46 PowerPC Instruction Set Legend Continued AA e a e pese gt A 277 AAA EOI ee A A EE Pese TEE A EA A AA E A ca 1 gt E EA KE e ee SE E DE A ERA e O P ES O AA a AE ERA e SEAS TEARS Pe E ETE E EN ESQUI A EE EU SA PEA EE gt A E EY CE AT E NEO A E E SAN pe UE SES ESE AE EA AE E TE ERA e ot E EE A AA CB A O NA NEO Sa QUA E VER EOS NOE SEEN ee EL LAA AE E EEN A AE a AE A e Y SpA re PA AE E EAS ESSE AA AA eo A EEN ENEE tE E ATA E A E ESA ASE E AA fe E ET Er PS AAA AA x ry er S el eS ey lt x P ey gt S S X Appendix A PowerPC Instruction Set Listings A 43 Table A 46 PowerPC Instruction Set Legend Continued KS EENG A TERA KT WE E DE AA EEN EH ET ei E E EE DEES EEN NS AAA E E ee 1 A E E AE HL Co Fals al o ee ES Po A AE A a ES ES A A Pe AE EA ea SE SNA a AAA E E EE OE OA e AE E A gt E A E E pe AS AE AE A RAY e USE ESA RI gt TES E EA e o E DEE E ARA Ai a al A a ie EN e AH ae A NEL e ARE a AMES E EEN A eg E DEET A AA gt AE ESSE E A Pe E ESA A a A EA EAS ESSE pe AS EA e E o E HL A Pi E E TEE a E AAA EE a Y LE SE E ARA AAA AA A 44 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table A 46 PowerPC
64. TRST signal assures that the JTAG logic does not interfere with the normal operation of the chip and must be asserted and deasserted coincident with the assertion of the HRESET signal 7 2 11 Clock Signals The 750 clock signal inputs determine the system clock frequency and provide a flexible clocking scheme that allows the processor to operate at an integer multiple of the system clock frequency Refer to the 750 hardware specifications for exact timing relationships of the clock signals 7 2 11 1 System Clock SYSCLK Input The 750 requires a single system clock SYSCLK input This input sets the frequency of operation for the bus interface Internally the 750 uses a phase locked loop PLL circuit to generate a master clock for all of the CPU circuitry including the bus interface circuitry which is phase locked to the SYSCLK input The master clock may be set to an integer or half integer multiple 2 1 2 5 1 3 1 3 5 1 4 1 4 5 1 5 1 5 5 1 6 1 6 5 1 or 7 1 of the SYSCLK frequency allowing the CPU core to operate at an equal or greater frequency than the bus interface State Meaning Asserted Negated The SYSCLK input is the primary clock input for the 750 and represents the bus clock frequency for 730 bus operation Internally the 750 may be operating at an integer or half integer multiple of the bus clock frequency Timing Comments Duty cycle Refer to the 750 hardware specifications for timing comments Note SY
65. Table 3 7 MEI State Transitions Continued Current Buia Cache Cache Actions Oberation State p Data cache Same Cast out of modified Write with kill Cache Bus Operation Operation sync block touch block as required Pass four beat read to Read memory queue Data cache x0x E M Same No action block touch Single beat Reload XXX Same Forward data_in read dump 1 Four beat read Reload XXX E Write data_in to cache double word al dump igned Four beat write Reload XXX M Write data_in to cache double word al dump igned EI Snoop No XXX E State change only write or kill committed M gt 1 Snoop XXX M State change only kill committed Push Snoop XXX M Conditionally push Write with kill M gt 1 flush Push Snoop XXX M E Conditionally push Write with kill M gt E clean TLB xx x x CRTRY TLBI Il invalidate ECC E Synchroni XXX D D CRTRY sync Il zation sm EN EE Note that single beat writes are not snooped in the write queue Chapter 3 Instruction and Data Cache Operation 3 33 3 34 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 4 Exceptions The OEA portion of the PowerPC architecture defines the mechanism by which PowerPC processors implement exceptions referred to as interrupts in the architecture specification Exception conditions may be defined at other levels of the architecture For example the UISA defines conditions that may cause floating point exceptions th
66. The OEA defines the registers an operating system uses for memory management configuration exception handling and other operating system functions The OEA defines the following supervisor level registers for 32 bit implementations Configuration registers Machine state register MSR The MSR defines the state of the processor The MSR can be modified by the Move to Machine State Register mtmsr System Call sc and Return from Exception rfi instructions It can be read by the Move from Machine State Register mfmsr instruction When an exception is taken the contents of the MSR are saved to the machine status save restore register 1 SRR1 which is described below See Machine State Register MSR in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information Implementation Note Table 2 1 describes MSR bits the 750 implements that are not required by the PowerPC architecture Table 2 1 Additional MSR Bits CIC 2 4 Power management enable Optional to the PowerPC architecture 0 Power management is disabled Power management is enabled The processor can enter a power saving mode when additional conditions are present The mode chosen is determined by the DOZE NAP and SLEEP bits in the hardware implementation dependent register 0 HIDO described in Table 2 4 Performance monitor marked mode This bit is specific to the 750 and is defined as reserved by the Power
67. The interpretation of TIN is controlled by TID See Table 2 16 IV Thermal management interrupt valid Read only This bit is set by the thermal assist logic to indicate that the thermal management interrupt TIN state is valid See Table 2 16 Threshold Threshold that the thermal sensor output is compared to The range is 0 127 Cand each bit represents 1 C Note that this is not the resolution of the thermal sensor Reserved System software should clear these bits when writing to the THRMn SPRs an interrupt is indicated if the junction temperature is below the threshold See Table 2 16 Thermal management interrupt enable The thermal management interrupt is maskable by the MSR EE bit If TIE is cleared and THRMn is valid the TIN bit records the status of the junction temperature vs threshold comparison without causing an exception This lets system software successively approximate the junction temperature See Table 2 16 Thermal management interrupt direction bit Selects the result of the temperature comparison to set TIN and to assert a thermal management interrupt if TIE is set If TID is cleared TIN is set and an interrupt occurs if the junction temperature exceeds the threshold If TID is set TIN is set and SPR valid bit Setting this bit indicates the SPR contains a valid threshold TID and TIE controls bits THRM1 2 V 1 and THRM3 E 1 enables the thermal sensor operation See Table 2 16 If an mtspr affects a TH
68. a eee 5 12 General Flow of MMU Address Translaton cc ceecceeeseceeeeeeeeeteeeeseeeenes 5 12 Real Addressing Mode and Block Address Translation Selection 5 12 Page Address Translation legt arias 5 14 MMU Exc ptions Summary aeeie ar T ae 5 16 MMU Instructions and Register Summary 5 18 Real Addressing Mode suscrita sedate rado denia seccatid dicas 5 20 Block Address Translation rodas atan EEN 5 21 Memory Sep ment Modele OS In aid 5 21 Pare History Recordin tidad 5 21 Referenced EE 5 22 Changed Bit anni bene 5 23 Scenarios for Referenced and Changed Bit Recording oococnoccconocccinnncn n 5 23 Page Memory Protection gend SSEAdE nde ENEE EAR Ee NEEN Edge deg 5 25 TEB Description O 5 25 AUD IS OO Th AVN E 5 25 TEBA AL ET E 5 27 Page Address Translation Summary ke 5 28 Page Table Search Operation seed NEEN Deeg 5 30 Page Table BE 5 34 pegment Register IPS a lia 5 34 Chapter 6 Instruction Timing Terminology EL 6 1 Instruction Ti Overview oeni ce Gata diac a 6 3 Timing Considerations EE 6 7 General Instruction FLOW iniciadas 6 8 Instruction erch Vine scesi esn a e io 6 11 Cache EE E 6 11 Cache E EE 6 11 Cache MISS eee an eRe le A ie ah oN ee ees 6 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number 6 3 2 4 6 3 3 6 3 3 1 6 3 3 2 6 4 6 4 1 6 4 1 1 6 4 1 2 6 4 1 3 6 4 1 3 1 6 4 1 3 2 6 4 2 6 4 3 6 4 4 6 4 5 6 4 6 6 4 7 6 4 8 6 5 6 5 1 6 5 2 6 6 6 6 1 6 6 1 1
69. a paragraph such as this one indicates that a change has been made to the paragraph since the 8 97 release of this document About the Companion Programming Environments Manual The PowerPC 740 PowerPC750 RISC Microprocessor User s Manual which describes 750 features not defined by the architecture is to be used with the PowerPC Microprocessor Family The Programming Environments Rev 1 referred to as The Programming Environments Manual Because the PowerPC architecture is designed to be flexible to support a broad range of processors The Programming Environments Manual provides a general description of features that are common to PowerPC processors and indicates those features that are optional or that may be implemented differently in the design of each processor Contact your sales representative for a copy of The Programming Environments Manual This document and The Programming Environments Manual distinguish between the three levels or programming environments of the PowerPC architecture which are as follows e PowerPC user instruction set architecture UISA The UISA defines the level of the architecture to which user level software should conform The UISA defines the base user level instruction set user level registers data types memory conventions and the memory and programming models seen by application programmers About This Book xxvii e PowerPC virtual environment architecture VEA The VEA which is the
70. a read operation TEA should be asserted for one cycle only Negation TEA must be negated no later than the negation of DBB IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 2 9 System Status Signals Most system status signals are input signals that indicate when exceptions are received when checkstop conditions have occurred and when the 750 must be reset The 750 generates the output signal CKSTP_OUT when it detects a checkstop condition For a detailed description of these signals see Section 8 7 Interrupt Checkstop and Reset Signals 7 2 9 1 Interrupt INT Input Following are the state meaning and timing comments for the INT signal State Meaning Asserted The 750 initiates an interrupt if MSR EE is set otherwise the 750 ignores the interrupt To guarantee that the 750 will take the external interrupt INT must be held active until the 750 takes the interrupt otherwise whether the 750 takes an external interrupt depends on whether the MSR EE bit was set while the INT signal was held active Negated Indicates that normal operation should proceed See Section 8 7 1 External Interrupts Timing Comments Assertion May occur at any time and may be asserted asynchronously to the input clocks The INT input is level sensitive Negation Should not occur until interrupt is taken 7 2 9 2 System Management Interrupt SMl Input Following are the state meaning and timing co
71. a rollover or terminal count of the DLL to checkstop the processor independent of MSR ME bit 24 30 L2CTR L2 DLL counter value read only for chip revisions 3 0 and later These bits indicate the current value of the DLL counter 0 to 127 They are asynchronously read when the L2CR is read and as such should be read at least twice with the same value in case the value is asynchronously caught in transition These bits are intended to provide observability of where in the 128 bit delay chain the DLL is at any given time Generally the DLL operation should be considered at risk if it is found to be within a couple of taps of its beginning or end point tap O or tap 128 L21 bit to determine when it has completed 31 L2IP L2 global invalidate in progress read only This read only bit indicates whether an L2 global invalidate is occurring It should be monitored after an L2 global invalidate has been initiated by the The L2CR register can be accessed with the mtspr and mfspr instructions using SPR 1017 Chapter 2 Programming Model 2 27 2 2 Operand Conventions This section describes the operand conventions as they are represented in two levels of the PowerPC architecture UISA and VEA Detailed descriptions are provided of conventions used for storing values in registers and memory accessing PowerPC registers and representation of data in these registers 2 2 1 Floating Point Execution Models UISA The IEEE 754 standa
72. and Data Cache Operation for more information Table 2 50 shows the mftb instruction Table 2 50 Move from Time Base Instruction DCI IEC O Move from Time Base rD TBR 2 60 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Simplified mnemonics are provided for the mftb instruction so it can be coded with the TBR name as part of the mnemonic rather than requiring it to be coded as an operand See Appendix F Simplified Mnemonics in The Programming Environments Manual for simplified mnemonic examples and for simplified mnemonics for Move from Time Base mftb and Move from Time Base Upper mftbu which are variants of the mftb instruction rather than of mfspr The mftb instruction serves as both a basic and simplified mnemonic Assemblers recognize an mftb mnemonic with two operands as the basic form and an mftb mnemonic with one operand as the simplified form Note that the 750 ignores the extended opcode differences between mftb and mfspr by ignoring bit 25 and treating both instructions identically Implementation Notes The following information is useful with respect to using the time base implementation in the 750 e The 750 allows user mode read access to the time base counter through the use of the Move from Time Base mftb and the Move from Time Base Upper mftbu instructions As a 32 bit PowerPC implementation the 750 can access TBU and TBL only separately whereas 64 bit implementations can access
73. and changed C bits in the page address translation mechanism that can be used as history information relevant to the page The operating system can use these bits to determine which areas of memory to write back to disk when new pages must be allocated in main memory While these bits are initially programmed by the operating system into the page table the architecture specifies that they can be maintained either by the processor hardware automatically or by some software assist mechanism Implementation Note When loading the TLB the 750 checks the state of the changed and referenced bits for the matched PTE If the referenced bit is not set and the table search operation is initially caused by a load operation or by an instruction fetch the 750 automatically sets the referenced bit in the translation table Similarly if the table search operation is caused by a store operation and either the referenced bit or the changed bit is not set the hardware automatically sets both bits in the translation table In addition when the address translation of a store operation hits in the DTLB the 750 checks the state of the changed bit If the bit is not already set the hardware automatically updates the DTLB and the translation table in memory to set the changed bit For more information see Section 5 4 1 Page History Recording 5 1 6 General Flow of MMU Address Translation The following sections describe the general flow used by PowerPC pr
74. and hardware developers and applications programmers who want to develop products for the 750 It is assumed that the reader understands operating systems microprocessor system design basic principles of RISC processing and details of the PowerPC architecture Organization Following is a summary and a brief description of the major sections of this manual e Chapter 1 PowerPC 740 PowerPC 750 Overview is useful for readers who want a general understanding of the features and functions of the PowerPC architecture and the 750 This chapter describes the flexible nature of the PowerPC architecture definition and provides an overview of how the PowerPC architecture defines the register set operand conventions addressing modes instruction set cache model exception model and memory management model e Chapter 2 Programming Model is useful for software engineers who need to understand the 750 specific registers operand conventions and details regarding how PowerPC instructions are implemented on the 750 Instructions are organized by function e Chapter 3 Instruction and Data Cache Operation discusses the cache and memory model as implemented on the 750 e Chapter 4 Exceptions describes the exception model defined in the PowerPC OEA and the specific exception model implemented on the 750 e Chapter 5 Memory Management describes the 750 s implementation of the memory management unit specifications
75. attribute pertains to out of order execution When a page is designated as guarded instructions and data cannot be accessed out of order H Harvard architecture An architectural model featuring separate caches for instruction and data Hashing An algorithm used in the page table search process I IEEE 754 A standard written by the Institute of Electrical and Electronics Engineers that defines operations and representations of binary floating point numbers Illegal instructions A class of instructions that are not implemented for a particular PowerPC processor These include instructions not defined by the PowerPC architecture In addition for 32 bit implementations instructions that are defined only for 64 bit implementations are considered to be illegal instructions For 64 bit implementations instructions that are defined only for 32 bit implementations are considered to be illegal instructions Glossary of Terms and Abbreviations Glossary 5 Glossary 6 Implementation A particular processor that conforms to the PowerPC architecture but may differ from other architecture compliant implementations for example in design feature set and implementation of optional features The PowerPC architecture has many different implementations Imprecise exception A type of synchronous exception that is allowed not to adhere to the precise exception model see Precise exception The PowerPC architecture allows only floating point except
76. base decrementer registers and the bus snooping logic Nap tThe nap mode further reduces power consumption by disabling bus snooping leaving only the time base register and the PLL in a powered state Sleep All internal functional units are disabled after which external system logic may disable the PLL and SYSCLK Thermal management facility provides software controllable thermal management Thermal management is performed through the use of three supervisor level registers and an 750 specific thermal management exception Instruction cache throttling provides control of instruction fetching to limit power consumption e Performance monitor can be used to help debug system designs and improve software efficiency e In system testability and debugging features through JTAG boundary scan capability 1 2 2 Instruction Flow As shown in Figure 1 1 the 750 instruction unit provides centralized control of instruction flow to the execution units The instruction unit contains a sequential fetcher six entry instruction queue IQ dispatch unit and BPU It determines the address of the next instruction to be fetched based on information from the sequential fetcher and from the BPU See Chapter 6 Instruction Timing for a detailed discussion of instruction timing Chapter 1 PowerPC 740 PowerPC 750 Overview 1 7 The sequential fetcher loads instructions from the instruction cache into the instruction queue The B
77. be automatically stopped whenever the 750 enters nap or sleep modes and automatically restarted when exiting those modes including snooping during nap mode The L2 SYNC_OUT SYNC_IN path will remain operating to keep the DLL in sync This bit is provided as a power saving alternative to the L2CTL bit and its corresponding ZZ pin which may not be useful for dynamic stopping restarting of the L2 interface from nap and sleep modes due to the relatively long recovery time from ZZ negation that many SRAM vendors require 2 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 18 L2CR Bit Settings Continued 23 L2DRO L2 DLL Rollover Checkstop Enable for chip revisions 3 0 and later Asserting this bit enables a potential actual rollover condition of the DLL to cause a checkstop for the processor A potential rollover condition occurs when the DLL is selecting the last tap of the delay line and thus may risk rolling over to the first tap with one adjustment while in the process of keeping in sync Such a condition is improper operation for the DLL and while this condition is not expected this bit allows detection for added security This bit should be set when the DLL is first enabled set with the L2CLK bits to detect rollover during initial synchronization It could also be set when the L2 cache is enabled with L2E bit after the DLL has achieved initial lock 0 Prevents DLL rollover to checkstop 1 Enable
78. bits are ignored and all accesses are propagated to the L2 cache or 60x bus as single beat transactions Note that the CI cache inhibit signal always reflects the state of the caching inhibited memory cache access attribute the I bit independent of the state of HIDO DCE Also note that disabling the data cache does not affect the translation logic translation for data accesses is controlled by MSR DR The setting of the DCE bit must be preceded by a sync instruction to prevent the cache from being enabled or disabled in the middle of a data access In addition the cache must be globally flushed before it is disabled to prevent coherency problems when it is re enabled Snooping is not performed when the data cache is disabled The dcbz instruction will cause an alignment exception when the data cache is disabled The touch load debt and debtst instructions are no ops when the data cache is disabled Other cache operations caused by the dcbf debst and debi instructions are not affected Chapter 3 Instruction and Data Cache Operation 3 13 by disabling the cache This can potentially cause coherency errors For example a dcbf instruction that hits a modified cache block in the disabled cache will cause a copyback to memory of potentially stale data 3 4 1 3 Data Cache Locking The contents of the data cache can be locked by setting the data cache lock bit HIDO DLOCK A data access that hits in a locked data cache is serviced by th
79. bus configurations that is selected during the negation of the HRESET signal The operation and selection of the optional bus configuration is described in the following sections 8 6 1 32 Bit Data Bus Mode The 750 supports an optional 32 bit data bus mode The 32 bit data bus mode operates the same as the 64 bit data bus mode with the exception of the byte lanes involved in the transfer and the number of data beats that are performed When in 32 bit data bus mode only byte lanes 0 through 3 are used corresponding to DHO DH31 and DPO0 DP3 Byte lanes 4 through 7 corresponding to DLO DL31 and DP4 DP7 are never used in this mode The unused data bus signals are not sampled by the 750 during read operations and they are driven low during write operations The number of data beats required for a data tenure in the 32 bit data bus mode is one two or eight beats depending on the size of the program transaction and the cache mode for the address Data transactions of one or two data beats are performed for caching inhibited load store or write through store operations These transactions do not assert the TBST signal even though a two beat burst may be performed having the same TBST and TSIZ 0 2 encodings as the 64 bit data bus mode Single beat data transactions are performed for bus operations of 4 bytes or less and double beat data transactions are performed for 8 byte operations only The 750 only generates an 8 byte operation for a double
80. bus tenure is used for the load operation The enveloped high priority cache block push feature defines a bus signal data bus write only DBWO which when asserted with a qualified data bus grant indicates that the resulting data tenure should be used for the store operation instead This signal is described in Section 8 10 Using Data Bus Write Only Note that the enveloped copy back operation is an internally pipelined bus operation 3 6 L1 Caches and 60x Bus Transactions The 750 transfers data to and from the cache in single beat transactions of two words or in four beat transactions of eight words which fill a cache block Single beat bus transactions can transfer from one to eight bytes to or from the 750 and can be misaligned Single beat transactions can be caused by cache write through accesses caching inhibited accesses WIMG x1xx accesses when the cache is disabled HIDO DCE bit is cleared or accesses when the cache is locked HIDO DLOCK bit is cleared Burst transactions on the 750 always transfer eight words of data at a time and are aligned to a double word boundary The 750 transfer burst TBST output signal indicates to the system whether the current transaction is a single beat transaction or four beat burst transfer Burst transactions have an assumed address order For cacheable read operations instruction fetches or cacheable non write through write operations that miss the cache the 750 presents the dou
81. bytes of the physical address for a transaction Odd parity means that an odd number of bits including the parity bit are driven high The signal assignments correspond to the following APO A 0 7 API A 8 15 AP2 A 16 23 AP3 A 24 31 For more information see Section 8 3 2 1 Address Bus Parity Timing Comments Assertion Negation The same as A 0 31 High Impedance The same as A 0 31 Chapter 7 Signal Descriptions 7 7 7 2 3 2 2 Address Bus Parity AP 0 3 Input Following are the state meaning and timing comments for the AP 0 3 input signal on the 750 State Meaning Asserted Negated Represents odd parity for each of the 4 bytes of the physical address for snooping operations Detected even parity causes the processor to take a machine check exception or enter the checkstop state if address parity checking is enabled in the HIDO register see Section 2 1 2 2 Hardware Implementation Dependent Register 0 Timing Comments Assertion Negation The same as A 0 31 7 2 4 Address Transfer Attribute Signals The transfer attribute signals are a set of signals that further characterize the transfer such as the size of the transfer whether it is a read or write operation and whether it is a burst or single beat transfer For a detailed description of how these signals interact see Section 8 3 2 Address Transfer Note that some signal functions vary depending on whether the transaction is a memor
82. cache performs as if the cache were not locked A cache block invalidated by a snoop remains invalid until the cache is unlocked To prevent locking during a cache access a sync instruction must precede the setting of DLOCK Instruction cache flash invalidate 0 The instruction cache is not invalidated The bit is cleared when the invalidation operation begins usually the next cycle after the write operation to the register The instruction cache must be enabled for the invalidation to occur An invalidate operation is issued that marks the state of each instruction cache block as invalid without writing back modified cache blocks to memory Cache access is blocked during this time Bus accesses to the cache are signaled as a miss during invalidate all operations Setting ICFI clears all the valid bits of the blocks and the PLRU bits to point to way LO of each set Once the L1 flash invalidate bits are set through a mtspr operations hardware automatically resets these bits in the next cycle provided that the corresponding cache enable bits are set in HIDO Note in the PowerPC 603 and PowerPC 603e processors the proper use of the ICFI and DCFI bits was to set them and clear them in two consecutive mtspr operations Software that already has this sequence of operations does not need to be changed to run on the 750 Chapter 2 Programming Model 2 11 Data cache flash invalidate O The data cache is not invalidated The bit is cleared when
83. calculates effective addresses for data loads and stores the instruction unit calculates effective addresses for instruction fetching The MMU translates the effective address to determine the correct physical address for the memory access The 750 supports the following types of memory translation e Real addressing mode In this mode translation is disabled by clearing bits in the machine state register MSR MSR IR for instruction fetching or MSR DR for data accesses When address translation is disabled the physical address is identical to the effective address e Page address translation translates the page frame address for a 4 Kbyte page size e Block address translation translates the base address for blocks 128 Kbytes to 256 Mbytes If translation is enabled the appropriate MMU translates the higher order bits of the effective address into physical address bits The lower order address bits that are untranslated and therefore considered both logical and physical are directed to the on chip caches where they form the index into the eight way set associative tag array After translating the address the MMU passes the higher order physical address bits to the cache and the cache lookup completes For caching inhibited accesses or accesses that miss in the cache the untranslated lower order address bits are concatenated with the translated higher order address bits the resulting 32 bit physical address is used by the memory
84. cause improper operation by the 750 When the 750 is following normal bus protocol data may be cancelled the bus cycle after TA by either of two means late cancellation by DRTRY or late cancellation by ARTRY When no DRTRY mode is selected both cancellation cases must be disallowed in the system design for the bus protocol When no DRTRY mode is selected for the 750 the system must ensure that DRTRY is not asserted to the 750 If it is asserted it may cause improper operation of the bus interface The system must also ensure that an assertion of ARTRY by a snooping device must occur before or coincident with the first assertion of TA to the 750 but not on the cycle after the first assertion of TA Other than the inability to cancel data that was read by the master on the bus cycle after TA was asserted the bus protocol for the 750 is identical to that for the basic transfer bus protocols described in this chapter including 32 bit data bus mode The 750 selects the desired DRTRY mode at startup by sampling the state of the DRTRY signal itself at the negation of the HRESET signal If the DRTRY signal is negated at the negation of HRESET normal operation is selected If the DRTRY signal is asserted at the negation of HRESET no DRTRY mode is selected 8 6 3 Reduced Pinout Mode This mode is not supported on the 750 Chapter 8 Bus Interface Operation 8 41 8 7 Interrupt Checkstop and Reset Signals This sec
85. caused directly by the execution of an instruction and those caused by an asynchronous event or interrupts Either may cause components of the system software to be invoked Exceptions can be caused directly by the execution of an instruction as follows e An attempt to execute an illegal instruction causes the illegal instruction program exception handler to be invoked An attempt by a user level program to execute the supervisor level instructions listed below causes the privileged instruction program exception handler to be invoked The 750 provides the following supervisor level instructions debi mfmsr mfspr mfsr mfsrin mtmsr mtspr mtsr mtsrin rfi tlbie and tlbsync Note that the privilege level of the mfspr and mtspr instructions depends on the SPR encoding e Any mtspr mfspr or mftb instruction with an invalid SPR or TBR field causes an illegal type program exception Likewise a program exception is taken if user level software tries to access a supervisor level SPR An mtspr instruction executing in supervisor mode MSR PR 0 with the SPR field specifying HID1 or PVR read only registers executes as a no op e An attempt to access memory that is not available page fault causes the ISI or DSI exception handler to be invoked e The execution of an sc instruction invokes the system call exception handler that permits a program to request the system to perform a service e The execution of a trap instruction inv
86. completion queue Instructions 2 and 3 drop into the two dispatch positions in the instruction queue Because there were two positions available in the instruction queue in clock cycle 0 two instructions 4 and 5 are fetched into the instruction queue Instruction 4 is a branch unconditional instruction which resolves immediately as taken Because the branch is taken it can therefore be folded from the instruction queue In cycle 2 assume a BTIC hit occurs and target instructions 6 and 7 are fetched into the instruction queue replacing the folded b instruction 4 and instruction 5 Instruction 0 completes writes back its results and vacates the completion queue by the end of the clock cycle Instruction 1 enters the second FPU execute stage instruction 2 is dispatched to the IU2 and instruction 3 is dispatched into the first FPU execute stage Because the taken branch instruction 4 does not update either CTR or LR it does not require a position in the completion queue and can be folded In cycle 3 target instructions 6 and 7 are fetched replacing instructions 4 and 5 in IQO and IQ1 This replacement on taken branches is called branch folding Instruction 1 proceeds through the last of the three FPU execute stages Instruction 2 has executed but must remain in the completion queue until instruction 1 completes Instruction 3 replaces instruction 1 in the second stage of the FPU and instruction 6 replaces instruction 3 in the first s
87. d Iwz 32 D A d Iwzu 33 D A d mulli 7 D A SIMM ori 24 S A UIMM oris 25 S A UIMM stb 38 S A d stbu 39 S A d stfd 54 S A d stfdu 55 S A d stfs 52 S A d stfsu 53 S A d sth 44 S A d sthu 45 S A d stmw 47 S A d A 30 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual stw 36 A d stwu 37 A d subfic 08 D A SIMM tdi 02 TO A SIMM twi 03 TO A SIMM xori 26 S A UIMM xoris 27 S A UIMM Note 1 Load store string multiple instruction 2 64 bit instruction Table A 35 DS Form Specific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Id 58 D A ds 0 Idu 58 D A ds 1 Iwa 58 D A ds 2 std 62 S A ds 0 stdu 62 S A ds 1 Note 1 64 bit instruction Table A 36 X Form Appendix A PowerPC Instruction Set Listings A 31 cr S 00 00000 OPCD 00000 00 OPCD 00000 00000 Specific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 andx 31 S A B 28 Rc andex A B 60 Rc cmp crfD H L A B 0 0 cmpl 31 0 A 32 0 entizdx 31 A 58 Re cntlzwx 31 A 26 Re dcba 26 31 A 758 0 dcbst 31 A B 54 0 debt 31 A B 278 0 dcbtst 31 A B 246 0 dcbz 31 A B 1014 0 eieio 31 00000 00000 00000 854 0 A 32 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual
88. e Load store instructions These include integer and floating point load and store instructions Integer load and store instructions Integer load and store multiple instructions Floating point load and store Primitives used to construct atomic memory operations Iwarx and stwex instructions e Flow control instructions These include branching instructions condition register logical instructions trap instructions and other instructions that affect the instruction flow Branch and trap instructions Condition register logical instructions e Processor control instructions These instructions are used for synchronizing memory accesses and management of caches TLBs and the segment registers Move to from SPR instructions Move to from MSR Synchronize Instruction synchronize Order loads and stores Chapter 1 PowerPC 740 PowerPC 750 Overview 1 27 e Memory control instructions These instructions provide control of caches TLBs and SRs Supervisor level cache management instructions User level cache instructions Segment register manipulation instructions Translation lookaside buffer management instructions This grouping does not indicate the execution unit that executes a particular instruction or group of instructions Integer instructions operate on byte half word and word operands Floating point instructions operate on single precision one word and
89. e The operating system writes the WIMG bits for each page into the PTEs in system memory as it sets up the page tables When an access requires coherency the processor performing the access must inform the coherency mechanisms throughout the system that the access requires memory coherency The M attribute determines the kind of access performed on the bus global or local Software must exercise care with respect to the use of these bits if coherent memory support is desired Careless specification of these bits may create situations that present coherency paradoxes to the processor In particular this can happen when the state of these bits is changed without appropriate precautions such as flushing the pages that correspond to the changed bits from the caches of all processors in the system or when the address translations of aliased real addresses specify different values for any of the WIMG bits These coherency paradoxes can occur within a single processor or across several processors It is important to note that in the presence of a paradox the operating system software is responsible for correctness For real addressing mode that is for accesses performed with address translation disabled MSR IR 0 or MSR DR 0 for instruction or data access respectively the WIMG bits are automatically generated as Ob0011 the data is write back caching is enabled memory coherency is enforced and memory is guarded 3 6 IBM PowerPC 740
90. eciwx or ecowx instruction operand is not word aligned A multiple string load store operation is attempted in little endian mode An operand of a debz instruction is on a page that is write through or cache inhibited for a virtual mode access An attempt to execute a dcbz instruction occurs when the cache is disabled Program 00700 As defined by the PowerPC architecture Floating point 00800 As defined by the PowerPC architecture unavailable Decrementer 00900 As defined by the PowerPC architecture when the most significant bit of the DEC register changes from 0 to 1 and MSR EE 1 System call 00C00 Execution of the System Call sc instruction MSR SE 1 or a branch instruction is completing and MSR BE 1 The 750 differs from the OEA by not taking this exception on an isync Reserved 00E00 The 750 does not generate an exception to this vector Other PowerPC processors may use this vector for floating point assist exceptions 00F00 The limit specified in PMCn is met and MMCRO ENINT 1 750 specific Instruction address 01300 IABR 0 29 matches EA 0 29 of the next instruction to complete IABR TE breakpoint matches MSR IR and IABR BE 1 750 specific System management 01400 MSR EE 1 and SMI is asserted 750 specific interrupt Chapter 4 Exceptions 4 3 Table 4 2 Exceptions and Conditions Continued Exception Type Keren Se Causing Conditions Thermal Thermal management is enabled junction temperature ex
91. exception may occur For example if both PMCn INTCONTROL and MMCRO ENINT are set and mtspr loads an overflow value an interrupt signal may be generated without any event counting having taken place The event to be monitored can be chosen by setting MMCRO 0 9 The selected events are counted beginning when MMCRO is set until either MMCRO is reset or a performance monitor interrupt is generated Table 2 10 lists the selectable events and their encodings Table 2 10 PMC1 Events MMCRO 19 25 Select Encodings ee 0000011 Number of transitions from 0 to 1 of specified bits in time base lower register Bits are specified through RTCSELECT MMRCO 7 8 00 15 01 19 10 23 11 31 0000100 Number of instructions dispatched 0 1 or 2 instructions per cycle 0000101 Number of eieio instructions completed 0000110 Number of cycles spent performing table search operations for the ITLB Chapter 2 Programming Model 2 17 Bits MMCRO 2631 specify events associated with PMC2 as shown in Table 2 11 Table 2 11 PMC2 Events MMCRO 26 31 Select Encodings a AA A 00 0000 Register holds current value 00 0001 Number of processor cycles 00 0010 Number of completed instructions Does not include folded branches 00 0011 Number of transitions from 0 to 1 of specified bits in time base lower register Bits are specified through RTCSELECT MMRCO 7 8 00 15 01 19 10 23 11 31 00 0100 Number of instructions dispatched 0 1
92. fields in the segment register are interpreted differently depending on the value of bit 0 The segment registers are accessed by the mtsr mtsrin mfsr and mfsrin instructions BAT registers There are 16 BAT registers organized as four pairs of instruction BAT registers IBATOU IBAT3U IBATOU IBAT3U paired with IBATOL IBAT3L and four pairs of data BAT registers IBATOL IBAT3L DBATOU DBAT3U paired with DBATOL DBAT3L The BAT registers are defined as DBATOU DBAT3U and 32 bit registers in 32 bit implementations These are special purpose registers that DBATOL DBAT3L are accessed by the mtspr and mfspr instructions The SDR1 register specifies the variables used in accessing the page tables in memory SDR1 is defined as a 32 bit register for 32 bit implementations This special purpose register is accessed by the mtspr and mfspr instructions 5 2 Real Addressing Mode If address translation is disabled MSR IR 0 or MSR DR 0 for a particular access the effective address is treated as the physical address and is passed directly to the memory subsystem as described in Chapter 7 Memory Management in The Programming Environments Manual Note that the default WIMG bits 0b0011 cause data accesses to be considered cacheable I 0 and thus load and store accesses are weakly ordered This is the case even if the data cache is disabled in the HIDO register as it is out of hard reset If I O devices require load and store a
93. filled in four beats of 64 bits each with the critical double word loaded first The data cache is not blocked to internal accesses while the load caused by a cache miss completes This functionality is sometimes referred to as hits under misses because the cache can service a hit while a cache miss fill is waiting to complete The critical double word read from memory is simultaneously written to the data cache and forwarded to the requesting unit thus minimizing stalls due to cache fill latency A cache block is filled after a read miss or write miss read with intent to modify occurs in the cache The cache block that corresponds to the missed address is updated by a burst transfer of the data from the L2 or system memory Note that if a read miss occurs in a system with multiple bus masters and the data is modified in another cache the modified data is first written to external memory before the cache fill occurs 3 5 4 Instruction Cache Block Fill Operations The 750 s instruction cache blocks are loaded in four beats of 64 bits each with the critical double word loaded first The instruction cache is not blocked to internal accesses while the fetch caused by a cache miss completes On a cache miss the critical and following double words read from memory are simultaneously written to the instruction cache and forwarded to the instruction queue thus minimizing stalls due to cache fill latency There is no snooping of the instruc
94. forwards the required data at completion When the source data reaches the rename register execution can begin Instruction results are transferred from the rename registers to the architected registers by the completion unit when an instruction is retired from the completion queue without exceptions and after any predicted branch conditions preceding it in the completion queue have been resolved correctly If a branch prediction was incorrect the instructions following the branch are flushed from the completion queue and any results of those instructions are flushed from the rename registers 6 3 3 2 Instruction Serialization Although the 750 can dispatch and complete two instructions per cycle so called serializing instructions limit dispatch and completion to one instruction per cycle There are three types of instruction serialization e Execution serialization Execution serialized instructions are dispatched held in the functional unit and do not execute until all prior instructions have completed A functional unit holding an execution serialized instruction will not accept further instructions from the dispatcher For example execution serialization is used for instructions that modify nonrenamed resources Results from these instructions are generally not available or forwarded to subsequent instructions until the instruction completes using mtspr to write to LR or CTR does provide forwarding to branch instructions e Completio
95. from the L2 cache they are forwarded to the 60x bus interface for address only broadcast if HIDO ABE is set to 1 The L2 flush mechanism is similar to the L1 data cache flush mechanism L2 flush requires that the entire L1 data cache be flushed prior to flushing the L2 cache Also interrupts must be disabled during the L2 flush so that the LRU algorithm does not get disturbed The L2 can be flushed by executing uniquely addressed load instructions to each of the 32 byte blocks of the L2 cache This requires a load to each of the 2 sets 2 way set associative of the 32 byte block sector within each 64 or 128 byte line of the L2 cache The loads must not hit in the L1 cache in order to effect a flush of the L2 cache The debi instruction is always forwarded to the L2 cache and causes a segment invalidation if a hit occurs The instruction is also forwarded to the 60x bus interface for broadcast if HIDO ABE is set to 1 The icbi instruction invalidates only L1 cache blocks and is never forwarded to the L2 cache Any debz instructions marked global do not affect the L2 cache state If an instruction hits in the L1 and L2 caches the L1 data cache block is cleared and the instruction completes If an instruction misses in the L2 cache it is forwarded to the 60x bus interface for broadcast Any dcbz instructions that are marked nonglobal act only on the L1 data cache without reference to the state of the L2 The sync and eieio instructions bypass
96. ia ESE ES Half word x x lt Double word Second beat KA ES ES Notes A Byte lane used Byte lane not used x Byte lane not used in 32 bit bus mode Misaligned data transfers when the 750 is configured with a 32 bit data bus operate in the same way as when configured with a 64 bit data bus with the exception that only the DH 0 31 data bus is used See Table 8 7 for an example of a 4 byte misaligned transfer starting at each possible byte address within a double word 8 20 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 8 7 Misaligned 32 Bit Data Bus Transfer Four Byte Examples Transfer Size Data Bus Byte Lanes Ge REES EE Misaligned first access ES x x lt x lt second access Misaligned first access x lt second access A Misaligned first access El second access ott 100 E Misaligned first access ott 101 J jajaja ES soroas 00 000 ppp Misaligned first access oto 110 a a x x x x x x EE EE EE x x x x lt x x lt second access 010 000 A del EE E EECHER DD BE Notes A Byte lane used Byte lane not used x Byte lane not used in 32 bit bus mode 8 3 2 5 Alignment of External Control Instructions The size of the data transfer associated with the eciwx and ecowx instructions is always 4 bytes If the eciwx or ecowx instruction is misaligned and crosses any word bo
97. implemented Out of order An aspect of an operation that allows it to be performed ahead of one that may have preceded it in the sequential model for example speculative operations An operation is said to be performed out of order if at the time that it is performed it is not known to be required by the sequential execution model See In order Out of order execution A technique that allows instructions to be issued and completed in an order that differs from their sequence in the instruction stream Overflow An condition that occurs during arithmetic operations when the result cannot be stored accurately in the destination register s For example if two 32 bit numbers are multiplied the result may not be representable in 32 bits Packet A term used in the 750 with respect to direct store operations Page A region in memory The OEA defines a page as a 4 Kbyte area of memory aligned on a 4 Kbyte boundary Page access history bits The changed and referenced bits in the PTE keep track of the access history within the page The referenced bit is set by the MMU whenever the page is accessed for a read or write operation The changed bit is set when the page is stored into See Changed bit and Referenced bit IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Page fault A page fault is a condition that occurs when the processor attempts to access a memory location that does not reside within a page not currentl
98. in memory are numbered consecutively starting with zero Each number is the address of the corresponding byte 2 3 2 2 Memory Operands Memory operands may be bytes half words words or double words or for the load store multiple and load store string instructions a sequence of bytes or words The address of a memory operand is the address of its first byte that is of its lowest numbered byte Operand length is implicit for each instruction The PowerPC architecture supports both big endian and little endian byte ordering The default byte and bit ordering is big endian See Byte Ordering in Chapter 3 Operand Conventions of The Programming Environments Manual for more information about big and little endian byte ordering The operand of a single register memory access instruction has a natural alignment boundary equal to the operand length In other words the natural address of an operand is an integral multiple of the operand length A memory operand is said to be aligned if it is aligned at its natural boundary otherwise it is misaligned For a detailed discussion about memory operands see Chapter 3 Operand Conventions of The Programming Environments Manual 2 3 2 3 Effective Address Calculation An effective address is the 32 bit sum computed by the processor when executing a memory access or branch instruction or when fetching the next sequential instruction For a memory access instruction if the sum of
99. in the invalid I state If the address misses in the cache no action is taken The execution of deht does not broadcast on the 60x bus unless broadcast is enabled through the HIDO ABE bit The function of this instruction is independent of the WIMG bit settings of the block containing the effective address The debf instruction executes regardless of whether the cache is disabled or locked however a BAT or TLB protection violation generates a DSI exception 3 4 2 5 Data Cache Block Invalidate dcbi The effective address is computed translated and checked for protection violations as defined in the PowerPC architecture This instruction is treated as a store with respect to address translation and memory protection If the address hits in the cache the cache block is placed in the invalid I state regardless of whether the data is modified Because this instruction may effectively destroy modified data it is privileged that is debi is available to programs at the supervisor privilege level MSR PR 0 The execution of debi does not broadcast on the 60x bus unless broadcast is enabled through the HIDO ABE bit The function of this instruction is independent of the WIMG bit settings of the block containing the effective address The debi instruction executes regardless of whether the cache is disabled or locked however a BAT or TLB protection violation generates a DSI exception 3 4 2 6 Instruction Cache Block Invalidate icbi
100. instructions to ensure the data correctness For updating the IBATs and SRs the sequencer classifies those operations as fetch serializing After such an instruction is dispatched the instruction buffer is flushed and the fetch stalls until the instruction completes However for reading from the IBATs the operation is classified as execution serializing As long as the LSU ensures that all previous instructions can be executed subsequent instructions can be fetched and dispatched Chapter 5 Memory Management 5 33 5 4 6 Page Table Updates When TLBs are implemented as in the 750 they are defined as noncoherent caches of the page tables TLB entries must be flushed explicitly with the TLB invalidate entry instruction tlbie whenever the corresponding PTE is modified As the 750 is intended primarily for uniprocessor environments it does not provide coherency of TLBs between multiple processors If the 750 is used in a multiprocessor environment where TLB coherency is required all synchronization must be implemented in software Processors may write referenced and changed bits with unsynchronized atomic byte store operations Note that the V R and C bits each reside in a distinct byte of a PTE Therefore extreme care must be taken to use byte writes when updating only one of these bits Explicitly altering certain MSR bits using the mtmsr instruction or explicitly altering PTEs or certain system registers may have the side effect o
101. internal pipelines and the reporting of exceptions When an instruction or data access occurs the effective address is routed to the appropriate MMU EA0 EA3 select one of the 16 segment registers and the remaining effective address bits and the VSID field from the segment register is passed to the TLB EA 14 19 then select two entries in the TLB the valid bits are checked and the 40 bit virtual page number 24 bit VSID and EA4 EA19 must match the VSID EAPI and API fields of the TLB entries If one of the entries hits the PP bits are checked for a protection violation If these bits don t cause an exception the C bit is checked and a table search operation is initiated if C must be updated If C does not require updating the RPN value is passed to the memory subsystem and the WIMG bits are then used as attributes for the access Although address translation is disabled on a reset condition the valid bits of TLB entries are not automatically cleared Thus TLB entries must be explicitly cleared by the system software with the tlbie instruction before the valid entries are loaded and address translation is enabled Also note that the segment registers do not have a valid bit and so they should also be initialized before translation is enabled 5 4 3 2 TLB Invalidation The 750 implements the optional tlbie and tlbsyne instructions which are used to invalidate TLB entries The execution of the tlbie instruction always invalidates four
102. is handled through MMCRO and MMCRI described in Table 11 2 and Table 11 3 respectively Event selection is described as follows The four event select fields in MMCRO and MMCR1 are as follows MMCRO 19 25 PMC1SELECT PMC1 input selector 128 events selectable 25 defined See Table 11 5 MMCRO 26 31 PMC2SELECT PMC2 input selector 64 events selectable 21 defined See Table 11 6 MMCRO 0 4 PMC3SELECT PMC3 input selector 32 events selectable defined See Table 11 7 MMCRO 5 9 PMC4SELECT PMC4 input selector 32 events selectable See Table 11 8 In the tables a correlation is established between each counter events to be traced and the pattern required for the desired selection The first five events are common to all four counters and are considered to be reference events These are as follows 00000 Register holds current value 00001 Number of processor cycles 00010 Number of completed instructions not including folded branches 00011 Number of TBL bit transitions from 0 to 1 of specified bits in time base lower register Bits are specified through RTCSELECT MMCRO 7 8 0 47 1 51 2 55 3 63 00100 Number of instructions dispatched 0 1 or 2 per cycle Some events can have multiple occurrences per cycle and therefore need two or three bits to represent them 11 5 Notes The following warnings should be noted 11 12 Only those load and stor
103. mantissa than the single precision format can provide In other words the result is too small to be represented accurately User mode The operating state of a processor used typically by application software In user mode software can access only certain control registers and can access only user memory space No privileged operations can be performed Also referred to as problem state V VEA virtual environment architecture The level of the architecture that describes the memory model for an environment in which multiple devices can access memory defines aspects of the cache model defines cache control instructions and defines the time base facility from a user level perspective Implementations that conform to the PowerPC VEA also adhere to the UISA but may not necessarily adhere to the OEA Virtual address An intermediate address used in the translation of an effective address to a physical address Virtual memory The address space created using the memory management facilities of the processor Program access to virtual memory is possible only when it coincides with physical memory W Word A 32 bit data element Write back A cache memory update policy in which processor write cycles are directly written only to the cache External memory is updated only indirectly for example when a modified cache block is cast out to make room for newer data Write through A cache memory update policy in which all processor write cy
104. memory access protocol The phases of the data tenure are identical to those of the address tenure underscoring the symmetry in the control of the two buses 8 4 1 Data Bus Arbitration Data bus arbitration uses the data arbitration signal group DBG DBWO and DBB Additionally the combination of TS and TT 0 4 provides information about the data bus request to external logic The TS signal is an implied data bus request from the 750 the arbiter must qualify TS with the transfer type TT encodings to determine if the current address transfer is an address only operation which does not require a data bus transfer see Figure 8 8 If the data bus is needed the arbiter grants data bus mastership by asserting the DBG input to the 750 As with the address bus arbitration phase the 750 must qualify the DBG input with a number of input signals before assuming bus mastership as shown in Figure 8 9 Chapter 8 Bus Interface Operation 8 23 Figure 8 9 Data Bus Arbitration A qualified data bus grant can be expressed as the following QDBG DBG asserted while DBB DRTRY and ARTRY associated with the data bus operation are negated When a data tenure overlaps with its associated address tenure a qualified ARTRY assertion coincident with a data bus grant signal does not result in data bus mastership DBB is not asserted Otherwise the 750 always asserts DBB on the bus clock cycle after recognition of a qualified data bus gra
105. miss in every case Note that setting the L2CR TS inhibits L2 cache misses from being forwarded to the 60x bus interface thereby avoiding the potential for bus errors due to addressing hardware or nonexistent memory The L2 cache then can be 9 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual further verified by reading the previously loaded addresses and observing whether all the tags hit and that the associated data compares correctly The performance monitor can also be used to verify whether the proper number of L2 cache hits and misses correspond to the test operations performed 4 The entire L2 cache can be tested by clearing L2CR DO and L2CR TS restoring the L1 and L2 caches to their normal operational state and executing a comprehensive test program designed to exercise all the caches The test program should include operations that cause L2 hit reload and castout activity that can be subsequently verified through the performance monitor 9 1 6 L2 Clock Configuration The 750 provides a programmable clock for the L2 external synchronous data RAM The clock frequency for the external SRAM is provided by dividing the 750 s internal clock by ratios of 1 1 5 2 2 5 or 3 programmed through the L2CR CLK bits The L2 clock is phase adjusted to synchronize the clocking of the latches in the 750 s L2 cache interface with the clocking of the external SRAM by means of an on chip delay locked loop DLL The rati
106. mode O The PMCn counters can be changed by hardware 1 If the processor is in user mode MSR PR is set the PMCn counters are not changed by hardware Disables counting while MSR PM is set O The PMCn counters can be changed by hardware 1 If MSR PM is set the PMCn counters are not changed by hardware Disables counting while MSR PM is zero O The PMCn counters can be changed by hardware 1 If MSR PM is cleared the PMCn counters are not changed by hardware 2 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 7 MMCRO Bit Settings Continued Lr MS aa 5 ENINT Enables performance monitor interrupt signaling 0 Interrupt signaling is disabled 1 Interrupt signaling is enabled Cleared by hardware when a performance monitor interrupt is signaled To reenable these interrupt signals software must set this bit after handling the performance monitor interrupt The IPL ROM code clears this bit before passing control to the operating system DISCOUNT Disables counting of PMCn when a performance monitor interrupt is signaled that is PMCnINTCONTROL 1 amp PMCn 0 1 amp ENINT 1 or the occurrence of an enabled time base transition with INTONBITTRANS 1 amp ENINT 1 0 Signaling a performance monitor interrupt does not affect counting status of PMCn 1 The signaling of a performance monitor interrupt prevents changing of PMC1 counter The PMCn counter do not change if PMC2COUNTCTL 0
107. model with a defined exception vector offset 0x00F00 The priority of the performance monitor interrupt lies between the external interrupt and the decrementer interrupt see Table 4 3 The contents of the SIA are described in 2 1 2 4 The performance monitor is described in Chapter 11 4 5 14 Instruction Address Breakpoint Exception 0x01300 An instruction address breakpoint interrupt occurs when the following conditions are met Chapter 4 Exceptions 4 23 e The instruction breakpoint address IABR O 29 matches EA 0 29 of the next instruction to complete in program order The instruction that triggers the instruction address breakpoint exception is not executed before the exception handler is invoked e The translation enable bit ABR TE matches MSR IR e The breakpoint enable bit IABR BE is set The address match is also reported to the JTAG COP block which may subsequently generate a soft or hard reset The instruction tagged with the match does not complete before the breakpoint exception is taken Table 4 14 lists register settings when an instruction address breakpoint exception is taken Table 4 14 Instruction Address Breakpoint Exception Register Settings Setting Description SRRO Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present SRR1 0 Loaded with equivalent MSR bits 1 4 Cleared 5 9 Loaded with equivalent MS
108. more error bits in SRR1 Table 2 2 describes SRR1 bits the 750 implements that are not required by the PowerPC architecture Table 2 2 Additional SRR1 Bits Bit Name Description 11 L2DP Set by a data parity error on the L2 bus The PowerPC 740 does not implement the L2 cache interface 12 MCPIN Set by the assertion of MCP 13 TEA Set by a TEA assertion on the 60x bus 14 oP Set by a data parity error on the 60x bus Set by an address parity error on the 60x bus Miscellaneous registers Time base TB The TB is a 64 bit structure provided for maintaining the time of day and operating interval timers The TB consists of two 32 bit registers time base upper TBU and time base lower TBL The time base registers can be written to only by supervisor level software but can be read by both user and supervisor level software See Time Base Facility TB OEA in Chapter 2 PowerPC Register Set of The Programming 2 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Environments Manual for more information Decrementer register DEC This register is a 32 bit decrementing counter that provides a mechanism for causing a decrementer exception after a programmable delay the frequency is a subdivision of the processor clock See Decrementer Register DEC in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information Imple
109. of the TLBISYNC signal at the negation of HRESET If the TLBISYNC signal is negated at the Chapter 8 Bus Interface Operation 8 40 negation of HRESET 64 bit data mode is entered by the 750 If TLBISYNC is asserted at the negation of HRESET 32 bit data mode is entered 8 6 2 No DRTRY Mode The 750 supports an optional mode to disable the use of the data retry function provided through the DRTRY signal The no DRTRY mode allows the forwarding of data during load operations to the internal CPU one bus cycle sooner than in the normal bus protocol The 60x bus protocol specifies that during load operations the memory system normally has the capability to cancel data that was read by the master on the bus cycle after TA was asserted In the 750 implementation this late cancellation protocol requires the 750 to hold any loaded data at the bus interface for one additional bus clock to verify that the data is valid before forwarding it to the internal CPU For systems that do not implement the DRTRY function the 750 provides an optional no DRTRY mode that eliminates this one cycle stall during all load operations and allows for the forwarding of data to the internal CPU immediately when TA is recognized When the 750 is in the no DRTRY mode data can no longer be cancelled the cycle after it is acknowledged by an assertion of TA Data is immediately forwarded to the CPU internally and any attempt at late cancellation by the system may
110. of the Iscbx instruction defined by the POWER architecture XER 16 23 is implemented so that they can be read with mfspr XER and written with mtxer XER instructions Link register LR The ER provides the branch target address for the Branch Chapter 2 Programming Model 2 3 Conditional to Link Register belrx instruction and can be used to hold the logical address of the instruction that follows a branch and link instruction typically used for linking to subroutines See Link Register LR in Chapter 2 PowerPC Register Set of The Programming Environments Manual Count register CTR The CTR holds a loop count that can be decremented during execution of appropriately coded branch instructions The CTR can also provide the branch target address for the Branch Conditional to Count Register bectrx instruction See Count Register CTR in Chapter 2 PowerPC Register Set of The Programming Environments Manual User level registers VEA The PowerPC VEA defines the time base facility TB which consists of two 32 bit registers time base upper TBU and time base lower TBL The time base registers can be written to only by supervisor level instructions but can be read by both user and supervisor level software For more information see PowerPC VEA Register Set Time Base in Chapter 2 PowerPC Register Set of The Programming Environments Manual Supervisor level registers OEA
111. on reset POR the 750 immediately branches to OxFFFO_0100 without attempting to reach a recoverable state A hard reset has the highest priority of any exception It is always nonrecoverable Table 4 9 shows the state of the machine just before it fetches the first instruction of the system reset handler after a hard reset In Table 4 9 the term Unknown means that the content may have been disordered These facilities must be properly initialized before use The FPRs BATs and TLBs may have been disordered To initialize the BATs first set them all to zero then to the correct values before any address translation occurs Note also that the 750 does not implement a synchronous error capability for memory accesses This means that the exception instruction pointer saved into the SRRO register does not point to the memory operation that caused the assertion of TEA but to the instruction about to be executed perhaps several instructions later However assertion of TEA does not invalidate data entering the GPR or the cache Additionally the address corresponding to the access that caused TEA to be asserted is not latched by the 750 To recover the exception handler must determine and remedy the cause of the TEA or the 750 must be reset therefore this function should only be used to indicate fatal system conditions to the processor such as parity or uncorrectable ECC errors After the 750 has committed to run a transaction that tran
112. or 2 instructions per cycle 00 0101 Number of eieio instructions completed Bits MMCR1 0 4 specify events associated with PMC3 as shown in Table 2 12 Table 2 12 PMC3 Events MMCR1 0 4 Select Encodings aS gt l Number of transitions from 0 to 1 of specified bits in the time base lower register Bits are specified through RTCSELECT MMRCO 7 8 0 47 1 51 2 55 3 63 number of MSR PM toggles while the processor is in user mode 0 1010 Number of store conditional instructions completed 2 18 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 12 PMC3 Events MMCR1 0 4 Select Encodings Continued E ETA Bits MMCR1 5 9 specify events associated with PMC4 as shown in Table 2 13 Table 2 13 PMC4 Events MMCR1 5 9 Select Encodings EII IN Number of transitions from 0 to 1 of specified bits in the time base lower register Bits are specified through RTCSELECT MMRCO 7 8 0 47 1 51 2 55 3 63 oto0o Number of mispredicted branches of Number of mispredicted branches branches number of MSR PM toggles while the processor is in supervisor mode Chapter 2 Programming Model 2 19 The PMC registers can be accessed with mtspr and mfspr using following SPR numbers e PMC1 is SPR 953 e PMC2 is SPR 954 e PMC3 is SPR 957 e PMC4 is SPR 958 2 1 2 4 6 User Performance Monitor Counter Registers UPMC1 UPMC4 The contents of the PMC1 PMC4 are ref
113. out of order and are held in the store queue until the completion logic signals that the store operation is to be completed to memory The 750 executes store instructions with a maximum throughput of one per cycle and a three cycle total latency to the data cache The time required to perform the actual load or store operation depends on the processor bus clock ratio and whether the operation involves the on chip cache the L2 cache system memory or an I O device 1 2 2 4 4 System Register Unit SRU The SRU executes various system level instructions as well as condition register logical operations and move to from special purpose register instructions To maintain system state most instructions executed by the SRU are execution serialized that is the instruction is held for execution in the SRU until all previously issued instructions have executed Results from execution serialized instructions executed by the SRU are not available or forwarded for subsequent instructions until the instruction completes Chapter 1 PowerPC 740 PowerPC 750 Overview 1 11 1 2 3 Memory Management Units MMUs The 750 s MMUs support up to 4 Petabytes 252 of virtual memory and 4 Gigabytes 232 of physical memory for instructions and data The MMUs also control access privileges for these spaces on block and page granularities Referenced and changed status is maintained by the processor for each page to support demand paged virtual memory systems The LSU
114. process Depending on the exception this may be the address in SRRO or at the next address in the program flow All instructions in the program flow preceding this one will have completed execution and no subsequent instruction will have begun execution This may be the address of the instruction that caused the exception or the next one as in the case of a system call trace or trap exception The SRRO register is shown in Figure 4 1 SRRO Holds EA for Instruction in Interrupted Program Flow Figure 4 1 Machine Status Save Restore Register 0 SRRO SRR1 is used to save machine status selected MSR bits and possibly other status bits as well on exceptions and to restore those values when an rfi instruction is executed SRR1 is shown in Figure 4 2 Exception Specific Information and MSR Bit Values 0 31 Figure 4 2 Machine Status Save Restore Register 1 SRR1 For most exceptions bits 2 4 and 10 12 of SRR1 are loaded with exception specific information and MSR 5 9 16 31 are placed into the corresponding bit positions of SRR1 Chapter 4 Exceptions 4 7 The 750 s MSR is shown in Figure 4 3 Reserved acc 000 BOG 0 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Figure 4 3 Machine State Register MSR The MSR bits are defined in Table 4 4 Table 4 4 MSR Bit Settings eck em rr es rn res ran COEN l OW Power management enable O Power management disabled normal operation mode 1
115. signals it is also an implied data bus request for a memory transaction unless it is an address only operation Negated Indicates that no bus transaction is occurring during normal operation Timing Comments Assertion Coincides with the assertion of ABB Negation Occurs one bus clock cycle after TS is asserted High Impedance Coincides with the negation of ABB 7 2 2 1 2 Transfer Start TS Input Following are the state meaning and timing comments for the TS input signal State Meaning Asserted Indicates that another master has begun a bus transaction and that the address bus and transfer attribute signals are valid for snooping see GBL Negated Indicates that no bus transaction is occurring Timing Comments Assertion May occur during the assertion of ABB Negation Must occur one bus clock cycle after TS is asserted 7 2 3 Address Transfer Signals The address transfer signals are used to transmit the address and to generate and monitor parity for the address transfer For a detailed description of how these signals interact refer to Section 8 3 2 Address Transfer 7 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 2 3 1 Address Bus A 0 31 The address bus A 031 consists of 32 signals that are both input and output signals 7 2 3 1 1 Address Bus A 0 31 Output Following are the state meaning and timing comments for the A 0 31 output signals State Meaning Assert
116. slave is identified in the address tenure and is responsible for supplying or latching the requested data for the master during the data tenure Snooping Monitoring addresses driven by a bus master to detect the need for coherency actions Snoop push Write backs due to a snoop hit The block will transition to an invalid or exclusive state Glossary of Terms and Abbreviations Glossary 11 Split transaction A transaction with independent request and response tenures Split transaction bus A bus that allows address and data transactions from different processors to occur independently Static branch prediction Mechanism by which software for example compilers can hint to the machine hardware about the direction a branch is likely to take Superscalar machine A machine that can issue multiple instructions concurrently from a conventional linear instruction stream Supervisor mode The privileged operation state of a processor In supervisor mode software typically the operating system can access all control registers and can access the supervisor memory space among other privileged operations Synchronization A process to ensure that operations occur strictly in order See Context synchronization and Execution synchronization Synchronous exception An exception that is generated by the execution of a particular instruction or instruction sequence There are two types of synchronous exceptions precise and imprecise
117. that normal operation should proceed See Section 8 7 3 Reset Inputs Timing Comments Assertion May occur at any time and may be asserted asynchronously to the 750 input clock The SRESET input is negative edge sensitive Negation May be negated two bus cycles after assertion This input has additional functionality in certain test modes 7 2 9 7 Processor Status Signals Processor status signals indicate the state of the processor This includes the memory reservation signal machine quiesce control signals time base enable signal and TLBISYNC signal 7 2 9 7 1 Quiescent Request QREQ Output Following are the state meaning and timing comments for QREQ State Meaning Asserted Indicates that the 750 is requesting all bus activity normally required to be snooped to terminate or to pause so the 750 Chapter 7 Signal Descriptions 7 23 Timing Comments may enter a quiescent low power state When the 750 has entered a quiescent state it no longer snoops bus activity Negated Indicates that the 750 is not making a request to enter the quiescent state Assertion Negation May occur on any cycle QREQ will remain asserted for the duration of the quiescent state 7 2 9 7 2 Quiescent Acknowledge QACK Input Following are the state meaning and timing comments for the QACK signal State Meaning Timing Comments Asserted Indicates that all bus activity that requires snooping has terminated or paused and
118. that the 750 may enter the quiescent or low power state Negated Indicates that the 750 may not enter a quiescent state and must continue snooping the bus Assertion Negation May occur on any cycle following the assertion of QREQ and must be held asserted for at least one bus clock cycle Start Up QACK is sampled at the negation of HRESET to select reduced pinout mode if QACK is asserted at start up reduced pinout mode is disabled Note Since the 750 does not support reduced pinout mode QACK must be asserted during start up 7 2 9 7 3 Reservation RSRV Output Following are the state meaning and timing comments for RSRV State Meaning Timing Comments Asserted Negated Represents the state of the reservation coherency bit in the reservation address register that is used by the Iwarx and stwex instructions See Section 8 8 1 Support for the lwarx stwcx Instruction Pair Assertion Negation Occurs synchronously with respect to bus clock cycles The execution of an Iwarx instruction sets the internal reservation condition 7 2 9 7 4 Time Base Enable TBEN Input Following are the state meaning and timing comments for the TBEN signal State Meaning Timing Comments 7 24 Asserted Indicates that the time base should continue clocking This input is essentially a count enable control for the time base counter Negated Indicates the time base should stop clocking Assertion Negation
119. the QREQ signal This signal allows the system to terminate or pause any bus activities that are normally snooped When the system is ready to enter the system quiesce state it asserts the QACK signal At this time the 750 may enter a quiescent low power state When the 750 is in the quiescent state it stops snooping bus activity While the 750 is in the nap power state the system power controller can enable snooping by the 750 by deasserting the QACK signal for at least eight bus clock cycles after which the 750 is capable of snooping bus transactions The reassertion of QACK following the snoop transactions will cause the 750 to reenter the nap power state 8 8 Processor State Signals This section describes the 750 s support for atomic update and memory through the use of the lwarx stwex opcode pair and includes a description of the TLBISYNC input 8 8 1 Support for the Iwarx stwcx Instruction Pair The Load Word and Reserve Indexed Iwarx and the Store Word Conditional Indexed stwex instructions provide a means for atomic memory updating Memory can be updated atomically by setting a reservation on the load and checking that the reservation is still valid before the store is performed In the 750 the reservations are made on behalf of aligned 32 byte sections of the memory address space The reservation RSRV output signal is driven synchronously with the bus clock and reflects the status of the reservation coherency bi
120. the effective address and the operand length exceeds the maximum effective address the memory operand is considered to wrap around from the maximum effective address through effective address 0 as described in the following paragraphs Effective address computations for both data and instruction accesses use 32 bit unsigned binary arithmetic A carry from bit 0 is ignored Chapter 2 Programming Model 2 35 Load and store operations have the following modes of effective address generation e EA rAl0 offset including offset 0 register indirect with immediate index e EA rAl0 rB register indirect with index Refer to Section 2 3 4 3 2 Integer Load and Store Address Generation for a detailed description of effective address generation for load and store operations Branch instructions have three categories of effective address generation e Immediate e Link register indirect e Count register indirect 2 3 2 4 Synchronization The synchronization described in this section refers to the state of the processor that is performing the synchronization 2 3 2 4 1 Context Synchronization The System Call sc and Return from Interrupt rfi instructions perform context synchronization by allowing previously issued instructions to complete before performing a change in context Execution of one of these instructions ensures the following e No higher priority exception exists sc e All previous instructions have c
121. the invalidation operation begins usually the next cycle after the write operation to the register The data cache must be enabled for the invalidation to occur An invalidate operation is issued that marks the state of each data cache block as invalid without writing back modified cache blocks to memory Cache access is blocked during this time Bus accesses to the cache are signaled as a miss during invalidate all operations Setting DCFI clears all the valid bits of the blocks and the PLRU bits to point to way LO of each set Once the L1 flash invalidate bits are set through a mtspr operations hardware automatically resets these bits in the next cycle provided that the corresponding cache enable bits are set in HIDO Setting this bit clears all the valid bits of the blocks and the PLRU bits to point to way LO of each set Note In the PowerPC 603 and PowerPC 603e processors the proper use of the ICFI and DCFI bits was to set them and clear them in two consecutive mtspr operations Software that already has this sequence of operations does not need to be changed to run on the 750 Table 2 4 HIDO Bit Functions Continued Speculative cache access disable O Speculative bus accesses to nonguarded space G 0 from both the instruction and data caches is enabled 1 Speculative bus accesses to nonguarded space in both caches is disabled IFEM Enable M bit on bus for instruction fetches O Mbit disabled Instruction fetches are treated as
122. these execution units in any combination When an instruction is dispatched it is assigned a position in the six entry completion queue A branch instruction can be issued on the same clock cycle for a maximum three instruction dispatch e During the execute pipeline stage each execution unit that has an executable instruction executes the selected instruction perhaps over multiple cycles writes the instruction s result into the appropriate rename register and notifies the completion stage that the instruction has finished execution In the case of an internal exception the execution unit reports the exception to the completion pipeline stage and except for the FPU discontinues instruction execution until the exception is handled The exception is not signaled until that instruction is the next to be completed Execution of most floating point instructions is pipelined within the FPU allowing up to three instructions to be executing in the FPU concurrently The FPU stages are multiply add and round convert Execution of most load store instructions is also pipelined The load store unit has two pipeline stages The first stage is for effective address calculation and MMU translation and the second stage is for accessing the data in the cache e The complete pipeline stage maintains the correct architectural machine state and transfers execution results from the rename registers to the GPRs and FPRs and CTR and LR for some instructions as
123. throughout this chapter 5 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Address Translation with Use EA 0 3 to Select One of 16 On Chip Segment Registers Check T Bit in Segment Descriptor Direct Store Page Address Segment Address Translation T 1 T 0 Otherwise Fetch with N Bit Set in Generate 52 Bit Virtual Address Segment Descriptor from Segment Descriptor No Execute Compare Virtual Address with DSI ISI Exception 1 TLB Entries 1 TLB A EN Miss E gt lt TLB ee Fj ee Figure 5 8 Hit t 9 dee Access Pr O Permitted otecied S Access Faulted PTE Not PTE Found Translate Address Found nee e E Continue Access to Access Faulted i koag HB Entry T Memory Subsystem _ In the case of instruction accesses ISI i Optional to the PowerPC architecture Implemented in the 750 calls ISl exception Figure 5 6 General Flow of Page and Direct Store Interface Address Translation Chapter 5 Memory Management 5 15 If SR T 0 page address translation is selected The information in the segment descriptor is then used to generate the 52 bit virtual address The virtual address is then used to identify the page address translation information stored as page table entries PTEs in a page table in memory For increased performance the 750 has two on chip TLBs to cache recently used translations on chip If an access hits in the appro
124. transfer the data and to ensure the integrity of the transfer Data transfer termination Data termination signals are required after each data beat in a data transfer In a single beat transaction the data termination signals also indicate the end of the tenure while in burst accesses the data termination signals Chapter 7 Signal Descriptions 7 1 7 2 apply to individual beats and indicate the end of the tenure only after the final data beat They also indicate whether a condition exists that requires the data phase to be repeated L2 cache address data The 750 has separate address and data buses for accessing the L2 cache not supported in the PowerPC 740 L2 cache clock control These signals provide clocking and control for the L2 cache not supported in the 740 Interrupts resets These signals include the external interrupt signal checkstop signals and both soft reset and hard reset signals They are used to interrupt and under various conditions to reset the processor Processor status and control These signals are used to set the reservation coherency bit enable the time base and other functions They are also used in conjunction with such resources as secondary caches and the time base facility Clock control These signals determine the system clock frequency They can also be used to synchronize multiprocessor systems Test interface The JTAG IEEE 1149 1a 1993 interface and the common on chip process
125. unit effectively eliminates its power consumption The operation of DPM is completely transparent to software or any external hardware Dynamic power management is enabled by setting HIDO DPM to 1 10 2 Programmable Power Modes The 750 provides four programmable power states full power doze nap and sleep Software selects these modes by setting one and only one of the three power saving mode bits in the HIDO register Hardware can enable a power management state through external asynchronous interrupts Such a hardware interrupt causes the transfer of program flow to interrupt handler code that then invokes the appropriate power saving mode The 750 provides a separate interrupt and interrupt vector for power management the system management interrupt SMI The 750 also contains a decrementer which allows it to enter the nap or doze mode for a predetermined amount of time and then return to full power operation through a decrementer interrupt Note that the 750 cannot switch from one power management mode to another without first returning to full power mode The sleep mode disables bus snooping therefore a hardware handshake is provided to ensure coherency before the 750 enters this power management mode Table 10 1 summarizes the four power states Chapter 10 Power and Thermal Management 10 1 Table 10 1 PowerPC 750 Microprocessor Programmable Power Modes PM Mode Functioning Units Activation Method Full Power Wake Up Method F
126. word aligned load or store double operation to or from the floating point GPRs All cache inhibited instruction fetches are performed as word single beat operations Data transactions of eight data beats are performed for burst operations that load into or store from the 750 s internal caches These transactions transfer 32 bytes in the same way as in 64 bit data bus mode asserting the TBST signal and signaling a transfer size of 2 TSIZ 0 2 0b010 The same bus protocols apply for arbitration transfer and termination of the address and data tenures in the 32 bit data bus mode as they apply to the 64 bit data bus mode Late ARTRY cancellation of the data tenure applies on the bus clock after the first data beat is acknowledged after the first TA for word or smaller transactions or on the bus clock after the second data beat is acknowledged after the second TA for double word or burst operations or coincident with respective TA if no DRTRY mode is selected An example of an eight beat data transfer while the 750 is in 32 bit data bus mode is shown in Figure 8 22 Chapter 8 Bus Interface Operation 8 39 Figure 8 22 32 Bit Data Bus Transfer Eight Beat Burst An example of a two beat data transfer with DRTRY asserted during each data tenure is shown in Figure 8 23 Figure 8 23 32 Bit Data Bus Transfer Two Beat Burst with DRTRY The 750 selects 64 bit or 32 bit data bus mode at startup by sampling the state
127. 0 SITV Sample interval timer value Number of elapsed processor clock cycles before a junction temperature vs threshold comparison result is sampled for TIN bit setting and interrupt generation This is necessary due to the thermal sensor DAC and the analog comparator settling time being greater than the processor cycle time The value should be configured to allow a sampling interval of 20 microseconds Enables the thermal sensor compare operation if either THRM1 V or THRM2 V is set to 1 Chapter 10 Power and Thermal Management 10 7 10 3 2 Thermal Assist Unit Operation The TAU can be programmed to operate in single or dual threshold modes which results in the TAU generating a thermal management interrupt when one or both threshold values are crossed In addition with the appropriate software routine the TAU can also directly determine the junction temperature The following sections describe the configuration of the TAU to support these modes of operation 10 3 2 1 TAU Single Threshold Mode When the TAU is configured for single threshold mode either THRM1 or THRM2 can be used to contain the threshold value and a thermal management interrupt is generated when the threshold value is crossed To configure the TAU for single threshold operation set the desired temperature threshold TID TIE and V bits for either THRM1 or THRM2 The unused THRMn threshold SPR should be disabled by clearing the V bit to 0 In this discussion THRM
128. 0 Integer Rotate Instructions s ciss sadscasdcatavssieaisadedaacnpadavesndentascedea cata DEEN EA 2 41 Integer SHITE Instructions EE EE 2 41 Floating Point Arithmetic Instructions ooooccnoccnoncnnnoncnoncnonononononn conocio nono nancnnnnnos 2 42 Floating Point Multiply Add Instructions ooococcconccnocanonncconcnonnnonn nono nononcnnnncnnnno 2 43 Floating Point Rounding and Conversion Instructions cooonnoccnononnoncnonanannninnno 2 43 Floating Point Compare Instructions oocoonoccnonononcnonannnoncnonoconanonn nono n co nncnnanrnnno 2 43 Floating Point Status and Control Register Instructions oooccnonnnnnccnonnnannnnnnos 2 44 Floating Point Move Instructions ooooccnoccnnoncnonononcnonnnon nono nononn conc conocio no ncnanncnnnno 2 44 Integer Load Instructions dead 2 47 Integer Store Ee e 2 48 Integer Load and Store with Byte Reverse Instructions ooncccnnnnnnoninonnnacnnnnnos 2 49 Integer Load and Store Multiple Instructions 00 0 ee eee eeeeeeeseeeseceseeeeeeeenees 2 49 xxi Paragraph Number Table 2 36 Table 2 37 Table 2 38 Table 2 39 Table 2 40 Table 2 41 Table 2 42 Table 2 43 Table 2 44 Table 2 45 Table 2 46 Table 2 47 Table 2 48 Table 2 49 Table 2 50 Table 2 51 Table 2 52 Table 2 53 Table 2 54 Table 2 55 Table 2 56 Table 2 57 Table 2 58 Table 2 59 Table 3 1 Table 3 2 Table 3 3 Table 3 4 Table 3 5 Table 3 6 Table 3 7 Table 4 1 Table 4 2 Table 4 3 Table 4 4 Table 4 5 Table 4 6 Table 4 7 T
129. 0 Number of fall through branches Indicates the number of branches that were predicted not taken 00 1001 Switches between Privileged and User Counts the number of times that the MSR PR bit toggles 00 1010 Reserved loads Incremented every time that a reserved load completes 00 1011 Loads and stores Counts all load and store instructions completed 00 1100 Number of snoops Gives the total number of snoops to the L1 and the 001101 L1 castouts to L2 Number of times the L1 castout goes to the L2 001110 System Unit Instructions Number of system unit instructions completed 001111 Instruction Miss cycles Counts the total number of L1 miss cycles of instruction fetches 010000 First speculative branch resolved correctly Indicates the number of branches that allow speculative execution beyond those that resolved correctly All others RESERVED May be used in a later revision Bits MMCR1 0 4 specify events associated with PMC3 as shown in Table 11 7 Table 11 7 PMC3 Events MMCR1 0 4 Select Encodings EE 0 0011 Number of TBL bit transitions from 0 to 1 of specified bits in time base lower register Bits are specified through RTCSELECT MMRCO 7 8 0 47 1 51 2 55 3 63 0 0100 Number of instructions dispatched 0 1 or 2 per cycle 11 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 11 7 PMC3 Events MMCR1 0 4 Select Encodings Continued E OA number of MSR PM toggles w
130. 0 asserts BR before the external arbiter asserts BG the 750 is considered to be unparked as shown in Figure 8 5 Figure 8 6 shows the parked case where a qualified bus grant exists on the clock edge following a need_bus condition Notice that the bus clock cycle required for arbitration is eliminated if the 750 is parked reducing overall memory latency for a transaction The 750 always negates ABB for at least one bus clock cycle after AACK is asserted even if it is parked and has another transaction pending 8 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Typically bus parking is provided to the device that was the most recent bus master however system designers may choose other schemes such as providing unrequested bus grants in situations where it is easy to correctly predict the next device requesting bus mastership Figure 8 6 Address Bus Arbitration Showing Bus Parking When the 750 receives a qualified bus grant it assumes address bus mastership by asserting ABB and negating the BR output signal Meanwhile the 750 drives the address for the requested access onto the address bus and asserts TS to indicate the start of a new transaction When designing external bus arbitration logic note that the 750 may assert BR without using the bus after it receives the qualified bus grant For example in a system using bus snooping if the 750 asserts BR to perform a replacement copy back operation anothe
131. 00800 As defined by the PowerPC architecture 09 As defined by the PowerPC architecture when the most significant bit of the DEC register changes from 0 to 1 and MSR EE 1 00C00 Execution of the System Call sc instruction 00 00D00 MSR SE 1 or a branch instruction completes and MSR BE 1 Unlike the architecture definition isync does not cause a trace exception 00E00 The 750 does not generate an exception to this vector Other PowerPC processors may use this vector for floating point assist exceptions 00F00 The limit specified in a PMC register is reached and MMCRO ENINT 1 01300 IABR 0 29 matches EA 0 29 of the next instruction to complete IABR TE matches MSR IR and IABR BE 1 01400 MSR EE 1 and SMI is asserted 01700 Thermal management is enabled the junction temperature exceeds the threshold specified in THRM1 or THRM2 and MSR EE 1 1 8 Memory Management The following subsections describe the memory management features of the PowerPC architecture and the 750 implementation respectively A detailed description of the 750 MMU implementation is provided in Chapter 5 Memory Management 1 32 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 1 8 1 PowerPC Memory Management Model The primary functions of the MMU are to translate logical effective addresses to physical addresses for memory accesses and to provide access protection on blocks and pages of memory There are tw
132. 1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ridicx 30 S 3 sh Rc ridiclx 30 S D sh Rc ridicrx 30 S A sh me 1 sh Re rldimix 30 S A sh mb 3 shiRe Note 1 64 bit instruction Appendix A PowerPC Instruction Set Listings A 39 A 40 Table A 45 MDS Form Specific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 rldclx 1 30 S A B mb 8 Re riderx Note 1 64 bit instruction IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual A 5 Instruction Set Legend Table A 46Table A 47 provides general information on the PowerPC instruction set such as the architectural level privilege level and form Table A 46 PowerPC Instruction Set Legend eee C a AAA ade LL TI TI se KEE ET E AA awo WEE ST WE NEE NEE e E AE O E ERA ps TT E TS Tes asame II po vo A A Appendix A PowerPC Instruction Set Listings A 41 Table A 46 PowerPC Instruction Set Legend Continued ese Tee E opin re eso O EE EE AA e M E DE es EEN ER B e ee ERT E EE EEN NC E AE E AE gt 1 3 ESE AE AER EE e AE O AA Bo a A A A ao SA E AA a A AE e E NAAA AA AAA a Y E TE E ERA ES FA EE EU SO PS E A E E AS SEA A AN gt UE E SP A E EEE AA gt gt E DEE A AA COM Aa WE AA O A EE e O AH APA e O NE E e AA A A E EA E EA AE AER gt AO EA HE AA a E O AAA AER A A EA EAS ESSE PE 2 OE ERA RS XLS EorE EENS e TT
133. 10110 01 0111011100 Re 01 U Oo o 0 0111101001 Re IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 divwx 01 D A B oH 0111101011 Rc slbia 125 01 00000 00000 00000 0111110010 0 merxr 01 cr D 00 00000 00000 1000000000 0 ve on e CK mon on a LI mens fo Ifsx 01 B 1000010111 0 srwx 01 B 1000011000 Re srdx 01 B 1000011011 Re tlbsync 925 01 00000 1000110110 0 SE e menon fe e on onoo onser fo Iswi 01 NB 1001010101 0 sync 01 00000 1001010110 0 Ifdx 01 B 1001010111 0 Ifdux 01 D B 1001110111 0 wu 3 o eooo e towooroor o wel 217 os a e gt ooon fo stwbrx 01 S A B 1010010110 0 stfsx 01 S A B 1010010111 0 stfsux 01 S A B 1010110111 0 stswi 01 S A NB 1011010101 0 wel onn s a f e f reneo fo swar mu ooon a e os fo stfdux 01 S A B 1011110111 0 Ihbrx 01 D A B 1100010110 0 srawx 01 S A B 1100011000 Re sradx 01 S A B 1100011010 Re nl ormi 90000 ooooo ooooo Gegen fol sthbrx 01 S A B 1110010110 0 extshx 01 S A 00000 1110011010 Re extsbx 01 S A 00000 1110111010 Re icbi 01 00000 A B 1111010110 0 Appendix A PowerPC Instruction Set Listings A 13 Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
134. 2 4 Address Transfer Attribute Signals describes the encodings for the address transfer attribute signals 8 3 2 2 1 Transfer Type TT 0 4 Signals Snooping logic should fully decode the transfer type signals if the GBL signal is asserted Slave devices can sometimes use the individual transfer type signals without fully decoding the group For a complete description of the encoding for TT 0 4 refer to Table 8 1 and Table 8 2 8 3 2 2 2 Transfer Size TSIZ 0 2 Signals The TSIZ 0 2 signals indicate the size of the requested data transfer as shown in Table 8 1 The TSIZ 0 2 signals may be used along with TBST and A 29 31 to determine which portion of the data bus contains valid data for a write transaction or which portion of the bus should contain valid data for a read transaction Note that for a burst Chapter 8 Bus Interface Operation 8 15 transaction as indicated by the assertion of TBST TSIZ O 2 are always set to 0b010 Therefore if the TBST signal is asserted the memory system should transfer a total of eight words 32 bytes regardless of the TSIZ 0 2 encodings Table 8 1 Transfer Size Signal Encodings ICON ETT ICONO ICI ICI CN BCC BCC EC II TIN Crea o IO IRC ECT MCCAIN IC CT meng A O IR TT MCCAIN ECT The basic coherency size of the bus is defined to be 32 bytes corresponding to one cache line Data transfers that cross an aligned 32 byte boundary either must present a new add
135. 2 4 L2 Cache Access Timing Considerations PowerPC 750 Only If an instruction fetch misses both the BTIC and the on chip instruction cache the 750 next looks in the L2 cache If the requested instructions are there they are burst into the 750 in Chapter 6 Instruction Timing 6 15 much the same way as shown in Figure 6 6 The formula for the L2 cache latency for instruction accesses is as follows 1 processor clock 3 L2 clocks 1 processor clock Therefore if the L2 is operating in 2 1 mode the instruction fetch takes 8 processor clock cycles Additional factors can also affect this latency including the type of memory used to implement the L2 and whether the processor clock and L2 clocks are aligned immediately For more information about the L2 cache implementation see Chapter 9 L2 Cache Interface Operation 6 3 3 Instruction Dispatch and Completion Considerations Several factors affect the 750 s ability to dispatch instructions at a peak rate of two per cycle the availability of the execution unit destination rename registers and completion queue as well as the handling of completion serialized instructions Several of these limiting factors are illustrated in the previous instruction timing examples To reduce dispatch unit stalls due to instruction data dependencies the 750 provides a single entry reservation station for the FPU SRU and each IU and a two entry reservation station for the LSU If a data depend
136. 23 24 25 26 27 28 29 30 31 000010 TO A SIMM 000011 TO A SIMM 000111 D A SIMM cover ff af 001011 crfD OIL A 001100 D A 001101 D A 001110 D A 001111 D A 010000 010001 010010 010011 crfD 00 op 00 0000000000 0 010011 BO Bl 0000010000 LK E O O wove oooooroore fo moon ow on Les ooooroooo fo 010011 0000110010 010011 0010000001 010011 0010010110 010011 0011000001 moon ew on ees zemmer fo moon ew on o mammen fo 010011 croD crbA crobB 0100100001 0 010011 croD crbA croB 0110100001 0 010011 croD crbA croB 0111000001 0 A 9 Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 bectrx 010011 BO BI 00000 1000010000 LK rlwimix 010100 S A SH MB ME Re rlwinmx 010101 S A SH MB ME Re oris 011001 S A xori 011010 S A xoris 011011 S A andi 011100 S A andis 011101 ridielx1 011110 mb 000 ridicrx 011110 S A sh Rc ridicx 011110 S A sh sh Re ridimix 011110 S A sh sh Rc rldcix 011110 S A B mb Re ool 211 em efe a L mmer fo tw 011111 TO A B 0000000100 0 subfex 011111 D A B 0000001000 Re mulhdux 011111 D A B 0000001001 Rc addex 011111 D A B 0000001010 Rc mum 22117 1 o f a e o mom fe wel orm o ooooo ope mme fol Iwarx 011111 D A B 0000010100 0 Idx 011111 D A B 0000010101 0
137. 3 lwarx stwcx support 8 43 M Machine check exception 4 17 MCP machine check interrupt signal 7 21 MEI protocol hardware considerations 3 9 read operations 3 23 state transitions 3 31 Memory accesses 8 6 Memory coherency bit M bit cache interactions 3 6 timing considerations 6 27 Memory control instructions description 2 62 2 66 segment register manipulation A 28 SLB management A 28 Memory management unit address translation flow 5 12 address translation mechanisms 5 9 5 12 block address translation 5 9 5 12 5 21 block diagrams 32 bit implementations 5 6 DMMU 5 8 IMMU 5 7 exceptions summary 5 16 features summary 5 3 implementation specific features 5 2 instructions and registers 5 18 memory protection 5 11 overview 1 12 5 2 Index 6 page address translation 5 9 5 12 5 28 page history status 5 12 5 21 5 25 real addressing mode 5 12 5 20 segment model 5 21 Memory synchronization instructions 2 59 2 61 A 24 Misaligned data transfer 8 21 Misalignment misaligned accesses 2 29 misaligned data transfer 8 19 MMCRN monitor mode control registers 2 14 4 23 11 3 MSR machine state register bit settings 4 8 FEO FE1 bits 4 10 IP bit 4 13 PM bit 2 4 RI bit 4 11 settings due to exception 4 12 Multiple precision shifts 2 41 Multiply add instructions A 20 N No DRTRY mode 8 41 O OEA exception mechanism 4 1 memory management specifications 5 1 registers 2 4 Operand conventions 2 28 Operand placeme
138. 3 21 Data Cache Block Push Operation ccesscccesseecseececeeeeecseeeecseeeeeseeeenaeees 3 22 Enveloped High Priority Cache Block Push Operation oooocnoccccnonccnnnns 3 22 L1 Caches and 60x Bus Eecher edd 3 22 Read Operations and the MEI Protocol ooooccnococonoccconoccconacccnoncnononcnonanccinnnnnos 3 23 Bus Operations Caused by Cache Control Instructions ooooocccnoncccnoncninnnnnos 3 24 SACA Ee Ee 3 25 Snoop Response to 60x Bus Transactions oocoocoocccconcccnoncncnonanononccononcconnnacinns 3 26 Transfer Attributes ir eege ee eae eee 3 29 IAE AAA n E AREE a p aieka as 3 31 Chapter 4 Exceptions PowerPC 750 Microprocessor Exceptions sssssssessssssesssesesseeesseessresseesseesseee 4 2 Exception Recognition and Protes 4 4 Exception Processing eorna nrin incio 4 7 Enabling and Disabling Exceptions oooooccoococonococonancconnncnonnncnnonononnncconnncnnnnnnnos 4 10 Steps for Exception Processing seseesesesseessesssersseressetrssressresserssseesseeesseese 4 10 Seti MSRIR en ere 4 11 Returning from an Exception Handler 4 11 ageet 4 12 Exception MEIER eege eene Eeer 4 12 System Reset Exception 0x00100 0 ee ceecceeeeseeceececeeeeeceeeeecseeeeceeeeeeseeeee 4 13 BOLERO EE 4 14 Hard Restan ts is e la Tocata teenie 4 15 Machine Check Exception OxD00200 siii il i 4 17 Machine Check Exception Enabled MSR ME IN 4 18 Cheekstop State MSR ME O sccisssiccsvsscecsssavcenetsscadcees sececdenassac
139. 40 PowerPC 750 RISC Microprocessor User s Manual Table 4 4 MSR Bit Settings Continued Branch trace enable O The processor executes branch instructions normally 1 The processor generates a branch type trace exception when a branch instruction executes successfully IEEE floating point exception mode 1 see Table 4 5 Reserved This bit corresponds to the AL bit of the POWER architecture Exception prefix The setting of this bit specifies whether an exception vector offset is prepended with Fs or Os In the following description nnnnn is the offset of the exception O Exceptions are vectored to the physical address 0x000n_nnnn 1 Exceptions are vectored to the physical address OxFFFn_nnnn Instruction address translation 0 Instruction address translation is disabled 1 Instruction address translation is enabled For more information see Chapter 5 Data address translation 0 Data address translation is disabled 1 Data address translation is enabled For more information see Chapter 5 Reserved Full function Performance monitor marked mode O Process is not a marked process 1 Process is a marked process 750 specific defined as reserved by the PowerPC architecture For more information about the performance monitor see 4 5 13 Indicates whether system reset or machine check exception is recoverable O Exception is not recoverable 1 Exception is recoverable The RI bit indicates whether from th
140. 41 Floating Point Performance Registers IBAT3U SPR534 DBAT3U SPR542 SDRi Monitor Registers FPRO IBAT3L SPR 535 DBAT3L SPR 543 SDR1 SPR 25 For Reading Eege FERI Exception Handling Registers SPRGs Data Address Save and Restore UPMC1 SPR 937 See eens Register Registers UPMC2 SPR 938 FPR31 spe epp zs DAR SPR 19 SRRO 1 SPR 26 UPMC3 SPR 941 Condition SPRG2 SPR274 DSISR See UPMC4 SPR 942 Register SPRG3 SPR 275 DSISR SPR 18 Sampled Instruction CR Address Miscellaneous Registers USIA SPR 939 Be Bical External Address Time Base Decrementer atus an Register For Writin Monitor Control Control Register 9 For Writing DEC SPR 22 EAR SPR 282 TBL SPR 284 UMMCRO SPR 936 FPSCR c TBU SPR 285 UMMCR1 SPR 940 So Data Address L2 Control Instruction Address Breakpoint Register Register Breakpoint Register DABR SPR 1013 L2CR SPR 1017 IABR SPR 1010 Performance Monitor Registers Power Thermal Management Registers Performance Sampled Counters Instruction Thermal Assist Instruction Cache Address Unit Registers Throttling Control PMC1 SPR 953 S E Register lA PR 955 PMC SPR 954 THRM1 SPR 1020 ICTC SPR 1019 i THRM2 GPP 1021 THRM3 SPR 1022 PMC3 SPR 957 Monitor Contro PMC4 SPR 958 MMCRO 1 SPR 952 MMCR1 SPR 956 These registers are 750 specific registers They may not be supported by other PowerPC pro
141. 420 2677 International PowerPC Documentation The PowerPC documentation is available from the sources listed inside the front cover of this manual the document order numbers are included in parentheses for ease in ordering e Programming environments manuals This book provides information about resources defined by the PowerPC architecture that are common to PowerPC processors XXX IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual PowerPC Microprocessor Family The Programming Environments G522 0290 00 e Implementation Variances Relative to Rev 1 of The Programming Environments Manual is available via the world wide web at http www chips ibm com e Hardware specifications Hardware specifications provide specific data regarding bus timing signal behavior and AC DC and thermal characteristics as well as other design considerations for each PowerPC implementation This include the following PowerPC 740 and PowerPC 750 Embedded RISC Microprocessor Hardware Specifications is available via the world wide web at http www chips ibm com PowerPC 750 SCM RISC Microprocessor Hardware Specification G522 0324 00 e Technical Summaries Each PowerPC implementation has a technical summary that provides an overview of its features This document is roughly the equivalent to the overview Chapter 1 of an implementation s user s manual PowerPC 750 RISC Microprocessor Technical Summa
142. 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 addcx 31 D A B OE 10 Re addex 31 D A B OE 138 Re addi 14 D A SIMM addic 12 D A SIMM addic 13 D A SIMM addis 15 D A SIMM addmex 31 D A 00000 OE 234 Re addzex 31 D A 00000 OE 202 Re Appendix A PowerPC Instruction Set Listings A 1 Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 andx 31 S A B 28 Rc andcx 31 andi 28 andis 29 bx 18 bcx 16 bectrx 19 belrx 19 cmp 31 cmpi 11 cmpl 31 cmpli 10 entlzdx 31 cntlzwx 31 crand 19 crandc 19 creqv 19 crnand 19 crnor 19 cror 19 crorc 19 crxor 19 dcba 31 dcbf 31 debi 31 dcbst 31 dcbt 31 dcbtst 31 dcbz 31 divdx 31 D A divdux 31 D A divwx 31 D A B OE 491 Re divwux 31 D A B OE 459 Re A 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 eciwx 31 D A B 310 0 ecowx 31 S A B 438 0 eieio 31 eqvx 31 extsbx 31 extshx 31 extswx 31 fabsx 63 faddx 63 faddsx 59 fcfidx 63 Tempo 63 fcmpu 63 fetidx 63 fetidzx 63 fetiwx 63 fctiwzx 63 fdivx 63 B fdivsx 59 B fmaddx 63 B fmaddsx 59 B fmsubsx 59 fmulx 63 fmulsx 59 fnabsx 63 fnegx 63 fnmaddx 63 D fnmaddsx 59 D fnmsubx 63 D fnmsub
143. 6 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 eciwx 31 D A B 310 0 ecowx 31 S A B 438 0 A 28 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual A 4 Instructions Sorted by Form Table A 31 through Table A 45 list the PowerPC instructions grouped by form Key fa Reserved bits Table A 31 l Form Specific Instruction Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 bx 18 LI AA LK Table A 32 B Form OPCD BO BI BD AA LK Specific Instruction Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Table A 33 SC Form OPCD 00000 00000 000000000000000 Hi Specific Instruction Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 sc 17 00000 00000 000000000000000 1 0 Table A 34 D Form Appendix A PowerPC Instruction Set Listings A 29 Specific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 addi 14 D A SIMM addic 12 D A SIMM addic 13 D A SIMM addis 15 D A SIMM andi 28 S A UIMM andis 29 S A UIMM cmpi 11 crfD o L A SIMM cmpli 10 o O L A UIMM Ibz 34 D A d Ibzu 35 D A d Ifd 50 D A d Ifdu 51 D A d Ifs 48 D A d eu 49 D A d Iha 42 D A d Ihau 43 D A d Ihz 40 D A d Ihzu 41 D A d Imw 46 D A
144. 61 0 stw 36 S A d Appendix A PowerPC Instruction Set Listings A 7 Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 stwbrx 31 S A B 662 0 stwcx 31 S A B 150 1 stwu 37 S A d subfx 31 D A B OE 40 Rc subfcx 31 D A B OE 8 Rc subfex 31 D A B OE 136 Rc subfic 08 D A SIMM sync 31 00000 00000 00000 598 0 td 31 TO A B 68 0 tdi 02 TO A SIMM tibia 297 31 00000 00000 00000 370 0 tlbie 2 31 00000 00000 B 306 0 tlbsync 9 31 00000 00000 00000 566 0 tw 31 TO A B 4 0 twi 03 TO A SIMM xorx 31 S A B 316 ro xori 26 S A UIMM xoris 27 S A UIMM Notes 1 64 bit instruction 2 Optional instruction 3 Supervisor level instruction 4 Load store string multiple instruction 3 Supervisor and user level instruction S Optional 64 bit bridge instruction 7 32 bit instruction not implemented by the PowerPC 750 A 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual A 2 Instructions Sorted by Opcode Table A 2 lists the instructions defined in the PowerPC architecture in numeric order by opcode Name tdi 1 twi mulli subfic cmpli cmpi addic addic addi addis bcx sc bx mcrf belrx rfid 12 crnor rfi 324 crandc isync crxor crnand crand creqv crorc cror Appendix A PowerPC Instruction Set Listings 0 Key E Reserved bits Table A 2 Complete Instruction List Sorted by Opcode 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
145. 64 bit data path for read and write operations The 750 transfers data in either single or four beat burst transfers Single beat operations can transfer from 1 to 8 bytes at a time and can be misaligned see Section 8 3 2 4 Effect of Alignment in Data Transfers Burst operations always transfer eight words and are aligned on eight word address boundaries Burst transfers can achieve significantly higher bus throughput than single beat operations The type of transaction initiated by the 750 depends on whether the code or data is cacheable and for store operations whether the cache is in write back or write through mode which software controls on either a page or block basis Burst transfers support cacheable operations only that is memory structures must be marked as cacheable and write back for data store operations in the respective page or block descriptor to take advantage of burst transfers The 750 output TBST indicates to the system whether the current transaction is a single or four beat transfer except during eciwx ecowx transactions when it signals the state of EAR 28 A burst transfer has an assumed address order For load or store operations that miss in the cache and are marked as cacheable and for stores write back in the MMU the 750 uses the double word aligned address associated with the critical code or data that initiated the transaction This minimizes latency by allowing the critical code or data to be forwa
146. ARBITRATION SINGLE BEAT TRANSFER TERMINATION Figure 8 4 Overlapping Tenures on the 750 Bus for a Single Beat Transfer The basic functions of the address and data tenures are as follows e Address tenure Arbitration During arbitration address bus arbitration signals are used to gain mastership of the address bus Transfer After the 750 is the address bus master it transfers the address on the address bus The address signals and the transfer attribute signals control the address transfer The address parity and address parity error signals ensure the integrity of the address transfer Termination After the address transfer the system signals that the address tenure is complete or that it must be repeated e Data tenure Arbitration To begin the data tenure the 750 arbitrates for mastership of the data bus Transfer After the 750 is the data bus master it samples the data bus for read operations or drives the data bus for write operations The data parity and data parity error signals ensure the integrity of the data transfer Termination Data termination signals are required after each data beat in a data transfer Note that in a single beat transaction the data termination signals also indicate the end of the tenure while in burst accesses the data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat The 750 generates an address o
147. BL reflects the M bit value specified for the memory reference in the corresponding translation descriptor s Note that care must be taken to minimize the number of pages marked as global because the retry protocol discussed in the previous section is used to enforce coherency and can require significant bus bandwidth When the 750 is not the address bus master GBL is an input The 750 snoops a transaction if TS and GBL are asserted together in the same bus clock cycle this is a qualified snooping condition No snoop update to the 750 cache occurs if the snooped transaction is not marked global This includes invalidation cycles When the 750 detects a qualified snoop condition the address associated with the TS is compared against the data cache tags Snooping completes if no hit is detected If however the address hits in the cache the 750 reacts according to the MEI protocol shown in Figure 8 15 assuming the WIM bits are set to write back caching allowed and coherency enforced modes WIM 001 Chapter 8 Bus Interface Operation 8 31 SH CRW SH CRW WM RM WH en RH RH WH SH CIR BUS TRANSACTIONS SH Snoop Hit p Snoop Push RH Read Hit WH Write Hit p Cache Line Fill WM Write Miss RM Read Miss SH CRW Snoop Hit Cacheable Read Write SH CIR Snoop Hit Caching Inhibited Read Figure 8 15 MEI Cache Coherency Protocol State Diagram WIM 001 8 32 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Ma
148. BR Breakpoint is disabled Address is unknown L2 CR 00000000 MMCRn 00000000 THRMn 00000000 UMMCRn 00000000 UPMCn 00000000 USIA 00000000 XER 00000000 PMCn Unknown ICTC 00000000 4 16 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The following is also true after a hard reset operation e External checkstops are enabled e The on chip test interface has given control of the I Os to the rest of the chip for functional use e Since the reset exception has data and instruction translation disabled MSR DR and MSR IR both cleared the chip operates in direct address translation mode referred to as the real addressing mode in the architecture specification Time from HRESET deassertion until the 750 asserts the first TS bus parked on the 750 or BG is 8 to 12 bus clocks SYSCLK 4 5 2 Machine Check Exception 0x00200 The 750 implements the machine check exception as defined in the PowerPC architecture OEA It conditionally initiates a machine check exception after an address or data parity error occurred on the bus or in either the L1 or L2 cache after receiving a qualified transfer error acknowledge TEA indication on the 750 bus or after the machine check interrupt MCP signal had been asserted As defined in the OEA the exception is not taken if MSR ME is cleared in which case the processor enters checkstop state Certain machine check conditions can be enabled and disabled
149. Because this bit is cleared by a hard reset but not by a soft reset software can set this bit after a hard reset and tell whether a subsequent reset is a hard or soft reset by examining whether this bit is still set See 2 1 2 2 The first bus operation following the negation of HRESET or the assertion of SRESET will be a single beat instruction fetch caching will be inhibited to x00100 Table 4 7 lists register settings when a system reset exception is taken Table 4 7 System Reset Exception Register Settings Setting Description SRRO Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present SRR1 0 Loaded with equivalent MSR bits 1 4 Cleared 5 9 Loaded with equivalent MSR bits 10 15 Cleared 16 31 Loaded with equivalent MSR bits Note that if the processor state is corrupted to the extent that execution cannot resume reliably MSR RI SRR1 30 is cleared Chapter 4 Exceptions 4 13 Table 4 7 System Reset Exception Register Settings Continued DR PM 0 0 RI 0 LE Ss et to value of ILE 4 5 1 1 Soft Reset If SRESET is asserted the processor is first put in a recoverable state To do this the 750 allows any instruction at the point of completion to either complete or take an exception blocks completion of any following instructions and allows the completion queue to drain The state before the exception occurred is then s
150. Both instructions must use the same EA Reservation location is not word aligned an alignment exception occurs The stwex instruction is the only load store instruction with a valid form if Rc is set If Rc is zero executing stwex sets CRO to an undefined value In general stwcx always causes a transaction on the external bus and thus operates with slightly worse performance characteristics than normal store operations granularity is implementation dependent The 750 makes reservations on Store Word rS rA rB behalf of aligned 32 byte sections of the memory address space If the W bit is Conditional set executing lwarx and stwex to a page marked write through does not Indexed cause a DSI exception but DSI exceptions can result for other reasons If the Chapter 2 Programming Model 2 59 Table 2 49 Memory Synchronization Instructions UISA Continued Synchronize sync Because it delays subsequent instructions until all previous instructions complete to where they cannot cause an exception sync is a barrier against store gathering Additionally all load store cache bus activities initiated by prior instructions are completed Touch load operations debt dcbtst must complete address translation but need not complete on the bus If HIDO ABE 1 sync completes after a successful broadcast The latency of syne depends on the processor state when it is dispatched and on various system level situations Therefore frequent us
151. C 740 PowerPC 750 Overview 1 9 Instructions cannot be dispatched to an execution unit unless there is a vacancy in the completion queue Branch instructions that do not update the CTR or LR are removed from the instruction stream and do not take an entry in the completion queue Instructions that update the CTR and LR follow the same dispatch and completion procedures as non branch instructions except that they are not issued to an execution unit Completing an instruction commits execution results to architected registers GPRs FPRs LR and CTR In order completion ensures the correct architectural state when the 750 must recover from a mispredicted branch or any exception Retiring an instruction removes it from the completion queue For a more detailed discussion of instruction completion see Section 6 3 3 Instruction Dispatch and Completion Considerations 1 2 2 4 Independent Execution Units In addition to the BPU the 750 provides the five execution units described in the following sections 1 2 2 4 1 Integer Units IUs The integer units U1 and U2 are shown in Figure 1 1 The IU1 can execute any integer instruction the U2 can execute any integer instruction except multiplication and division instructions Each IU has a single entry reservation station that can receive instructions from the dispatch unit and operands from the GPRs or the rename buffers Each IU consists of three single cycle subunits a fast adder compa
152. CFI DCE DLOCK bits 3 13 organization 3 4 Data organization in memory 2 28 Data transfers alignment 8 18 burst ordering 8 17 eciwx and ecowx instructions alignment 8 21 operand conventions 2 28 signals 8 25 DBB data bus busy signal 7 16 8 10 8 24 DBDIS data bus disable signal 7 19 DBG data bus grant signal 7 15 8 10 DBWO data bus write only signal 7 16 8 10 8 25 8 45 dcbi 2 66 debt 2 63 DEC decrementer register 2 7 Decrementer exception 4 21 Defined instruction class 2 33 DHn DLn data bus signals 7 17 Dispatch considerations 6 16 dispatch unit resource requirements 6 30 DPn data bus parity signals 7 18 DRTRY data retry signal 7 20 8 26 8 29 DSI exception 4 19 DSISR register 2 6 DTLB organization 5 25 Dynamic branch prediction 6 9 E EAR external access register 2 7 Effective address calculation address translation 5 4 branches 2 35 loads and stores 2 35 2 46 2 51 eleio 2 62 EMI protocol enforcing memory coherency 8 30 Index Enveloped high priority cache block push op eration 3 22 Error termination 8 30 Event counting 11 11 Event selection 11 12 Exceptions alignment exception 4 20 decrementer exception 4 21 definitions 4 12 DSI exception 4 19 enabling and disabling exceptions 4 10 exception classes 4 2 exception prefix IP bit 4 13 exception priorities 4 4 exception processing 4 7 4 10 external interrupt 4 20 FP assist exception 4 22 FP unavailable exception 4 21 instruct
153. Cache Testing A typical test for verifying the proper operation of the 750 s L2 cache memory external SRAM and tag would perform the following steps 1 Initialize the L2 test sequence by disabling address translation to invoke the default WIMG setting 0b0011 Set L2CR DO and L2CR TS and perform a global invalidation of the L1 data cache and the L2 cache The L1 instruction cache can remain enabled to improve execution efficiency 2 Test the L2 cache external SRAM by enabling the L1 data cache and executing a sequence of dcbz stw and debf instructions to initialize the L2 cache with a desired range of consecutive addresses and with cache data consisting of zeros Once the L2 cache holds a sequential range of addresses disable the L1 data cache and execute a series of single beat load and store operations employing a variety of bit patterns to test for stuck bits and pattern sensitivities in the L2 cache SRAM The performance monitor can be used to verify whether the number of L2 cache hits or misses corresponds to the tests performed 3 Test the L2 cache tag memory by enabling the L1 data cache and executing a sequence of dcbz stw and dcbf instructions to initialize the L2 cache with a wide range of addresses and cache data Once the L2 cache is populated with a known range of addresses and data disable the L1 data cache and execute a series of store operations to addresses not previously in the L2 cache These store operations should
154. Convert to Integer Double Word Appendix B Instructions Not Implemented B 1 Table B 2 64 Bit Instructions Not Implemented Continued EE Ee CON CEC CON EEC ECON CECI CON EII eg IN e IN E EN ECON TEN man EIN COI CCE om CEN am TEN isc EIN CN EIN CO EN COMETEN COMETEN COMETEN ECN CEIC ECN EXIT CONAN EITC CON EXC CN EXI CN ENTE ECON ETC EN EITC ECN EIC CN EXE CON ECT CIN TN CON ETC IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Glossary of Terms and Abbreviations G 1 Alphabetical List The glossary contains an alphabetical list of terms phrases and abbreviations used in this book Some of the terms and definitions included in the glossary are reprinted from JEEE Std 754 1985 IEEE Standard for Binary Floating Point Arithmetic copyright 1985 by the Institute of Electrical and Electronics Engineers Inc with the permission of the IEEE A Architecture A detailed specification of requirements for a processor or computer system It does not specify details of how the processor or computer system must be implemented instead it provides a template for a family of compatible implementations Asynchronous exception Exceptions that are caused by events external to the processor s execution In this document the term asynchronous exception is used interchangeably with the word interrupt Atomic access A bus access that attempts to be part of a read write operation to the same addr
155. Data address register DAR After a DSI or an alignment exception DAR is set to the effective address EA generated by the faulting instruction See Data Address Register DAR in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information SPRGO SPRG3 The SPRGO SPRG3 registers are provided for operating system use See SPRGO SPRG3 in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information DSISR The DSISR register defines the cause of DSI and alignment exceptions See DSISR in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information Machine status save restore register 0 SRRO The SRRO register is used to save the address of the instruction at which execution continues when rfi executes at the end of an exception handler routine See Machine Status Save Restore Register 0 SRRO in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information Machine status save restore register 1 SRR1 The SRR1 register is used to save machine status on exceptions and to restore machine status when rfi executes See Machine Status Save Restore Register 1 SRR1 in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information Implementation Note When a machine check exception occurs the 750 sets one or
156. FPSCR 29 is set In this mode denormalized numbers NaNs and some IEEE invalid operations are treated in a non EEE conforming manner This is accomplished by delivering results that approximate the values required by the IEEE standard Table 2 19 summarizes the conditions and mode behavior for operands Chapter 2 Programming Model 2 29 Table 2 19 Floating Point Operand Data Type Behavior Operand A Operand B Operand C IEEE a Non IEEE Mode Data Type Data Type Data Type NI NI 1 Single denormalized Single denormalized Single denormalized Normalize all three Zero all three Double denormalized Double denormalized Double denormalized Single denormalized Single denormalized Normalized or zero Normalize Aand B Zero A and B Double denormalized Double denormalized Normalized or zero Single denormalized Single denormalized Normalize Band C Zero B and C Double denormalized Double denormalized Single denormalized Normalized or zero Single denormalized Normalize Aand C Zero A and C Double denormalized Double denormalized Single denormalized Normalized or zero Normalized or zero Normalize A Zero A Double denormalized Normalized or zero Single denormalized Normalized or zero Normalize B Zero B Double denormalized Normalized or zero Normalized or zero Single denormalized Normalize C Zero C Double denormalized Single QNaN Don t care Don t care QNaN QNaN Single SNaN Double QNaN Double SNaN Don t care Single QNaN Don t care QNaN QN
157. Floating Point Rounding and Conversion Instructions e LL es Tage Dec IL mm E 7770 MIN Floating Convert to Integer Word with Round toward Zero fetiwz fctiwz pp 2 3 4 2 4 Floating Point Compare Instructions Floating point compare instructions compare the contents of two floating point registers The comparison ignores the sign of zero that is 0 01 The floating point compare instructions are summarized in Table 2 29 Table 2 29 Floating Point Compare Instructions tee Iesel om Floating Compare Unordered fempu crfD frA frB Floating Compare Ordered fompo crfD frA frB Chapter 2 Programming Model 2 43 The PowerPC architecture allows an fempu or Tempo instruction with the Rc bit set to produce a boundedly undefined result which may include an illegal instruction program exception In the 750 crfD should be treated as undefined 2 3 4 2 5 Floating Point Status and Control Register Instructions Every FPSCR instruction appears to synchronize the effects of all floating point instructions executed by a given processor Executing an FPSCR instruction ensures that all floating point instructions previously initiated by the given processor appear to have completed before the FPSCR instruction is initiated and that no subsequent floating point instructions appear to be initiated by the given processor until the FPSCR instruction has completed The FPSCR instructions are summarized in Table 2 30 Table 2 30 Floatin
158. GBL is not asserted for the transaction that transaction is not snooped by the 750 Note that the GBL signal is not asserted for instruction fetches and that GBL is asserted for all data read or write operations when using real addressing mode that is address translation is disabled Normally GBL reflects the M bit value specified for the memory reference in the corresponding translation descriptor s Care should be taken to minimize the number of pages marked as global because the retry protocol enforces coherency and can use considerable bus bandwidth if much data is shared Therefore available bus bandwidth decreases as more memory is marked as global Chapter 3 Instruction and Data Cache Operation 3 9 The 750 snoops a transaction if the transfer start TS and GBL signals are asserted together in the same bus clock this is a qualified snooping condition No snoop update to the 750 cache occurs if the snooped transaction is not marked global Also because cache block castouts and snoop pushes do not require snooping the GBL signal is not asserted for these operations When the 750 detects a qualified snoop condition the address associated with the TS signal is compared with the cache tags Snooping finishes if no hit is detected If however the address hits in the cache the 750 reacts according to the MEI protocol shown in Figure 3 4 3 3 3 Coherency Precautions in Single Processor Systems The following coherenc
159. GK21 0263 00 2 23 99 PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual TEH cwerPE IBM1999 Portions hereof IMotorola Inc 1999 All rights reserved This document contains information on a new product under development by IBM IBM reserve the right to change or discontinue this product without notice Information in this document is provided solely to enable system and software implementers to use PowerPC microprocessors There are no express or implied copyright or patent licenses granted hereunder by IBM to design modify the design of or fabricate circuits based on the information in this document The PowerPC 750 microprocessor embodies the intellectual property of IBM However IBM does not assume any responsibility or liability as to any aspects of the performance operation or other attributes of the microprocessor as marketed by the other party or by any third party IBM has neither assumed created or granted hereby any right or authority to any third party to assume or create any express or implied obligations on its behalf Information such as data sheets as well as sales terms and conditions such as prices schedules and support for the product may vary as between parties selling the product Accordingly customers wishing to learn more information about the products as marketed by a given party should contact that party IBM reserves the right to modify this manual and or any of the products as descri
160. ICI FI TI ECC EE pm e eam INICIAR CET p EE pu A pr A CCOO IONES ACI CTI EI ES ECON ION IO CI E ES Chapter 6 Instruction Timing 6 33 Table 6 6 Integer Instructions Continued ee ss me Lesser eee p ESS A pe o C INE ARNET CI EI ES ECC ION FEET CI E E ECON IRE FOIS IN EOI EA ma ION fe cema IONES ACI pe E EE ECONO IONES A TA EAN EE ew a INE pe EA ES C fe fume ff CECR C ION FOIOS IN II ES ECON ION ae TI E EC ECC sr ae ome EA CI fewest oe E ES A e C w f e INEA e COI ESA per me E E pews ee o ES pe roe EAN ES Table 6 7 shows latencies for floating point instructions Pipelined floating point instructions are shown with number of clocks in each pipeline stage separated by dashes Floating point instructions with a single entry in the cycles column are not pipelined when the FPU executes these nonpipelined instructions it remains busy for the full duration of the instruction execution and is not available for subsequent instructions Table 6 7 Floating Point Instructions 6 34 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 6 7 Floating Point Instructions Continued al Kei T U E O de O N N ch m m m U U U C C CIC al o ye o Execution Execution O o ye o m U Chapter 6 Instruction Timing 6 35 Table 6 8 shows load and store instruction latencies Pipelined load store instructions are shown with cycles of total latency and throughput c
161. INT The INT signal is expected to remain asserted until the 750 takes the external interrupt exception If INT is negated early recognition of the interrupt request is not guaranteed After the 750 begins execution of the external interrupt handler the system can safely negate the INT When the 750 detects assertion of INT it stops dispatching and waits for all pending instructions to complete This allows any instructions in progress that need to take an exception to do so before the external interrupt is taken After all instructions have vacated the completion buffer the 750 takes the external interrupt exception as defined in the PowerPC architecture OEA An external interrupt may be delayed by other higher priority exceptions or if MSR EE is cleared when the exception occurs Register settings for this exception are described in Chapter 6 Exceptions in The Programming Environments Manual When an external interrupt exception is taken instruction fetching resumes at offset 0x00500 from the physical base address indicated by MSR IP 4 5 6 Alignment Exception 0x00600 The 750 implements the alignment exception as defined by the PowerPC architecture OEA An alignment exception is initiated when any of the following occurs e The operand of a floating point load or store is not word aligned e The operand of Imw stmw Iwarx or stwex is not word aligned e The operand of dcbz is in a page that is write through or cache
162. If no match is found in the L2 cache tags the physical address is used to access system memory In addition to the loads stores and instruction fetches the 750 performs hardware table search operations following TLB misses L2 cache cast out operations when least recently used cache lines are written to memory after a cache miss and cache line snoop push out operations when a modified cache line experiences a snoop hit from another bus master Figure 8 2 shows the address path from the execution units and instruction fetcher through the translation logic to the caches and bus interface logic The 750 uses separate address and data buses and a variety of control and status signals for performing reads and writes The address bus is 32 bits wide and the data bus is 64 bits wide The interface is synchronous all 750 inputs are sampled at and all outputs are driven from the rising edge of the bus clock The processor runs at a multiple of the bus clock speed 8 1 1 Operation of the Instruction and Data L1 Caches The 750 provides independent instruction and data L1 caches Each cache is a physically addressed 32 Kbyte cache with eight way set associativity Both caches consist of 128 sets of eight cache lines with eight words in each cache line Because the data cache on the 750 is an on chip write back primary cache the predominant type of transaction for most applications is burst read memory operations followed by burst write memory o
163. MCR1 2 1 2 4 4 User Monitor Mode Control Register 1 UMMCR1 The contents of MMCR1 are reflected to UMMCRI which can be read by user level software MMCRI can be accessed with mfspr using SPR 940 2 1 2 4 5 Performance Monitor Counter Registers PMC1 PMC4 PMC1 PMC4 shown in Figure 2 7 are 32 bit counters that can be programmed to generate interrupt signals when they overflow Counter Value o 1 31 Figure 2 7 Performance Monitor Counter Registers PMC1 PMC4 The bits contained in the PMChn registers are described in Table 2 9 2 16 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 9 PMCn Bit Settings eT ee me e o Overflow When this bit is set it indicates that this counter has reached its maximum value 1 31 Indicates the number of occurrences of the specified event Counters are considered to overflow when the high order bit the sign bit becomes set that is they reach the value 2147483648 0x8000_0000 However an interrupt is not signaled unless both PMCn INTCONTROL and MMCRO ENINT are also set Note that the interrupts can be masked by clearing MSR EE the interrupt signal condition may occur with MSR EE cleared but the exception is not taken until EE is set Setting MMCRO DISCOUNT forces counters to stop counting when a counter interrupt occurs Software is expected to use mtspr to set PMC explicitly to nonoverflow values If software sets an overflow value an erroneous
164. NaN SNaN Return QNaN Return QNaN Not supported by 750 Not supported by 750 2 3 Instruction Set Summary This chapter describes instructions and addressing modes defined for the 750 These instructions are divided into the following functional categories e Integer instructions These include arithmetic and logical instructions For more information see Section 2 3 4 1 Integer Instructions e Floating point instructions These include floating point arithmetic instructions as well as instructions that affect the floating point status and control register FPSCR For more information see Section 2 3 4 2 Floating Point Instructions e Load and store instructions These include integer and floating point load and store instructions For more information see Section 2 3 4 3 Load and Store Instructions e Flow control instructions These include branching instructions condition register logical instructions trap instructions and other instructions that affect the instruction flow For more information see Section 2 3 4 4 Branch and Flow Control Instructions e Processor control instructions These instructions are used for synchronizing memory accesses and managing caches TLBs and segment registers For more information see Section 2 3 4 6 Processor Control Instructions UISA Section 2 3 5 1 Processor Control Instructions VEA and Section 2 3 6 2 Processor Control Instructions
165. OE 235 Re negx 31 D A 00000 OE 104 Re subfx 31 D A B OE 40 Re subfcx 31 D A B OE 8 Re subfex 31 D A B OE 136 Re subfmex 31 D A 00000 OE 232 Re subfzex 31 D A 00000 OE 200 Re Note 1 64 bit instruction Table A 42 A Form OPCD D A B 00000 XO Re OPCD D XO Re OPCD D XO Re OPCD D XO Re Specific Instructions Name 0 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 faddx 63 D A B 00000 21 Re faddsx 59 D A B 21 Re fdivx 63 D A B 18 Re fdivsx 59 D A 18 Re fmaddx 63 D A 29 Re fmaddsx 59 D A C 29 Re fmsubx 63 D A C 28 Re fmulx 63 D A C 25 Rc fmulsx 59 D A 00000 C 25 Rc fnmaddx 63 D A B C 31 Rc fnmaddsx 59 D A B C 31 Re A 38 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual fnmsubx fnmsubsx fresx frsqrtex fselx fsqrtx 1 fsqrtsx 12 fsubx fsubsx 63 D 30 Re 59 D 30 Re 59 D 24 Re 63 D 26 Re 63 D 23 Re 63 D 22 Re 59 D 22 Re 63 D 20 Re 59 D 20 Re Note 1 Optional instruction 2 32 bit instruction not implemented by the PowerPC 750 Table A 43 M Form OPCD S A SH MB ME Rc OPCD S A B MB ME Rc Specific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 rlwimix rlwinmx rlwnmx Specific Instructions Name 0 5 6 7 8 9 10 1
166. OEA This section describes the processor control instructions used to access the MSR and the SPRs Table 2 55 lists instructions for accessing the MSR Table 2 55 Move to from Machine State Register Instructions Move to Machine State Register Move from Machine State Register Chapter 2 Programming Model 2 65 The OEA defines encodings of mtspr and mfspr to provide access to supervisor level registers The instructions are listed in Table 2 56 Table 2 56 Move to from Special Purpose Register Instructions OEA SSC Move from Special Purpose Register Move to Special Purpose Register wen SPR rS Encodings for the architecture defined SPRs are listed in Table 2 47 Encodings for 750 specific supervisor level SPRs are listed in Table 2 48 Simplified mnemonics are provided for mtspr and mfspr in Appendix F Simplified Mnemonics in The Programming Environments Manual For a discussion of context synchronization requirements when altering certain SPRs refer to Appendix E Synchronization Programming Examples in The Programming Environments Manual 2 3 6 3 Memory Control Instructions OEA Memory control instructions include the following e Cache management instructions supervisor level and user level e Segment register manipulation instructions e Translation lookaside buffer management instructions This section describes supervisor level memory control instructions Section 2 3 5 3 Memory Control I
167. PC Instruction Set Listings A 47 Table A 47 PowerPC Instruction Set Legend Continued Form Optional 64 Bit Bridge Bit Only 64 Supervisor Ww gt IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual A 48 Table A 47 PowerPC Instruction Set Legend Continued Form Optional 64 Bit Bridge Bit Only 64 Supervisor Ww gt Oo A E gt n fmaddsx fmsubsx fnmaddx fnmaddsx fnmsubx fnmsubsx frsqrtex A 49 Appendix A PowerPC Instruction Set Listings Table A 47 PowerPC Instruction Set Legend Continued Form IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual A 50 Table A 47 PowerPC Instruction Set Legend Continued Form gt lt x lt lt gt lt Y Ofojojojo Optional 64 Bit Bridge Bit Only 64 Supervisor Ww gt mtfsb0x mtfsb1x mulhdux mulhwux A 51 Appendix A PowerPC Instruction Set Listings Table A 47 PowerPC Instruction Set Legend Continued Form V jo O O qy yaqa ja sa O Ke Optional 64 Bit Bridge Bit Only 64 Supervisor W gt rlwinmx srawix IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual A 52 Table A 47 PowerPC Instruction Set Legend Continued Form A 53 Appendix A PowerPC Instruction Set Listings Table A 47 PowerPC Instruction Set Legend Continued Supervisor 64 Bit 64 Bit Optional Form Level Only B
168. PC architecture See Chapter 11 Performance Monitor O Process is not a marked process Process is a marked process IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Note that setting MSR EE masks not only the architecture defined external interrupt and decrementer exceptions but also the 750 specific system management performance monitor and thermal management exceptions Processor version register PVR This register is a read only register that identifies the version model and revision level of the PowerPC processor For more information see Processor Version Register PVR in Chapter 2 PowerPC Register Set of The Programming Environments Manual Implementation Note The processor version information is listed in the PowerPC 740 and PowerPC 750 Embedded Microprocessor Hardware Specifications The processor revision level starts at 0x0100 and is updated for each silicon revision Memory management registers Block address translation BAT registers The PowerPC OEA includes an array of block address translation registers that can be used to specify four blocks of instruction space and four blocks of data space The BAT registers are implemented in pairs four pairs of instruction BATs IBATOU IBAT3U and IBATOL IBAT3L and four pairs of data BATs DBATOU DBAT3U and DBATOL DBAT3L Figure 2 1 lists the SPR numbers for the BAT registers For more information see BAT Regis
169. PU extracts branch instructions from the sequential fetcher Branch instructions that cannot be resolved immediately are predicted using either the 750 specific dynamic branch prediction or the architecture defined static branch prediction Branch instructions that do not affect the LR or CTR are removed from the instruction stream The BPU folds branch instructions when a branch is taken or predicted as taken branch instructions that are not taken or predicted as not taken are removed from the instruction stream through the dispatch mechanism Instructions issued beyond a predicted branch do not complete execution until the branch is resolved preserving the programming model of sequential execution If branch prediction is incorrect the instruction unit flushes all predicted path instructions and instructions are fetched from the correct path 1 2 2 1 Instruction Queue and Dispatch Unit The instruction queue IQ shown in Figure 1 1 holds as many as six instructions and loads up to four instructions from the instruction cache during a single processor clock cycle The instruction fetcher continuously attempts to load as many instructions as there were vacancies in the IQ in the previous clock cycle All instructions except branch instructions are dispatched to their respective execution units from the bottom two positions in the instruction queue IQO and IQ1 at a maximum rate of two instructions per cycle Reservation stations are provide
170. Power management enabled reduced power mode Power management functions are implementation dependent See Chapter 10 Reserved Implementation specific Exception little endian mode When an exception occurs this bit is copied into MSR LE to select the endian mode for the context established by the exception 1 5 13 P 14 1 L 1 E External interrupt enable O The processor delays recognition of external interrupts and decrementer exception conditions 1 The processor is enabled to take an external interrupt or the decrementer exception 1 18 F Floating point available O The processor prevents dispatch of floating point instructions including floating point loads stores and moves The processor can execute floating point instructions and can take floating point enabled program exceptions 19 M Machine check enable O Machine check exceptions are disabled 1 Machine check exceptions are enabled IEEE floating point exception mode 0 see Table 4 5 21 SE Single step trace enable O The processor executes instructions normally 1 The processor generates a single step trace exception upon the successful execution of every instruction except rfi isync and sc Successful execution means that the instruction caused no other exception Privilege level O The processor can execute both user and supervisor level instructions 1 The processor can only execute user level instructions 4 9 5 6 7 0 4 8 IBM PowerPC 7
171. PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual instructions not implemented B 1 integer arithmetic 2 38 A 17 compare 2 39 A 18 load A 22 load store multiple A 23 load store string A 24 load store with byte reverse A 23 logical 2 40 A 18 rotate and shift 2 40 A 19 store A 23 integer instructions 6 33 isync 4 12 latency summary 6 31 load and store address generation floating point 2 51 integer 2 46 byte reverse instructions 2 49 A 23 floating point load A 24 floating point move 2 44 A 25 floating point store 2 51 handling misalignment 2 45 integer load 2 46 A 22 integer multiple 2 49 integer store 2 47 A 23 memory synchronization 2 59 2 61 A 24 multiple instructions A 23 string instructions 2 50 A 24 lookaside buffer management instructions A 28 memory control instructions 2 62 2 66 memory synchronization instructions 2 59 2 61 A 24 PowerPC instructions list A 1 A 9 A 17 processor control instructions 2 55 2 60 2 65 A 27 reserved instructions 2 34 rfi 4 11 segment register manipulation instructions A 28 SLB management instructions A 28 stwex 4 12 support for lwarx stwex 8 43 Index sync 4 12 system linkage instructions 2 55 A 26 TLB management instructions A 28 tlbie 2 67 tlbsync 2 67 trap instructions 2 55 A 26 INT interrupt signal 7 21 8 42 Integer arithmetic instructions 2 38 A 17 Integer compare instructions 2 39 A 18 Integer load instructions 2 46 A 22 Integer log
172. R 11 Lower priority ooo E exception conditions are shown below s Any alignment exception condition prioritized as follows Floating point access not word aligned Imw stmw Iwarx stwex not word aligned eciwx or ecowx not word aligned Multiple or string access with MSR LE set dcbz to write through or cache inhibited page or cache is disabled INCA s ECT a Any access except cache operations to a segment where SR T 1 DSISR 5 or an access crosses from a T 0 segment to one where T 1 DSISR 5 a VET 4 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 4 3 PowerPC 750 Exception Priorities Continued Post Instruction Execution Exceptions MSR SE 1 or MSR BE 1 for branches System reset and machine check exceptions may occur at any time and are not delayed even if an exception is being handled As a result state information for an interrupted exception may be lost therefore these exceptions are typically nonrecoverable An exception may not be taken immediately when it is recognized 4 3 Exception Processing When an exception is taken the processor uses SRRO and SRR1 to save the contents of the MSR for the current context and to identify where instruction execution should resume after the exception is handled When an exception occurs the address saved in SRRO helps determine where instruction processing should resume when the exception handler returns control to the interrupted
173. R Encodings for PowerPC 750 Defined Registers mfspr egister mfspr mtspr ECC see or CI eer CN 2 58 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 48 SPR Encodings for PowerPC 750 Defined Registers mfspr Continued Register Note Note that the order of the two 5 bit halves of the SPR number is reversed compared with actual instruction coding For mtspr and mfspr instructions the SPR number coded in assembly language does not appear directly as a 10 bit binary number in the instruction The number coded is split into two 5 bit halves that are reversed in the instruction with the high order 5 bits appearing in bits 16 20 of the instruction and the low order 5 bits in bits 11 15 2 3 4 7 Memory Synchronization Instructions UISA Memory synchronization instructions control the order in which memory operations are completed with respect to asynchronous events and the order in which memory operations are seen by other processors or memory access mechanisms See Chapter 3 Instruction and Data Cache Operation for additional information about these instructions and about related aspects of memory synchronization See Table 2 49 for a summary Table 2 49 Memory Synchronization Instructions UISA Load Word rD rA rB Programmers can use Iwarx with stwex to emulate common semaphore and Reserve operations such as test and set compare and swap exchange memory and Indexed fetch and add
174. R bits 10 15 Cleared 16 31 Loaded with equivalent MSR bits MSR et to value of ILE The 750 requires that an mtspr to the IABR be followed by a context synchronizing instruction The 750 cannot generate a breakpoint response for that context synchronizing instruction if the breakpoint is enabled by the mtspr IABR immediately preceding it The 750 also cannot block a breakpoint response on the context synchronizing instruction if the breakpoint was disabled by the mtspr IABR instruction immediately preceding it The format of the ABR register is shown in 2 1 2 1 When an instruction address breakpoint exception is taken instruction fetching resumes as offset 0x01300 from the base address indicated by MSR IP 4 24 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 4 5 15 System Management Interrupt 0x01400 The 750 implements a system management interrupt exception which is not defined by the PowerPC architecture The system management exception is very similar to the external interrupt exception and is particularly useful in implementing the nap mode It has priority over an external interrupt see Table 4 3 and it uses a different vector in the exception table offset 0x01400 Table 4 15 lists register settings when a system management interrupt exception is taken Table 4 15 System Management Interrupt Exception Register Settings Setting Description SRRO Set to the effective address of the instruction that
175. RESET and they must be initialized with some valid value after POR HRESET and before being stored 2 52 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 40 shows the conversions made when performing a Store Floating Point Double instruction Most entries in the table indicate that the floating point value is simply stored Only in a few cases are any other actions taken Table 2 40 Store Floating Point Double Behavior me ome Lee Architecturally all floating point numbers are represented in double precision format within the 750 Execution of a store floating point single stfs stfsu stfsx stfsux instruction requires conversion from double to single precision format If the exponent is not greater than 896 this conversion requires denormalization The 750 supports this denormalization by shifting the mantissa one bit at a time Anywhere from 1 to 23 clock cycles are required to complete the denormalization depending upon the value to be stored Because of how floating point numbers are implemented in the 750 there is also a case when execution of a store floating point double stfd stfdu stfdx stfdux instruction can require internal shifting of the mantissa This case occurs when the operand of a store floating point double instruction is a denormalized single precision value The value could be the result of a load floating point single instruction a single precision arithmetic instruction or
176. RM register that contains operating parameters for an ongoing comparison during operation of the thermal assist unit the respective TIV bits are cleared and the comparison is restarted Changing THRM3 forces the TTV bits of both THRM1 and THRM2 to 0 and restarts the comparison if THRM3 E is set 2 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Examples of valid THRM1 THRM2 bit settings are shown in Table 2 16 Table 2 16 Valid THRM1 THRM2 States PFRI Invalid entry The threshold in the SPR is not used for comparison El EES ES Disable thermal management interrupt assertion x x x 1 Set TIN and assert thermal management interrupt if TIE 1 and the junction temperature exceeds the threshold D x 1 x 1 Set TIN and assert thermal management interrupt if TIE 1 and the junction temperature is less than the threshold xof xa The state of the TIN bit is not valid 1 D 1 The junction temperature is less than the threshold and as a result the thermal management interrupt is not generated for TIE 1 1 1 x 1 The junction temperature is greater than the threshold and as a result the thermal management interrupt is generated if TIE 1 1 1 D 1 The junction temperature is greater than the threshold and as a result the thermal management interrupt is not generated for TIE 1 1 1 1 D 1 The junction temperature is less than the threshold and as a result the thermal management interrupt is generated if TIE 1
177. RR1 3 1 matching BAT entry and PTE G 1 In addition to the translation exceptions there are other MMU related conditions some of them defined as implementation specific and therefore not required by the architecture that can cause an exception to occur These exception conditions map to processor exceptions as shown in Table 5 4 The only MMU exception conditions that occur when MSR DR 0 are those that cause an alignment exception for data accesses For more detailed information about the conditions that cause an alignment exception in particular for string multiple instructions see Section 4 5 6 Alignment Exception 0x00600 Note that some exception conditions depend upon whether the memory area is set up as write though W 1 or cache inhibited I 1 These bits are described fully in Memory Cache Access Attributes in Chapter 5 Cache Model and Memory Coherency of The Programming Environments Manual Refer to Chapter 4 Exceptions and to Chapter 6 Exceptions in The Programming Environments Manual for a complete description of the SRR1 and DSISR bit settings for these exceptions Chapter 5 Memory Management 5 17 Table 5 4 Other MMU Exception Conditions for the PowerPC 750 Processor el Fe JL Fe dcbz with W 1 orl 1 dcbz instruction to write through or Alignment exception not cache inhibited segment or block required by architecture for this condition Iwarx or stwex with W 1
178. Re mulhdx 1 31 D A B 0 73 Re mulhdux 31 D A B 0 9 Re mulhwx 31 D A B 0 75 Re mulhwux 31 D A B 0 11 Re mulld 31 D A B OE 233 Re mulli 07 D A SIMM mullwx 31 D A B OE 235 Re negx 31 D A 00000 OE 104 Re subfx 31 D A B OE 40 Re subfcx 31 D A B OE 8 Re subficx 08 D A SIMM subfex 31 D A B OE 136 Re subfmex 31 D A 00000 OE 232 Re subfzex 31 D A 00000 OE 200 Re Note 1 64 bit instruction Appendix A PowerPC Instruction Set Listings A 17 Table A 4 Integer Compare Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 cmp 31 cfD JOJL A B 0000000000 0 cmpi 11 cr D O L A SIMM cmpl 31 crfD A B 32 0 cmpli 10 crfD A UIMM Table A 5 Integer Logical Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 andx 31 S A B 28 Rc andcx 31 S A B 60 Re andi 28 S A UIMM andis 29 S A UIMM cntlzdx 31 S A 00000 58 Re entlzwx 31 S A 00000 26 Rc eqvx 31 S A B 284 Rc extsbx 31 S A 00000 954 Re extshx 31 S A 00000 922 Rc extswx 31 S A 00000 986 Re nandx 31 S A B 476 Rc norx 31 S A B 124 Rc orx 31 S A B 444 Rc orcx 31 S A B 412 Re ori 24 S A UIMM oris 25 S A UIMM xorx 31 S A B 316 Rc xori 26 S A UIMM xoris 27 S A UIMM Note 1 64 bit instruction A 18 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table A 6 Integer Rotate Instructions Name 0 5 6 7 8
179. Real Addressing Mode and Block Note that if the BAT array search results in a hit the access is qualified with the appropriate protection bits If the access violates the protection mechanism an exception ISI or DSI exception is generated Chapter 5 Memory Management 5 1 6 2 Page Address Translation Selection If address translation is enabled and the effective address information does not match a BAT array entry the segment descriptor must be located When the segment descriptor is located the T bit in the segment descriptor selects whether the translation is to a page or to a direct store segment as shown in Figure 5 6 For 32 bit implementations the segment descriptor for an access is contained in one of 16 on chip segment registers effective address bits EA 0 3 select one of the 16 segment registers Note that the 750 does not implement the direct store interface and accesses to these segments cause a DSI or ISI exception In addition Figure 5 6 also shows the way in which the no execute protection is enforced if the N bit in the segment descriptor is set and the access is an instruction fetch the access is faulted as described in Chapter 7 Memory Management in The Programming Environments Manual Note that the figure shows the flow for these cases as described by the PowerPC OEA and so the TLB references are shown as optional Because the 750 implements TLBs these branches are valid and are described in more detail
180. SCLK is used as the frequency reference for the internal PLL clock generator and must not be suspended or varied during normal operation to ensure proper PLL operation 7 2 11 2 Clock Out CLK_OUT Output The clock out CLK_OUT signal is an output signal output only on the 750 Following are the state meaning and timing comments for the CLK_OUT signal State Meaning Asserted Negated Provides PLL clock output for PLL testing and monitoring The configuration of the HIDO SBCLK and HIDO ECLK bits determines whether the CLK_OUT signal clocks at either the processor clock frequency the bus clock frequency or half of the bus clock frequency See Table 2 5 for HIDO register configuration of the CLK_OUT signal The CLK_OUT signal defaults to a high impedance state following the assertion of HRESET The CLK_OUT signal is provided for testing only Timing Comments Assertion Negation Refer to the 750 hardware specifications for timing comments Chapter 7 Signal Descriptions 7 29 7 2 11 3 PLL Configuration PLL_CFG 0 3 Input The PLL phase locked loop is configured by the PLL_CFG O 3 signals For a given SYSCLK bus frequency the PLL configuration signals set the internal CPU frequency of operation Refer to the 750 hardware specifications for PLL configuration Following are the state meaning and timing comments for the PLL_CFG 0 3 signals State Meaning Asserted Negated Configures the operation of the PLL and the i
181. SIA User The user sampled instruction address register USIA provides user level read access to the SIA register 1 5 Instruction Set All PowerPC instructions are encoded as single word 32 bit opcodes Instruction formats are consistent among all instruction types permitting efficient decoding to occur in parallel with operand accesses This fixed instruction length and consistent format greatly simplifies instruction pipelining MMCRO upervisor The monitor mode control registers UMCRO MMCR1 are used to enable various MMCR1 performance monitoring interrupt functions UMMCRO UMMCR1 provide user level read For more information see Chapter 2 Programming Model 1 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 1 5 1 PowerPC Instruction Set The PowerPC instructions are divided into the following categories e Integer instructions These include computational and logical instructions Integer arithmetic instructions Integer compare instructions Integer logical instructions Integer rotate and shift instructions e Floating point instructions These include floating point computational instructions as well as instructions that affect the FPSCR Floating point arithmetic instructions Floating point multiply add instructions Floating point rounding and conversion instructions Floating point compare instructions Floating point status and control instructions
182. SRR1 1 4 10 15 are loaded with information specific to the exception type 3 SRR1 5 9 16 31 are loaded with a copy of the corresponding MSR bits Depending on the implementation reserved bits may not be copied 4 The MSR is set as described in Table 4 4 The new values take effect as the first instruction of the exception handler routine is fetched 4 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Note that MSR IR and MSR DR are cleared for all exception types therefore address translation is disabled for both instruction fetches and data accesses beginning with the first instruction of the exception handler routine Instruction fetch and execution resumes using the new MSR value at a location specific to the exception type The location is determined by adding the exception s vector see Table 4 2 to the base address determined by MSR IP If IP is cleared exceptions are vectored to the physical address 0x000n_nnnn If IP is set exceptions are vectored to the physical address OxFFFn_nnnn For a machine check exception that occurs when MSR ME 0 machine check exceptions are disabled the checkstop state is entered the machine stops executing instructions See 4 3 3 Setting MSR RI An operating system may handle MSR RI as follows In the machine check and system reset exceptions If MSR RI is cleared the exception is not recoverable If it is set the exception is recoverable with respect
183. Set The registers implemented on the 750 are shown in Figure 2 1 The number to the right of the special purpose registers SPRs indicates the number that is used in the syntax of the instruction operands to access the register for example the number used to access the Chapter 2 Programming Model 2 1 integer exception register XER is SPR 1 These registers can be accessed using the mtspr and mfspr instructions SUPERVISOR MODEL OEA Configuration Registers USER MODEL VEA Hardware Processor Implementation Version Machine State Time Base Facility For Reading Registers Register Register TBL TBR 268 TBU TBR 269 HIDO SPR 1008 PVR SPR 287 MSR HID1 SPR 1009 USER MODEL UISA Memory Management Registers Count General Purpose Instruction BAT Data BAT Segment Register Registers Registers Registers Registers CTR GPRO IBATOU SPR 528 DBATOU SPR 536 SRO XER GPR1 IBATOL SPR 529 DBATOL SPR 537 SR1 XER S IBAT1U SPR 530 DBAT1U SPR 538 e e IBAT1L SPR 531 DBAT1L SPR 539 Link Register GPR31 IBAT2U SPR 532 DBAT2U SPR 540 SR15 LR SPR8 et para SPR533 DBAT2L SPR 5
184. System memory The physical memory available to a processor T Tenure A tenure consists of three phases arbitration transfer termination There can be separate address bus tenures and data bus tenures TLB translation lookaside buffer A cache that holds recently used page table entries Throughput The measure of the number of instructions that are processed per clock cycle Transaction A complete exchange between two bus devices A transaction is minimally comprised of an address tenure one or more data tenures may be involved in the exchange Transfer termination Signal that refers to both signals that acknowledge the transfer of individual beats of both single beat transfer and individual beats of a burst transfer and to signals that mark the end of the tenure Glossary 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual U UISA user instruction set architecture The level of the architecture to which user level software should conform The UISA defines the base user level instruction set user level registers data types floating point memory conventions and exception model as seen by user programs and the memory and programming models Underflow A condition that occurs during arithmetic operations when the result cannot be represented accurately in the destination register For example underflow can happen if two floating point fractions are multiplied and the result requires a smaller exponent and or
185. T1 are dispatched 4 Inclock cycle 4 instruction 3 on which the second branch instruction depended writes back and the branch prediction is proven incorrect Even though TO is in CQ1 from which it could be written back it is not written back because the branch prediction was incorrect All target instructions are flushed from their positions in the pipeline at the end of this clock cycle as are any results in the rename registers After one clock cycle required to refetch the original instruction stream instruction 5 the same instruction that was fetched in clock cycle 1 is brought back into the IQ from the instruction cache along with three others not all of which are shown 6 4 2 Integer Unit Execution Timing The 750 has two integer units The IU1 can execute all integer instructions and the U2 can execute all integer instructions except multiply and divide instructions As shown in Figure 6 2 each integer unit has one execute pipeline stage thus when a multicycle integer instruction is being executed no other integer instructions can begin to execute Table 6 6 lists integer instruction latencies Most integer instructions have an execution latency of one clock cycle 6 4 3 Floating Point Unit Execution Timing The floating point unit on the 750 executes all floating point instructions Execution of most floating point instructions is pipelined within the FPU allowing up to three instructions to be executing in the FPU con
186. The external SRAMs are accessed through a dedicated L2 cache port that supports a single bank of up to 1 Mbyte of synchronous SRAMs The L2 cache normally operates in write back mode and supports system cache coherency through snooping Depending on its size the L2 cache is organized into 64 or 128 byte lines which in turn are subdivided into 32 byte sectors blocks the unit at which cache coherency is maintained The L2 cache controller contains the L2 cache control register L2CR which includes bits for enabling parity checking setting the L2 to processor clock ratio and identifying the type of RAM used for the L2 cache implementation The L2 cache controller also manages the L2 cache tag array two way set associative with 4K tags per way Each sector 32 byte cache block has its own valid and modified status bits Requests from the L1 cache generally result from instruction misses data load or store misses write through operations or cache management instructions Requests from the L1 cache are looked up in the L2 tags and serviced by the L2 cache if they hit they are forwarded to the bus interface if they miss The L2 cache can accept multiple simultaneous accesses The L1 instruction cache can request an instruction at the same time that the L1 data cache is requesting one load and two store operations The L2 cache also services snoop requests from the bus If there are multiple pending requests to the L2 cache snoop requests
187. The 750 does not accept a BG in the cycles between the assertion of any TS and AACK Negation May occur at any time to indicate the 750 cannot use the bus The 750 may still assume bus mastership on the bus clock cycle of the negation of BG because during the previous cycle BG indicated to the 750 that it could take mastership if qualified 7 2 1 3 Address Bus Busy ABB The address bus busy ABB signal is both an input and an output signal 7 2 1 3 1 Address Bus Busy ABB Output Following are the state meaning and timing comments for the ABB output signal State Meaning Timing Comments Asserted Indicates that the 750 is the address bus master See Section 8 3 1 Address Bus Arbitration Negated Indicates that the 750 is not using the address bus If ABB is negated during the bus clock cycle following a qualified bus grant the 750 did not accept mastership even if BR was asserted This can occur if a potential transaction is aborted internally before the transaction begins Assertion Occurs on the bus clock cycle following a qualified BG that is accepted by the processor see Negated Negation Occurs for a minimum of one half bus clock cycle following the assertion of AACK If ABB is negated during the bus clock cycle after a qualified bus grant the 750 did not accept mastership even if BR was asserted High Impedance Occurs after ABB is negated 7 2 1 3 2 Address Bus Busy ABB Input F
188. The operations specified by an instruction are being performed by 6 6 the appropriate execution unit The black stripe is a reminder that the instruction occupies an entry in the completion queue described in Figure 6 3 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Complete The instruction is in the completion queue In the final stage the results of the executed instruction are written back and the instruction is retired The completion queue has six entries CQO CQS EE In retirement entry Completed instructions can be retired from CQO and CQ1 Like dispatch retirement is an event that in this case occurs at the end of the final cycle of the complete stage Figure 6 3 shows the stages of 750 execution units 1U1 1U2 SRU Instructions Fetch In Dispatch Execute Complete Retire Entry CC E a LSU Instructions Execute Fetch In Dispatch EA Cache Align Complete Retire Entry Calculation LE A AR FPU Instructions Execute Fetch In Dispatch i Round Complete Retire Entry SE CES Normalize Oooo A BPU Instructions Fetch Fetch In Dispatch In Completion Complete Retire Predict Entry Queue NS NOS O gg A 1 Several integer instructions such as multiply and divide instructions require multiple cycles in the execute stage 2 Only those branch instructions that update the LR or CTR take an entry in the completion queue Figure 6 3 PowerPC 750 Microprocessor Pipeline Stages 6 3 Ti
189. a yun Buissa90 1d yourlg yun uononasu un yoyedsiq plom 9 anand OU suonona1sul Z JOUUOW SCOUEUUOUS juawabeuey J19mog peuayl OPLO dOD DV LP Jeun 20072 Je U9W9109Q JOJUNOD eseg UI S9 n 294 euonIppy Figure 1 1 PowerPC 750 Microprocessor Block Diagram 1 3 Chapter 1 PowerPC 740 PowerPC 750 Overview 1 2 PowerPC 750 Microprocessor Features This section lists features of the 750 The interrelationship of these features is shown in Figure 1 1 1 2 1 Overview of the PowerPC 750 Microprocessor Features Major features of the 750 are as follows e High performance superscalar microprocessor As many as four instructions can be fetched from the instruction cache per clock cycle As many as two instructions can be dispatched per clock As many as six instructions can execute per clock including two integer instructions Single clock cycle execution for most instructions e Six independent execution units and two register files BPU featuring both static and dynamic branch prediction 64 entry 16 set four way set associative branch target instruction cache BTIC a cache of branch instructions that have been encountered in branch loop code sequences If a target instruction is in the BTIC itis fetched into the instruction queue a cycle sooner than it can be made available from the instruction cache Typ
190. a branch instruction from being resolved immediately thereby delaying execution of the subsequent Chapter 6 Instruction Timing 6 21 instruction stream based on the predicted outcome of the branch instruction The instruction sequences and the resulting action of the branch instruction are described as follows e Anmtspr LK followed by a belr Fetching stops and the branch waits for the mtspr to execute e Anmtspr CTR followed by a bectr Fetching stops and the branch waits for the mtspr to execute e Anmtspr CTR followed by abe CTR decrement Fetching stops and the branch waits for the mtspr to execute e A third be based on CR is encountered while there are two unresolved be based on CR The third be based on CR is not executed and fetching stops until one of the previous be based on CR is resolved Note that branch conditions can be a function of the CTR and the CR if the CTR condition is sufficient to resolve the branch then a CR dependency is ignored 6 4 1 3 1 Static Branch Prediction The PowerPC architecture provides a field in branch instructions the BO field to allow software to hint whether a branch is likely to be taken Rather than delaying instruction processing until the condition is known the 750 uses the instruction encoding to predict whether the branch is likely to be taken and begins fetching and executing along that path When the branch condition is known the prediction is evaluated If the predic
191. a floating round to single precision instruction In these cases shifting the mantissa takes from 1 to 23 clock cycles depending upon the value to be stored These cycles are incurred during the store 2 3 4 4 Branch and Flow Control Instructions Some branch instructions can redirect instruction execution conditionally based on the value of bits in the CR When the processor encounters one of these instructions it scans the execution pipelines to determine whether an instruction in progress may affect the particular CR bit If no interlock is found the branch can be resolved immediately by checking the bit in the CR and taking the action defined for the branch instruction 2 3 4 4 1 Branch Instruction Address Calculation Branch instructions can alter the sequence of instruction execution Instruction addresses are always assumed to be word aligned the PowerPC processors ignore the two low order bits of the generated branch target address Chapter 2 Programming Model 2 53 Branch instructions compute the EA of the next instruction address using the following addressing modes e Branch relative e Branch conditional to relative address e Branch to absolute address e Branch conditional to absolute address e Branch conditional to link register e Branch conditional to count register Note that in the 750 all branch instructions b ba bl bla bc bca bel bela belr belrl bectr bectrl and condition register logical instructions
192. aN Single SNaN Double QNaN Double SNaN Don t care Don t care Single QNaN QNaN QNaN Single SNaN Double QNaN Double SNaN Single normalized Single normalized Single normalized Do the operation Do the operation Single infinity Single infinity Single infinity Single zero Single zero Single zero Double normalized Double normalized Double normalized Double infinity Double infinity Double infinity Double zero Double zero Double zero Prioritize according to Chapter 3 Operand Conventions in The Programming Environments Manual 2 30 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 20 summarizes the mode behavior for results Table 2 20 Floating Point Result Data Type Behavior Precision Data Type IEEE IEEE Mode NI 0 IEEE Mode NI 0 NomEEE Mode NI 1 IEEE Mode NomEEE Mode NI 1 1 Single Denormalized Return single precision denormalized number Return zero with trailing zeros Single Normalized Return the result Return the result infinity zero QNaN SNaN Return QNaN Return QNaN Single INT Place integer into low word of FPR If Invalid Operation then Place 0x8000 into FPR 32 63 else Place integer into FPR 32 63 Double Denormalized Return double precision denormalized number double Return double precision denormalized number denormalized number Return zero Return zero Double Normalized Return the result Return the result infinity zero Q
193. ability including the following e Debug control observation COP e Boundary scan standard IEEE 1149 1a 1993 JTAG compliant interface e Support for manufacturing test The COP and boundary scan logic are not used under typical operating conditions Detailed discussion of the 750 test functions is beyond the scope of this document however sufficient information has been provided to allow the system designer to disable the test functions that would impede normal operation The JTAG COP interface is shown in Figure 8 24 For more information refer to IEEE Standard Test Access Port and Boundary Scan Architecture IEEE STD 1149 la 1993 TDI Test Data Input TMS Test Mode Select TCK Test Clock Input TDO Test Data Output TRST Test Reset Figure 8 24 IEEE 1149 1a 1993 Compliant Boundary Scan Interface Chapter 8 Bus Interface Operation 8 44 8 10 Using Data Bus Write Only The 750 supports split transaction pipelined transactions It supports a limited out of order capability for its own pipelined transactions through the data bus write only DBWO signal When recognized on the clock of a qualified DBG the assertion of DBWO directs the 750 to perform the next pending data write tenure if any even if a pending read tenure would have normally been performed because of address pipelining The DBWO signal does not change the order of write tenures with respect to other write tenures from the same 750 It only allows that a wr
194. able 4 8 Table 4 9 Table 4 10 Table 4 11 Table 4 12 xxii Tables Page HES Number Integer Load and Store String Instructions oooonoccnnncnnonononononnnonancnnncnoncnnnn conan 2 50 Floating Point Load Instructions gege EENEG 2 51 Floating Point Store Instructions cti te BAe ee Eee 2 52 Store Floating Point Single Behavior ooonccnnnccnonccionononcnonnnonanononcnoncnnnncnnn cono ncnnos 2 52 Store Floating Point Double Behavior ocooconnccnnncnnonononnnonnnonnononcnoncnnnnc nan cconncnnos 2 53 Branch Instructions sitos Mareen Reel ates cats 2 54 Condition Register Logical Instructions oooocccnnncccnoncncnonccononcconnnnnononaconnnccinnnns 2 54 Trap ee 2 55 System Linkage Mistico USA a a pie 2 55 Move to from Condition Register Instructnons 2 56 Move to from Special Purpose Register Instructions UISA 2 56 PowerPC Encodings iii aia rias 2 56 SPR Encodings for PowerPC 750 Defined Registers mfspr oooonoconnonno 2 58 Memory Synchronization Instructions UISA oe ceeeeesneeeneeceneeeeeeeeaeees 2 59 Move from Time Base Instruction siii daa 2 60 Memory Synchronization Instructions NEA 2 62 User Level Cache Instructions nd A dt 2 63 External Control Instruct Ons tara ie iia 2 64 System Linkage Instructions O A coococooocccoonnncnoncnonnnnnononononnncnnnnnc cnn nc cono nccnnnnns 2 65 Move to from Machine State Register Instructons 2 65 Move to from Special Purpose Register Instructions OPBA 2 66 Super
195. able as no execute Pages selectable as user supervisor and read only or guarded Blocks selectable as user supervisor and read only or guarded Page history Architecturally defined Referenced and changed bits defined and maintained Page address Architecturally defined Translations stored as PTEs in hashed page tables in memory translation 7 7 7 3 Page table size determined by mask in SDR1 register Architecturally defined Instructions for maintaining TLBs tlbie and tlbsync instructions in 750 750 specific 128 entry two way set associative ITLB 128 entry two way set associative DTLB LRU replacement algorithm Segment descriptors Architecturally defined Stored as segment registers on chip two identical copies maintained Page table search 750 specific The 750 performs the table search operation in hardware support Chapter 5 Memory Management 5 3 5 1 1 Memory Addressing A program references memory using the effective logical address computed by the processor when it executes a load store branch or cache instruction and when it fetches the next instruction The effective address is translated to a physical address according to the procedures described in Chapter 7 Memory Management in The Programming Environments Manual augmented with information in this chapter The memory subsystem uses the physical address for the access For a complete discussion of effective address calculation see Section 2 3 2 3 Effective Ad
196. aced in the invalid 1 state e If the addressed cache block is in the modified M state the 750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid I state If the address misses in the cache no action is taken Any reservation is canceled regardless of the address Reserved Read atomic Read atomic operations appear on the bus in response to lwarx instructions and generate the same snooping responses as read operations Read with intent to mo The RWITM atomic operations appear on the bus in response to dify atomic stwcx instructions and generate the same snooping responses as RWITM operations Read with no intent to A RWNITC operation is issued to acquire exclusive use of a memory cache RWNITC location with no intention of modifying the location e If the addressed cache block is in the exclusive E state the cache block remains in the exclusive E state e If the addressed cache block is in the modified M state the 750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the exclusive E state If the address misses in the cache no action is taken 3 28 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 3 5 Response to Snooped Bus Transactions Continued Snooped Transaction TT 0 4 750 Response 3 6 5 Transfer Attributes In addition to the address and transfer t
197. ache A cache memory that is typically larger and has a longer access time than the primary cache A secondary cache may be shared by multiple devices Also referred to as L2 or level 2 cache Set v To write a nonzero value to a bit or bit field the opposite of clear The term set may also be used to generally describe the updating of a bit or bit field Set n A subdivision of a cache Cacheable data can be stored in a given location in any one of the sets typically corresponding to its lower order address bits Because several memory locations can map to the same location cached data is typically placed in the set whose cache block corresponding to that address was used least recently See Set associative Set associative Aspect of cache organization in which the cache space is divided into sections called sets The cache controller associates a particular main memory address with the contents of a particular set or region within the cache Signaling NaN A type of NaN that generates an invalid operation program exception when it is specified as arithmetic operands See Quiet NaN Significand The component of a binary floating point number that consists of an explicit or implicit leading bit to the left of its implied binary point and a fraction field to the right Simplified mnemonics Assembler mnemonics that represent a more complex form of a common operation Slave The device addressed by a master device The
198. ache are propagated to the L2 cache or 60x bus as single beat transactions Note that the CI signal always reflects the state of the caching inhibited memory cache access attribute the I bit independent of the state of HIDO ILOCK The setting of the ILOCK bit must be preceded by an isync instruction to prevent the instruction cache from being locked during an instruction fetch 3 4 2 Cache Control Instructions The PowerPC architecture defines instructions for controlling both the instruction and data caches when they exist The cache control instructions debt debtst dcbz dcbst debf debi and icbi are intended for the management of the local L1 and L2 caches The 750 interprets the cache control instructions as if they pertain only to its own L1 or L2 caches These instructions are not intended for managing other caches in the system except to the extent necessary to maintain coherency The 750 does not snoop cache control instruction broadcasts except for dcbz when M 1 The dcbz instruction is the only cache control instruction that causes a broadcast on the 60x bus when M 1 to maintain coherency All other data cache control instructions debi debf dcbst and dcbz are not broadcast unless broadcast is enabled through the HIDO ABE configuration bit Note that debi dcbf debst and dcbz do broadcast to the 750 s L2 cache regardless of HIDO ABE The icbi instruction is never broadcast 3 4 2 1 Data Cache Block Touch d
199. ache coherency through snooping Designers should note that the PowerPC 740 does not implement the on chip L2 tag memory or the signals required for the support of the external SRAMs and memory accesses go directly to the bus interface unit The L2 cache receives independent memory access requests from both the L1 instruction and data caches The L1 accesses are compared to the L2 cache tags and the data or instructions are forwarded from the L2 to the L1 cache if there is a cache hit or are forwarded on to the bus interface unit if there is an L2 cache miss or if the address being accessed is from a page marked as caching inhibited Burst read accesses that miss in the L2 cache initiate a load operation from the bus interface As the load operation transfers data to the L1 cache the data is also loaded into the L2 cache and marked as valid unmodified in the L2 cache tags An L1 load store or castout operation can cause an L2 cache block allocation resulting in the castout of an L2 cache block marked modified to the bus interface For additional information about the operation of the L2 cache refer to Chapter 9 L2 Cache Interface Operation 8 1 3 Operation of the Bus Interface Memory accesses can occur in single beat 1 2 3 4 and 8 bytes and four beat 32 bytes burst data transfers The address and data buses are independent for memory accesses to support pipelining and split transactions The 750 can pipeline as many as two trans
200. ache requires a configuration of 64 Kbyte x 64 bits a 1 Mbyte L2 cache requires a configuration of 128K x 64 bits 00 Reserved 01 256 Kbyte 10 512 Kbyte 11 1 Mbyte L2 clock ratio core to L2 frequency divider Specifies the clock divider ratio based from the core clock frequency that the L2 data RAM interface is to operate at When these bits are cleared the L2 clock is stopped and the on chip DLL for the L2 interface is disabled For nonzero values the processor generates the L2 clock and the on chip DLL is enabled After the L2 clock ratio is chosen the DLL must stabilize before the L2 interface can be enabled See the hardware specifications The resulting L2 clock frequency cannot be slower than the clock frequency of the 60x bus interface 000 L2 clock and DLL disabled 001 1 010 1 5 011 Reserved 100 2 101 2 5 110 3 Reserved L2 RAM type Configures the L2 RAM interface for the type of synchronous SRAMs used e Flow through register buffer synchronous burst SRAMs that clock addresses in and flow data out e Pipelined register register synchronous burst SRAMs that clock addresses in and clock data out e Late write synchronous SRAMs for which the 750 requires a pipelined register register configuration Late write RAMs require write data to be valid on the cycle after WE is asserted rather than on the same cycle as the write enable as with traditional burst RAMs For burst RAM selections the 750 does not burst data i
201. actions and has limited support for out of order split bus transactions Access to the bus interface is granted through an external arbitration mechanism that allows devices to compete for bus mastership This arbitration mechanism is flexible allowing the 750 to be integrated into systems that implement various fairness and bus parking procedures to avoid arbitration overhead 8 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Typically memory accesses are weakly ordered to maximize the efficiency of the bus without sacrificing coherency of the data The 750 allows load operations to bypass store operations except when a dependency exists In addition the 750 can be configured to reorder high priority store operations ahead of lower priority store operations Because the processor can dynamically optimize run time ordering of load store traffic overall performance is improved Note that the synchronize sync and enforce in order execution of IO eieio instructions can be used to enforce strong ordering The following sections describe how the 750 interface operates providing detailed timing diagrams that illustrate how the signals interact A collection of more general timing diagrams are included as examples of typical bus operations Figure 8 3 is a legend of the conventions used in the timing diagrams This is a synchronous interface all 750 input signals are sampled and output signals are driven on the risin
202. additional information Table 2 44 System Linkage Instruction UISA ee E Executing this instruction causes the system call exception handler to be evoked For more information see Section 4 5 10 System Call Exception 0x00C00 2 3 4 6 Processor Control Instructions UISA Processor control instructions are used to read from and write to the condition register CR machine state register MSR and special purpose registers SPRs See Chapter 2 Programming Model 2 55 Section 2 3 5 1 Processor Control Instructions VEA for the mftb instruction and Section 2 3 6 2 Processor Control Instructions OEA for information about the instructions used for reading from and writing to the MSR and SPRs 2 3 4 6 1 Move to from Condition Register Instructions Table 2 45 summarizes the instructions for reading from or writing to the condition register Table 2 45 Move to from Condition Register Instructions Implementation Note The PowerPC architecture indicates that in some implementations the Move to Condition Register Fields mterf instruction may perform more slowly when only a portion of the fields are updated as opposed to all of the fields The condition register access latency for the 750 is the same in both cases 2 3 4 6 2 Move to from Special Purpose Register Instructions UISA Table 2 46 lists the mtspr and mfspr instructions Table 2 46 Move to from Special Purpose Register Instructions UISA
203. address of the failing instruction Refer to Chapter 4 Exceptions for a more detailed description of exception processing 5 16 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 5 3 Translation Exception Conditions ze Ire LI res Page fault no PTE found No matching PTE found in page tables and access ISI exception no matching BAT array entry SRR1 1 1 D access DSI exception DSISR 1 1 Block protection violation Conditions described for block in Block access ISI exception Memory Protection in Chapter 7 Memory SRR1 4 1 Management in The Programming Environments Manual D access DSI exception DSISR 4 1 Page protection violation Conditions described for page in Page access ISI exception Memory Protection in Chapter 7 Memory SRR1 4 1 Management in The Programming Environments Manual D access DSI exception DSISRI4 1 No execute protection violation Attempt to fetch instruction when SR N 1 ISI exception SRR1 3 1 Instruction fetch from Attempt to fetch instruction when SR T 1 ISI exception direct store segment SRR1 3 1 Data access to direct store Attempt to perform load or store including FP DSI exception segment including load or store when SR T 1 DSISR 5 1 floating point accesses Instruction fetch from guarded Attempt to fetch instruction when MSR IR 1 ISI exception memory and either matching xBAT G 1 or no S
204. aded in four beats of 64 bits each The burst load is performed as critical double word first The critical double word is simultaneously written to the cache and forwarded to the requesting unit thus minimizing stalls due to load delays If subsequent loads follow in sequential order the instructions or data will be forwarded to the requesting unit as the cache block is written 8 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual AA or al sng eq 119 79 0pZ y Ul JON STE Yo 49 10 1409 27 anend Nosey 27 wun 99e 18 U Sng Z7 sng SSOUPPY 1118 21 sng EISO Wa 79 anand peo EIS anand InoJseo D anand DIS uonon 1sul yun SNg X09 sng Ssolppy 119 c aiid keny 1v80 eubuo SYS NWI eea Anua 9 Jayng Japioay yun uopa jdwog HG Xx yun julog bulyeo 4 uOne g uolTeMesoy anand ae uoenoje9 y3 slaying eweuey SI Ydd Aqua z yun 210 S peo7 uo uonenasoy uun Ja sibay W S S slaying weu y alld ddd UOIE S VONEAJOSOy X z uun 1969 u Hun 496a ul uonejg uoneaJasay uones uonenesey su ayoeo Kew alAqy ze y lval ql
205. aken and handled sequentially Likewise exceptions that are asynchronous and precise are recognized when they occur but are not handled until all instructions currently in the execute stage successfully complete execution and report their results To prevent loss of state information exception handlers must save the information stored in the machine status save restore registers SRRO and SRR1 soon after the exception is taken to prevent this information from being lost due to another exception being taken Because exceptions can occur while an exception handler routine is executing multiple exceptions can become nested It is up to the exception handler to save the necessary state information if control is to return to the excepting program Chapter 4 Exceptions 4 4 In many cases after the exception handler handles an exception there is an attempt to execute the instruction that caused the exception Instruction execution continues until the next exception condition is encountered Recognizing and handling exception conditions sequentially guarantees that the machine state is recoverable and processing can resume without losing instruction results In this book the following terms are used to describe the stages of exception processing Recognition Exception recognition occurs when the condition that can cause an exception is identified by the processor Taken An exception is said to be taken when control of instruction execution is p
206. al Maximum four instruction fetch per clock cycle Maximum three instruction dispatch per clock cycle includes one branch instruction Decode Dispatch Execute Stage Maximum two instruction Complete Write back completion per clock cycle Figure 6 2 Superscalar Pipeline Diagram The instruction pipeline stages are described as follows The instruction fetch stage includes the clock cycles necessary to request instructions from the memory system and the time the memory system takes to respond to the request Instruction fetch timing depends on many variables such as whether the instruction is in the branch target instruction cache the on chip instruction cache or the L2 cache Those factors increase when it is necessary to fetch instructions from system memory and include the processor to bus clock ratio the amount of bus traffic and whether any cache coherency operations are required Because there are so many variables unless otherwise specified the instruction timing examples below assume optimal performance that the instructions are available in the instruction queue in the same clock cycle that they are requested The fetch stage ends when the instruction is dispatched The decode dispatch stage consists of the time it takes to fully decode the instruction and dispatch it from the instruction queue to the appropriate execution unit Instruction dispatch requires the following Instruction
207. al Single Beat Write Temmaton oooconocccnoncccnoncnonononononcnnnnnanonancnnnanacinnnoos 8 27 Normal Burst Transaction geet 8 28 Termination with DRT RY sccciissisacsassavascasavaeiassavecsdsepsacea spasedes secpenteavevace snaendeasons 8 29 Read Burst with TA Wait States and DRTRY c cccccscscsesessssesescscscseseesesesesees 8 29 MEI Cache Coherency Protocol State Diagram WIM 001 eessen 8 32 Fast stSinsle B at REN 8 33 Fastest Silo Beat Wiles ee easel cade es 8 34 Single Beat Reads Showing Data Delay Control 8 35 Single Beat Writes Showing Data Delay Controls 8 36 Burst Transfers with Data Delay Controls ooooonnocccnnccccnonccconnncononanonanccnnnnccnnnnnos 8 37 Use of Transfer Error Acknowledge CTEA 8 38 32 Bit Data Bus Transfer Eight Beat Burer 8 40 32 Bit Data Bus Transfer Two Beat Burst with DRIRN 8 40 IEEE 1149 1a 1993 Compliant Boundary Scan Interface 8 44 Data Bus Write Only Transaction 4x cccissscccsssesecaasussscsssondcntssecsen cade seceteanaccevsaese 8 45 Typical 1 Mbyte L2 Cache Configuration 0 0 0 0 ceeceeesececesececeeececsneeeeeeeeesaeeees 9 2 Burst Read Write Read L2 Cache Access Flow Through t 9 10 Burst Read Modify Write L2 Cache Access Plouw Throush 9 10 Burst Read Write Write L2 Cache Access Plow Through 9 11 Burst Read Write Read L2 Cache Access Ppelmned 9 11 Burst Read Modify Write L2 Cache Access Dpelmned 9 12 Burst Read Write Write L2 Cache Access Dpelmed 9 12
208. al and synchronous with the L2CLK_OUTA signal and provides the capability to drive up to four L2 cache memory devices If differential L2 clocking is configured through the setting of the L2CR the L2CLK_OUTA signal is driven phase inverted with relation to the L2CLK_OUTB signal Assertion Negation Refer to the 750 hardware specifications for timing comments The L2CLK_OUTB signal is driven low during assertion of HRESET 7 2 9 15 L2 Sync Out L2SYNC_OUT Output Following are the state meaning and timing comments for the L2SYNC_OUT signal State Meaning Asserted Negated Clock output for L2 clock synchronization The L2SYNC_OUT signal should be routed half of the trace length to the L2 cache memory devices and returned to the L2SYNC_IN signal input Chapter 7 Signal Descriptions 7 27 Timing Comments Assertion Negation Refer to the 750 hardware specifications for timing comments The L2SYNC_OUT signal is driven low during assertion of HRESET 7 2 9 16 L2 Sync In L2SYNC_IN Input Following are the state meaning and timing comments for the L2SYNC_IN signal State Meaning Asserted Negated Clock input for L2 clock synchronization The L2SYNC_IN signal is driven by the L2SYNC_OUT signal output Timing Comments Assertion Negation Refer to the 750 hardware specifications for timing comments The routing of this signal on the printed circuit board should ensure that the rising edge at L2SYNC_IN is coincident with the rising e
209. alid data for that address The data at this address in external memory is not valid See MESI Most significant bit msb The highest order bit in an address registers data element or instruction encoding Most significant byte MSB The highest order byte in an address registers data element or instruction encoding NaN An abbreviation for not a number a symbolic entity encoded in floating point format There are two types of NaNs signaling NaNs and quiet NaNs No op No operation A single cycle operation that does not affect registers or generate bus activity Glossary of Terms and Abbreviations Glossary 7 Glossary 8 Normalization A process by which a floating point value is manipulated such that it can be represented in the format for the appropriate precision single or double precision For a floating point value to be representable in the single or double precision format the leading implied bit must be a 1 OEA operating environment architecture The level of the architecture that describes PowerPC memory management model supervisor level registers synchronization requirements and the exception model It also defines the time base feature from a supervisor level perspective Implementations that conform to the PowerPC OEA also conform to the PowerPC UISA and VEA Optional A feature such as an instruction a register or an exception that is defined by the PowerPC architecture but not required to be
210. alignment exception The PowerPC architecture defines the load multiple word Imw instruction with rA in the range of registers to be loaded as an invalid form Table 2 35 Integer Load and Store Multiple Instructions se Iess am Load Multiple Word mw rD d rA Store Multiple Word stmw rS d rA Chapter 2 Programming Model 2 49 2 3 4 3 8 Integer Load and Store String Instructions The integer load and store string instructions allow movement of data from memory to registers or from registers to memory without concern for alignment These instructions can be used for a short move between arbitrary memory locations or to initiate a long move between misaligned memory fields However in some implementations these instructions are likely to have greater latency and take longer to execute perhaps much longer than a sequence of individual load or store instructions that produce the same results Table 2 36 summarizes the integer load and store string instructions In other PowerPC implementations operating with little endian byte order execution of a load or string instruction invokes the alignment error handler see Byte Ordering in The Programming Environments Manual for more information Table 2 36 Integer Load and Store String Instructions rD rA NB Load String Word Immediate D rS rA NB Load string and store string instructions may involve operands that are not word aligned Store String Word Immediate
211. an alignment exception in little endian mode Chapter 6 Instruction Timing 6 25 Table 6 1 Performance Effects of Memory Operand Placement size Byte Alignment None 8 Byte Cache Block Protection Boundary A EES EE Floating Point oa gt Jo f 7 a IONES HC e FENICIA e A pr por o me Oo foj Y Notes A Optimal means one EA calculation occurs Not supported in little endian mode causes an alignment exception 3 Good means multiple EA calculations occur that may cause additional bus activities with multiple bus transfers 4 Poor means that an alignment exception occurs 6 4 7 Integer Store Gathering The 750 performs store gathering for write through operations to nonguarded space It performs cache inhibited stores to nonguarded space for 4 byte word aligned stores These stores are combined in the LSU to form a double word and are sent out on the 60x bus as a single beat operation However stores are gathered only if the successive stores meet the criteria and are queued and pending Store gathering occurs regardless of the address order of the stores Store gathering is enabled by setting HIDO SGE Stores can be gathered in both endian modes Store gathering is not done for the following e Cacheable store operations e Stores to guarded cache inhibited or write through space e Byte reverse store operations 6 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual e stwex instructio
212. aning Asserted Indicates for a write transaction that the 750 must release the data bus and the data bus parity to high impedance during the following cycle The data tenure remains active DBB remains driven and the transfer termination signals are still monitored by the 750 Negated Indicates the data bus should remain normally driven DBDIS is ignored during read transactions Timing Comments Assertion Negation May be asserted on any clock cycle when the 750 is driving or will be driving the data bus may remain asserted multiple cycles 7 2 8 Data Transfer Termination Signals Data termination signals are required after each data beat in a data transfer Note that in a single beat transaction the data termination signals also indicate the end of the tenure while in burst accesses the data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat For a detailed description of how these signals interact see Section 8 4 4 Data Transfer Termination 7 2 8 1 Transfer Acknowledge TA Input Following are the state meaning and timing comments for the TA signal State Meaning Asserted Indicates that a single beat data transfer completed successfully or that a data beat in a burst transfer completed successfully unless DRTRY is asserted on the next bus clock cycle Note that TA must be asserted for each data beat in a burst transaction and must be asserted dur
213. apply to individual beats and indicate the end of the tenure only after the final data beat They also indicate whether a condition exists that requires the data phase to be repeated L2 cache clock control signals These signals provide clocking and control for the L2 cache Not supported in the 740 L2 cache address data The 750 has separate address and data buses for accessing the L2 cache Not supported in the 740 Interrupt signals These signals include the interrupt signal checkstop signals and both soft reset and hard reset signals These signals are used to generate interrupt exceptions and under various conditions to reset the processor Processor status control signals These signals are used to set the reservation coherency bit enable the time base and other functions Miscellaneous signals These signals are used in conjunction with such resources as secondary caches and the time base facility JTAG COP interface signals The common on chip processor COP unit provides a serial interface to the system for performing board level boundary scan interconnect tests Clock signals These signals determine the system clock frequency These signals can also be used to synchronize multiprocessor systems NOTE A bar over a signal name indicates that the signal is active low for example ARTRY address retry and TS transfer start Active low signals are referred to as asserted active when they are low and ne
214. ardware handshake sequence using the quiesce request QREQ and quiesce acknowledge QACK signals The 750 asserts the QREQ signal to indicate that it is ready to disable bus snooping When the system has ensured that snooping is no longer necessary it will assert QACK and the 750 will enter the nap mode If the system determines that a bus snoop cycle is required QACK is deasserted to the 750 for at least eight bus clock cycles and the 750 will then be able respond to a snoop cycle Assertion of QACK following the snoop cycle will again disable the 750 s snoop capability The 750 s power dissipation while in nap mode with QACK deasserted is the same as the power dissipation while in doze mode The 750 2 0 and later also allows dynamic switching between nap and doze modes to allow the use of nap mode without sacrificing hardware snoop coherency For this operation negating QACK at any time for at least 8 bus cycles guarantees that the 750 has transitioned from nap mode to doze mode in order to snoop Reasserting QACK then allows the 750 to return to nap mode This sequencing could be used by the system at any time with knowledge of what power management mode if any that the 750 is currently in Note that when in nap mode the DLL should be kept locked to enable a quick recovery to full power mode without having to wait for the DLL to re lock Additionally an L2ZZ signal is provided by the 750 s L2 cache interface to
215. arx Address only reservation set te tlbsync Address gt KIO CC TO Single beat Caching inhibited Write with flush Single beat write or write through write or burst store Burst Cast out or 1 1 Write with kill Burst nonGBL snoop copyback Load miss store 1 1 1 Read with intent miss or to modify instruction fetch Single beat Write with flush Single beat write atomic write E beat Iwarx we E E read inhibited load read or burst Burst Iwarx Read with intent Burst load miss to modify atomic Table 7 1 Transfer Type Encodings for PowerPC 750 Bus Master CISNES Flush block Address only Papo Ce LEA TLB invalidate Address only Reserved ape EC e IC Single beat Caching inhibited 1 1 Read Single beat read load or instruction read or burst fetch Reserved a ofofofi reses Chapter 7 Signal Descriptions 7 9 Table 7 1 Transfer Type Encodings for PowerPC 750 Bus Master Continued PowerPC 750 Transaction 60x Bus Bus Master Source TTO TT1 TT2 TT3 TT4 Specification Transaction Transaction Command mo fa fo e e RS E e intent to cache read or burst E A E Ee woo p 11 E CC PT eee E Note 1Address only transaction occurs if enabled by setting HIDO ABE bit to 1 Table 7 2 describes the 60x bus specification transfer encodings and the 750 bus snoop response on an address hit Table 7 2 PowerPC 750 Snoop Hit Response 60x Bus Specification PowerPC 750 Geer Transaction TTO TT1 TT2 TT3 TT4 Bus Sno
216. assed to the exception handler that is the context is saved and the instruction at the appropriate vector offset is fetched and the exception handler routine is begun in supervisor mode Handling Exception handling is performed by the software linked to the appropriate vector offset Exception handling is begun in supervisor mode referred to as privileged state in the architecture specification Note that the PowerPC architecture documentation refers to exceptions as interrupts In this book the term interrupt is reserved to refer to asynchronous exceptions and sometimes to the event that causes the exception Also the PowerPC architecture uses the word exception to refer to IEEE defined floating point exception conditions that may cause a program exception to be taken see 4 5 7 The occurrence of these IEEE exceptions may not cause an exception to be taken EEE defined exceptions are referred to as IEEE floating point exceptions or floating point exceptions 4 1 PowerPC 750 Microprocessor Exceptions As specified by the PowerPC architecture exceptions can be either precise or imprecise and either synchronous or asynchronous Asynchronous exceptions are caused by events external to the processor s execution synchronous exceptions are caused by instructions The types of exceptions are shown in Table 4 1 Note that all exceptions except for the system management interrupt thermal management and performance monitor except
217. ating point instruction can occupy only one of the three stages at a time freeing the previous stage to work on the next floating point instruction Thus three single precision floating point instructions can be in the FPU execute stage at a time Double precision add instructions have a three cycle latency double precision multiply and multiply add instructions have a four cycle latency Chapter 1 PowerPC 740 PowerPC 750 Overview 1 1 Figure 1 1 shows the parallel organization of the execution units shaded in the diagram The instruction unit fetches dispatches and predicts branch instructions Note that this is a conceptual model that shows basic features rather than attempting to show how features are implemented physically The 750 has independent on chip 32 Kbyte eight way set associative physically addressed caches for instructions and data and independent instruction and data memory management units MMUs Each MMU has a 128 entry two way set associative translation lookaside buffer DTLB and ITLB that saves recently used page address translations Block address translation is done through the four entry instruction and data block address translation IBAT and DBAT arrays defined by the PowerPC architecture During block translation effective addresses are compared simultaneously with all four BAT entries For information about the L1 cache see Chapter 3 Instruction and Data Cache Operation The L2 cache is implemen
218. aved as specified in the PowerPC architecture and instruction fetching begins at the system reset interrupt vector offset 0x00100 The vector address on a soft reset depends on the setting of MSR IP either 0x0000_0100 or OxFFFO_0100 Soft resets are third in priority after hard reset and machine check This exception is recoverable provided attaining a recoverable state does not generate a machine check SRESET is an effectively edge sensitive signal that can be asserted and deasserted asynchronously provided the minimum pulse width specified in the hardware specifications is met Asserting SRESET causes the 750 to take a system reset exception This exception modifies the MSR SRRO and SRR1 as described in The Programming Environments Manual Unlike hard reset soft reset does not directly affect the states of output signals Attempts to use SRESET during a hard reset sequence or while the JTAG logic is non idle cause unpredictable results see Section 7 2 9 6 2 for more information on soft reset SRESET can be asserted during HRESET assertion see Figure 4 4 In all three cases shown in Figure 4 4 the SRESET assertion and deassertion have no effect on the operation or state of the machine SRESET asserted coincident to or after the assertion of HRESET will also have no effect on the operation or state of the machine SRESET Figure 4 4 SRESET Asserted During HRESET 4 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor U
219. beat burst load Misaligned accesses across a page boundary can incur a performance penalty Caches are nonblocking write back caches with hardware support for reloading on cache misses The critical double word is transferred on the first beat and is simultaneously written to the cache and forwarded to the requesting unit minimizing stalls due to load delays The cache being loaded is not blocked to internal accesses while the load completes The 750 cache organization is shown in Figure 1 2 Way 0 Way 1 Address Tag 1 Words 0 7 Way 2 Address Tag 2 Words 0 7 Way 3 Address Tag 3 Words 0 7 Way 4 Address Tag 4 Words 0 7 Way 5 Address Tag 5 Words 0 7 Way 6 Address Tag 6 Words 0 7 Way 7 Address Tag 7 Words 0 7 Ix 8 Words Way gt Figure 1 2 Cache Organization Within one cycle the data cache provides double word access to the LSU Like the instruction cache the data cache can be invalidated all at once or on a per cache block basis The data cache can be disabled and invalidated by clearing HIDO DCE and setting HIDO DCFI The data cache can be locked by setting HIDO DLOCK To ensure cache coherency the data cache supports the three state MEI protocol The data cache tags are single ported so a simultaneous load or store and a snoop access represent a resource collision If a snoop hit occurs the LSU is blocked internally for one cycle to allow the eigh
220. bed herein without further notice NOTHING IN THIS MANUAL NOR IN ANY OF THE ERRATA SHEETS DATA SHEETS AND OTHER SUPPORTING DOCUMENTATION SHALL BE INTERPRETED AS THE CONVEYANCE BY IBM AN EXPRESS WARRANTY OF ANY KIND OR IMPLIED WARRANTY REPRESENTATION OR GUARANTEE REGARDING THE MERCHANTABILITY OR FITNESS OF THE PRODUCTS FOR ANY PARTICULAR PURPOSE IBM does not assume any liability or obligation for damages of any kind arising out of the application or use of these materials Any warranty or other obligations as to the products described herein shall be undertaken solely by the marketing party to the customer under a separate sale agreement between the marketing party and the customer In the absence of such an agreement no liability is assumed by IBM or the marketing party for any damages actual or otherwise Typical parameters can and do vary in different applications All operating parameters including Typicals must be validated for each customer application by customer s technical experts IBM does not convey any license under their respective intellectual property rights nor the rights of others IBM makes no claim warranty or representation express or implied that the products described in this manual are designed intended or authorized for use as components in systems intended for surgical implant into the body or other applications intended to support or sustain life or for any other application in which the failure of the prod
221. bed in Table 11 4 Table 11 4 PMCn Bit Settings CO e Jo Overflow When this bit is set it indicates this counter has reached its maximum value 1 31 Indicates the number of occurrences of the specified event Counters overflow when the high order bit the sign bit becomes set that is they reach the value 2147483648 0x8000_0000 However an interrupt is not signaled unless both MMCRO ENINT and either PMCIINTCONTROL or PMCINTCONTROL in the MMCRO register are also set as appropriate Note that the interrupts can be masked by clearing MSR EE the interrupt signal condition may occur with MSR EE cleared but the exception is not taken until MSR EE is set Setting MMCRO DISCOUNT forces counters to stop counting when a counter interrupt occurs 11 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Software is expected to use the mtspr instruction to explicitly set PMC to non overflowed values Setting an overflowed value may cause an erroneous exception For example if both MMCRO ENINT and either PMCIINTCONTROL or PMCINTCONTROL are set and the mtspr instruction loads an overflow value an interrupt signal may be generated without an event counting having taken place The event to be monitored can be chosen by setting MMCRO 19 31 The selected events are counted beginning when MMCRO is set until either MMCRO is reset or a performance monitor interrupt is generated Table 11 5 lists the selectable events and
222. ber User Level Cache Instructions NEA 2 62 Optional External Control Instructions oooonoococnnncccnonnononcnonanccnnnnncnonnnnnn 2 64 PowerPC OEA ASTUCIA Abi 2 65 System Linkage Instructions OBA coooccccoocccconnnononnnonnnnnonnncnnonncconnnccnanncnnn 2 65 Processor Control Instructions OEA coococnnocccconcccnonnnnnoncncnnnnnonanccnnnccnnnnnos 2 65 Memory Control Instructpons OEA 2 66 Supervisor Level Cache Management Instructon OPA 2 66 Segment Register Manipulation Instructions OBA 0ococonoccccnoccconanccinnos 2 67 Translation Lookaside Buffer Management Instructions OEA 2 67 Recommended Simplified Mnemonics eeecceeeeeceeeteceeeeeeeeseeeeneeeenaeeees 2 68 Chapter 3 Instruction and Data Cache Operation Data Cahe Cir A you asa dey nad a I S TaN 3 3 Instruction Cache Organization 5 si dans ee ee eae aa Cave 3 4 Memory and Cache COn oca 3 5 Memory Cache Access Attributes WIMG Buet 3 6 En Nee EE 3 7 MET Hardware Considerations csscscssssessssesenssceensccsensncessaccesedecesneesees 3 9 Coherency Precautions in Single Processor Systems oooocccnocccnnocccononccinnncn n 3 10 Coherency Precautions in Multiprocessor Systems oooococcccnocccononcconanccnonaconns 3 10 PowerPC 750 Initiated Load Store Operations oocoococcccnoccnononcconanaconnnccnnnncn ns 3 10 Performed Loads and TOTES EE EE 3 11 Sequential Consistency of Memory Accesseg 3 11 Atomic Memory References sox ceneseess
223. between memory and a set of 32 floating point registers FPRs Arithmetic and logical instructions do not read or modify memory To use the contents of a memory location in a computation and then modify the same or another memory location the memory contents must be loaded into a register modified and then written to the target location using load and store instructions The description of each instruction includes the mnemonic and a formatted list of operands To simplify assembly language programming a set of simplified mnemonics and symbols is provided for some of the frequently used instructions see Appendix F Simplified Mnemonics in The Programming Environments Manual for a complete list of simplified mnemonics Note that the architecture specification refers to simplified mnemonics as extended mnemonics Programs written to be portable across the various assemblers for the PowerPC architecture should not assume the existence of mnemonics not described in that document 2 3 1 Classes of Instructions The 750 instructions belong to one of the following three classes e Defined e Illegal e Reserved Note that while the definitions of these terms are consistent among the PowerPC processors the assignment of these classifications is not For example PowerPC 2 32 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual instructions defined for 64 bit implementations are treated as illegal by 32 bit implementations suc
224. ble 11 7 Table 11 8 Table A 1 Table A 2 Table A 3 Table A 4 Table A 5 Table A 6 Table A 7 Table A 8 Table A 9 Table A 10 Table A 11 Table A 12 Table A 13 Table A 14 Table A 15 Table A 16 Table A 17 Table A 18 Table A 19 Table A 20 Table A 21 Table A 22 Table A 23 Table A 24 Table A 25 Table A 26 Table A 27 Table A 28 Table A 29 Table A 30 Table A 31 Table A 32 Table A 33 Table A 34 Table A 35 Table A 36 Table A 37 Table A 38 Table A 39 XXIV Tables Page ae Number PMC1 Events MMCRO 19 25 Select Encodings occoocccconcccnonnconanccinanccinnnnos 11 7 PMC2 Events MMCRO 26 31 Select Encodings ocoococcoccccnoncninonccinnnccnnnnnos 11 7 PMC3 Events MMCR1 0 4 Select Encodings cooooccoocccnoncccoonnnonancconanccnnnnnos 11 8 PMC4 Events MMCR1 5 9 Select Encodings oocoocoocccnoncccnonccinonccinnnccnnnnoss 11 9 Complete Instruction List Sorted by Mnemonic oococccnoccnonnnonncnonnnannnona nono nononennnos A 1 Complete Instruction List Sorted by Opcode coooocccnnoccccocccconacccoonnnononcnonanacinnnnons A 9 Integer Arithmetic Instructivo Eed ENEE ENNEN A 17 Integer Compare Le A 18 Integer Logical ISC A eal A 18 Integer Rotate Instuctions 2403 oie EE a es A 19 IN A 19 Floating Point Arithmetic Instructions conocio no ncnnnncnnnnos A 20 Floating Point Multiply Add Instructions ooooconocccnonononnccononannnonn ccoo cnn ncnnnccnnnoo A 20 Floating Point Rounding and Conversion Instruc
225. ble word aligned address associated with the load store instruction or instruction fetch that initiated the transaction 3 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual As shown in Figure 3 6 the first quad word contains the address of the load store or instruction fetch that missed the cache This minimizes latency by allowing the critical code or data to be forwarded to the processor before the rest of the block is filled For all other burst operations however the entire block is transferred in order oct word aligned Critical double word first fetching on a cache miss applies to both the data and instruction cache 750 Cache Address Bits 27 28 00 01 10 If the address requested is in double word A the address placed on the bus is that of double word A and the four data beats are ordered in the following manner Beat 1 2 3 A A A a Al eee If the address requested is in double word C the address placed on the bus will be that of double word C and the four data beats are ordered in the following manner l Beat 1 2 3 EME l Figure 3 6 Double Word Address Ordering Critical Double Word First 3 6 1 Read Operations and the MEI Protocol The MEI coherency protocol affects how the 750 data cache performs read operations on the 60x bus All reads except for caching inhibited reads are encoded on the bus as read with intent to modify RWITM to force flushing of the addressed cac
226. bus busy DBB signal is both an input and output signal on the 750 7 2 6 3 1 Data Bus Busy DBB Output Following are the state meaning and timing comments for the DBB output signal State Meaning Asserted Indicates that the 750 is the data bus master The 750 always assumes data bus mastership if it needs the data bus and is given a qualified data bus grant see DBG Negated Indicates that the 750 is not using the data bus Timing Comments Assertion Occurs during the bus clock cycle following a qualified DBG Negation Occurs for a minimum of one half bus clock cycle dependent on clock mode following the assertion of the final TA High Impedance Occurs after DBB is negated 7 2 6 3 2 Data Bus Busy DBB Input Following are the state meaning and timing comments for the DBB input signal State Meaning Asserted Indicates that another device is bus master Negated Indicates that the data bus is free with proper qualification see DBG for use by the 750 7 16 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Timing Comments Assertion Must occur when the 750 must be prevented from using the data bus Negation May occur whenever the data bus is available 7 2 7 Data Transfer Signals Like the address transfer signals the data transfer signals are used to transmit data and to generate and monitor parity for the data transfer For a detailed description of how the data trans
227. by the operating system and user mode of operation used by the application software The programming models incorporate 32 GPRs 32 FPRs special purpose registers SPRs and several miscellaneous registers Each PowerPC microprocessor also has its own unique set of hardware implementation dependent HID registers Having access to privileged instructions registers and other resources allows the operating system to control the application environment providing virtual memory and protecting operating system and critical machine resources Instructions that control the state of the processor the address translation mechanism and supervisor registers can be executed only when the processor is operating in supervisor mode Chapter 1 PowerPC 740 PowerPC 750 Overview 1 21 Figure 1 5 shows all the 750 registers available at the user and supervisor level The numbers to the right of the SPRs indicate the number that is used in the syntax of the instruction operands to access the register For more information see Chapter 2 Programming Model 1 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual SUPERVISOR MODEL OEA Configuration Registers USER MODEL VEA Hardware Processor Implementation Version Machine State Time Base Facility For Reading Registers Register Register TBL TBR 268 TBU TBR 269 HIDO SPR 1008 PVR SPR 287 MSR HID1 SPR 1009 USER
228. cbt and Data Cache Block Touch for Store dcbtst The Data Cache Block Touch debt and Data Cache Block Touch for Store dcbtst instructions provide potential system performance improvement through the use of software initiated prefetch hints The 750 treats these instructions identically that is a dcbtst instruction behaves exactly the same as a debt instruction on the 750 Note that PowerPC implementations are not required to take any action based on the execution of these instructions but they may choose to prefetch the cache block corresponding to the effective address into their cache The 750 loads the data into the cache when the address hits in the TLB or the BAT is permitted load access from the addressed page is not directed to a direct store segment and is directed at a cacheable page Otherwise the 750 treats these instructions as no ops The data brought into the cache as a result of this instruction is validated in the same manner Chapter 3 Instruction and Data Cache Operation 3 15 that a load instruction would be that is it is marked as exclusive The memory reference of a debt or debtst instruction causes the reference bit to be set Note also that the successful execution of the debt or debtst instruction affects the state of the TLB and cache LRU bits as defined by the PLRU algorithm 3 4 2 2 Data Cache Block Zero dcbz The effective address is computed translated and checked for protection violations as
229. cccccuccconaccnonncncnnnnnonnncnnnnncnonnncno 1 8 Branch Processing Unit BPU yw ed erg eiteetst condo iaa bos detecconsdevetcodwedeces 1 8 Complecon Uria eiii 1 9 Independent Execution Dildo 1 10 ii A AS 1 10 Floating Point Unit FPU sssssicassccssaesssn5ecesss ssacencssdcatasssceataasdecazacessaceones 1 10 Load Store Unit LSU vicciscccscecdcicssestessoncessecesodesvesetanesascesbevadebastcacsedeses 1 11 System Register Unit SRU wesiissisacdessavesaisoesaces tii 1 11 Memory Management Units OMMIUe 1 12 On Chip Instruction and Data Caches ooooooococcoocccooncncnoncnononanonnncnnnnnccnnnnnnonns 1 12 L2 Cache Implementation Not Supported in the PowerPC 740 1 14 System Interface Bus Interface Unit BIU oooonoococcnoccconaccnoonnconnncnonnnnconnnanonns 1 15 E TEE 1 16 Pio AA A A A ES 1 18 EE 1 19 PowerPC 750 Microprocessor Implementation eee eeeeeseceseeeseeeeeeenaeee 1 19 PowerPC Registers and Programming Model 1 21 Instruction di a 1 26 PowerPC Instr ction SSE 52265 goss hen eege eg tiei 1 27 PowerPC 750 Microprocessor Instruction Set 1 28 On Chip Cache Implementation 1 29 PowerPC Cache Models a e A Gs 1 29 PowerPC 750 Microprocessor Cache Implementanon 1 29 Exception ENEE EE ge ere 1 29 PowerPC Exception Model viii daria lirio 1 29 PowerPC 750 Microprocessor Exception Implementaton 1 31 Memory Mana me iaa 1 32 PowerPC Memory Management Model 1 33 PowerPC 750 Microprocessor Memory Manage
230. ccess and the R bit is zero the 750 sets the R bit in the page table The OEA specifies that the referenced bit may be set immediately or the setting may be delayed until the memory access is determined to be successful Because the reference to a page is what causes a PTE to be loaded into the TLB the referenced bit in all 750 TLB entries is effectively always set The processor never automatically clears the referenced bit 5 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The referenced bit is only a hint to the operating system about the activity of a page At times the referenced bit may be set although the access was not logically required by the program or even if the access was prevented by memory protection Examples of this in PowerPC systems include the following e Fetching of instructions not subsequently executed e A memory reference caused by a speculatively executed instruction that is mispredicted e Accesses generated by an Iswx or stswx instruction with a zero length e Accesses generated by an stwex instruction when no store is performed because a reservation does not exist e Accesses that cause exceptions and are not completed 5 4 1 2 Changed Bit The changed bit of a page is located both in the PTE in the page table and in the copy of the PTE loaded into the TLB if a TLB is implemented as in the 750 Whenever a data store instruction is executed successfully if the TLB search for page addre
231. ccesses to occur in strict program order strongly ordered translation must be enabled so that the corresponding I bit can be set Note also that the G bit must be set to ensure that the accesses are strongly ordered For instruction accesses the default memory access mode bits WIMG are also 0b0011 That is instruction accesses are considered cacheable I 0 and the memory is guarded Again instruction accesses are considered cacheable even if the instruction cache is disabled in the HIDO register as it is out of hard reset The W and M bits have no effect on the instruction cache For information on the synchronization requirements for changes to MSR IR and MSR DR refer to Section 2 3 2 4 Synchronization in this manual and Synchronization Requirements for Special Registers and for Lookaside Buffers in Chapter 2 PowerPC Register Set in The Programming Environments Manual 5 20 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 5 3 Block Address Translation The block address translation BAT mechanism in the OEA provides a way to map ranges of effective addresses larger than a single page into contiguous areas of physical memory Such areas can be used for data that is not subject to normal virtual memory handling paging such as a memory mapped display buffer or an extremely large array of numerical data Block address translation in the 750 is described in Chapter 7 Memory Management
232. ce Operation 9 3 The execution of the stwex instruction results in single beat writes from the L1 data cache These single beat writes are processed by the L2 cache according to hit miss status L1 and L2 write through configuration and reservation active status If the address associated with the stwex instruction misses in the L2 cache or if the reservation is no longer active the stwex instruction bypasses the L2 cache and is forwarded to the 60x bus interface If the stwex hits in the L2 cache and the reservation is still active one of the following actions occurs e Ifthe stwex hits a modified sector in the L2 cache independent of write through status or if the stwex hits both the L1 and L2 caches in copy back mode the stwex is written to the L2 and the reservation completes e If the stwex hits an unmodified sector in the L2 cache and either the L1 or L2 is in write through mode the stwex is forwarded to the 60x bus interface and the sector hit in the L2 cache is invalidated L1 cache block push operations generated by the execution of dcbf and debst instructions write through to the 60x bus interface and invalidate the L2 cache sector if they hit The execution of debf and debst instructions that do not cause a cache block push from the L1 cache are forwarded to the L2 cache to perform a sector invalidation and or push from the L2 cache to the 60x bus as required If the debf and debst instructions do not cause a sector push
233. ceeds the threshold management specified in THRM1 or THRM2 and MSR EE 1 750 specific interrupt 4 2 Exception Recognition and Priorities Exceptions are roughly prioritized by exception class as follows 1 Nonmaskable asynchronous exceptions have priority over all other exceptions system reset and machine check exceptions although the machine check exception condition can be disabled so the condition causes the processor to go directly into the checkstop state These exceptions cannot be delayed and do not wait for completion of any precise exception handling 2 Synchronous precise exceptions are caused by instructions and are taken in strict program order 3 Imprecise exceptions imprecise mode floating point enabled exceptions are caused by instructions and they are delayed until higher priority exceptions are taken Note that the 750 does not implement an exception of this type 4 Maskable asynchronous exceptions external decrementer thermal management system management performance monitor and interrupt exceptions are delayed until higher priority exceptions are taken The following list of exception categories describes how the 750 handles exceptions up to the point of signaling the appropriate interrupt to occur Note that a recoverable state is reached if the completed store queue is empty drained not canceled and any instruction that is next in program order and has been signaled to complete has complet
234. ces for the 750 are as follows TEA _ assertion on the 60X bus Address parity error on the 60X bus Data parity error on the 60X bus Data parity error on the L2 bus e Machine check input pin MCP_ e Checkstop input pin CKSTP_IN_ e DLL rollover for chip revision 3 0 and later for the 750 see Table 2 18 on page 2 25 8 7 3 Reset Inputs The 750 has two reset inputs described as follows Chapter 8 Bus Interface Operation 8 42 HRESET hard reset The HRESET signal is used for power on reset sequences or for situations in which the 750 must go through the entire cold start sequence of internal hardware initializations e SRESET soft reset The soft reset input provides warm reset capability This input can be used to avoid forcing the 750 to complete the cold start sequence When either HRESET is negated or SRESET transitions to asserted the processor attempts to fetch code from the system reset exception vector The vector is located at offset 0x00100 from the exception prefix all zeros or ones depending on the setting of the exception prefix bit in the machine state register MSR IP The MSR IP bit is set for HRESET 8 7 4 System Quiesce Control Signals The system quiesce control signals QREQ and QACK allow the processor to enter the nap or sleep low power states and bring bus activity to a quiescent state in an orderly fashion Prior to entering the nap or sleep power state the 750 asserts
235. cessor User s Manual LO invalid gt Allocate LO valid Ges L1 invalid gt Allocate L1 valid _ L2 invalid gt cee L2 valid b L3 invalid gt cas L3 valid 4 L4 invalid Allocate L4 valid Ee L5 invalid gt ra L5 valid pee L6 invalid gt eee L6 valid E L7 invalid gt llocate L7 valid BO 0 BO 1 B3 1 B4 1 B5 1 Ke Figure 3 5 PLRU Replacement Algorithm Chapter 3 Instruction and Data Cache Operation 3 19 Each cache is organized as eight blocks per set by 128 sets There is a valid bit for each block in the cache L 0 7 When all eight blocks in the set are valid the PLRU algorithm is used to select the replacement target There are seven PLRU bits B 0 6 for each set in the cache For every hit in the cache the PLRU bits are updated using the rules specified in Table 3 2 Table 3 2 PLRU Bit Update Rules If the Then the PLRU bits are Changed to Current A A To DECIA AN EN E EN ARE E E HERA E E ME Y E E EA EPR AAA EA IA E EE O A E A A EE E E A E AE AE TS AA ME E E E Note x Does not change If all eight blocks are valid then a block is selected for replacement according to the PLRU bit encodings shown in Table 3 3 Table 3 3 PLRU Replacement Block Selection Then the Block If the PLRU Bits Are Selected for Replacement 3 20 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual During power up or hard res
236. cessors May not be supported by the 740 Figure 2 1 Programming Model PowerPC 750 Microprocessor Registers 2 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The PowerPC UISA registers are user level General purpose registers GPRs and floating point registers FPRs are accessed through instruction operands Access to registers can be explicit by using instructions for that purpose such as Move to Special Purpose Register mtspr and Move from Special Purpose Register mfspr instructions or implicit as part of the execution of an instruction Some registers are accessed both explicitly and implicitly Implementation Note The 750 fully decodes the SPR field of the instruction If the SPR specified is undefined the illegal instruction program exception occurs The PowerPC s user level registers are described as follows e User level registers UISA The user level registers can be accessed by all software with either user or supervisor privileges They include the following General purpose registers GPRs The thirty two GPRs GPRO GPR31 serve as data source or destination registers for integer instructions and provide data for generating addresses See General Purpose Registers GPRs in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information Floating point registers FPRs The thirty two FPRs FPRO FPR31 serve as the data source or destinatio
237. ch rate for limiting power dissipation Power management is described in Chapter 10 Power and Thermal Management The 750 uses an advanced CMOS process technology and is fully compatible with TTL devices 1 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual E sng ed 119 79 DPI 9d1 M0d OU U JON sel zI sng Sselppy 1118 21 vy fa q I sng ele 1g v9 Ir lt HI 4911011409 d anend nosey z7 998e 19 u Sng Z7 uun anand peo eleq anand NOSLI 17 anand yoJey uonona su uun 998J18 u Sng x09 Sng Ssolppy ICC UO keny 1vaad reuibuo SYS NWN 210 Aua 9 Jayjng 19p1094 yun uonejduoy YISd4 Xx Wun julod bulye0 4 sang sweusy uol e S UOEAIoeo alld Ydd anand 910 S uoenaje9 y3 Hun 910 5 Pe07 Aug z uoEIe UONEAJasay sang sweusy iud wun Jaysibay wa s s uOne g uolTeMesoy X z uun 19691u uun feu uone s vonenesay U0o0EIe vonenasey y Det suogonsu y lg 831 ql keny 1V8l mopeys SHS NW uoan sul HI HI suononysu z Ug p9 Hd Anu p9 Ol
238. chronization Requirements for Special Registers and for Lookaside Buffers in Chapter 2 PowerPC Register Set of The Programming Environments Manual for serialization requirements and other recommended precautions to observe when manipulating the segment registers Table 2 58 Segment Register Manipulation Instructions Implementation Notes pes SS S S Move to Segment Register Move from Segment Register Move from Segment Register Indirect Move to Segment Register Indirect mistin o rD SR The shadow SRs in the instruction MMU can be read by setting HIDO RISEG before executing mfsr 2 3 6 3 3 Translation Lookaside Buffer Management Instructions OEA The address translation mechanism is defined in terms of the segment descriptors and page table entries PTEs PowerPC processors use to locate the logical to physical address mapping for a particular access These segment descriptors and PTEs reside in segment registers and page tables in memory respectively See Chapter 7 Memory Management for more information about TLB operations Table 2 59 summarizes the operation of the TLB instructions in the 750 Table 2 59 Translation Lookaside Buffer Management Instruction TLB Invalidates both ways in both instruction and data TLB entries at the index Invalidate provided by EA 14 19 lt executes regardless of the MSR DR and MSR IR Entry settings To invalidate all entries in both TLBs the programmer should i
239. cleared Set for a store instruction otherwise cleared Cleared Set for Iswx or stswx otherwise cleared Set for mtspr to SDR1 EAR HIDO PIR IBATs DBATs SRs Set for taken branch otherwise cleared 13 15 Cleared 16 31 MSR 16 31 Implementation Note The 750 processor diverges from the PowerPC architecture in that it does not take trace exceptions on the isync instruction When a trace exception is taken instruction fetching resumes as offset 0xOODO0 from the base address indicated by MSR IP 4 5 12 Floating Point Assist Exception 0x00E00 The optional floating point assist exception defined by the PowerPC architecture is not implemented in the 750 4 5 13 Performance Monitor Interrupt 0x00F00 The 750 microprocessor provides a performance monitor facility to monitor and count predefined events such as processor clocks misses in either the instruction cache or the data cache instructions dispatched to a particular execution unit mispredicted branches and other occurrences The count of such events can be used to trigger the performance monitor exception The performance monitor facility is not defined by the PowerPC architecture 4 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The performance monitor can be used for the following e To increase system performance with efficient software especially in a multiprocessing system Memory hierarchy behavior must be monitored and studied to develop algo
240. cles are written to both the cache and memory Glossary of Terms and Abbreviations Glossary 13 Glossary 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Index A AACK address acknowledge signal 7 14 ABB address bus busy signal 7 5 8 10 Address bus address tenure 8 9 address transfer An7 7 APE 8 15 APn 7 7 address transfer attribute CI 7 12 GBL 7 13 TBST 7 12 8 16 TSIZn 7 11 8 15 TTN 7 8 8 15 WT 7 13 address transfer start TS 7 6 8 14 address transfer termination AACK 7 14 ARTRY 7 14 terminating address transfer 8 21 arbitration signals 7 4 8 10 bus parking 8 13 Address translation see Memory management unit Addressing modes 2 35 Aligned data transfer 8 18 8 21 Alignment data transfers 8 18 exception 4 20 misaligned accesses 2 29 rules 2 29 An address bus signals 7 7 APE address parity error signal 8 15 APn address parity signals 7 7 Arbitration system bus 8 12 8 23 Arithmetic instructions floating point A 20 integer A 17 ARTRY address retry signal 7 14 Index B BG bus grant signal 7 4 8 10 Block address translation block address translation flow 5 12 definition 1 12 registers description 2 5 initialization 5 21 selection of block address translation 5 9 Boundedly undefined definition 2 33 BR bus request signal 7 4 8 10 Branch fall through 6 18 Branch folding 6 18 Branch instructions address calculation 2 53 condition register logical 2 54 A 26
241. clock speed and memory translation These issues are discussed further in the following sections 6 3 2 1 Cache Arbitration When the instruction fetcher requests instructions from the instruction cache two things may happen If the instruction cache is idle and the requested instructions are present they are provided on the next clock cycle However if the instruction cache is busy due to a cache line reload operation instructions cannot be fetched until that operation completes 6 3 2 2 Cache Hit If the instruction fetch hits the instruction cache it takes only one clock cycle after the request for as many as four instructions to enter the instruction queue Note that the cache is not blocked to internal accesses during a cache reload completes hits under misses The critical double word is written simultaneously to the cache and forwarded to the requesting unit minimizing stalls due to load delays Figure 6 5 shows a simple example of instruction fetching that hits in the on chip cache This example uses a series of integer add and double precision floating point add instructions to show how the number of instructions to be fetched is determined how program order is maintained by the instruction and completion queues how instructions are dispatched and retired in pairs maximum and how the FPU IU1 and IU2 pipelines function The following instruction sequence is examined 3 add 4 fadd 5 add 6 fadd 7 br 6 8 fsub 9 fadd 10
242. coherency of instruction caches and data memory Since instruction fetching may bypass the data cache changes made to items in the data cache may not be reflected in memory until after the instruction fetch completes Way 0 Way 1 i Words 0 7 Way 2 Address Tag 2 i Words 0 7 Way 3 Address Tag 3 i Words 0 7 Way 4 Address Tag 4 i Words 0 7 Way 5 Address Tag 5 i Words 0 7 Way 6 Address Tag 6 i Words 0 7 Way 7 Address Tag 7 i Words 0 7 x 8 Words Block Figure 3 3 Instruction Cache Organization 3 3 Memory and Cache Coherency The primary objective of a coherent memory system is to provide the same image of memory to all devices using the system Coherency allows synchronization and cooperative use of shared resources Otherwise multiple copies of a memory location some containing stale values could exist in a system resulting in errors when the stale values are used Each potential bus master must follow rules for managing the state of its cache This section describes the coherency mechanisms of the PowerPC architecture and the three state cache coherency protocol of the 750 data cache Note that unless specifically noted the discussion of coherency in this section applies to the 750 s data cache only The instruction cache is not snooped Instruction cache coherency must be maintained by software However the 750 does support a fast instr
243. crand cror crxor crnand crnor crandc creqv crorc and merf are executed by the BPU Some of these instructions can redirect instruction execution conditionally based on the value of bits in the CR Whenever the CR bits resolve the branch direction is either marked as correct or mispredicted Correcting a mispredicted branch requires that the 750 flush speculatively executed instructions and restore the machine state to immediately after the branch This correction can be done immediately upon resolution of the condition registers bits 2 3 4 4 2 Branch Instructions Table 2 41 lists the branch instructions provided by the PowerPC processors To simplify assembly language programming a set of simplified mnemonics and symbols is provided for the most frequently used forms of branch conditional compare trap rotate and shift and certain other instructions See Appendix F Simplified Mnemonics in The Programming Environments Manual for a list of simplified mnemonic examples Table 2 41 Branch Instructions el ee e Branch Conditional bc bca bel bcla BO Bl target_addr Branch Conditional to Link Register belr bcirl BO BI Branch Conditional to Count Register bcctr bcctrl BO BI 2 3 4 4 3 Condition Register Logical Instructions Condition register logical instructions shown in Table 2 42 and the Move Condition Register Field merf instruction are also defined as flow control instructions Table 2 42 Condition Register
244. ction cache block contains eight contiguous words from memory that are loaded from an eight word boundary that is bits A 27 31 of the logical effective addresses are zero as a result cache blocks are aligned with page boundaries Also address bits A 20 26 provide the index to select a set and bits A 27 29 select a word within a block The tags consist of bits PA O 19 Address translation occurs in parallel with set selection from A 20 26 and the higher order address bits the tag bits in the cache are physical The instruction cache differs from the data cache in that it does not implement MEI cache coherency protocol and a single state bit is implemented that indicates only whether a cache block is valid or invalid The instruction cache is not snooped so if a processor modifies a memory location that may be contained in the instruction cache software must ensure that such memory updates are visible to the instruction fetching mechanism This can be achieved with the following instruction sequence dcbst update memory sync wait for update icbi remove invalidate copy in instruction cache sync wait for ICBI operation to be globally performed isyne remove copy in own instruction buffer 3 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual These operations are necessary because the processor does not maintain instruction memory coherent with data memory Software is responsible for enforcing
245. ction dispatch 6 16 instruction flow 6 8 instruction scheduling guidelines 6 29 IU execution timing 6 24 latency summary 6 31 load store unit execution timing 6 25 overview 6 3 SRU execution timing 6 27 stage definition 6 2 TLB description 5 25 invalidate tlbie instruction 5 27 5 34 Index 10 LRU replacement 5 27 organization for ITLB and DTLB 5 25 TLB miss and table search operation 5 26 5 30 TLB invalidate description 5 27 TLB management instructions 2 67 A 28 TLB miss effect 6 28 tlbie 2 67 TLBISYNC TLBI sync signal 7 25 tlbsync 2 67 Transactions data cache 3 22 Transfer 8 14 8 25 Trap instructions 2 55 TS transfer start signal 7 6 8 14 TSIZn transfer size signals 7 11 8 15 TTn transfer type signals 7 8 8 15 U UMMCRO user monitor mode control register 0 2 15 11 5 UMMCRI user monitor mode control register 1 2 16 11 6 UPMCn user performance monitor counter registers 2 20 11 10 Use of TEA timing 8 38 User instruction set architecture UISA description 1 21 registers 2 3 User instruction set architecture UISA description xxv USIA user sampled instruction address regis ter 2 20 11 11 Using DBWO timing 8 45 V Virtual environment architecture VEA 1 21 Virtual environment architecture VEA xxvi W WIMG bits 8 30 Write back definition 6 3 Write through mode W bit cache interactions 3 6 Write with Atomic operation 3 27 IBM PowerPC 740 PowerPC 750 RISC Micro
246. ction of the MMU in a PowerPC processor is the translation of logical effective addresses to physical addresses referred to as real addresses in the architecture specification for memory accesses and I O accesses I O accesses are assumed to be memory mapped In addition the MMU provides access protection on a segment block or page basis This chapter describes the specific hardware used to implement the MMU model of the OEA in the 750 Refer to Chapter 7 Memory Management in The Programming Environments Manual for a complete description of the conceptual model Note that the 750 does not implement the optional direct store facility and it is not likely to be supported in future devices Two general types of memory accesses generated by PowerPC processors require address translation instruction accesses and data accesses generated by load and store instructions Generally the address translation mechanism is defined in terms of the segment descriptors and page tables PowerPC processors use to locate the effective to physical address mapping for memory accesses The segment information translates the effective address to an interim virtual address and the page table information translates the interim virtual address to a physical address The segment descriptors used to generate the interim virtual addresses are stored as on chip segment registers on 32 bit implementations such as the 750 In addition two translation lookaside buf
247. ction to pass through all the stages when the pipeline has been filled one instruction can complete its work on every clock cycle Chapter 6 Instruction Timing 6 3 Figure 6 1 represents a generic pipelined execution unit Stage 1 l Stage 2 l Stage 3 l l l l 7 Clock 1 Instruction B Instruction A Clock 3 I l l l l l l Clock 2 Instruction B Instruction A l l l I nc EE EE I I Figure 6 1 Pipelined Execution Unit The entire path that instructions take through the fetch decode dispatch execute complete and write back stages is considered the 750 s master pipeline and two of the 750 s execution units the FPU and LSU are also multiple stage pipelines The 750 contains the following execution units that operate independently and in parallel Branch processing unit BPU Integer unit 1 U1 executes all integer instructions Integer unit 2 1U2 executes all integer instructions except multiplies and divides 64 bit floating point unit FPU Load store unit LSU System register unit SRU The 750 can retire two instructions on every clock cycle In general the 750 processes instructions in four stages fetch decode dispatch execute and complete as shown in Figure 6 2 Note that the example of a pipelined execution unit in Figure 6 1 is similar to the three stage FPU pipeline in Figure 6 2 6 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manu
248. cular implementation may recognize exception conditions out of order they are handled in order When an instruction caused exception is recognized any unexecuted instructions that appear earlier in the instruction stream including any that are undispatched are required to complete before the exception is taken and any exceptions those instructions cause must also be handled first likewise asynchronous precise exceptions are recognized when they occur but are not handled until the instructions currently in the completion queue successfully retire or generate an exception and the completion queue is emptied Unless a catastrophic condition causes a system reset or machine check exception only one exception is handled at a time For example if one instruction encounters multiple exception conditions those conditions are handled sequentially After the exception handler handles an exception the instruction processing continues until the next exception condition is encountered Recognizing and handling exception conditions sequentially guarantees that exceptions are recoverable When an exception is taken information about the processor state before the exception was taken is saved in SRRO and SRR1 Exception handlers must save the information stored in SRRO and SRRI1 early to prevent the program state from being lost due to a system reset and machine check exception or due to an instruction caused exception in the exception handler and before
249. currently While most floating point instructions execute with three or four cycle latency and one or two cycle throughput three instructions fdivs fdiv and fres execute with latencies of 11 to 33 cycles The fdivs fdiv fres mtfsb0 mtfsb1 mtfsfi mffs and mtfsf instructions block the floating point unit pipeline until they complete execution and thereby inhibit the dispatch of additional floating point instructions See Table 6 7 for floating point instruction execution timing 6 24 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 6 4 4 Effect of Floating Point Exceptions on Performance BRAD one review said only the last sentence is correct Wanted to verify Floating point operations that reset the exception sticky bits in the FPSCR may suffer a performance penalty When an exception is disabled in the FPSCR and MSR FEO MSR FE1 0 updates to the FPSCR exception sticky bits are completion serializing which may delay execution by one or two cycles The penalty occurs only when the exception bit is toggled and not on subsequent operations with the same exception When an exception is enabled in the FPSCR the instruction traps to the floating point assist handler without updating the FPSCR or the target FPR The floating point assist handler is required to complete the instruction and is invoked regardless of the setting of MSR FEn For the fastest and most predictable floating point performance all except
250. d In addition to the double word straddle boundary condition the address translation logic can generate substantial exception overhead when the load store multiple and load store string instructions access misaligned data It is strongly recommended that software attempt to align data where possible Table 8 5 Misaligned Data Transfers Four Byte Examples Transfer Size Four Bytes Aligned Misaligned first access second access Misaligned first access second access Misaligned first access second access SSS Se O Misaligned first access oui 101 pg second access pepe EEEE Misaligned first access RESCH 10 EE EEN I apa second access peepee PEEL Misaligned first access MES 01 EE EEN E Ka second access pape PPE EEE Notes A Byte lane used Byte lane not used 8 3 2 4 1 Effect of Alignment in Data Transfers 32 Bit Bus The aligned data transfer cases for 32 bit data bus mode are shown in Table 8 6 All of the transfers require a single data beat if caching inhibited or write through except for double word cases which require two data beats The double word case is only generated by the 750 for load or store double operations to from the floating point GPRs All caching inhibited instruction fetches are performed as word operations Chapter 8 Bus Interface Operation 8 19 Table 8 6 Aligned Data Transfers 32 Bit Bus Mode Data Bus Byte Lane s Transfer Size TSIZO TSIZ1 TSIZ2
251. d Indexed ecowx Floating Select fsel Floating Reciprocal Estimate Single Precision fres Floating Reciprocal Square Root Estimate frsqrte Store Floating Point as Integer Word stfiwx 1 28 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 1 6 On Chip Cache Implementation The following subsections describe the PowerPC architecture s treatment of cache in general and the 750 specific implementation respectively A detailed description of the 750 cache implementation is provided in Chapter 3 Instruction and Data Cache Operation 1 6 1 PowerPC Cache Model The PowerPC architecture does not define hardware aspects of cache implementations For example PowerPC processors can have unified caches separate instruction and data caches Harvard architecture or no cache at all PowerPC microprocessors control the following memory access modes on a page or block basis e Write back write through mode e Caching inhibited mode e Memory coherency The caches are physically addressed and the data cache can operate in either write back or write through mode as specified by the PowerPC architecture The PowerPC architecture defines the term cache block as the cacheable unit The VEA and OEA define cache management instructions that a programmer can use to affect cache contents 1 6 2 PowerPC 750 Microprocessor Cache Implementation The 750 cache implementation is described in Section 1 2 4
252. d Negated Used when snooping for single beat reads read with no intent to cache Timing Comments Assertion Negation The same as A 0 31 7 2 4 4 Cache Inhibit Cl Output The cache inhibit CI signal is an output signal on the 750 Following are the state meaning and timing comments for the CI signal State Meaning Asserted Indicates that a single beat transfer will not be cached reflecting the setting of the I bit for the block or page that contains the address of the current transaction Negated Indicates that a burst transfer will allocate an 750 data cache block Timing Comments Assertion Negation The same as A 0 31 High Impedance The same as A 0 31 7 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 2 4 5 Write Through WT Output The write through WT signal is an output signal on the 750 Following are the state meaning and timing comments for the WT signal State Meaning Asserted Indicates that a single beat write transaction is write through reflecting the value of the W bit for the block or page that contains the address of the current transaction Assertion during a read operation indicates instruction fetching Negated Indicates that a write transaction is not write through during a read operation negation indicates a data load Timing Comments Assertion Negation The same as A 0 31 High Impedance The same as A 0 31 7 2 4 6 Global GBL The global GBL sig
253. d and when the signal is an input and an output NOTE A bar over a signal name indicates that the signal is active low for example ARTRY address retry and TS transfer start Active low signals are referred to as asserted active when they are low and negated when they are high Signals that are not active low such as AP O 3 address bus parity signals and TT O 4 transfer type signals are referred to as asserted when they are high and negated when they are low The 750 signals are grouped as follows Address arbitration The 750 uses these signals to arbitrate for address bus mastership Address transfer start These signals indicate that a bus master has begun a transaction on the address bus Address transfer These signals include the address bus and address parity signals They are used to transfer the address and to ensure the integrity of the transfer Transfer attribute These signals provide information about the type of transfer such as the transfer size and whether the transaction is bursted write through or cache inhibited Address transfer termination These signals are used to acknowledge the end of the address phase of the transaction They also indicate whether a condition exists that requires the address phase to be repeated Data arbitration The 750 uses these signals to arbitrate for data bus mastership Data transfer These signals which consist of the data bus and data parity are used to
254. d bus transactions 3 22 L2 interface cache configuration 9 2 cache global invalidation 9 7 cache initialization 9 6 cache testing 9 8 clock configuration 9 9 dcbi 9 4 eieio 9 4 L2 cache considerations 6 15 L2 cache interface signals 7 25 operation 9 2 Index 2 overview 9 1 SRAM timing examples 9 9 stwcx execution 9 4 sync 9 4 load store operations processor initiated 3 10 PLRU replacement 3 19 stwcx execution 9 4 Changed C bit maintenance recording 5 12 5 23 Checkstop signal 7 22 8 42 state 4 19 CI cache inhibit signal 7 12 CKSTP_IN CKSTP_OUT lt Default Para Font checkstop input output sig nals gt 7 22 Classes of instructions 2 32 Clean block operation 3 27 CLK_OUT signal 7 29 Clock signals PLL_CFGn 7 30 SYSCLK 7 29 Compare instructions floating point A 21 integer A 18 Completion completion unit resource requirements 6 30 considerations 6 16 definition 6 1 Context synchronization 2 36 Conventions xxx xxxiv 6 1 COP scan interface 8 44 Copy back mode 6 27 CR condition register CR logical instructions 2 54 A 26 CR description 2 3 CTR register 2 4 D DABR data address breakpoint register 2 7 DAR data address register 2 6 Data bus arbitration signals 7 15 8 10 bus arbitration 8 23 data tenure 8 9 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual data transfer 7 17 8 25 data transfer termination 7 19 8 26 Data cache block push operation 3 22 configuration 3 3 D
255. d execution MSR ME is set to allow the processor to continue execution at the machine check exception vector address Typically earlier processes cannot resume however operating systems can use the machine check exception handler to try to identify and log the cause of the machine check condition When a machine check exception is taken instruction fetching resumes at offset 0x00200 from the physical base address indicated by MSR IP 4 5 2 2 Checkstop State MSR ME 0 If MSR ME 0 and a machine check occurs the processor enters the checkstop state In addition the assertion of CKSTP_IN to the 750 causes checkstop Also if enabled by L2CR L2DRO a DLL rollover causes checkstop When a processor is in checkstop state instruction processing is suspended and generally cannot resume without the processor being reset The contents of all latches are frozen within two cycles upon entering checkstop state 4 5 3 DSI Exception 0x00300 A DSI exception occurs when no higher priority exception exists and an error condition related to a data memory access occurs The DSI exception is implemented as it is defined in the PowerPC architecture OEA In case of a TLB miss for a load store or cache operation a DSI exception is taken if the resulting hardware table search causes a page fault On the 750 a DSI exception is taken when a load or store is attempted to a direct store segment SR T 1 In the 750 a floating point load or st
256. d for the IU1 U2 FPU LSU and SRU The dispatch unit checks for source and destination register dependencies determines whether a position is available in the completion queue and inhibits subsequent instruction dispatching as required Branch instructions can be detected decoded and predicted from anywhere in the instruction queue For a more detailed discussion of instruction dispatch see Section 6 3 3 Instruction Dispatch and Completion Considerations 1 2 2 2 Branch Processing Unit BPU The BPU receives branch instructions from the sequential fetcher and performs CR lookahead operations on conditional branches to resolve them early achieving the effect of a zero cycle branch in many cases Unconditional branch instructions and conditional branch instructions in which the condition is known can be resolved immediately For unresolved conditional branch instructions the branch path is predicted using either the architecture defined static branch prediction or the 750 specific dynamic branch prediction Dynamic branch prediction is enabled if HIDO BHT 1 1 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual When a prediction is made instruction fetching dispatching and execution continue from the predicted path but instructions cannot complete and write back results to architected registers until the prediction is determined to be correct resolved When a prediction is incorrect the instructions from
257. d in bus clock 3 In this diagram the address bus termination input AACK is asserted to the 750 on the bus clock following assertion of TS as shown by the dependency line This is the minimum duration of the address transfer for the 750 the duration can be extended by delaying the assertion of AACK for one or more bus clocks 8 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual ADDR aack artry_in Figure 8 7 Address Bus Transfer 8 3 2 1 Address Bus Parity The 750 always generates 1 bit of correct odd byte parity for each of the 4 bytes of address when a valid address is on the bus The calculated values are placed on the AP 0 3 outputs when the 750 is the address bus master If the 750 is not the master and TS and GBL are asserted together qualified condition for snooping memory operations the calculated values are compared with the AP O 3 inputs If there is an error and address parity checking is enabled HIDO EBA set to 1 a machine check exception is generated An address bus parity error causes a checkstop condition if MSR ME is cleared to 0 For more information about checkstop conditions see Chapter 4 Exceptions 8 3 2 2 Address Transfer Attribute Signals The transfer attribute signals include several encoded signals such as the transfer type TT O 4 signals transfer burst TBST signal transfer size TSIZ O 2 signals write through WT and cache inhibit CI Section 7
258. d process is running This enables statistics to be gathered only during the execution of the marked process The states of MSR PR and MSR PM together define a state that the processor supervisor or program and the process marked or unmarked may be in at any time If this state matches a state specified by the MMCR the state for which monitoring is enabled counting is enabled The following are states that can be monitored e Supervisor only ser only arked and user only 2 lt Z ot marked and user only arked and supervisor only Z ot marked and supervisor only lt arked only H Ge DAS Z ot marked only In addition one of two unconditional counting modes may be specified e Counting is unconditionally enabled regardless of the states of MSR PM and MSR PR This can be accomplished by clearing MMCRO 0 4 e Counting is unconditionally disabled regardless of the states of MSR PM and MSR PR This is done by setting MMCRO 0 The performance monitor counters count specified events and are used to generate performance monitor exceptions when an overflow most significant bit is a 1 situation occurs The 750 performance monitor has four 32 bit registers that can count up to Ox7FFFFFFF 2 147 483 648 in decimal before overflowing Bit 0 of the registers is used to determine when an interrupt condition exists Chapter 11 Performance Monitor 11 11 11 4 Event Selection Event selection
259. date PTE R in Memory if R_Flag 1 Page Table Memory Protection Search Complete Violation Figure 5 9 Primary Page Table Search Memory 5 32 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Secondary Page Table Search Generate PA Using Primary Hash Function PA lt Base PA of PTEG Fetch PTE from PTEG PA PA 8 Fetch Next PTE in PTEG Fetch PTE 64 Bits from PA Otherwise PTE VSID API H V Segment Descriptor VSID EA API 1 1 s Secondary Page Table Search Hit Last PTE in PTEG we See Figure 5 9 Instruction Access Data Access Set SRR1 1 1 Set DSISR 1 1 ISI Exception DSI Exception Figure 5 10 Secondary Page Table Search Flow The LSU initiates out of order accesses without knowledge of whether it is legal to do so Therefore the MMU does not perform hardware table search due to TLB misses until the request is required by the program flow In these out of order cases the MMU does detect protection violations and whether a dcbz instruction specifies a page marked as write through or cache inhibited The MMU also detects alignment exceptions caused by the dcbz instruction and prevents the changed bit in the PTE from being updated erroneously in these cases If an MMU register is being accessed by an instruction in the instruction stream the IMMU stalls for one translation cycle to perform that operation The sequencer serializes
260. de is invoked the TAU is rendered inactive If the 750 is entering sleep mode with SYSCLK disabled the TAU should be configured to disable thermal management interrupts to avoid an unwanted thermal management interrupt when the SYSCLK input signal is restored Note For 750 revision 3 0 and later the TAU will no longer be operational in sleep mode 10 4 Instruction Cache Throttling The 750 provides an instruction cache throttling mechanism to effectively reduce the instruction execution rate without the complexity and overhead of dynamic clock control Instruction cache throttling when used in conjunction with the TAU and the dynamic power management capability of the 750 provides the system designer with a flexible means of controlling device temperature while allowing the processor to continue operating The instruction cache throttling mechanism simply reduces the instruction forwarding rate from the instruction cache to the instruction dispatcher Normally the instruction cache forwards four instructions to the instruction dispatcher every clock cycle if all the instructions hit in the cache For thermal management the 750 provides a supervisor level instruction cache throttling control ICTC SPR The instruction forwarding rate is reduced by writing a nonzero value into the ICTC FI field and enabling instruction cache throttling by setting the ICTC E bit to 1 The overall junction temperature reduction results from dynamic power mana
261. defined in the PowerPC architecture The dcbz instruction is treated as a store to the addressed byte with respect to address translation and protection If the block containing the byte addressed by the EA is in the data cache all bytes are cleared and the tag is marked as modified M If the block containing the byte addressed by the EA is not in the data cache and the corresponding page is caching allowed the block is established in the data cache without fetching the block from main memory and all bytes of the block are cleared and the tag is marked as modified M If the contents of the cache block are from a page marked memory coherence required M 1 an address only bus transaction is run prior to clearing the cache block The debz instruction is the only cache control instruction that causes a broadcast on the 60x bus when M 1 to maintain coherency The other cache control instructions are not broadcast unless broadcasting is specifically enabled through the HIDO ABE configuration bit The dcbz instruction executes regardless of whether the cache is locked but if the cache is disabled an alignment exception is generated If the page containing the byte addressed by the EA is caching inhibited or write through then the system alignment exception handler is invoked BAT and TLB protection violations generate DSI exceptions 3 4 2 3 Data Cache Block Store dcbst The effective address is computed translated and checked for p
262. definition 6 3 sync 4 12 SYNC operation 3 27 Synchronization context execution synchronization 2 36 execution of rfi 4 11 memory synchronization instructions 2 59 2 61 A 24 SYSCLK system clock signal 7 29 System call exception 4 21 System linkage instructions 2 55 2 65 list of instructions A 26 System management interrupt 4 25 10 1 System quiesce control signals QACK QREQ 8 43 System register unit execution timing 6 27 latency CR logical instructions 6 32 latency system register instructions 6 31 Index 9 T TA transfer acknowledge signal 7 19 Table search flow primary and secondary 5 31 TBEN time base enable signal 7 24 TBL TBU time base lower and upper regis ters 2 4 2 6 TBST transfer burst signal 7 12 8 16 8 25 TEA transfer error acknowledge signal 7 20 8 30 Termination 8 21 8 26 Thermal assist unit TAU 10 6 Thermal management interrupt exception 4 26 THRMn thermal management registers 2 21 10 7 Throughput definition 6 3 Timing considerations 6 7 Timing diagrams interface address transfer signals 8 14 burst transfers with data delays 8 37 L2 cache SRAM timing 9 9 single beat reads 8 33 single beat reads with data delays 8 35 single beat writes 8 34 single beat writes with data delays 8 36 use of TEA 8 38 using DBWO 8 45 Timing instruction BPU execution timing 6 18 branch timing example 6 23 cache hit 6 12 cache miss 6 15 execution unit 6 18 FPU execution timing 6 24 instru
263. dependent register 1 HID1 T his register reflects the state of PLL_CFG 0 3 clock signals The L2 cache control register L2CR is used to configure and operate the L2 cache It includes bits for enabling parity checking setting the L2 to processor clock ratio and identifying the type of RAM used for the L2 cache implementation Not supported in the 740 Performance monitor registers The following registers are used to define and count events for use by the performance monitor The performance monitor counter registers PMC1 PMC4 are used to record the number of times a certain event has occurred UPMC1 UPMC4 provide user level read access to these registers The monitor mode control registers MMCRO MMCR are used to enable various performance monitor interrupt functions UMMCRO UMMCRI1 provide user level read access to these registers Chapter 2 Programming Model 2 7 The sampled instruction address register SIA contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition USIA provides user level read access to the SIA The 750 does not implement the sampled data address register SDA or the user level read only USDA registers However for compatibility with processors that do those registers can be written to by boot code without causing an exception SDA is SPR 959 USDA is SPR 943 The instruct
264. description A 25 list of instructions 2 54 A 25 system linkage 2 55 2 65 A 26 trap 2 55 A 26 Branch prediction 6 1 6 22 Branch processing unit branch instruction timing 6 23 execution timing 6 18 latency branch instructions 6 31 overview 1 9 Branch resolution definition 6 1 resource requirements 6 30 BTIC branch target instruction cache 6 9 Burst data transfers 32 bit data bus 8 17 64 bit data bus 8 17 transfers with data delays timing 8 37 Bus arbitration see Data bus Bus configurations 8 41 Bus interface unit BIU 3 2 8 1 Bus transactions and L1 cache 3 22 Byte ordering 2 35 C Cache Index 1 bus interface unit 3 2 8 1 cache arbitration 6 11 cache block definition 3 3 cache characteristics 3 1 cache coherency description 3 5 overview 3 25 reaction to bus operations 3 26 cache control 3 13 cache control instructions bus operations 3 24 cache control 3 13 dcbi 2 66 dcbt 2 63 cache hit 6 11 cache integration 3 2 cache management instructions A 27 cache miss 6 14 cache operations cache block push operations 9 4 data cache transactions 3 22 instruction cache block fill 3 21 load store operations processor initiat ed 3 10 operations 3 18 overview 3 1 8 3 snoop response to bus transactions 3 26 cache unit overview 3 3 cache inhibited accesses 1 bit 3 6 data cache configuration 3 3 dcbf dcbst execution 9 4 icbi 9 4 instruction cache configuration 3 4 instruction cache throttling 10 10 Ll cache an
265. dge of the clock at the clock input of the L2 cache memory devices 7 2 9 17 L2 Low Power Mode Enable L2ZZ Output Following are the state meaning and timing comments for the L2ZZ signal State Meaning Asserted Negated Enables low power mode for certain L2 cache memory devices Operation of the signal is enabled through the L2CR Timing Comments Assertion Negation Occurs synchronously with the L2 clock when the 750 enters and exits the nap or sleep power modes after negation of this signal at least two L2 clock cycles will elapse before L2 cache operations resume The L2ZZ signal is driven low during assertion of HRESET 7 2 10 IEEE 1149 1a 1993 Interface Description The 750 has five dedicated JTAG signals which are described in Table 7 6 The test data input TDI and test data output TDO scan ports are used to scan instructions as well as data into the various scan registers for JTAG operations The scan operation is controlled by the test access port TAP controller which in turn is controlled by the test mode select TMS input sequence The scan data is latched in at the rising edge of test clock TCK Table 7 6 IEEE Interface Pin Descriptions e Weak Pullup A Signal Name Input Output IEEE 1149 1a Function Output Serial scan output signal 7 28 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Test reset TRST is a JTAG optional signal which is used to reset the TAP controller asynchronously The
266. dicates if the instruction is 64 bit and optional e Appendix B Instructions Not Implemented provides a list of the 32 bit and 64 bit PowerPC instructions that are not implemented in the 750 e This manual also includes a glossary and an index Suggested Reading This section lists additional reading that provides background for the information in this manual as well as general information about the PowerPC architecture General Information The following documentation provides useful information about the PowerPC architecture and computer architecture in general e The following books are available from the Morgan Kaufmann Publishers 340 Pine Street Sixth Floor San Francisco CA 94104 Tel 800 745 7323 U S A 415 392 2665 International internet address mkp mkp com The PowerPC Architecture A Specification for a New Family of RISC Processors Second Edition by International Business Machines Inc Updates to the architecture specification are accessible via the world wide web at http www austin ibm com tech ppc chg html e PowerPC Programming for Intel Programmers by Kip McClanahan IDG Books Worldwide Inc 919 East Hillsdale Boulevard Suite 400 Foster City CA 94404 Tel 800 434 3422 U S A 415 655 3022 International e PowerPC System Architecture by Tom Shanley Mindshare Inc 2202 Buttercup Drive Richardson TX 75082 Tel 214 231 2216 U S A 021 706 6000 United Kingdom 800
267. double precision one double word floating point operands The PowerPC architecture uses instructions that are four bytes long and word aligned It provides for byte half word and word operand loads and stores between memory and a set of 32 GPRs It also provides for word and double word operand loads and stores between memory and a set of 32 floating point registers FPRs Computational instructions do not modify memory To use a memory operand in a computation and then modify the same or another memory location the memory contents must be loaded into a register modified and then written back to the target location with distinct instructions PowerPC processors follow the program flow when they are in the normal execution state however the flow of instructions can be interrupted directly by the execution of an instruction or by an asynchronous event Either kind of exception may cause one of several components of the system software to be invoked Effective address computations for both data and instruction accesses use 32 bit unsigned binary arithmetic A carry from bit 0 is ignored in 32 bit implementations 1 5 2 PowerPC 750 Microprocessor Instruction Set The 750 instruction set is defined as follows e The 750 provides hardware support for all 32 bit PowerPC instructions e The 750 implements the following instructions optional to the PowerPC architecture External Control In Word Indexed eciwx External Control Out Wor
268. dress Calculation 5 1 2 MMU Organization Figure 5 1 shows the conceptual organization of a PowerPC MMU in a 32 bit implementation note that it does not describe the specific hardware used to implement the memory management function for a particular processor Processors may optionally implement on chip TLBs hardware support for the automatic search of the page tables for PTEs and other hardware features invisible to the system software not shown The 750 maintains two on chip TLBs with the following characteristics e 128 entries two way set associative 64 x 2 LRU replacement e Data TLB supports the DMMU instruction TLB supports the IMMU e Hardware TLB update e Hardware update of referenced R and changed C bits in the translation table In the event of a TLB miss the hardware attempts to load the TLB based on the results of a translation table search operation Figure 5 2 and Figure 5 3 show the conceptual organization of the 750 instruction and data MMUs respectively The instruction addresses shown in Figure 5 2 are generated by the processor for sequential instruction fetches and addresses that correspond to a change of program flow Data addresses shown in Figure 5 3 are generated by load store and cache instructions As shown in the figures after an address is generated the high order bits of the effective address EA 0 19 or a smaller set of address bits EA O n in the cases of blocks are translated
269. drive external SRAM into a low power mode when the nap or sleep modes are invoked The L2ZZ signal is enabled by setting the L2CR CTL bit to 1 Note that if bus snooping is to be performed through deassertion of the QACK signal the L2CR CTL bit should always be cleared to 0 Time base decrementer still enabled e Thermal management unit enabled Chapter 10 Power and Thermal Management 10 3 e Most functional units disabled e All nonessential input receivers disabled e Nap mode sequence Set nap bit HIDO 9 1 clear doze and sleep bits HIDO 8 and HIDO 10 0 The 750 asserts quiesce request QREQ signal System asserts quiesce acknowledge QACK signal The 750 enters sleep mode after several processor clocks e Nap mode bus snoop sequence System deasserts QACK signal for eight or more bus clock cycles The 750 snoops address tenure s on bus System asserts QACK signal to restore full nap mode e Several methods of returning to full power mode Assert INT SMI MCP machine check or decrementer interrupts Assert hard reset or soft reset e Transition to full power takes no more than a few processor cycles e PLL and DLL running and locked to SYSCLK 10 2 1 5 Sleep Mode Sleep mode consumes the least amount of power of the four modes since all functional units are disabled To conserve the maximum amount of power the PLL may be disabled by placing the PLL_CFG signals in the PLL bypass mode a
270. e Store stwex Write Same CRTRY write Ee T 0 l Push block to write Write with kill queue Store T 0 Write Same Pass single beat write Write with flus or stwex to memory queue h WIM 10x p Store T 0 Write CRTRY write or stwex WIM 10x Chapter 3 Instruction and Data Cache Operation 3 31 Table 3 7 MEI State Transitions Continued A Cache Bus Current Bus Operation P Cache Cache Actions Operation sync State Operation Store T 0 Write xix M CRTRY write Il or stwcx a S WIM 10x Push block to write Write with kill queue Ee If the reserved bit is set this operation is like other writes except the bus operation Ee uses a special encoding Datacache LE Game CRTRY dcbf Il block flush Data cache Push block to write Write with kill block flush queue Data cache Same CRTRY debst dcbst block store CES clean Clean Same Same No action action Datacache a block to write E with kill block store queue Datacache Alignment trap block set to Data cache Alignment trap block set to Datacache E M Clear block block set to Datacache block touch Datacache block touch Datacache block touch Same Pass single beatreadto Read memory queue Push block to write Write with kill queue Datacache Yes Same CRTRY dcbz E block set to Cast out of modified Write with kill block 3 32 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual
271. e QACK The 750 enters sleep mode after several processor clocks e Several methods of returning to full power mode Assert INT SMI or MCP interrupts Assert hard reset or soft reset e PLL and DLL may be disabled and SYSCLK may be removed while in sleep mode e Return to full power mode after PLL and SYSCLK are disabled in sleep mode Enable SYSCLK Reconfigure PLL into desired processor clock mode System logic waits for PLL startup and relock time 100 sec System logic asserts one of the sleep recovery signals for example INT or SMI Reconfigure DLL wait for DLL relock 640 L2 clock cycles and re enable L2 cache through the L2CR 10 2 2 Power Management Software Considerations Since the 750 is a dual issue processor with out of order execution capability care must be taken in how the power management mode is entered Furthermore nap and sleep modes require all outstanding bus operations to be completed before these power management modes are entered Normally during system configuration time one of the power management modes would be selected by setting the appropriate HIDO mode bit Later on the power management mode is invoked by setting the MSR POW bit To ensure a clean transition into and out of a power management mode set the MSR EE bit to 1 and execute the following code sequence sync mtmsr POW 1 isync continue Chapter 10 Power and Thermal Management 10 5 10 3 Thermal A
272. e Sanare PA 0 19 32 Kbyte 8 Way Set Associative 8 Way Set Associative Cache Logic Cache Logic Instructions 0 63 Data 0 63 Ea Effective Address PA Physical Address Figure 3 1 Cache Integration Both caches are tightly coupled to the 750 s bus interface unit to allow efficient access to the system memory controller and other bus masters The bus interface unit receives requests for bus operations from the instruction and data caches and executes the operations per the 60x bus protocol The BIU provides address queues prioritizing logic and bus control logic The BIU captures snoop addresses for data cache address queue and memory reservation lwarx and stwex instruction operations 3 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The data cache provides buffers for load and store bus operations All the data for the corresponding address queues load and store data queues is located in the data cache The data queues are considered temporary storage for the cache and not part of the BIU The data cache also provides storage for the cache tags required for memory coherency and performs the cache block replacement PLRU function The data cache supplies data to the GPRs and FPRs by means of the load store unit The 750 s LSU is directly coupled to the data cache to allow efficient movement of data to and from the general purpose and floating point registers The load store unit provides all logic r
273. e 4 2 Figure 4 3 Figure 4 4 Figure 5 1 Figure 5 2 Figure 5 3 Figure 5 4 Figure 5 5 Figure 5 6 Figure 5 7 Figure 5 8 Figure 5 9 Figure 5 10 Figure 6 1 Figure 6 2 Figure 6 3 Figure 6 4 Figure 6 5 Illustrations Illustrations Page mle Number PowerPC 750 Microprocessor Block DiagraM ooonoconnncninnnnonoconnconnnnnnnnnnncnonennnos 1 3 Cache A O E erine aa EuS 1 13 System Interact ek 1 16 PowerPC 750 Microprocessor Signal Groups 1 18 PowerPC 750 Microprocessor Programming Mode Registers 00 1 23 A oeei a e a E A O 1 34 Programming Model PowerPC 750 Microprocessor Registers scceeeeeees 2 2 Instruction Address Breakpoint Regester 2 9 Hardware Implementation Dependent Register 0 HIDO oooooonnncccccnccccoonccinnnnnos 2 9 Hardware Implementation Dependent Register 1 HID1 eens 2 13 Monitor Mode Control Register 0 OMMCRO 2 14 Monitor Mode Control Register 1 MMCR1 ec ceeceeeeececeeneeceeeeeceeneeenteeeesaes 2 16 Performance Monitor Counter Registers DMCT DMC A 2 16 Sampled Instruction Address Registers GA 2 20 Instruction Cache Throttling Control Register OCT 2 21 Thermal Management Registers 1 2 THRMI THRM2D cccoooccccnccccconcncnnncninnnos 2 22 Thermal Management Register 3 CTHRMD 2 23 E2 Cache Control Resister L2CR ida 2 24 EE 3 2 Data Cache Org amizati on esisiini ire aipa e darte 3 4 Instruction Cache Ors ami Zain cocida abit bien eee 3 5 MEI Cache C
274. e OEA defines the mechanism by which the exception is taken The PowerPC exception mechanism allows the processor to change to supervisor state as a result of unusual conditions arising in the execution of instructions and from external signals bus errors or various internal conditions When exceptions occur information about the state of the processor is saved to certain registers and the processor begins execution at an address exception vector predetermined for each exception Processing of exceptions begins in supervisor mode Although multiple exception conditions can map to a single exception vector often a more specific condition may be determined by examining a register associated with the exception for example the DSISR and the floating point status and control register FPSCR Also software can explicitly enable or disable some exception conditions The PowerPC architecture requires that exceptions be taken in program order therefore although a particular implementation may recognize exception conditions out of order they are handled strictly in order with respect to the instruction stream When an instruction caused exception is recognized any unexecuted instructions that appear earlier in the instruction stream including any that have not yet entered the execute state are required to complete before the exception is taken For example if a single instruction encounters multiple exception conditions those exceptions are t
275. e Write through Store operations to memory marked write through always update both system memory and the on chip cache on cache hits Because valid cache contents always match system memory marked write through cache hits from other devices do not cause modified data to be copied back as they do for locations marked write back However all write operations are passed to the bus which can limit performance Load operations that miss the on chip cache must wait for the external store operation Write through configuration is useful when cached data must agree with external memory for example video memory when shared global data may be needed often or when it is undesirable to allocate a cache block on a cache miss Chapter 3 Instruction and Data Cache Operation describes the caches memory configuration and snooping in detail 6 5 2 Effect of TLB Miss If a page address translation is not in a TLB the 750 hardware searches the page tables and updates the TLB when a translation is found Table 6 2 shows the estimated latency for the hardware TLB load for different cache configurations and conditions Table 6 2 TLB Miss Latencies Instruction and Data Clock Ratio Clock Ratio Cycles 1 1 12 2 100 cache miss 100 cache miss ae 4 1 5 2 2 2 memory The PTE table search assumes a hit in the first entry of the primary PTEG 6 28 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 6 6 Instruction Schedul
276. e address EA The 32 or 64 bit address specified for a load store or an instruction fetch This address is then submitted to the MMU for translation to either a physical memory address or an I O address Exception A condition encountered by the processor that requires special supervisor level processing Exception handler A software routine that executes when an exception is taken Normally the exception handler corrects the condition that caused the exception or performs some other meaningful task that may include aborting the program that caused the exception The address for each exception handler is identified by an exception vector offset defined by the architecture and a prefix selected via the MSR Exclusive state MEI state E in which only one caching device contains data that is also in system memory Execution synchronization A mechanism by which all instructions in execution are architecturally complete before beginning execution appearing to begin execution of the next instruction Similar to context synchronization but doesn t force the contents of the instruction buffers to be deleted and refetched Exponent In the binary representation of a floating point number the exponent is the component that normally signifies the integer power to which the value two is raised in determining the value of the represented number See also Biased exponent Fall through branch fall through A not taken branch On the Pow
277. e arbitration signals refer to Section 7 2 1 Address Bus Arbitration Signals and Section 7 2 6 Data Bus Arbitration Signals IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 8 2 2 Address Pipelining and Split Bus Transactions The 750 protocol provides independent address and data bus capability to support pipelined and split bus transaction system organizations Address pipelining allows the address tenure of a new bus transaction to begin before the data tenure of the current transaction has finished Split bus transaction capability allows other bus activity to occur either from the same master or from different masters between the address and data tenures of a transaction While this capability does not inherently reduce memory latency support for address pipelining and split bus transactions can greatly improve effective bus memory throughput For this reason these techniques are most effective in shared memory multimaster implementations where bus bandwidth is an important measurement of system performance External arbitration is required in systems in which multiple devices must compete for the system bus The design of the external arbiter affects pipelining by regulating address bus grant BG data bus grant DBG and address acknowledge AACK signals For example a one level pipeline is enabled by asserting AACK to the current address bus master and granting mastership of the address bus to the n
278. e as follows 1 BAT protection violation DSI exception 2 TLB protection violation DSI exception Data Cache Block rA rB The EA is computed translated and checked for protection violations Flush For cache hits with the tag marked M the cache block is written back to memory and the cache entry is invalidated For cache hits with the tag marked E the entry is invalidated For cache misses no further action is taken A debf is not broadcast unless HIDO ABE 1 regardless of WIMG settings The instruction acts like a load with respect to address translation and memory protection It executes regardless of whether the cache is disabled or locked The exception priorities from highest to lowest for dcbf are as follows 1 BAT protection violation DSI exception 2 TLB protection violation DSI exception Instruction Cache rA rB This instruction performs a virtual lookup into the instruction cache index Block Invalidate only The address is not translated so it cannot cause an exception All ways of a selected set are invalidated regardless of whether the cache is disabled or locked The 750 never broadcasts icbi onto the 60x bus Note 1A program that uses debt and debtst instructions improperly performs less efficiently To improve performance HIDO NOOPTI may be set which causes debt and dcbtst to be no oped at the cache They do not cause bus activity and cause only a 1 clock execution latency The default state of this bit
279. e block is in the exclusive E state the cache block remains in the exclusive E state e If the addressed cache block is in the modified M state the 750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the exclusive E state e If the address misses in the cache no action is taken For burst read transactions e If the addressed cache block is in the exclusive E state the cache block is placed in the invalid 1 state e If the addressed cache block is in the modified M state the 750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid I state e If the address misses in the cache no action is taken Read with intent to mo A RWITM operation is issued to acquire exclusive use of a memory dify RWITM location for the purpose of modifying it e If the addressed cache block is in the exclusive E state the cache block is placed in the invalid 1 state e If the addressed cache block is in the modified M state the 750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid I state e If the address misses in the cache no action is taken Write with flush atomic Write with flush atomic operations occur after the processor issues an stwex instruction e If the addressed cache block is in the exclusive E state the cache block is pl
280. e burst and size signals on the 60x bus are used to select the device these four signals output the 4 bit resource ID RID field located in the EAR The eciwx instruction also loads a word from the data bus that is output by the special device For more information about the relationship between these instructions and the system interface refer to Chapter 7 Signal Descriptions 2 3 6 PowerPC OEA Instructions The PowerPC operating environment architecture OEA includes the structure of the memory management model supervisor level registers and the exception model Implementations that conform to the OEA also adhere to the UISA and the VEA This section describes the instructions provided by the OEA 2 3 6 1 System Linkage Instructions OEA This section describes the system linkage instructions see Table 2 54 The user level se instruction lets a user program call on the system to perform a service and causes the processor to take a system call exception The supervisor level rfi instruction is used for returning from an exception handler Table 2 54 System Linkage Instructions OEA System Call e The sc instruction is context synchronizing Return from The rfi instruction is context synchronizing For the 750 this means the rfi Interrupt instruction works its way to the final stage of the execution pipeline updates architected registers and redirects the instruction flow 2 3 6 2 Processor Control Instructions
281. e cache However all accesses that miss in the locked cache are propagated to the L2 cache or 60x bus as single beat transactions Note that the CI signal always reflects the state of the caching inhibited memory cache access attribute the I bit independent of the state of HIDO DLOCK The 750 treats snoop hits to a locked data cache the same as snoop hits to an unlocked data cache However any cache block invalidated by a snoop hit remains invalid until the cache is unlocked The setting of the DLOCK bit must be preceded by a sync instruction to prevent the data cache from being locked during a data access 3 4 1 4 Instruction Cache Flash Invalidation The instruction cache is automatically invalidated when the 750 is powered up and during a hard reset However a soft reset does not automatically invalidate the instruction cache Software must use the HIDO instruction cache flash invalidate bit HIDO ICFI if instruction cache invalidation is desired after a soft reset Once HIDO ICFI is set through an mtspr operation the 750 automatically clears this bit in the next clock cycle provided that the instruction cache is enabled in the HIDO register Note that some PowerPC microprocessors accomplish instruction cache flash invalidation by setting and clearing HIDO ICFI with two consecutive mtspr instructions that is the bit is not automatically cleared by the microprocessor Software that has this sequence of operations does not need t
282. e endian modes SRU handles miscellaneous instructions Executes CR logical and Move to Move from SPR instructions mtspr and mfspr Single entry reservation station e Rename buffers Six GPR rename buffers Six FPR rename buffers Condition register buffering supports two CR writes per clock e Completion unit The completion unit retires an instruction from the six entry reorder buffer completion queue when all instructions ahead of it have been completed the instruction has finished execution and no exceptions are pending Guarantees sequential programming model precise exception model Monitors all dispatched instructions and retires them in order Tracks unresolved branches and flushes instructions from the mispredicted branch Retires as many as two instructions per clock e Separate on chip instruction and data caches Harvard architecture 32 Kbyte eight way set associative instruction and data caches Pseudo least recently used PLRU replacement algorithm 32 byte eight word cache block Physically indexed physical tags Note that the PowerPC architecture refers to physical address space as real address space Cache write back or write through operation programmable on a per page or per block basis Instruction cache can provide four instructions per clock data cache can provide two words per clock Caches can be disabled in software Chapter 1 Pow
283. e in queue position 0 of their respective load store queues are monitored when a threshold event is selected in PMC1 The 750 cannot accurately track threshold events with respect to the following types of loads and stores Unaligned load and store operations that cross a word boundary Load and store multiple operations Load and store string operations IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Appendix A PowerPC Instruction Set Listings This appendix lists the PowerPC 750 microprocessor s instruction set as well as the additional PowerPC instructions not implemented in the 750 Instructions are sorted by mnemonic opcode function and form Also included in this appendix is a quick reference table that contains general information such as the architecture level privilege level and form and indicates if the instruction is 64 bit and optional Note that the 750 is a 32 bit microprocessor and doesn t implement any 64 bit instructions Note that split fields that represent the concatenation of sequences from left to right are shown in lowercase For more information refer to Chapter 8 Instruction Set in The Programming Environments Manual A 1 Instructions Sorted by Mnemonic Table A 1 lists the instructions implemented in the PowerPC architecture in alphabetical order by mnemonic Key SS Reserved bits Table A 1 Complete Instruction List Sorted by Mnemonic Name 0
284. e of sync may degrade performance System designs with an L2 cache should take special care to recognize the hardware signaling caused by a SYNC bus operation and perform the appropriate actions to guarantee that memory references that may be queued internally to the L2 cache have been performed globally See 2 3 5 2 Memory Synchronization Instructions VEA for details about additional memory synchronization eieio and isync instructions In the PowerPC architecture the Rc bit must be zero for most load and store instructions If Re is set the instruction form is invalid for sync and lwarx instructions If the 750 encounters one of these invalid instruction forms it sets CRO to an undefined value 2 3 5 PowerPC VEA Instructions The PowerPC virtual environment architecture VEA describes the semantics of the memory model that can be assumed by software processes and includes descriptions of the cache model cache control instructions address aliasing and other related issues Implementations that conform to the VEA also adhere to the UISA but may not necessarily adhere to the OEA This section describes additional instructions that are provided by the VEA 2 3 5 1 Processor Control Instructions VEA In addition to the move to condition register instructions specified by the UISA the VEA defines the mftb instruction user level instruction for reading the contents of the time base register see Chapter 3 Instruction
285. e perspective of the processor it is safe to continue that is processor state data such as that saved to SRRO is valid but it does not guarantee that the interrupted process is recoverable 22 23 24 25 26 27 28 29 30 1 Little endian mode enable O The processor runs in big endian mode 1 The processor runs in little endian mode Note Full function reserved bits are saved in SRR1 when an exception occurs partial function reserved bits are not saved The IEEE floating point exception mode bits FEO and FE1 together define whether floating point exceptions are handled precisely imprecisely or whether they are taken at all As shown in Table 4 5 if either FEO or FE are set the 750 treats exceptions as precise MSR bits are guaranteed to be written to SRR1 when the first instruction of the exception handler is encountered For further details see Chapter 6 Exceptions of The Programming Environments Manual Chapter 4 Exceptions 4 9 Table 4 5 IEEE Floating Point Exception Mode Bits EN Floating point exceptions disabled Imprecise nonrecoverable For this setting the 750 operates in floating point precise mode EEN Imprecise recoverable For this setting the 750 operates in floating point precise mode Floating point precise mode 4 3 1 Enabling and Disabling Exceptions When a condition exists that may cause an exception to be generated it must be determined whether the exception is enabled for that condi
286. e that although memory accesses that miss in the cache are forwarded to the memory queue for future arbitration for the external bus all potential synchronous exceptions have been resolved before the cache In addition although subsequent memory accesses can address the cache full coherency checking between the cache and the memory queue is provided to avoid dependency conflicts 3 3 5 3 Atomic Memory References The PowerPC architecture defines the Load Word and Reserve Indexed Iwarx and the Store Word Conditional Indexed stwex instructions to provide an atomic update function for a single aligned word of memory These instructions can be used to develop a rich set of multiprocessor synchronization primitives Note that atomic memory references constructed using lwarx stwex instructions depend on the presence of a coherent memory Chapter 3 Instruction and Data Cache Operation 3 11 system for correct operation These instructions should not be expected to provide atomic access to noncoherent memory For detailed information on these instructions refer to Chapter 2 Programming Model in this book and Chapter 8 Instruction Set in The Programming Environments Manual The Iwarx instruction performs a load word from memory operation and creates a reservation for the 32 byte section of memory that contains the accessed word The reservation granularity is 32 bytes The lwarx instruction makes a nonspecific reservation with resp
287. e that the rising edge of the L2 clock is coincident at the K input of all SRAMs and at the L2Sync_In input of the 750 The clock A network can be used solely or the clock B network can also be used depending on loading frequency and number of SRAMs No pull up resistors are normally required for the L2 interface The 750 supports only one bank of SRAMs For high speed operation no more than two loads should be presented on each L2 interface signal Figure 9 26 Typical 1 Mbyte L2 Cache Configuration 9 1 1 L2 Cache Operation The 750 s L2 cache is a combined instruction and data cache that receives memory requests from both L1 instruction and data caches independently The L1 requests are generally the result of instruction fetch misses data load or store misses write through operations or cache management instructions Each L1 request generates an address lookup in the L2 tags If a hit occurs the instructions or data are forwarded to the L1 cache A miss in the L2 tags causes the L1 request to be forwarded to the 60x bus interface The cache block received from the bus is forwarded to the L1 cache immediately and is also loaded into the L2 cache with the tag marked valid and unmodified If the cache block loaded into the L2 causes a new tag entry to be allocated and the current tag entry is marked valid modified the modified sectors of the tag to be replaced are castout from the L2 cache to the 60x bus 9 2 IBM PowerPC
288. each instruction implemented on the 750 the latency for each instruction and other information that is useful for the assembly language programmer 6 1 Terminology and Conventions This section provides an alphabetical glossary of terms used in this chapter These definitions are provided as a review of commonly used terms and as a way to point out specific ways these terms are used in this chapter e Branch prediction The process of guessing whether a branch will be taken Such predictions can be correct or incorrect the term predicted as it is used here does not imply that the prediction is correct successful The PowerPC architecture defines a means for static branch prediction as part of the instruction encoding e Branch resolution The determination of whether a branch is taken or not taken A branch is said to be resolved when the processor can determine which instruction path to take If the branch is resolved as predicted the instructions following the predicted branch that may have been speculatively executed can complete see completion If the branch is not resolved as predicted instructions on the mispredicted path and any results of speculative execution are purged from the pipeline and fetching continues from the nonpredicted path e Completion Completion occurs when an instruction has finished executing written back any results and is removed from the completion queue When an instruction completes it is gua
289. each a recoverable state A hard reset has the highest priority of any exception It is always nonrecoverable Table 4 9 shows the state of the machine just before it fetches the first instruction of the system reset handler after a hard reset In Table 4 9 the term Unknown means that the content may have been disordered These facilities must be properly initialized before use The FPRs BATs and TLBs may have been disordered To initialize the BATs first set them all to zero then to the correct values before any address translation occurs Chapter 4 Exceptions 4 15 Table 4 9 Settings Caused by Hard Reset Register Setting Register Setting GPRs Unknown PVR see the PowerPC 740 and PowerPC 750 Embedded Microprocessor Hardware Specifications FPRs Unknown HIDO 00000000 FPSCR 00000000 HID1 00000000 CR All 0s DMISS and All 0s IMISS SRs Unknown DCMP and All Os ICMP MSR 00000040 only IP set RPA All 0s XER 00000000 IABR All Os break point disabled TBU 00000000 DSISR 00000000 TBL 00000000 DAR 00000000 LR 00000000 DEC FFFFFFFF CTR 00000000 HASH1 00000000 SDR1 00000000 HASH2 00000000 SRRO 00000000 TLBs Unknown SRR1 00000000 Reservation Unknown reservation flag Address cleared SPRGs 00000000 BATs Unknown Tag directory All entries are marked invalid Cache Icache All blocks are unchanged from Icache and all LRU bits are set to 0 and and Dcache before HRESET Dcache caches are disabled DA
290. eat Writes Showing Data Delay Controls 8 36 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Figure 8 20 shows the use of data delay controls with burst transfers Note that all bidirectional signals are three stated between bus tenures Note the following e The first data beat of bursted read data clock 0 is the critical quad word e The write burst shows the use of TA signal negation to delay the third data beat e The final read burst shows the use of DRTRY on the third data beat e The address for the third transfer is delayed until the first transfer completes es D D ES o o 7 8 9 10 11 12 13 14 15 16 17 18 19 20 O S peed ME A ona 7 fh Figure 8 20 Burst Transfers with Data Delay Controls Chapter 8 Bus Interface Operation 8 37 Figure 8 21 shows the use of the TEA signal Note that all bidirectional signals are three stated between bus tenures Note the following e The first data beat of the read burst in clock 0 is the critical quad word e The TEA signal truncates the burst write transfer on the third data beat e The 750 eventually causes an exception to be taken on the TEA event ch p2 3 4 ps 6 7 8 9 10 11 12 13 14 15 16 17 Caor g Ja gt Di Cc pu EZ da Figure 8 21 Use of Transfer Error Acknowledge TEA 8 38 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 8 6 Optional Bus Configuration The 750 supports optional
291. ecific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 mfspr 31 D spr 339 0 A 36 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual mftb 31 D tor 371 0 mtcrf 31 S 0 CRM 0 144 0 mtspr 31 D spr 467 0 Note d Supervisor and user level instruction Table A 39 XFL Form so p fe Specific Instructions O Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 mt stx 63 0 FM 0 B 711 Re Table A 40 XS Form OPCD S A sh XO sh He Specific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Note 1 64 bit instruction Table A 41 XO Form OPCD D A B OE XO Re OPCD D A B 0 XO Re OPCD D A 00000 OE XO Re Specific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 addx 31 D A B OE 266 Re addcx 31 D A B OE 10 Re addex 31 D A B OE 138 Re addmex 31 D A 00000 JOE 234 Re addzex 31 D A 00000 JOE 202 Re divdx 31 D A B OE 489 Re divdux 31 D A B OE 457 Re divwx 31 D A B OE 491 Re Appendix A PowerPC Instruction Set Listings A 37 divwux 31 D A B OE 459 Re mulhdx 31 D A B 0 73 Re mulhdux 31 D A B 0 9 Re mulhwx 31 D A B 0 75 Re mulhwux 31 D A B 0 11 Re mulldx 31 D A B OE 233 Re mullwx 31 D A B
292. eckstop Input CKSTP_IN Input Following are the state meaning and timing comments for the CKSTP_IN signal State Meaning Asserted Indicates that the 750 must terminate operation by internally gating off all clocks and release all outputs except CKSTP_OUT to the high impedance state Once CKSTP_IN has been asserted it must remain asserted until the system has been reset Negated Indicates that normal operation should proceed See Section 8 7 2 Checkstops Timing Comments Assertion May occur at any time and may be asserted asynchronously to the input clocks Negation May occur any time after the CKSTP_OUT output signal has been asserted 7 2 9 5 Checkstop Output CKSTP_OUT Output Note that the CKSTP_OUT signal is an open drain type output and requires an external pull up resistor for example 10 k to V to assure proper de assertion of the CKSTP_OUT signal Following are the state meaning and timing comments for the CKSTP_OUT signal State Meaning Asserted Indicates that the 750 has detected a checkstop condition and has ceased operation Negated Indicates that the 750 is operating normally See Section 8 7 2 Checkstops Timing Comments Assertion May occur at any time and may be asserted asynchronously to the 750 input clocks Negation Is negated upon assertion of HRESET 7 2 9 6 Reset Signals There are two reset signals on the 750 hard reset HRESET and soft reset SRESET De
293. ect to the executing processor and a specific reservation with respect to other masters This means that any subsequent stwex executed by the same processor regardless of address will cancel the reservation Also any bus write or invalidate operation from another processor to an address that matches the reservation address will cancel the reservation The stwex instruction does not check the reservation for a matching address The stwex instruction is only required to determine whether a reservation exists The stwex instruction performs a store word operation only if the reservation exists If the reservation has been cancelled for any reason then the stwex instruction fails and clears the CRO EQ bit in the condition register The architectural intent is to follow the lwarx stwex instruction pair with a conditional branch which checks to see whether the stwex instruction failed If the page table entry is marked caching allowed WIMG x0xx and an Iwarx access misses in the cache then the 750 performs a cache block fill If the page is marked caching inhibited WIMG x1xx or the cache is locked and the access misses then the Iwarx instruction appears on the bus as a single beat load All bus operations that are a direct result of either an Iwarx instruction or an stwex instruction are placed on the bus with a special encoding Note that this does not force all lwarx instructions to generate bus transactions but rather provides a mea
294. ed If MSR RI 0 the 750 is in a nonrecoverable state Also instruction completion is defined as updating all architectural registers associated with that instruction and then removing that instruction from the completion buffer e Exceptions caused by asynchronous events interrupts These exceptions are further distinguished by whether they are maskable and recoverable Asynchronous nonmaskable nonrecoverable System reset for assertion of HRESET Has highest priority and is taken immediately regardless of other pending exceptions or recoverability Includes power on reset 4 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Asynchronous maskable nonrecoverable Machine check exception Has priority over any other pending exception except system reset for assertion of HRESET Taken immediately regardless of recoverability Asynchronous nonmaskable recoverable System reset for SRESET Has priority over any other pending exception except system reset for HRESET or power on reset or machine check Taken immediately when a recoverable state is reached Asynchronous maskable recoverable System management performance monitor thermal management external and decrementer interrupts Before handling this type of exception the next instruction in program order must complete If that instruction causes another type of exception that exception is taken and the asynchronous maskable rec
295. ed Negated Represents the physical address real address in the architecture specification of the data to be transferred On burst transfers the address bus presents the double word aligned address containing the critical code data that missed the cache on a read operation or the first double word of the cache line on a write operation Note that the address output during burst operations is not incremented See Section 8 3 2 Address Transfer Timing Comments Assertion Negation Occurs on the bus clock cycle after a qualified bus grant coincides with assertion of ABB and TS High Impedance Occurs one bus clock cycle after AACK is asserted 7 2 3 1 2 Address Bus A 0 31 Input Following are the state meaning and timing comments for the A 0 31 input signals State Meaning Asserted Negated Represents the physical address of a snoop operation Timing Comments Assertion Negation Must occur on the same bus clock cycle as the assertion of TS is sampled by 750 only on this cycle 7 2 3 2 Address Bus Parity AP 0 3 The address bus parity AP O 3 signals are both input and output signals reflecting one bit of odd byte parity for each of the 4 bytes of address when a valid address is on the bus 7 2 3 2 1 Address Bus Parity AP 0 3 Output Following are the state meaning and timing comments for the AP 0 3 output signals on the 750 State Meaning Asserted Negated Represents odd parity for each of the 4
296. ed opcodes for instructions defined only for 64 bit implementations are illegal in 32 bit implementations and vice versa The following primary opcodes have unused extended opcodes 17 19 31 59 63 Primary opcodes 30 and 62 are illegal for all 32 bit implementations but as 64 bit opcodes they have some unused extended opcodes e An instruction consisting of only zeros is guaranteed to be an illegal instruction This increases the probability that an attempt to execute data or uninitialized memory invokes the system illegal instruction error handler a program exception Note that if only the primary opcode consists of all zeros the instruction is considered a reserved instruction as described in Section 2 3 1 4 Reserved Instruction Class The 750 invokes the system illegal instruction error handler a program exception when it detects any instruction from this class or any instructions defined only for 64 bit implementations See Section 4 5 7 Program Exception 0x00700 for additional information about illegal and invalid instruction exceptions Except for an instruction consisting of binary zeros illegal instructions are available for additions to the PowerPC architecture 2 3 1 4 Reserved Instruction Class Reserved instructions are allocated to specific implementation dependent purposes not defined by the PowerPC architecture Attempting to execute an unimplemented reserved instruction invokes the illegal instructio
297. ed to be set by the processor in some scenarios the architecture allows that the bits may be set not absolutely required and in some scenarios the bits are Chapter 5 Memory Management 5 23 guaranteed to not be set Note that when the 750 updates the R and C bits in memory the accesses are performed as if MSR DR 0 and G 0 that is as nonguarded cacheable operations in which coherency is required Table 5 8 defines a prioritized list of the R and C bit settings for all scenarios The entries in the table are prioritized from top to bottom such that a matching scenario occurring closer to the top of the table takes precedence over a matching scenario closer to the bottom of the table For example if an stwex instruction causes a protection violation and there is no reservation the C bit is not altered as shown for the protection violation case Note that in the table load operations include those generated by load instructions by the eciwx instruction and by the cache management instructions that are treated as a load with respect to address translation Similarly store operations include those operations generated by store instructions by the ecowx instruction and by the cache management instructions that are treated as a store with respect to address translation Table 5 8 Model for Guaranteed R and C Bit Settings Causes Setting of R Bit Causes Setting of C Bit DN 750 1 No execute protection violation 2 Page protect
298. edicted branch instruction 6 6 1 2 Dispatch Unit Resource Requirements The following is a list of resources required to avoid stalls in the dispatch unit IQ 0 and IQ 1 are the two dispatch entries in the instruction queue e Requirements for dispatching from IQ O are as follows Needed execution unit available Needed GPR rename registers available Needed FPR rename registers available Completion queue is not full A completion serialized instruction is not being executed e Requirements for dispatching from IQ 1 are as follows Instruction in IQ 0 must dispatch Instruction dispatched by IQ O is not completion or refetch serialized Needed execution unit is available after dispatch from IQ O Needed GPR rename registers are available after dispatch from IQ 0 Needed FPR rename register is available after dispatch from IQ O Completion queue is not full after dispatch from IQ O 6 6 1 3 Completion Unit Resource Requirements The following is a list of resources required to avoid stalls in the completion unit note that the two completion entries are described as CQ O and CQ 1 where CQ O is the completion queue located at the end of the completion queue see Figure 6 4 e Requirements for completing an instruction from CQ O are as follows Instruction in CQ O must be finished Instruction in CQ O must not follow an unresolved predicted branch Instruction in
299. egation Driven valid by L2 cache memory during read Operations 7 2 9 10 L2 Data Parity L2DP 0 7 The eight data bus parity L2DP 0 7 signals on the 750 are both output and input signals 7 2 9 10 1 L2 Data Parity L2DP 0 7 Output Following are the state meaning and timing comments for the L2 data parity output signals State Meaning Asserted Negated Represents odd parity for each of the 8 bytes of L2 cache data during write transactions Odd parity means that an odd number of bits including the parity bit are driven high Note that parity bit O is associated with bits 0 7 byte lane 0 of the L2DATA bus Timing Comments Assertion Negation The same as L2DATA 0 63 High Impedance The same as L2DATA 0 63 7 2 9 10 2 L2 Data Parity L2DP 0 7 Input Following are the state meaning and timing comments for the L2 parity input signals State Meaning Asserted Negated Represents odd parity for each byte of L2 cache read data Timing Comments Assertion Negation The same as L2DATA 0 63 7 2 9 11 L2 Chip Enable L2CE Output Following are the state meaning and timing comments for the L2CE signal State Meaning Asserted Indicates that the L2 cache memory devices are being selected for a read or write operation Negated Indicates that the 750 is not selecting the L2 cache memory devices for a read or write operation Timing Comments Assertion Negation May occur on any cycle L2CE is driven hig
300. eld in an instruction encoding for these cases is considered undefined The 750 does not support either of the two floating point imprecise modes supported by the PowerPC architecture Unless exceptions are disabled MSR FEO MSR FE1 0 all floating point exceptions are treated as precise When a program exception is taken instruction fetching resumes at offset 0x00700 from the physical base address indicated by MSR IP Chapter 6 Exceptions in The Programming Environments Manual describes register settings for this exception 4 5 8 Floating Point Unavailable Exception 0x00800 The floating point unavailable exception is implemented as defined in the PowerPC architecture A floating point unavailable exception occurs when no higher priority exception exists an attempt is made to execute a floating point instruction including floating point load store or move instructions and the floating point available bit in the MSR is disabled MSR FP 0 Register settings for this exception are described in Chapter 6 Exceptions in The Programming Environments Manual When a floating point unavailable exception is taken instruction fetching resumes at offset 0x00800 from the physical base address indicated by MSR IP 4 5 9 Decrementer Exception 0x00900 The decrementer exception is implemented in the 750 as it is defined by the PowerPC architecture The decrementer exception occurs when no higher priority exception exist
301. eloped between address and data tenures of a read operation Because the 750 can dynamically optimize run time ordering of load store traffic overall performance is improved The system interface is specific for each PowerPC microprocessor implementation The 750 signals are grouped as shown in Figure 1 3 Signals are provided for clocking and control of the L2 caches as well as separate L2 address and data buses Test and control signals provide diagnostics for selected internal circuits Chapter 1 PowerPC 740 PowerPC 750 Overview 1 15 Address Arbitration Data Arbitration Address Start Data Transfer Address Transfer Data Termination Transfer Attribute 750 L2 Cache Clock Control Address Termination L2 Cache Address Data Clocks Processor Status Control System Status Test and Control Von Vpp VO Not supported in the 740 Figure 1 3 System Interface The system interface supports address pipelining which allows the address tenure of one transaction to overlap the data tenure of another The extent of the pipelining depends on external arbitration and control circuitry Similarly the 750 supports split bus transactions for systems with multiple potential bus masters one device can have mastership of the address bus while another has mastership of the data bus Allowing multiple bus transactions to occur simultaneously increases the available bus bandwidth for other activity The 750 s clocking structure supports a wide ran
302. ems that do not generate parity Not hard reset software use only O A hard reset occurred if software had previously set this bit 1 A hard reset has not occurred Chapter 4 Exceptions 4 17 A TEA indication on the bus can result from any load or store operation initiated by the processor In general TEA is expected to be used by a memory controller to indicate that a memory parity error or an uncorrectable memory ECC error has occurred Note that the resulting machine check exception is imprecise and unordered with respect to the instruction that originated the bus operation If MSR ME and the appropriate HIDO bits are set the exception is recognized and handled otherwise the processor generates an internal checkstop condition When the exception is recognized all incomplete stores are discarded The bus protocol operates normally A machine check exception may result from referencing a nonexistent physical address either directly with MSR DR 0 or through an invalid translation If a debz instruction introduces a block into the cache associated with a nonexistent physical address a machine check exception can be delayed until an attempt is made to store that block to main memory Not all PowerPC processors provide the same level of error checking Checkstop sources are implementation dependent Machine check exceptions are enabled when MSR ME 1 this is described in the following section 4 5 2 1 If MSR ME 0 and a
303. enabling external interrupts The PowerPC architecture supports four types of exceptions e Synchronous precise These are caused by instructions All instruction caused exceptions are handled precisely that is the machine state at the time the exception occurs is known and can be completely restored This means that excluding the trap and system call exceptions the address of the faulting instruction is provided to the exception handler and that neither the faulting instruction nor subsequent instructions in the code stream will complete execution before the exception is taken Once the exception is processed execution resumes at the address of the faulting instruction or at an alternate address provided by the exception handler When an exception is taken due to a trap or system call instruction execution resumes at an address provided by the handler e Synchronous imprecise The PowerPC architecture defines two imprecise floating point exception modes recoverable and nonrecoverable Even though the 750 provides a means to enable the imprecise modes it implements these modes identically to the precise mode that is enabled floating point exceptions are always precise 1 30 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual e Asynchronous maskable The PowerPC architecture defines external and decrementer interrupts as maskable asynchronous exceptions When these exceptions occur their handling is postponed unt
304. ency keeps an instruction from starting execution that instruction is dispatched to the reservation station associated with its execution unit and the rename registers are assigned thereby freeing the positions in the instruction queue so instructions can be dispatched to other execution units Execution begins during the same clock cycle that the rename buffer is updated with the data the instruction is dependent on If both instructions in IQO and IQ1 require the same execution unit the instruction in IO cannot be dispatched until the first instruction proceeds through the pipeline and provides the subsequent instruction with a vacancy in the requested execution unit The completion unit maintains program order after instructions are dispatched from the instruction queue guaranteeing in order completion and a precise exception model Completing an instruction implies committing execution results to the architected destination registers In order completion ensures the correct architectural state when the 750 must recover from a mispredicted branch or an exception Instruction state and all information required for completion is kept in the six entry first in first out completion queue An completion queue entry is allocated for each instruction when it is dispatched to an execute unit if no entry is available the dispatch unit stalls A maximum of two instructions per cycle may be completed and retired from the completion queue and the flo
305. endix F Simplified Mnemonics in The Programming Environments Manual for examples The UISA states that an implementation that executes instructions that set the overflow enable bit OE or the carry bit CA may either execute these instructions slowly or prevent execution of the subsequent instruction until the operation completes Chapter 6 Instruction Timing describes how the 750 handles CR dependencies The summary overflow bit SO and overflow bit OV in the integer exception register are set to reflect an overflow condition of a 32 bit result This can happen only when OE 1 2 3 4 1 2 Integer Compare Instructions The integer compare instructions algebraically or logically compare the contents of register rA with either the zero extended value of the UIMM operand the sign extended value of the SIMM operand or the contents of register rB The comparison is signed for the cmpi and cmp instructions and unsigned for the cmpli and cmpl instructions Table 2 22 summarizes the integer compare instructions Table 2 22 Integer Compare Instructions SSC Compare Immediate on crfD L rA SIMM Compare Logical Immediate oi crfD L rA UIMM Compare Logical cmp crfD L rA rB The crfD operand can be omitted if the result of the comparison is to be placed in CRO Otherwise the target CR field must be specified in erfD using an explicit field number For information on simplified mnemonics for the integer compare instructi
306. eptions cceccceesceceeneeceeeeeceeeeecseeeeceeeeeceeeeees 2 37 tt ON Set ONCE ai ai cata 2 37 PowerPC UISA e ee EE 2 38 Integer OSTEN E A eG 2 38 Integer Arithmetic InstructiONScvscinsianiitn cialis 2 38 Integer errereen Ebene 2 39 Integer Logical Instructions viii bette eee eae ee 2 40 Integer Rotate and Shift Instructions 2 0 0 ee ee eeeceseeeeneeeseeeeseeeeeeeenees 2 40 Floating Point Instructions icons irte gege 2 41 Floating Point Arithmetic Instructions ooonoccnnonnoccnonnninancnnncnononanncnnnos 2 42 Floating Point Multiply Add Instructions cooococnnococnonccconaccnnoncnonancnnnno 2 42 Floating Point Rounding and Conversion Instructons cesses 2 43 Floating Point Compare Instructions ec eeeeeeeceseeeeeeeesecnteenseeeenees 2 43 Floating Point Status and Control Register Instructions ee 2 44 Floating Point Move Instructions ee eeseesseceseeeseeeeneecseeeeseenseeeenees 2 44 Load and Store Instructions vai ds eo edie ee 2 45 Self Moditying Code nsiintd ia lirica 2 45 Integer Load and Store Address Generation coooconoccnoccnnonnnonncnononancnnnnoo 2 46 Register Indirect Integer Load Instructions ooooconoccnocconacnnonncnoncnannnonnno 2 46 Integer Store Inst cu a 2 47 et St re Ee 2 48 Integer Load and Store with Byte Reverse Instructions 0 0 0 0 eee 2 49 Integer Load and Store Multiple Instructions oooonnnccnoncnoonononnnannnonnno 2 49 Integer Load and Store String Instructio
307. equired to calculate effective addresses handles data alignment to and from the data cache and provides sequencing for load and store string and multiple operations Write operations to the data cache can be performed on a byte half word word or double word basis The instruction cache provides a 128 bit interface to the instruction unit so four instructions can be made available to the instruction unit in a single clock cycle The instruction unit accesses the instruction cache frequently in order to sustain the high throughput provided by the six entry instruction queue 3 1 Data Cache Organization The data cache is organized as 128 sets of eight ways as shown in Figure 3 2 Each way consists of 32 bytes two state bits and an address tag Note that in the PowerPC architecture the term cache block or simply block when used in the context of cache implementations refers to the unit of memory at which coherency is maintained For the 750 this is the eight word 32 byte cache line This value may be different for other PowerPC implementations Each cache block contains eight contiguous words from memory that are loaded from an eight word boundary that is bits A 27 31 of the logical effective addresses are zero as a result cache blocks are aligned with page boundaries Note that address bits A 20 26 provide the index to select a cache set Bits A 27 31 select a byte within a block The two state bits implement a t
308. er in the instruction unit This helps control the 750 s overall junction temperature S Supervisor The L2 cache control register L2CR is used to configure and operate the L2 cache It has bits for enabling parity checking setting the L2 to processor clock ratio and identifying the type of RAM used for the L2 cache implementation The L2 cache feature is not supported in the 740 S access to MMCRO MMCR1 PMC1 Supervisor The performance monitor counter registers PMC1 PMC4 are used to count specified PMC4 events UPMC1 UPMC4 provide user level read access to these registers SIA Supervisor The sampled instruction address register SIA holds the EA of an instruction executing at or around the time the processor signals the performance monitor interrupt condition The USIA register provides user level read access to the SIA THRM1 Supervisor THRM1 and THRM2 provide a way to compare the junction temperature against two THRM2 user provided thresholds The thermal assist unit TAU can be operated so that the thermal sensor output is compared to only one threshold selected in THRM1 or THRM2 THRM3 THRM3 is used to enable the TAU and to control the output sample time UMMCRO User The user monitor mode control registers UMMCRO UMMCR1 provide user level read UMMCR1 access to MMCRO MMCR1 UPMC1 User The user performance monitor counter registers UPMC1 UPMC4 provide user level UPMC4 read access to PMC1 PMC4 U
309. er recognizing an assertion of ARTRY and aborting the transaction in progress the 750 is not guaranteed to run the same transaction the next time it is granted the bus due to internal reordering of load and store operations If an address retry is required the ARTRY response will be asserted by a bus snooping device as early as the second cycle after the assertion of TS Once asserted ARTRY must remain asserted through the cycle after the assertion of AACK The assertion of ARTRY during the cycle after the assertion of AACK is referred to as a qualified ARTRY An earlier assertion of ARTRY during the address tenure is referred to as an early ARTRY As a bus master the 750 recognizes either an early or qualified ARTRY and prevents the data tenure associated with the retried address tenure If the data tenure has already begun the 750 aborts and terminates the data tenure immediately even if the burst data has been received If the assertion of ARTRY is received up to or on the bus cycle following the first or only assertion of TA for the data tenure the 750 ignores the first data beat and if it is a load operation does not forward data internally to the cache and execution units If ARTRY is asserted after the first or only assertion of TA improper operation of the bus interface may result During the clock of a qualified ARTRY the 750 also determines if it should negate BR and ignore BG on the followi
310. erPC 750 fall through branch instructions are removed from the instruction stream at dispatch That is these instructions are allowed to fall through the instruction queue via the dispatch mechanism without either being passed to an execution unit and or given a position in the completion queue IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Fetch Retrieving instructions from either the cache or main memory and placing them into the instruction queue Floating point register FPR Any of the 32 registers in the floating point register file These registers provide the source operands and destination results for floating point instructions Load instructions move data from memory to FPRs and store instructions move data from FPRs to memory The FPRs are 64 bits wide and store floating point values in double precision format Flush An operation that causes a modified cache block to be invalidated and the data to be written to memory Fraction In the binary representation of a floating point number the field of the significand that lies to the right of its implied binary point G General purpose register GPR Any of the 32 registers in the general purpose register file These registers provide the source operands and destination results for all integer data manipulation instructions Integer load instructions move data from memory to GPRs and store instructions move data from GPRs to memory Guarded The guarded
311. erPC 740 PowerPC 750 RISC Microprocessor User s Manual ARTRY 7 14 8 26 BG 7 4 8 10 BR 7 4 8 10 checkstop 8 42 CI 7 12 CKSTP_IN CKSTP_OUT 7 22 CLK_OUT 7 29 configuration 7 3 COP scan interface 8 44 data arbitration 8 10 8 23 data transfer termination 8 26 DBB 7 16 8 10 8 24 DBDIS 7 19 DBG 7 15 8 10 DBWO 7 16 8 10 8 25 8 45 DHn DLn 7 17 DPn 7 18 DRTRY 7 20 8 26 8 29 GBL 7 13 HRESET 7 23 INT 7 21 8 42 L2 cache interface signals 7 25 L2ADDRn 7 25 L2CE 7 26 L2CLK_OUTA 7 27 L2CLK_OUTB 7 27 L2DATAN 7 25 L2DP 7 26 L2SYNC_IN 7 28 L2SYNC_OUT 7 27 L2WE 7 27 L2ZZ 7 28 MCP 7 21 PLL_CFGn 7 30 power and ground signals 7 30 QACK 7 24 QREO 7 23 8 43 reset 8 43 RSRV 7 24 8 43 SMI 4 25 7 21 SRESET 7 23 8 43 system quiesce control 8 43 TA 7 19 TBEN 7 24 TBST 7 12 8 16 8 25 Index TEA 7 20 8 26 8 30 TLBISYNC 7 25 transfer encoding 7 9 TS 7 6 TSIZn 7 11 8 15 TTn 7 8 8 15 WT 7 13 Single beat transfer reads with data delays timing 8 35 reads timing 8 33 termination 8 26 writes timing 8 34 SLB management instructions A 28 SMI system management interrupt signal 4 25 7 21 Snooping 3 25 Split bus transaction 8 11 SPRGn registers 2 6 SRESET soft reset signal 7 23 8 43 SRRO SRR 1 status save restore registers description 2 6 exception processing 4 7 Stage definition 6 2 Stall definition 6 3 Static branch prediction 6 9 6 22 stwex 4 12 Superscalar
312. erPC 740 PowerPC 750 Overview 1 5 Caches can be locked in software Data cache coherency MED maintained in hardware The critical double word is made available to the requesting unit when it is burst into the line fill buffer The cache is nonblocking so it can be accessed during this operation e Level 2 L2 cache interface The L2 cache interface is not supported in the 740 On chip two way set associative L2 cache controller and tags External data SRAMs Support for 256 Kbyte 512 Kbyte and 1 Mbyte L2 caches 64 byte 256 Kbyte 512 Kbyte and 128 byte 1 Mbyte sectored line size Supports flow through register buffer pipelined register register and pipelined late write register register synchronous burst SRAMs e Separate memory management units MMUs for instructions and data 52 bit virtual address 32 bit physical address Address translation for 4 Kbyte pages variable sized blocks and 256 Mbyte segments Memory programmable as write back write through cacheable noncacheable and coherency enforced coherency not enforced on a page or block basis Separate IBATs and DBATs four each also defined as SPRs Separate instruction and data translation lookaside buffers TLBs Both TLBs are 128 entry two way set associative and use LRU replacement algorithm TLBs are hardware reloadable that is the page table search is performed in hardware e Separate bus int
313. eration Chapter 1 PowerPC 740 PowerPC 750 Overview 1 19 e Exception model Section 1 7 Exception Model describes the exception model of the PowerPC operating environment architecture and the differences in the 750 exception model The information in this section is described more fully in Chapter 4 Exceptions e Memory management Section 1 8 Memory Management describes generally the conventions for memory management among the PowerPC processors This section also describes the 750 s implementation of the 32 bit PowerPC memory management specification The information in this section is described more fully in Chapter 5 Memory Management e Instruction timing Section 1 9 Instruction Timing provides a general description of the instruction timing provided by the superscalar parallel execution supported by the PowerPC architecture and the 750 The information in this section is described more fully in Chapter 6 Instruction Timing e Power management Section 1 10 Power Management describes how the power management can be used to reduce power consumption when the processor or portions of it are idle The information in this section is described more fully in Chapter 10 Power and Thermal Management e Thermal management Section 1 11 Thermal Management describes how the thermal management unit and its associated registers THRM1 THRM3 and exception can be used to
314. eration that is the prefetch buffer is not purged and does not cause these instructions to be refetched 5 4 4 Page Address Translation Summary Figure 5 8 provides the detailed flow for the page address translation mechanism The figure includes the checking of the N bit in the segment descriptor and then expands on the TLB Hit branch of Figure 5 6 The detailed flow for the TLB Miss branch of Figure 5 6 is described in Section 5 4 5 Page Table Search Operation Note that as in the case of block address translation if an attempt is made to execute a dcbz instruction to a page marked either write through or caching inhibited W 1 or I 1 an alignment exception is generated The checking of memory protection violation conditions is described in Chapter 7 Memory Management in The Programming Environments Manual 5 28 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Effective Address Generated See Figure 5 6 Otherwise Instruction Fetch with N Bit Set in Segment Descriptor Page Address No Execute Translation Generate 52 Bit Virtual Address from Segment Descriptor Compare Virtual Address with TLB Entries TLB Hit Case O dcbz Instruction with W or 1 Otherwise Alignment Exception Check Page Memory Protection Violation Conditions See The Programming Environments Manual Access Permitted Access Prohibited Se See The Programming Env
315. eration These transactions are snooped regardless of whether GBL is asserted to support reservations in the MEI cache protocol The state of ABB is not sampled to determine a qualified snoop condition All transactions snooped by the 750 are checked for correct address bus parity Every assertion of TS detected by the 750 whether snooped or not must be followed by an accompanying assertion of AACK Chapter 3 Instruction and Data Cache Operation 3 25 Once a qualified snoop condition is detected on the bus the snooped address associated with TS is compared against the data cache tags memory queues and or other storage elements as appropriate The L1 data cache tags and L2 cache tags are snooped for standard data cache coherency support No snooping is done in the instruction cache for coherency The memory queues are snooped for pipeline collisions and memory coherency collisions A pipeline collision is detected when another bus master addresses any portion of a line that this 750 s data cache is currently in the process of loading L1 loading from L2 or L1 L2 loading from memory A memory coherency collision occurs when another bus master addresses any portion of a line that the 750 has currently queued to write to memory from the data cache castout or copy back but has not yet been granted bus access to perform If a snooped transaction results in a cache hit or pipeline collision or memory queue collision the 750 asserts ARTRY
316. erations to Update History Bits TLB Hit Case R and C bits Combination doesn t occur Combination doesn t occur 10 Read No special action Write The 750 initiates a table search operation to update C No special action for read or write The table shows that the status of the C bit in the TLB entry in the case of a TLB hit is what causes the processor to update the C bit in the PTE the R bit is assumed to be set in the page tables if there is a TLB hit Therefore when software clears the R and C bits in the page tables in memory it must invalidate the TLB entries associated with the pages whose referenced and changed bits were cleared The debt and debtst instructions can execute if there is a TLB BAT hit or if the processor is in real addressing mode In case of a TLB or BAT miss these instructions are treated as no ops they do not initiate a table search operation and they do not set either the R or C bits As defined by the PowerPC architecture the referenced and changed bits are updated as if address translation were disabled real addressing mode If these update accesses hit in the data cache they are not seen on the external bus If they miss in the data cache they are performed as typical cache line fill accesses on bus assuming the data cache is enabled 5 4 1 1 Referenced Bit The referenced R bit of a page is located in the PTE in the page table Every time a page is referenced with a read or write a
317. erency between these caches and the tables in memory whenever the tables in memory are modified When the tables in memory are changed the operating system purges these caches of the corresponding entries allowing the translation caching mechanism to refetch from the tables when the corresponding entries are required Note that the 750 implements all TLB related instructions except tlbia which is treated as an illegal instruction Because the MMU specification for PowerPC processors is so flexible it is recommended that the software that uses these instructions and registers be encapsulated into subroutines to minimize the impact of migrating across the family of implementations 5 18 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 5 5 summarizes 750 instructions that specifically control the MMU For more detailed information about the instructions refer to Chapter 2 Programming Model in this book and Chapter 8 Instruction Set in The Programming Environments Manual Table 5 5 PowerPC 750 Microprocessor Instruction Summary Control MMUs A AAA O mtsr SR rS Move to Segment Register SR SR lt rS misrin rS rB Move to Segment Register Indirect SR rB 0 3 rS mfsr rD SR Move from Segment Register rD lt SR SR Move from Segment Register Indirect rD lt SR rB 0 3 tlbie rB TLB Invalidate Entry For effective address specified by rB TLB V lt 0 The tlbie instruction invalidates a
318. erface These ground signals are isolated from the GND and OGND ground signals These signals are not implemented on the 740 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 8 Bus Interface Operation This chapter describes the PowerPC 750 microprocessor bus interface and its operation It shows how the 750 signals defined in Chapter 7 Signal Descriptions interact to perform address and data transfers The bus interface buffers bus requests from the instruction and data caches and executes the requests per the 60x bus protocol It includes address register queues prioritizing logic and bus control logic It captures snoop addresses for snooping in the cache and in the address register queues It also snoops for reservations and holds the touch load address for the cache All data storage for the address register buffers load and store data buffers are located in the cache section The data buffers are considered temporary storage for the cache and not part of the bus interface The general functions and features of the bus interface are as follows e Seven address register buffers that include the following Instruction cache load address buffer Data cache load address buffer Two data cache castout store address buffers associated data block buffers located in cache Data cache snoop copy back address buffer associated data block buffer located in cache Reservation address buffer fo
319. erface units for system memory and for the L2 cache Bus interface features include the following Selectable bus to core clock frequency ratios of 2x 2 5x 3x 3 5x 4x 4 5x 8x 2x to 8x all half clock multipliers in between A 64 bit split transaction external data bus with burst transfers Support for address pipelining and limited out of order bus transactions Single entry load queue Single entry instruction fetch queue Two entry L1 cache castout queue No DRTRY mode eliminates the DRTRY signal from the qualified bus grant This allows the forwarding of data during load operations to the internal core one bus cycle sooner than if the use of DRTRY is enabled IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual L2 cache interface features which are not implemented on the 740 include the following Core to L2 frequency divisors of 1 1 5 2 2 5 and 3 Four entry L2 cache castout queue in L2 cache BIU 17 bit address bus 64 bit data bus e Multiprocessing support features include the following Hardware enforced three state cache coherency protocol MEI for data cache Load store with reservation instruction pair for atomic memory references semaphores and other multiprocessor operations e Power and thermal management Three static modes doze nap and sleep progressively reduce power dissipation Doze All the functional units are disabled except for the time
320. errupt exception is taken Table 4 16 Thermal Management Interrupt Exception Register Settings Setting Description SRRO Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present SRR1 0 Loaded with equivalent MSR bits 1 4 Cleared 5 9 Loaded with equivalent MSR bits 10 15 Cleared 16 31 Loaded with equivalent MSR bits MSR et to value of ILE The thermal management interrupt is similar to the system management and external interrupts The 750 requires the next instruction in program order to complete or take an exception blocks completion of any following instructions and allows the completed store queue to drain Any exceptions encountered in this process are taken first and the thermal management interrupt exception is delayed until a recoverable halt is achieved at which point the 750 saves the machine state as shown in Table 4 16 When a thermal management interrupt exception is taken instruction fetching resumes as offset 0x01700 from the base address indicated by MSR IP Chapter 10 gives details about thermal management 4 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 5 Memory Management This chapter describes the PowerPC 750 microprocessor s implementation of the memory management unit MMU specifications provided by the operating environment architecture OEA for PowerPC processors The primary fun
321. ers TLBs The 128 entry two way set associative ITLBs and DTLBs keep recently used page address translations on chip e Table search operations performed in hardware The 52 bit virtual address is formed and the MMU attempts to fetch the PTE which contains the physical address from the appropriate TLB on chip If the translation is not found in a TLB that is a TLB miss occurs the hardware performs a table search operation using a hashing function to search for the PTE e TLB invalidation The 750 implements the optional TLB Invalidate Entry tlbie and TLB Synchronize tlbsync instructions which can be used to invalidate TLB entries For more information on the tlbie and tlbsync instructions see Section 5 4 3 2 TLB Invalidation 5 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 5 1 summarizes the 750 MMU features including those defined by the PowerPC architecture OEA for 32 bit processors and those specific to the 750 Table 5 1 MMU Feature Summary Architecturally Defined Address ranges Architecturally defined 232 bytes of effective address 252 bytes of virtual address 292 bytes of physical address Page size Architecturally defined 4 Kbytes Segment size Architecturally defined 256 Mbytes Block address Architecturally defined Range of 128 Kbyte 256 Mbyte sizes translation 3 Implemented with IBAT and DBAT registers in BAT array Memory protection Architecturally defined Segments select
322. ervisor only Supervisor only no execute Supervisor write only E j Supervisor write only no execute Both user supervisor EA Both user supervisor no execute Both user supervisor eae el read only Both user supervisor read only no execute Access pemitted Protection violation The no execute option provided in the segment register lets the operating system program determine whether instructions can be fetched from an area of memory The remaining options are enforced based on a combination of information in the segment descriptor and the page table entry Thus the supervisor only option allows only read and write operations generated while the processor is operating in supervisor mode MSR PR 0 to access the page User accesses that map into a supervisor only page cause an exception Finally a facility in the VEA and OEA allows pages or blocks to be designated as guarded preventing out of order accesses that may cause undesired side effects For example areas of the memory map used to control I O devices can be marked as guarded so accesses do not occur unless they are explicitly required by the program For more information on memory protection see Memory Protection Facilities in Chapter 7 Memory Management in the The Programming Environments Manual Chapter 5 Memory Management 5 11 5 1 5 Page History Information The MMUs of PowerPC processors also define referenced R
323. ess Breakpoint Register Register Breakpoint Register DABR SPR 1013 L2CR SPR 1017 IABR SPR 1010 Performance Monitor Registers Pertormance Sampled Power Thermal Management Registers Counters Instruction Thermal Assist Instruction Cache Address Unit Registers Throttling Control PMC1 SPR 953 Register Gren SPR 954 SIA SPR 955 THRM1 SPR 1020 95 IcTC SPR 1019 PMC3 SPR 957 Monitor Control THRM2 SPR 1021 PMC4 SPR 958 MMCRO SPR 952 THRM3 SERRE MMCRI SPR 956 These registers are 750 specific registers They may not be supported by other PowerPC processors Not supported by the 740 Figure 1 5 PowerPC 750 Microprocessor Programming Model Registers Chapter 1 PowerPC 740 PowerPC 750 Overview 1 23 The following tables summarize the PowerPC registers implemented in the 750 Table 1 1 describes registers excluding SPRs defined by the architecture Table 1 1 Architecture Defined Registers Excluding SPRs el IE User The condition register CR consists of eight four bit fields that reflect the results of certain operations such as move integer and floating point compare arithmetic and logical instructions and provide a mechanism for testing and branching FPRs User The 32 floating point registers FPRs serve as the data source or destination for floating point instructions These 64 bit re
324. ess is determined to be to the direct store interface space the 750 takes a DSI exception if it is a data access see Section 4 5 3 DSI Exception 0x00300 and takes an ISI exception if it is an instruction access see Section 4 5 4 ISI Exception 0x00400 For memory accesses translated by a segment descriptor the interim virtual address is generated using the information in the segment descriptor Page address translation corresponds to the conversion of this virtual address into the 32 bit physical address used by the memory subsystem In most cases the physical address for the page resides in an on chip TLB and is available for quick access However if the page address translation misses in the on chip TLB the MMU causes a search of the page tables in memory using the virtual address information and a hashing function to locate the required physical address Because blocks are larger than pages there are fewer upper order effective address bits to be translated into physical address bits more low order address bits at least 17 are untranslated to form the offset into a block for block address translation Also instead of segment descriptors and a TLB block address translations use the on chip BAT registers as a BAT array If an effective address matches the corresponding field of a BAT register the information in the BAT register is used to generate the physical address in this case the results of the page translation
325. ess uninterrupted by any other access to that address the term refers to the fact that the transactions are indivisible The PowerPC architecture implements atomic accesses through the Iwarx stwex instruction pair B BAT block address translation mechanism A software controlled array that stores the available block address translations on chip Biased exponent An exponent whose range of values is shifted by a constant bias Typically a bias is provided to allow a range of positive values to express a range that includes both positive and negative values Big endian A byte ordering method in memory where the address n of a word corresponds to the most significant byte In an addressed memory word the bytes are ordered left to right 0 1 2 3 with 0 being the most significant byte See Little endian Block Memory An area of memory that ranges from 128 Kbyte to 256 Mbyte whose size translation and protection attributes are controlled by the BAT mechanism see Cache Block Glossary of Terms and Abbreviations Glossary 1 Glossary 2 Boundedly undefined A characteristic of certain operation results that are not rigidly prescribed by the PowerPC architecture Boundedly undefined results for a given operation may vary among implementations and between execution attempts in the same implementation Although the architecture does not prescribe the exact behavior for when results are allowed to be boundedly undefined the re
326. et all the valid bits of the blocks are cleared and the PLRU bits cleared to point to block LO of each set Note that this is also the state of the data or instruction cache after setting their respective flash invalidate bit HIDO DCFI or HIDO ICFI 3 5 2 Cache Flush Operations The instruction cache can be invalidated by executing a series of icbi instructions or by setting HIDO ICFI The data cache can be invalidated by executing a series of debi instructions or by setting HIDO DCFI Any modified entries in the data cache can be copied back to memory flushed by using the dcbf instruction or by executing a series of 12 uniquely addressed load or debz instructions to each of the 128 sets The address space should not be shared with any other process to prevent snoop hit invalidations during the flushing routine Exceptions should be disabled during this time so that the PLRU algorithm does not get disturbed The data cache flush assist bit HIDO DCFA simplifies the software flushing process When set HIDO DCFA forces the PLRU replacement algorithm to ignore the invalid entries and follow the replacement sequence defined by the PLRU bits This reduces the series of uniquely addressed load or debz instructions to eight per set HIDO DCFA should be set just prior to the beginning of the cache flush routine and cleared after the series of instructions is complete 3 5 3 Data Cache Block Fill Operations The 750 s data cache blocks are
327. eters in HIDO describes the HIDO cache control bits and Section 3 4 2 Cache Control Instructions describes the cache control instructions 3 4 1 Cache Control Parameters in HIDO The HIDO special purpose register contains several bits that invalidate disable and lock the instruction and data caches The following sections describe these facilities 3 4 1 1 Data Cache Flash Invalidation The data cache is automatically invalidated when the 750 is powered up and during a hard reset However a soft reset does not automatically invalidate the data cache Software must use the HIDO data cache flash invalidate bit HIDO DCFI if data cache invalidation is desired after a soft reset Once HIDO DCFI is set through an mtspr operation the 750 automatically clears this bit in the next clock cycle provided that the data cache is enabled in the HIDO register Note that some PowerPC microprocessors accomplish data cache flash invalidation by setting and clearing HIDO DCFI with two consecutive mtspr instructions that is the bit is not automatically cleared by the microprocessor Software that has this sequence of operations does not need to be changed to run on the 750 3 4 1 2 Data Cache Enabling Disabling The data cache may be enabled or disabled by using the data cache enable bit HIDO DCE HIDO DCE is cleared on power up disabling the data cache When the data cache is in the disabled state HIDO DCE 0 the cache tag state
328. ext requesting master before the current data bus tenure has completed Two address tenures can occur before the current data bus tenure completes The 750 can pipeline its own transactions to a depth of one level intraprocessor pipelining however the 750 bus protocol does not constrain the maximum number of levels of pipelining that can occur on the bus between multiple masters interprocessor pipelining The external arbiter must control the pipeline depth and synchronization between masters and slaves In a pipelined implementation data bus tenures are kept in strict order with respect to address tenures However external hardware can further decouple the address and data buses allowing the data tenures to occur out of order with respect to the address tenures This requires some form of system tag to associate the out of order data transaction with the proper originating address transaction not defined for the 750 interface Individual bus requests and data bus grants from each processor can be used by the system to implement tags to support interprocessor out of order transactions The 750 supports a limited intraprocessor out of order split transaction capability via the data bus write only DBWO signal For more information about using DBWO see Section 8 10 Using Data Bus Write Only Note that the 750 drops out of pipeline mode between consecutive burst data reads and between consecutive burst instruction fetches
329. extend bus mastership for write operations 8 28 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Figure 8 13 Termination with DRTRY Figure 8 14 shows the effect of using DRTRY during a burst read It also shows the effect of using TA to pace the data transfer rate Notice that in bus clock cycle 3 of Figure 8 14 TA is negated for the second data beat The 750 data pipeline does not proceed until bus clock cycle 4 when the TA is reasserted Figure 8 14 Read Burst with TA Wait States and DRTRY Note that DRTRY is useful for systems that implement predicted forwarding of data such as those with direct mapped third level caches where hit miss is determined on the following bus clock cycle or for parity or ECC checked memory systems Note that DRTRY may not be implemented on other PowerPC processors Chapter 8 Bus Interface Operation 8 29 8 4 4 2 Data Transfer Termination Due to a Bus Error The TEA signal indicates that a bus error occurred It may be asserted while DBB and or DRTRY for read operations is asserted Asserting TEA to the 750 terminates the transaction that is further assertions of TA and DRTRY are ignored and DBB is negated Assertion of the TEA signal causes a machine check exception and possibly a checkstop condition within the 750 For more information see Section The hard reset exception is a nonrecoverable nonmaskable asynchronous exception When HRESET is asserted or at power
330. f changing the effective or physical addresses from which the current instruction stream is being fetched This kind of side effect is defined as an implicit branch Implicit branches are not supported and an attempt to perform one causes boundedly undefined results Therefore PTEs must not be changed in a manner that causes an implicit branch Chapter 2 PowerPC Register Set in The Programming Environments Manual lists the possible implicit branch conditions that can occur when system registers and MSR bits are changed 5 4 7 Segment Register Updates Synchronization requirements for using the move to segment register instructions are described in Synchronization Requirements for Special Registers and for Lookaside Buffers in Chapter 2 PowerPC Register Set in The Programming Environments Manual 5 34 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 6 Instruction Timing This chapter describes how the PowerPC 750 microprocessor fetches dispatches and executes instructions and how it reports the results of instruction execution It gives detailed descriptions of how the 750 execution units work and how those units interact with other parts of the processor such as the instruction fetching mechanism register files and caches It gives examples of instruction sequences showing potential bottlenecks and how to minimize their effects Finally it includes tables that identify the unit that executes
331. f this section Conventions This document uses the following notational conventions mnemonics Instruction mnemonics are shown in lowercase bold italics Italics indicate variable command parameters for example bectrx Book titles in text are set in italics 0x0 Prefix to denote hexadecimal number Ob Prefix to denote binary number rA rB Instruction syntax used to identify a source GPR rD Instruction syntax used to identify a destination GPR frA frB frC Instruction syntax used to identify a source FPR frD Instruction syntax used to identify a destination FPR REG FIELD Abbreviations or acronyms for registers are shown in uppercase text Specific bits fields or ranges appear in brackets For example MSR LE refers to the little endian mode enable bit in the machine state register x In certain contexts such as a signal encoding this indicates a don t care n Used to express an undefined numerical value NOT logical operator AND logical operator OR logical operator 0000 Indicates reserved bits or bit fields in a register Although these bits may be written to as either ones or zeros they are always read as Zeros Acronyms and Abbreviations Table i contains acronyms and abbreviations that are used in this document Table i Acronyms and Abbreviated Terms DON IN Block address translation BIST Built in self test Branch history table xxxii IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Man
332. fadd 11 add 12 add T3 add 14 add 1 5 fadd 16 add 17 fadd 18 19 20 Chapter 6 Instruction Timing 6 11 Fetch in IQ In dispatch entry IQ0 IQ1 Execute Complete In CQ In retirement entry CQ0 CQ1 5 fsub l 6 fadd l l 7 fadd 8 add 9 add 10 add 11 add Ba en EE 14 fadd len Instruction Queue 12 11 11 3 5 10 10 12 14 16 2 4 9 9 11 13 15 1 3 8 8 10 12 14 0 2 7 7 9 11 13 Completion Queue 12 12 14 10 11 11 13 3 6 6 8 9 10 10 12 14 2 3 3 7 8 9 9 11 13 1 1 2 2 6 7 8 8 10 12 14 0 0 1 1 3 6 7 7 9 11 13 Figure 6 5 Instruction Timing Cache Hit The instruction timing for this example is described cycle by cycle as follows 0 Incycle 0 instructions 0 3 are fetched from the instruction cache Instructions 0 and 1 are placed in the two entries in the instruction queue from which they can be dispatched on the next clock cycle 6 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual In cycle 1 instructions O and 1 are dispatched to the U2 and FPU respectively Notice that for instructions to be dispatched they must be assigned positions in the completion queue In this case since the completion queue was empty instructions O and 1 take the two lowest entries in the
333. fect instruction forwarding immediately The ICTC register can be accessed with the mtspr and mfspr instructions using SPR 1019 2 1 4 Thermal Management Registers THRM1 THRM3 The on chip thermal management assist unit provides the following functions e Compares the junction temperature against user programmed thresholds e Generates a thermal management interrupt if the temperature crosses the threshold e Provides a way for a successive approximation routine to estimate junction temperature Chapter 2 Programming Model 2 21 Control and access to the thermal management assist unit is through the privileged mtspr mfspr instructions to the three THRM registers THRM1 and THRM2 shown in Figure 2 10 provide the ability to compare the junction temperature against two user provided thresholds Having dual thresholds allows thermal management software differing degrees of action in reducing junction temperature Thermal management can use a single threshold mode in which the thermal sensor output is compared to only one threshold in either THRM1 or THRM2 Reserved 0 1 2 8 9 28 29 30 31 Figure 2 10 Thermal Management Registers 1 2 THRM1 THRM2 The bits in THRM1 and THRM2 are described in Table 2 15 Table 2 15 THRM1 THRM2 Bit Settings CI o e Thermal management interrupt bit Read only This bit is set if the thermal sensor output crosses the threshold specified in the SPR The state of TIN is valid only if TIV is set
334. fer signals interact see Section 8 4 3 Data Transfer 7 2 7 1 Data Bus DH 0 31 DL 0 31 The data bus DH O 3 1 and DL O 31 consists of 64 signals that are both inputs and outputs on the 750 Following are the state meaning and timing comments for the DH and DL signals State Meaning The data bus has two halves data bus high DH and data bus low DL See Table 7 4 for the data bus lane assignments Timing Comments The data bus is driven once for noncached transactions and four times for cache transactions bursts Table 7 4 Data Bus Lane Assignments ECC EC ECC ICI 7 2 7 1 1 Data Bus DH 0 31 DL 0 31 Output Following are the state meaning and timing comments for the DH and DL output signals State Meaning Asserted Negated Represents the state of data during a data write Byte lanes not selected for data transfer will not supply valid data Timing Comments Assertion Negation Initial beat coincides with DBB and for bursts transitions on the bus clock cycle following each assertion of TA High Impedance Occurs on the bus clock cycle after the final assertion of TA following the assertion of TEA or in certain ARTRY cases Chapter 7 Signal Descriptions 7 17 7 2 7 1 2 Data Bus DH 0 31 DL 0 31 Input Following are the state meaning and timing comments for the DH and DL input signals State Meaning Asserted Negated Represents the state of data during a data read transaction Timi
335. fers TLBs are implemented on the 750 to keep recently used page address translations on chip Although the PowerPC OEA describes one MMU conceptually the 750 hardware maintains separate TLBs and table search resources for instruction and data accesses that can be performed independently and simultaneously Therefore the 750 is described as having two MMUs one for instruction accesses IMMU and one for data accesses DMMU The block address translation BAT mechanism is a software controlled array that stores the available block address translations on chip BAT array entries are implemented as pairs of BAT registers that are accessible as supervisor special purpose registers SPRs There are separate instruction and data BAT mechanisms and in the 750 they reside in the instruction and data MMUs respectively Chapter 5 Memory Management 5 1 The MMUs together with the exception processing mechanism provide the necessary support for the operating system to implement a paged virtual memory environment and for enforcing protection of designated memory areas Exception processing is described in Chapter 4 Exceptions Section 4 3 Exception Processing describes the MSR which controls some of the critical functionality of the MMUs 5 1 MMU Overview The 750 implements the memory management specification of the PowerPC OEA for 32 bit implementations Thus it provides 4 Gbytes of effective address space accessible to superv
336. from one stage to the next When it does the stage becomes available for the next instruction Although an individual instruction may take many cycles to complete the number of cycles is called instruction latency pipelining makes it possible to overlap the processing so that the throughput number of instructions completed per cycle is greater than if pipelining were not implemented Program order The order of instructions in an executing program More specifically this term is used to refer to the original order in which program instructions are fetched into the instruction queue from the cache Rename register Temporary buffers used by instructions that have finished execution but have not completed Reservation station A buffer between the dispatch and execute stages that allows instructions to be dispatched even though the results of instructions on which the dispatched instruction may depend are not available Retirement Removal of the completed instruction from the completion queue Stage The term stage is used in two different senses depending on whether the pipeline is being discussed as a physical entity or a sequence of events In the latter case a stage is an element in the pipeline during which certain actions are performed such as decoding the instruction performing an arithmetic operation or writing back the results A stage is typically described as taking a processor clock cycle to perform its operati
337. g Point Status and Control Register Instructions el tos am Move from FPSCR mtfs mffs mp Move to Condition Register from FPSCR morts crfD crfS IC CCC ECC CAC Implementation Note The PowerPC architecture states that in some implementations the Move to FPSCR Fields mtfsf instruction may perform more slowly when only some of the fields are updated as opposed to all of the fields In the 750 there is no degradation of performance 2 3 4 2 6 Floating Point Move Instructions Floating point move instructions copy data from one FPR to another The floating point move instructions do not modify the FPSCR The CR update option in these instructions controls the placing of result status into CR1 Table 2 31 summarizes the floating point move instructions Table 2 31 Floating Point Move Instructions Floating Move Register Floating Negate fneg fneg Floating Absolute Value fabs fabs Floating Negative Absolute Value fnabs fnabs 2 44 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 2 3 4 3 Load and Store Instructions Load and store instructions are issued and translated in program order however the accesses can occur out of order Synchronizing instructions are provided to enforce strict ordering This section describes the load and store instructions which consist of the following e Integer load instructions e Integer store instructions e Integer load and store with byte reverse instructions
338. g edge of the bus clock cycle see the 750 hardware specifications for exact timing information 8 1 4 Optional 32 Bit Data Bus Mode The 750 supports an optional 32 bit data bus mode The 32 bit data bus mode operates the same as the 64 bit data bus mode with the exception of the byte lanes involved in the transfer and the number of data beats that are performed The number of data beats required for a data tenure in the 32 bit data bus mode is one two or eight beats depending on the size of the program transaction and the cache mode for the address For additional information about 32 bit data bus mode see Section 8 6 1 32 Bit Data Bus Mode 8 1 5 Direct Store Accesses The 750 does not support the extended transfer protocol for accesses to the direct store storage space The transfer protocol used for any given access is selected by the T bit in the MMU segment registers if the T bit is set the memory access is a direct store access An attempt to access instructions or data in a direct store segment will result in the 750 taking an ISI or DSI exception Chapter 8 Bus Interface Operation 8 7 Bar over signal name indicates active low ap 750 input while 750 is a bus master BR 750 output while 750 is a bus master ADDR 750 output grouped here address plus attributes qual BG 750 internal signal inaccessible to the user but used in diagrams to clarify operations Compelling dependency event will occur on the next cl
339. g in real address mode if its ability to perform address translation has been disabled through the MSR registers IR and or DR bits Record bit Bit 31 or the Rc bit in the instruction encoding When it is set updates the condition register CR to reflect the result of the operation Referenced bit One of two page history bits found in each page table entry PTE The processor sets the referenced bit whenever the page is accessed for a read or write See also Page access history bits Register indirect addressing A form of addressing that specifies one GPR that contains the address for the load or store Register indirect with immediate index addressing A form of addressing that specifies an immediate value to be added to the contents of a specified GPR to form the target address for the load or store Register indirect with index addressing A form of addressing that specifies that the contents of two GPRs be added together to yield the target address for the load or store Reservation The processor establishes a reservation on a cache block of memory space when it executes an lwarx instruction to read a memory semaphore into a GPR IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual RISC reduced instruction set computing An architecture characterized by fixed length instructions with nonoverlapping functionality and by a separate set of load and store instructions that perform memory accesses S Secondary c
340. g two steps 1 from effective address to the virtual address which never exists as a specific entity but can be considered to be the concatenation of the virtual page number and the byte offset within a page and 2 from virtual address to physical address This section highlights those areas of the memory segment model defined by the OEA that are specific to the 750 5 4 1 Page History Recording Referenced R and changed C bits in each PTE keep history information about the page They are maintained by a combination of the 750 table search hardware and the system software The operating system uses this information to determine which areas of memory to write back to disk when new pages must be allocated in main memory Referenced and Chapter 5 Memory Management 5 21 changed recording is performed only for accesses made with page address translation and not for translations made with the BAT mechanism or for accesses that correspond to direct store T 1 segments Furthermore R and C bits are maintained only for accesses made while address translation is enabled MSR IR 1 or MSR DR 1 In the 750 the referenced and changed bits are updated as follows e For TLB hits the C bit is updated according to Table 5 7 e For TLB misses when a table search operation is in progress to locate a PTE The R and C bits are updated set if required to reflect the status of the page based on this access Table 5 7 Table Search Op
341. gated when they are high Signals that are not active low such as AP O 3 address bus parity signals and TT O 4 transfer type signals are referred to as asserted when they are high and negated when they are low Chapter 1 PowerPC 740 PowerPC 750 Overview 1 17 1 2 8 Signal Configuration Figure 1 4 shows the 750 s logical pin configuration The signals are grouped by function L2Vpp L2AVop Not supported in the PowerPC 740 hag L2ADDR 16 0 gt Address L2DATA O 63 p F Ei Arbitratio a ress all da L2DP 0 7 So Data Address E L2CE gt Start L2WE es L2CLK_OUT A B L2 Cache L2SYNC_OUT oe Clock le ES L2SYNC_IN Control sus L2ZZ lt lt Transfer ge GE P Interrupts lt HRESET Resets 7 CKSTP_IN CKSTP_OUT at gt Ht Address Termination lt RSRV E TBEN ire TLBISYNG Processor SE Status Arbit oa QRE gt Control rbitration La ACK Data e SYSCLK Transfer lt PLL_CFG 0 3 Clock CLK OUT Control ES gt Le JTAG COP Se ee Data Factory Test Termination gt Interface Von Von VO AVpp Figure 1 4 PowerPC 750 Microprocessor Signal Groups Signal functionality is described in detail in Chapter 7 Signal Descriptions and Chapter 8 Bus Interface Operation 1 18 IBM PowerPC 740 PowerPC
342. ge of processor to bus clock ratios 1 2 7 Signals The 750 s signals are grouped as follows Address arbitration signals The 750 uses these signals to arbitrate for address bus mastership Address start signals These signals indicate that a bus master has begun a transaction on the address bus Address transfer signals These signals include the address bus and address parity signals They are used to transfer the address and to ensure the integrity of the transfer Transfer attribute signals These signals provide information about the type of transfer such as the transfer size and whether the transaction is bursted write through or caching inhibited Address termination signals These signals are used to acknowledge the end of the address phase of the transaction They also indicate whether a condition exists that requires the address phase to be repeated Data arbitration signals The 750 uses these signals to arbitrate for data bus mastership Data transfer signals These signals which consist of the data bus and data parity signals are used to transfer the data and to ensure the integrity of the transfer IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Data termination signals Data termination signals are required after each data beat in a data transfer In a single beat transaction a data termination signal also indicates the end of the tenure in burst accesses data termination signals
343. gecteeg Je Oem EES EC eas We Zao wh tee wane o 2 3 4 3 4 Integer Store Instructions For integer store instructions the contents of rS are stored into the byte half word word or double word in memory addressed by the EA effective address Many store instructions have an update form in which rA is updated with the EA For these forms the following rules apply e IfrA 0 the efective address is placed into rA e IfrS rA the contents of register rS are copied to the target memory element then the generated EA is placed into rA rS Chapter 2 Programming Model 2 47 The PowerPC architecture defines store with update instructions with rA 0 as an invalid form In addition it defines integer store instructions with the CR update option enabled Re field bit 31 in the instruction encoding 1 to be an invalid form Table 2 33 summarizes the integer store instructions Table 2 33 Integer Store Instructions Store Byte Store Byte Indexed Store Byte with Update Store Byte with Update Indexed Store Half Word Store Half Word Indexed Store Half Word with Update Store Half Word with Update Indexed Store Word Store Word Indexed Store Word with Update Store Word with Update Indexed rS rA rB 2 3 4 3 5 Integer Store Gathering The 750 performs store gathering for write through accesses to nonguarded space or to cache inhibited stores to nonguarded space if the stores are 4 bytes and they are word aligned These store
344. gement interrupt TIN state is valid in acl Threshold value that the output of the thermal sensor is compared to The threshold range is between 0 and 127 C and each bit represents 1 C Note that this is not the resolution of the thermal sensor Za Reserved System software should clear these bits to 0 Thermal management interrupt direction bit Selects the result of the temperature comparison to set TIN If TID is cleared to 0 TIN is set and an interrupt occurs if the junction temperature exceeds the threshold If TID is set to 1 TIN is set and an interrupt is indicated if the junction temperature is below the threshold Thermal management interrupt enable Enables assertion of the thermal management interrupt signal The thermal management interrupt is maskable by the MSR EE bit If TIE is cleared to 0 and THRMn is valid the TIN bit records the status of the junction temperature vs threshold comparison without asserting an interrupt signal This feature allows system software to make a successive approximation to estimate the junction temperature SPR valid bit This bit is set to indicate that the SPR contains a valid threshold TID and TIE controls bits Setting THRM1 2 V and THRM3 E to 1 enables operation of the thermal sensor The bit fields in the THRM3 SPR are described in Table 10 3 Table 10 3 THRM3 Bit Field Settings Description o7 Reserved for future use System software should clear these bits to 0 18 3
345. gement reducing the power to the execution units while waiting for instructions to be forwarded from the instruction cache thus instruction cache throttling 10 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual does not provide thermal reduction unless HIDO DPM is set to 1 Note that during instruction cache throttling the configuration of the PLL and DLL remain unchanged The bit field settings of the ICTC SPR are shown in Table 10 5 Table 10 5 ICTC Bit Field Settings CIC Instruction forwarding interval expressed in processor clocks 0x00 0 clock cycle 0x01 1 clock cycle OXFF 255 clock cycles Cache throttling enable O Disable instruction cache throttling 1 Enable instruction cache throttling Chapter 10 Power and Thermal Management 10 11 10 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 11 Performance Monitor The performance monitor facility provides the ability to monitor and count predefined events such as processor clocks misses in the instruction cache data cache or L2 cache types of instructions dispatched mispredicted branches and other occurrences The count of such events which may be an approximation can be used to trigger the performance monitor exception The performance monitor facility is not defined by the PowerPC architecture The performance monitor can be used for the following To increase system performance with efficient software es
346. gister Instructions Continued SSES CECR CECR CSS ES RES ES Notes Y This assumes no pending stores in the store queue If there are the sync completes after they complete to memory If broadcast is enabled on the 60x bus sync completes only after a successful broadcast 2 tlbsync is dispatched only to the completion buffer not to any execution unit and is marked finished as it is dispatched Upon retirement it waits for an external TLBISYNC signal to be asserted In most systems TLBISYNC is always asserted so the instruction is a no op Table 6 5 lists condition register logical instruction latencies Table 6 5 Condition Register Logical Instructions Table 6 6 shows integer instruction latencies Note that the IU1 executes all integer arithmetic instructions multiply divide shift rotate add subtract and compare The IU2 executes all integer instructions except multiply and divide that is shift rotate add subtract and compare 6 32 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 6 6 Integer Instructions ET ES C ION II IS EIN ES ECC ION II IS EA ES CCOO ION IONES CIN E ESA fawn fae me wa a ACI pe E ES C IONES ANI CI EOS ES ew e o e p ES ea a e pe p ESA ECON IONES ION CI IA ESA C me fete ECONO INCISIONES IS EN ES ECETIA FOIS CIA E ES ECC ICI FEE CE EC EA C FOCO CEC EC ESA ECC IONES mm me wm a AI TI E ES em a AI pe p ES C IEC FOIS IA E ESA ECONO
347. gisters can hold either single or double precision floating point values FPSCR User The floating point status and control register FPSCR contains the floating point exception signal bits exception summary bits exception enable bits and rounding control bits needed for compliance with the IEEE 754 standard The 32 GPRs serve as the data source or destination for integer instructions Supervisor The machine state register MSR defines the processor state Its contents are saved when an exception is taken and restored when exception handling completes The 750 implements MSR POW defined by the architecture as optional which is used to enable the power management feature The 750 specific MSR PM bit is used to mark a process for the performance monitor accesses and a shadow array for instruction accesses see Figure 1 1 Loading a segment entry with the Move to Segment Register mtsr instruction loads both arrays The mfsr instruction reads the master register shown as part of the data MMU in Figure 1 1 SRO Supervisor The sixteen 32 bit segment registers SRs define the 4 Gbyte space as sixteen 256 Mbyte SR15 segments The 750 implements segment registers as two arrays a main array for data The OEA defines numerous special purpose registers that serve a variety of functions such as providing controls indicating status configuring the processor and performing special operations During normal execution a program ca
348. h during HRESET assertion 7 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 2 9 12 L2 Write Enable L2WE Output Following are the state meaning and timing comments for the L2WE signal State Meaning Timing Comments Asserted Indicates that the 750 is performing a write operation to the L2 cache memory Negated Indicates that the 750 is not performing an L2 cache memory write operation Assertion Negation May occur on any cycle L2WE is driven high during HRESET assertion 7 2 9 13 L2 Clock Out A L2CLK_OUTA Output Following are the state meaning and timing comments for the LZCLK_OUTA signal State Meaning Timing Comments Asserted Negated Clock output for L2 cache memory devices The L2CLK_OUTA signal is identical and synchronous with the L2CLK_OUTB signal and provides the capability to drive up to four L2 cache memory devices If differential L2 clocking is configured through the setting of the L2CR the L2CLK_OUTB signal is driven phase inverted with relation to the L2ZCLK_OUTA signal Assertion Negation Refer to the 750 hardware specifications for timing comments The L2CLK_OUTA signal is driven low during assertion of HRESET 7 2 9 14 L2 Clock Out B L2CLK_OUTB Output Following are the state meaning and timing comments for the L2CLK_OUTB signal State Meaning Timing Comments Asserted Negated Clock output for L2 cache memory devices The L2CLK_OUTB signal is identic
349. h as the 750 The class is determined by examining the primary opcode and the extended opcode if any If the opcode or combination of opcode and extended opcode is not that of a defined instruction or of a reserved instruction the instruction is illegal Instruction encodings that are now illegal may become assigned to instructions in the architecture or may be reserved by being assigned to processor specific instructions 2 3 1 1 Definition of Boundedly Undefined If instructions are encoded with incorrectly set bits in reserved fields the results on execution can be said to be boundedly undefined If a user level program executes the incorrectly coded instruction the resulting undefined results are bounded in that a spurious change from user to supervisor state is not allowed and the level of privilege exercised by the program in relation to memory access and other system resources cannot be exceeded Boundedly undefined results for a given instruction may vary between implementations and between execution attempts in the same implementation 2 3 1 2 Defined Instruction Class Defined instructions are guaranteed to be supported in all PowerPC implementations except as stated in the instruction descriptions in Chapter 8 Instruction Set in The Programming Environments Manual The 750 provides hardware support for all instructions defined for 32 bit implementations It does not support the optional fsqrt fsqrts and tlbia instructi
350. h no address retry Chapter 8 Bus Interface Operation 8 45 3 At this point if DBWO is asserted with a qualified data bus grant to the 750 the 750 asserts DBB and drives the write data onto the data bus out of order with respect to the address pipeline The write transaction concludes with the 750 negating DBB 4 The next qualified data bus grant signals the 750 to complete the outstanding read transaction by latching the data on the bus This assertion of DBG should not be accompanied by an asserted DBWO Any number of bus transactions by other bus masters can be attempted between any of these steps Note the following regarding DBWO e DBWO can be asserted if no data bus read is pending but it has no effect on write ordering e The ordering and presence of data bus writes is determined by the writes in the write queues at the time BG is asserted for the write address not DBG If a particular write is desired for example a cache line snoop push out operation then BG must be asserted after that particular write is in the queue and it must be the highest priority write in the queue at that time A cache line snoop push out operation may be the highest priority write but more than one may be queued e Because more than one write may be in the write queue when DBG is asserted for the write address more than one data bus write may be enveloped by a pending data bus read The arbiter must monitor bus operations a
351. hat a coherency state describes also referred to as a cache line e Two coherency state bits for each data cache block allow encoding for three states Modified Exclusive M Exclusive Unmodified E Invalid 1 e Asingle coherency state bit for each instruction cache block allows encoding for two possible states Invalid INV Valid VAL Chapter 3 Instruction and Data Cache Operation 3 1 e Each cache can be invalidated or locked by setting the appropriate bits in the hardware implementation dependent register 0 HIDO a special purpose register SPR specific to the 750 The 750 supports a fully coherent 4 Gbyte physical memory address space Bus snooping is used to drive the MEI three state cache coherency protocol that ensures the coherency of global memory with respect to the processor s data cache The MEI protocol is described in Section 3 3 2 MEI Protocol On a cache miss the 750 s cache blocks are filled in four beats of 64 bits each The burst fill is performed as a critical double word first operation the critical double word is simultaneously written to the cache and forwarded to the requesting unit thus minimizing stalls due to cache fill latency The instruction and data caches are integrated into the 750 as shown in Figure 3 1 Load Store Unit Instruction Unit LSU Instructions 0 127 EA 20 26 Data 0 63 Cache Tags Cache Tags I Cache D Cach
352. have highest priority The next priority consists of load and store requests from the L1 data cache The next priority consists of instruction fetch requests from the L1 instruction cache For more information see Chapter 9 L2 Cache Interface Operation The L2 cache interface is physically present in the 740 but the IOs are not brought out to the package Initially the 740 uses a 255 pin CBGA package the 750 uses a 360 pin CBGA package 1 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 1 2 6 System Interface Bus Interface Unit BIU The address and data buses operate independently address and data tenures of a memory access are decoupled to provide a more flexible control of memory traffic The primary activity of the system interface is transferring data and instructions between the processor and system memory There are two types of memory accesses e Single beat transfers These memory accesses allow transfer sizes of 8 16 24 32 or 64 bits in one bus clock cycle Single beat transactions are caused by uncacheable read and write operations that access memory directly that is when caching is disabled cache inhibited accesses and stores in write through mode e Four beat burst 32 bytes data transfers Burst transactions which always transfer an entire cache block 32 bytes are initiated when an entire cache block is transferred Because the first level caches on the 750 are write back caches bu
353. he snoop The snoop is not given priority into the tags when the snoop coincides with a tag write for example validation after a cache block load In these situations the snoop is retried and must re arbitrate before the lookup is possible Occasionally cache snoops cannot be serviced and must be retried These retries occur if the cache is busy with a burst read or write when the snoop operation takes place Note that it is possible for a snoop to hit a modified cache block that is already in the process of being written to the copy back buffer for replacement purposes If this happens the 750 retries the snoop and raises the priority of the castout operation to allow it to go to the bus before the cache block fill Another consideration is page table aliasing If a store hits to a modified cache block but the page table entry is marked write through WIMG 1 xxx then the page has probably been aliased through another page table entry which is marked write back WIMG Oxxx If this occurs the 750 ignores the modified bit in the cache tag The cache block is updated during the write through operation and the block remains in the modified state The global GBL signal asserted as part of the address attribute field during a bus transaction enables the snooping hardware of the 750 Address bus masters assert GBL to indicate that the current transaction is a global access that is an access to memory shared by more than one device If
354. he address W M WIM state from address translation complement Dor 1 WIM state implied by transaction type in table For instruction fetches reflection of the M bit must be enabled through HIDO IFEM A Atomic high if lwarx low otherwise S Transfer size Special instructions listed may not generate bus transactions depending on cache state 3 30 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 3 7 MEI State Transactions Table 3 7 shows MEI state transitions for various operations Bus operations are described in Table 3 5 Table 3 7 MEI State Transitions Current A Cache Bus Bus Operation Operation sync Cache Cache Actions Operation State Load Read x0x Same 1 Cast out of modified Write with kill T 0 block as required 2 Pass four beat read Read to memory queue Load Read x0x EM Same Read data from cache T 0 Load T 0 Read No x1x Same Passsingle beatreadto Read memory queue Load T 0 Read No x1x M CRTRY read push Write with kill sector to write queue Read Acts like other reads but bus operation uses special encoding en A Same Cast out of modified Write with kill T 0 block if necessary Pass RWITM to RWITM memory queue Store Write EM Write data to cache T 0 Store stwex Write Same Pass single beat write pene with flus T 0 to memory queue Store stwex Write Same Write data to cache Em T 0 Pass single beat write Ve to memory queu
355. he block from other caches in the system The MEI coherency protocol also affects how the 750 snoops read operations on the 60x bus All reads snooped from the 60x bus except for caching inhibited reads are interpreted as RWITM to cause flushing from the 750 s cache Single beat reads TBST negated are interpreted by the 750 as caching inhibited These actions for read operations allow the 750 to operate successfully coherently on the bus with other bus masters that implement either the three state MEI or a four state MESI cache coherency protocol Chapter 3 Instruction and Data Cache Operation 3 23 3 6 2 Bus Operations Caused by Cache Control Instructions The cache control TLB management and synchronization instructions supported by the 750 may affect or be affected by the operation of the 60x bus The operation of the instructions may also indirectly cause bus transactions to be performed or their completion may be linked to the bus The dcbz instruction is the only cache control instruction that causes an address only broadcast on the 60x bus All other data cache control instructions debi debf debst and dcbz are not broadcast unless specifically enabled through the HIDO ABE configuration bit Note that debi debf debst and debz do broadcast to the 750 s L2 cache regardless of HIDO ABE HIDO ABE also controls the broadcast of the sync and eieio instructions The icbi instruction is never broadcast No broadcasts by ot
356. he following e Registers implemented in the 750 e Operand conventions e The 750 instruction set For detailed information about architecture defined features see The Programming Environments Manual 2 1 The PowerPC 750 Processor Register Set This section describes the registers implemented in the 750 It includes an overview of registers defined by the PowerPC architecture highlighting differences in how these registers are implemented in the 750 and a detailed description of 750 specific registers Full descriptions of the architecture defined register set are provided in Chapter 2 PowerPC Register Set in The Programming Environments Manual Registers are defined at all three levels of the PowerPC architecture user instruction set architecture UISA virtual environment architecture VEA and operating environment architecture OEA The PowerPC architecture defines register to register operations for all computational instructions Source data for these instructions are accessed from the on chip registers or are provided as immediate values embedded in the opcode The three register instruction format allows specification of a target register distinct from the two source registers thus preserving the original data for use by other instructions and reducing the number of instructions required for certain operations Data is transferred between memory and registers with explicit load and store instructions only 2 1 1 Register
357. he performance monitor counts events during execution of code relating to dispatch execution completion and memory accesses The performance monitor incorporates several registers that can be read and written to by supervisor level software User level versions of these registers provide read only access for user level applications These registers are described in Section 1 4 PowerPC Registers and Programming Model Performance monitor control registers MMCRO or MMCRI1 can be used to specify which events are to be counted and the conditions for which a performance monitoring interrupt is taken Additionally the sampled instruction address register SIA USIA holds the address of the first instruction to complete after the counter overflowed Attempting to write to a user read only performance monitor register causes a program exception regardless of the MSR PR setting When a performance monitoring interrupt occurs program execution continues from vector offset OxOOFOO Chapter 11 Performance Monitor describes the operation of the performance monitor diagnostic tool incorporated in the 750 1 38 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 2 Programming Model This chapter describes the PowerPC 750 programming model emphasizing those features specific to the 750 processor and summarizing those that are common to PowerPC processors It consists of three major sections which describe t
358. he processor sets the changed bit if any store is performed into the page See also Page access history bits and Referenced bit Clear To cause a bit or bit field to register a value of zero See also Set Completion Completion occurs when an instruction has finished executing written back any results and is removed from the completion queue When an instruction completes it is guaranteed that this instruction and all previous instructions can cause no exceptions Context synchronization An operation that ensures that all instructions in execution complete past the point where they can produce an exception that all instructions in execution complete in the context in which they began execution and that all subsequent instructions are fetched and executed in the new context Context synchronization may result from executing specific instructions such as isync or rfi or when certain events occur such as an exception Copy back An operation in which modified data in a cache block is copied back to memory Glossary of Terms and Abbreviations Glossary 3 Glossary 4 Denormalized number A nonzero floating point number whose exponent has a reserved value usually the format s minimum and whose explicit or implicit leading significand bit is zero Direct mapped cache A cache in which each main memory address can appear in only one location within the cache operates more quickly when the memory request is a cache hit Effectiv
359. her PMC1 0 1 or a performance monitor interrupt is signaled 19 25 PMC1SELECT PMC1 input selector 128 events selectable See Table 2 10 26 31 PMC2SELECT PMC2 input selector 64 events selectable See Table 2 11 MMCRO can be accessed with mtspr and mfspr using SPR 952 2 1 2 4 2 User Monitor Mode Control Register 0 UMMCRO The contents of MMCRO are reflected to UMMCRO which can be read by user level software MMCRO can be accessed with mfspr using SPR 936 Chapter 2 Programming Model 2 15 2 1 2 4 3 Monitor Mode Control Register 1 MMCR1 The monitor mode control register 1 MMCR1 functions as an event selector for performance monitor counter registers 3 and 4 PMC3 and PMC4 The MMCRI register is shown in Figure 2 6 Reserved Figure 2 6 Monitor Mode Control Register 1 MMCR1 Bit settings for MMCRI1 are shown in Table 2 8 The corresponding events are described in Section 2 1 2 4 5 Performance Monitor Counter Registers PMC1 PMC4 Table 2 8 MMCR1 Bit Settings A EE PMC3SELECT PMC3 input selector 32 events selectable See Table 2 12 for defined selections PMC4SELECT PMC4 input selector 32 events selectable See Table 2 13 for defined selections EE EE MMCHRT can be accessed with mtspr and mfspr using SPR 956 User level software can read the contents of MMCRI by issuing an mfspr instruction to UMMCR1 described in Section 2 1 2 4 4 User Monitor Mode Control Register 1 UM
360. her masters are snooped by the 750 except for debz kill block transactions For detailed information on the cache control instructions refer to Chapter 2 Programming Model in this book and Chapter 8 Instruction Set in The Programming Environments Manual Table 3 4 provides an overview of the bus operations initiated by cache control instructions Note that Table 3 4 assumes that the WIM bits are set to 001 that is the cache is operating in write back mode caching is permitted and coherency is enforced Table 3 4 Bus Operations Caused by Cache Control Instructions WIM 001 Current sync Don t care No change sync Waits for memory queues if enabled in to complete bus activity HIDO ABE E A e O ere tlbsync None Waits for the negation of the TLBSYNC input signal to complete Don t care No change eieio Address only bus if enabled in operation HIDO ABE CCOO ESTO O CIN EN Don t care Kill block Address only bus if enabled in operation HIDO ABE LE Flush block Address only bus if enabled in operation HIDO ABE LE No change Clean block Address only bus if enabled in operation HIDO ABE 3 24 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 3 4 Bus Operations Caused by Cache Control Instructions WIM 001 Current Cache State Next Cache State Bus Operation Kill Kill lock Writes over modified data Writes over modified data modified data ee with intent
361. hile the processor is in user mode RESERVED Bits MMCR1 5 9 specify events associated with PMC4 as shown in Table 11 8 Table 11 8 PMC4 Events MMCR1 5 9 Select Encodings EIN Number of TBL bit transitions from 0 to 1 of specified bits in time base lower register Bits are specified through RTCSELECT MMRCO 7 8 0 47 1 51 2 55 3 63 00100 Number of instructions dispatched 0 1 or 2 per cycle 00101 Number of L2 castouts Chapter 11 Performance Monitor 11 9 Table 11 8 PMC4 Events MMCR1 5 9 Select Encodings a 01101 Number of completed integer operations 01110 Number of cycles the BPU cannot process new branches due to having two unresolved branches 11111 Number of L1 Data cache misses Does not include cache ops All others Reserved May be used in a later revision The PMC registers can be accessed with the mtspr and mfspr instructions using the following SPR numbers e PMCl1 is SPR 953 e PMC2 is SPR 954 e PMC3 is SPR 957 e PMC4 is SPR 958 11 2 1 6 User Performance Monitor Counter Registers UPMC1 UPMC4 The contents of the PMC1 PMC4 are reflected to UPMC1 UPMC4 which can be read by user level software The UPMC registers can be read with the mfspr instructions using the following SPR numbers e UPMCI is SPR 937 e UPMC2 is SPR 938 UPMC3 is SPR 941 e UPMC4 is SPR 942 11 2 1 7 Sampled Instruction Address Register SIA The sampled instruction address register SIA is a
362. hing inhibited Accesses to such memory locations never update the on chip cache If a cache inhibited access hits the on chip cache the cache block is invalidated If the cache block is marked modified it is copied back to memory before being invalidated Where caching is permitted memory is configured as either write back or write through which are described as follows e Write back Configuring a memory region as write back lets a processor modify data in the cache without updating system memory For such locations memory updates occur only on modified cache block replacements cache flushes or when one processor needs data that is modified in another s cache Therefore configuring memory as write back can help when bus traffic could cause bottlenecks especially for multiprocessor systems and for regions in which data such as local variables is used often and is coupled closely to a processor If multiple devices use data in a memory region marked write through snooping must be enabled to allow the copy back and cache invalidation operations necessary to ensure cache coherency The 750 s snooping hardware keeps other devices from accessing invalid data For example when snooping is enabled the 750 monitors transactions of other bus devices For example if another device needs data that is Chapter 6 Instruction Timing 6 27 modified on the 750 s cache the access is delayed so the 750 can copy the modified data to memory
363. hree state MEI modified exclusive invalid protocol a coherent subset of the standard four state MESI modified exclusive shared invalid protocol The MEI protocol is described in Section 3 3 2 MEI Protocol The tags consist of bits PA 0 19 Address translation occurs in parallel with set selection from A 20 26 and the higher order address bits the tag bits in the cache are physical The 750 s on chip data cache tags are single ported and load or store operations must be arbitrated with snoop accesses to the data cache tags Load or store operations can be performed to the cache on the clock cycle immediately following a snoop access if the snoop misses snoop hits may block the data cache for two or more cycles depending on whether a copy back to main memory is required Chapter 3 Instruction and Data Cache Operation 3 3 Way 0 Way 1 Address Tag 1 Words 0 7 Way 2 Address Tag 2 Words 0 7 Way 3 Address Tag 3 Words 0 7 Way 4 Address Tag 4 Words 0 7 Way 5 Address Tag 5 Words 0 7 Way 6 Address Tag 6 Words 0 7 Way 7 Address Tag 7 Words 0 7 lt 8 Words Block gt Figure 3 2 Data Cache Organization 3 2 Instruction Cache Organization The instruction cache also consists of 128 sets of eight ways as shown in Figure 3 3 Each way consists of 32 bytes a single state bit and an address tag As with the data cache each instru
364. hzux 31 D A B 311 0 Ihzx 31 D A B 279 0 Iwa 58 D A ds 2 Iwaux 31 D A B 373 0 Iwax 31 D A B 341 0 Iwz 32 D A d Iwzu 33 D A d Iwzux 31 D A B 55 0 Iwzx 31 D A B 23 0 Note 1 64 bit instruction A 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table A 14 Integer Store Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 stb 38 S A d stbu 39 S A d stbux 31 S A B 247 0 stbx 31 S 215 0 std 62 S A ds 0 stdu 62 S A ds 1 stdux 31 S A B 181 0 stdx 31 S A B 149 0 sth 44 S A d sthu 45 S A d sthux 31 S 439 0 sthx 31 S A B 407 0 stw 36 S A d stwu 37 S A d stwux 31 S A B 183 0 Note 1 64 bit instruction Table A 15 Integer Load and Store with Byte Reverse Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Ihbrx 31 D A B 790 0 Iwbrx 31 D A B 534 0 sthbrx 31 S A B 918 0 stwbrx 31 S A B 662 0 Table A 16 Integer Load and Store Multiple Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Appendix A PowerPC Instruction Set Listings A 23 Table A 17 Integer Load and Store String Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Iswi 31 D A NB 597 0 Iswx 31 D A B 533 0 stswi 31 S A NB 725 0 stswx 31 S A B 661 0
365. ical instructions 2 40 A 18 Integer rotate shift instructions 2 40 A 19 Integer store gathering 6 26 Integer store instructions 2 47 A 23 Integer unit execution timing 6 24 Interrupt external 4 20 ISI exception 4 19 isync 2 62 4 12 ITLB organization 5 25 K Kill block operation 3 27 L L1 L2 interface operation see Cache L2ADDRN L2 address signals 7 25 L2CE L2 chip enable signals 7 26 L2CLK_OUTA L2 clock out A signal 7 27 L2CLK_OUTB L2 clock out B signal 7 27 L2CR L2 cache control register 2 24 9 5 L2DATAn L2 data signals 7 25 L2DPn L2 data parity signals 7 26 L2SYNC_IN L2 sync in signal 7 28 L2SYNC_OUT L2 sync out signal 7 27 L2WE L2 write enable signal 7 27 L2ZZ L2 low power mode enable signal 7 28 Latency load store instructions 6 36 Latency definition 6 2 Load store address generation 2 46 byte reverse instructions 2 49 A 23 execution timing 6 25 floating point load instructions 2 51 A 24 Index 5 floating point move instructions 2 44 A 25 floating point store instructions 2 52 A 25 handling misalignment 2 45 integer load instructions 2 46 A 22 integer store instructions 2 47 A 23 latency load store instructions 6 36 load store multiple instructions 2 49 A 23 memory synchronization instructions A 24 string instructions 2 50 A 24 Logical address translation 5 1 Logical instructions integer A 18 Lookaside buffer management instructions A 28 LR link register 2
366. ically if a fetch access hits the BTIC it provides the first two instructions in the target stream 512 entry branch history table BHT with two bits per entry for four levels of prediction not taken strongly not taken taken strongly taken Branch instructions that do not update the count register CTR or link register LR are removed from the instruction stream Two integer units Us that share thirty two GPRs for integer operands IU1 can execute any integer instruction TU2 can execute all integer instructions except multiply and divide instructions multiply divide shift rotate arithmetic and logical instructions Most instructions that execute in the U2 take one cycle to execute The U2 has a single entry reservation station Three stage FPU Fully IEEE 754 1985 compliant FPU for both single and double precision operations Supports non IEEE mode for time critical operations Hardware support for denormalized numbers Single entry reservation station Thirty two 64 bit FPRs for single or double precision operands IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Two stage LSU Two entry reservation station Single cycle pipelined cache access Dedicated adder performs EA calculations Performs alignment and precision conversion for floating point data Performs alignment and sign extension for integer data Three entry store queue Supports both big and littl
367. ignal to configure CLK_OUT See Table 2 5 5 Not used Defined as EICE on some earlier processors ECLK CLK_OUT output enable and clock type selection Used in conjunction with HIDO BCLK and the HRESET signal to configure CLK_OUT See Table 2 5 Disable precharge of ARTRY O Precharge of ARTRY enabled 1 Alters bus protocol slightly by preventing the processor from driving ARTRY to high negated state If this is done the system must restore the signals to the high state ES BCLK CLK_OUT output enable and clock type selection Used in conjunction with HIDO ECLK and the Doze mode enable Operates in conjunction with MSR POW 0 Doze mode disabled 1 Doze mode enabled Doze mode is invoked by setting MSR POW while this bit is set In doze mode the PLL time base and snooping remain active Nap mode enable Operates in conjunction with MSR POW O Nap mode disabled 1 Nap mode enabled Doze mode is invoked by setting MSR POW while this bit is set In nap mode the PLL and the time base remain active Sleep mode enable Operates in conjunction with MSR POW O Sleep mode disabled 1 Sleep mode enabled Sleep mode is invoked by setting MSR POW while this bit is set QREQ is asserted to indicate that the processor is ready to enter sleep mode If the system logic determines that the processor may enter sleep mode the quiesce acknowledge signal QACK is asserted back to the processor Once QACK assertion is detected the processo
368. il the next instruction and any exceptions associated with that instruction completes execution If no instructions are in the execution units the exception is taken immediately upon determination of the correct restart address for loading SRRO As shown in Table 1 4 the 750 implements additional asynchronous maskable exceptions e Asynchronous nonmaskable There are two nonmaskable asynchronous exceptions system reset and the machine check exception These exceptions may not be recoverable or may provide a limited degree of recoverability Exceptions report recoverability through the MSR RI bit 1 7 2 PowerPC 750 Microprocessor Exception Implementation The 750 exception classes described above are shown in Table 1 4 Table 1 4 PowerPC 750 Microprocessor Exception Classifications Synchronous Asynchronous Precise Imprecise Exception Type Asynchronous nonmaskable Machine check system reset Asynchronous maskable Precise External decrementer system management performance monitor and thermal management interrupts Synchronous Precise Instruction caused exceptions Although exceptions have other characteristics such as priority and recoverability Table 1 4 describes categories of exceptions the 750 handles uniquely Table 1 4 includes no synchronous imprecise exceptions although the PowerPC architecture supports imprecise handling of floating point exceptions the 750 implements these exception modes precisely Table 1 5 lists
369. ing Guidelines The performance of the 750 can be improved by avoiding resource conflicts and scheduling instructions to take fullest advantage of the parallel execution units Instruction scheduling on the 750 can be improved by observing the following guidelines To reduce mispredictions separate the instruction that sets CR bits from the branch instruction that evaluates them Because there can be no more than 12 instructions in the processor with the instruction that sets CR in CQO and the dependent branch instruction in IQ5 there is no advantage to having more than 10 instructions between them Likewise when branching to a location specified by the CTR or LR separate the mtspr instruction that initializes the CTR or LR from the dependent branch instruction This ensures the register values are immediately available to the branch instruction Schedule instructions such that two can be dispatched at a time Schedule instructions to minimize stalls due to execution units being busy Avoid scheduling high latency instructions close together Interspersing single cycle latency instructions between longer latency instructions minimizes the effect that instructions such as integer divide and multiply can have on throughput Avoid using serializing instructions Schedule instructions to avoid dispatch stalls Six instructions can be tracked in the completion queue therefore only six instructions can be in the execute stages at any o
370. ing assertion of DRTRY For more information see Section 8 4 4 Data Transfer Termination Negated During DBB indicates that until TA is asserted the 750 must continue to drive the data for the current write or must wait to sample the data for reads Timing Comments Assertion Miust not occur before AACK for the current transaction if the address retry mechanism is to be used to prevent invalid data from being used by the processor otherwise assertion may occur at any time during the assertion of DBB The system can withhold assertion of TA to indicate that the 750 should insert wait states to extend the duration of the data beat Negation Must occur after the bus clock cycle of the final or only data beat of the transfer For a burst transfer the system can assert TA Chapter 7 Signal Descriptions 7 19 for one bus clock cycle and then negate it to advance the burst transfer to the next beat and insert wait states during the next beat 7 2 8 2 Data Retry DRTRY Input Following are the state meaning and timing comments for the DRTRY signal State Meaning Timing Comments Asserted Indicates that the 750 must invalidate the data from the previous read operation Negated Indicates that data presented with TA on the previous read operation is valid Note that DRTRY is ignored for write transactions Assertion Must occur during the bus clock cycle immediately after TA is asserted if a retry is re
371. inhibited e An attempt is made to execute dcbz when the data cache is disabled e An eciwx or ecowx is not word aligned e A multiple or string access is attempted with MSR LE set Note that in the 750 a floating point load or store to a direct store segment causes a DSI exception rather than an alignment exception as specified by the PowerPC architecture For more information see 4 5 3 4 5 7 Program Exception 0x00700 The 750 implements the program exception as it is defined by the PowerPC architecture OEA A program exception occurs when no higher priority exception exists and one or more of the exception conditions defined in the OEA occur 4 20 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The 750 invokes the system illegal instruction program exception when it detects any instruction from the illegal instruction class The 750 fully decodes the SPR field of the instruction If an undefined SPR is specified a program exception is taken The UISA defines mtspr and mfspr with the record bit Rc set as causing a program exception or giving a boundedly undefined result In the 750 the appropriate condition register CR should be treated as undefined Likewise the PowerPC architecture states that the Floating Compared Unordered fempu or Floating Compared Ordered fempo instruction with the record bit set can either cause a program exception or provide a boundedly undefined result In the 750 an the BF fi
372. instructions are retired As with dispatching instructions from the instruction queue instructions are retired from the two bottom positions in the completion queue If completion logic detects an instruction causing an exception all following instructions are cancelled their execution results in rename registers are discarded and instructions are fetched from the appropriate exception vector Chapter 1 PowerPC 740 PowerPC 750 Overview 1 35 Because the PowerPC architecture can be applied to such a wide variety of implementations instruction timing varies among PowerPC processors For a detailed discussion of instruction timing with examples and a table of latencies for each execution unit see Chapter 6 Instruction Timing 1 10 Power Management The 750 provides four power modes selectable by setting the appropriate control bits in the MSR and HIDO registers The four power modes are as follows Full power This is the default power state of the 750 The 750 is fully powered and the internal functional units are operating at the full processor clock speed If the dynamic power management mode is enabled functional units that are idle will automatically enter a low power state without affecting performance software execution or external hardware Doze All the functional units of the 750 are disabled except for the time base decrementer registers and the bus snooping logic When the processor is in doze mode an external asy
373. into physical address bits PA O 19 The low order address bits A 20 31 are untranslated and are therefore identical for both effective and physical addresses After translating the address the MMUs pass the resulting 32 bit physical address to the memory subsystem 5 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The MMUs record whether the translation is for an instruction or data access whether the processor is in user or supervisor mode and for data accesses whether the access is a load or a store operation The MMUs use this information to appropriately direct the address translation and to enforce the protection hierarchy programmed by the operating system Section 4 3 Exception Processing describes the MSR which controls some of the critical functionality of the MMUs The figures show how address bits A 20 26 index into the on chip instruction and data caches to select a cache set The remaining physical address bits are then compared with the tag fields comprised of bits PA O 19 of the two selected cache blocks to determine if a cache hit has occurred In the case of a cache miss on the 750 the instruction or data access is then forwarded to the L2 interface tags to check for an L2 cache hit In case of a miss and in all cases of an on chip cache miss on the PowerPC 740 the access is forwarded to the bus interface unit which initiates an external memory access Chapter 5 Memory Management 5 5
374. ion are defined at least to some extent by the PowerPC architecture Table 4 1 PowerPC 750 Microprocessor Exception Classifications Synchronous Asynchronous Precise Imprecise Exception Types Asynchronous nonmaskable Machine check Machine check system reset reset Asynchronous maskable Precise A interrupt decrementer system management interrupt performance monitor interrupt thermal management interrupt Synchronous Precise Instruction caused exceptions 4 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual These classifications are discussed in greater detail in 4 2 For a better understanding of how the 750 implements precise exceptions see Chapter 6 Exceptions implemented in the 750 and conditions that cause them are listed in Table 4 2 Table 4 2 Exceptions and Conditions Vector Offset Exception Type hex Causing Conditions Reserved 00000 System reset 00100 Assertion of either HRESET or SRESET or at power on reset Machine check 00200 Assertion of TEA during a data bus transaction assertion of MCP or an address data or L2 bus parity error MSR ME must be set 00300 As specified in the PowerPC architecture For TLB misses on load store or cache operations a DSI exception occurs if a page fault occurs 00400 As defined by the PowerPC architecture External interrupt 00500 MSR EE 1 and INT is asserted Alignment 00600 A floating point load store stmw stwcx Imw lwarx
375. ion temperature is less than the threshold EASES Ban AA AS Note The TIN and TIV bits are read only status bits 10 3 2 2 TAU Dual Threshold Mode The configuration and operation of the TAU s dual threshold mode is similar to single threshold mode except both THRM1 and THRM2 are configured with desired threshold and TID values and the TIE and V bits are set to 1 When the THRM3 E bit is set to 1 to enable temperature measurement and comparison the first comparison is made with THRM1 If no thermal management interrupt results from the comparison the number of processor cycles specified in THRM3 SITV elapses and the next comparison is made with THRM2 If no thermal management interrupt results from the THRM2 comparison the time specified by THRM3 SITV again elapses and the comparison returns to THRM1 This sequence of comparisons continues until a thermal management interrupt occurs or the TAU is disabled When a comparison results in an interrupt the comparison with the threshold SPR causing the interrupt is halted but comparisons continue with the other threshold SPR Following a thermal management interrupt the interrupt service routine must read both THRM1 and THRM2 to determine which threshold was crossed Note that it is possible for both threshold values to have been crossed in which case the TAU ceases making temperature comparisons until an mtspr instruction is executed to one or both of the threshold SPRs Chap
376. ion cache throttling control register ICTC has bits for enabling the instruction cache throttling feature and for controlling the interval at which instructions are forwarded to the instruction buffer in the fetch unit This provides control over the processor s overall junction temperature Thermal management registers THRM1 THRM2 and THRM3 Used to enable and set thresholds for the thermal management facility THRM1 and THRM2 provide the ability to compare the junction temperature against two user provided thresholds The dual thresholds allow the thermal management software differing degrees of action in lowering the junction temperature The TAU can be also operated in a single threshold mode in which the thermal sensor output is compared to only one threshold in either THRM1 or THRM2 THRM3 is used to enable the thermal management assist unit TAU and to control the comparator output sample time Note that while it is not guaranteed that the implementation of 750 specific registers is consistent among PowerPC processors other processors may implement similar or identical registers 2 1 2 PowerPC 750 Specific Registers This section describes registers that are defined for the 750 but are not included in the PowerPC architecture 2 1 2 1 Instruction Address Breakpoint Register IABR The address breakpoint register ABR shown in Figure 2 2 supports the instruction address breakpoint exception When this exceptio
377. ion related exceptions 2 37 ISI exception 4 19 machine check exception 4 17 performance monitor interrupt 4 22 program exception 4 20 register settings MSR 4 8 4 12 SRRO SRR1 4 7 reset exception 4 13 returning from an exception handler 4 11 summary table 4 3 system call exception 4 21 system management interrupt 4 25 terminology 4 2 thermal management interrupt exception 4 26 Execution synchronization 2 36 Execution unit timing examples 6 18 Execution units 1 10 External control instructions 2 64 8 21 A 28 F Features list 1 4 Finish cycle definition 6 2 Floating Point Execution Models UISA 2 28 Floating point model FEO FE1 bits 4 10 Index 3 FP arithmetic instructions 2 42 A 20 FP assist exceptions 4 22 FP compare instructions 2 43 A 21 FP load instructions A 24 FP move instructions A 25 FP multiply add instructions 2 42 A 20 FP operand 2 30 FP rounding conversion instructions 2 43 A 21 FP store instructions 2 52 A 25 FP unavailable exception 4 21 FPSCR instructions 2 44 A 21 TEEE 754 compatibility 2 28 NI bit in FPSCR 2 30 Floating point unit execution timing 6 24 latency FP instructions 6 34 overview 1 10 1 11 Flush block operation 3 27 FPRn floating point registers 2 3 FPSCR floating point status and control regis ter FPSCR instructions 2 44 A 21 FPSCR register description 2 3 NI bit 2 29 G GBL global signal 7 13 GPRn general purpose registers 2 3 Guarded memory bit G bit 3 6
378. ion violation Out of order instruction fetch or load operation Out of order store operation Would be required Maybe by the sequential execution model in the absence of system caused or imprecise exceptions or of floating point assist exception for instructions that would cause no other kind of precise exception 1 ES ES a l All other out of order store operations Maybe No l Zero length store stswx Maybe No d D k EE w fe wo Y ves no 7 S a CC 14 debi instruction Maybe Notes 1 If Cis set R is guaranteed to be set also 2 Includes the case in which the instruction is fetched out of order and R is not set does not apply for 750 S 3 EN Store conditional stwex that does not store N N Y N N No Yes No No No No No Yes Yes Yes Yes No Yes es 5 24 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual For more information see Page History Recording in Chapter7 Memory Management of The Programming Environments Manual 5 4 2 Page Memory Protection The 750 implements page memory protection as it is defined in Chapter 7 Memory Management in The Programming Environments Manual 5 4 3 TLB Description The 750 implements separate 128 entry data and instruction TLBs to maximize performance This section describes the hardware resources provided in the 750 to facilitate page address translation Note that the hardware implementation of the MMU is no
379. ional units are clocked only when needed e No software or hardware intervention is required after mode is set e Software hardware and performance transparent 10 2 1 3 Doze Mode Doze mode disables most functional units but maintains cache coherency by enabling the bus interface unit and snooping A snoop hit causes the 750 to enable the data cache copy the data back to memory disable the cache and fully return to the doze state 10 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual e Most functional units disabled e Bus snooping and time base decrementer still enabled e Doze mode sequence Set doze bit HIDO 8 1 clear nap and sleep bits HIDO 9 and HIDO 10 0 The 750 enters doze mode after several processor clocks e Several methods of returning to full power mode Assert INT SMI MCP decrementer performance monitor machine check or thermal management interrupts Assert hard reset or soft reset e Transition to full power state takes no more than a few processor cycles e PLL running and locked to SYSCLK 10 2 1 4 Nap Mode The nap mode disables the 750 but still maintains the phase locked loop PLL delay locked loop DLL L2CLK_OUTA and L2CLK_OUTB output signals and the time base decrementer The time base can be used to restore the 750 to full power state after a programmed amount of time To maintain data coherency bus snooping is disabled for nap and sleep modes through a h
380. ions should be disabled in the FPSCR and MSR 6 4 5 Load Store Unit Execution Timing The execution of most load and store instructions is pipelined The LSU has two pipeline stages The first is for effective address calculation and MMU translation and the second is for accessing data in the cache Load and store instructions have a two cycle latency and one cycle throughput If operands are misaligned additional latency may be required either for an alignment exception to be taken or for additional bus accesses Load instructions that miss in the cache block subsequent cache accesses during the cache line refill Table 6 8 gives load and store instruction execution latencies 6 4 6 Effect of Operand Placement on Performance The PowerPC VEA states that the placement location and alignment of operands in memory may affect the relative performance of memory accesses and in some cases affect it significantly The effects memory operand placement has on performance are shown in Table 6 1 The best performance is guaranteed if memory operands are aligned on natural boundaries For the best performance across the widest range of implementations the programmer should assume the performance model described in Chapter 3 Operand Conventions in The Programming Environments Manual The effect of misalignment on memory access latency is the same for big and little endian addressing modes except for multiple and string operations that cause
381. ions to be handled imprecisely Instruction queue A holding place for instructions fetched from the current instruction stream Integer unit A functional unit in the 750 responsible for executing integer instructions In order An aspect of an operation that adheres to a sequential model An operation is said to be performed in order if at the time that it is performed it is known to be required by the sequential execution model See Out of order Instruction latency The total number of clock cycles necessary to execute an instruction and make ready the results of that instruction Interrupt An asynchronous exception On PowerPC processors interrupts are a special case of exceptions See also asynchronous exception Invalid state State of a cache entry that does not currently contain a valid copy of a cache block from memory Key bits A set of key bits referred to as Ks and Kp in each segment register and each BAT register The key bits determine whether supervisor or user programs can access a page within that segment or block Kill An operation that causes a cache block to be invalidated L2 cache See Secondary cache Least significant bit Isb The bit of least value in an address register data element or instruction encoding Least significant byte LSB The byte of least value in an address register data element or instruction encoding IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual
382. ipeline accesses have occurred BR will also be asserted for one cycle during the execution of a dcbz instruction and during the execution of a load instruction which hits in the touch load buffer Negation Occurs for at least one bus clock cycle after an accepted qualified bus grant see BG and ABB even if another transaction is pending It is also negated for at least one bus clock cycle when the assertion of ARTRY is detected on the bus 7 2 1 2 Bus Grant BG Input Following are the state meaning and timing comments for the BG input signal State Meaning Asserted Indicates that the 750 may with proper qualification assume mastership of the address bus A qualified bus grant occurs when BG is asserted and ABB and ARTRY are not asserted the bus cycle following the assertion of AACK The ABB and ARTRY signals are driven by the 750 or other bus masters If the 750 is parked BR need not be asserted for the qualified bus grant See Section 8 3 1 Address Bus Arbitration 7 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Timing Comments Negated Indicates that the 750 is not the next potential address bus master Assertion May occur at any time to indicate the 750 can use the address bus After the 750 assumes bus mastership it does not check for a qualified bus grant again until the cycle during which the address bus tenure completes assuming it has another transaction to run
383. ironments Manual Page Memory Store Access with Protection Violation Otherwise PTE C 0 Page Table Search Operation See Figure 5 9 a PA 0 31 lt RPNI A 20 31 Continue Access to Memory Sub system with WIMG Bits from PTE Figure 5 8 Page Address Translation Flow TLB Hit Chapter 5 Memory Management 5 29 5 4 5 Page Table Search Operation If the translation is not found in the TLBs a TLB miss the 750 initiates a table search operation which is described in this section Formats for the PTE are given in PTE Format for 32 Bit Implementations in Chapter 7 Memory Management of The Programming Environments Manual The following is a summary of the page table search process performed by the 750 1 The 32 bit physical address of the primary PTEG is generated as described in Page Table Addresses in Chapter 7 Memory Management of The Programming Environments Manual 2 The first PTE PTEO in the primary PTEG is read from memory PTE reads occur with an implied WIM memory cache mode control bit setting of 0b001 Therefore they are considered cacheable and read burst from memory and placed in the cache 3 The PTE in the selected PTEG is tested for a match with the virtual page number VPN of the access The VPN is the VSID concatenated with the page index field of the virtual address For a match to occur the following must be true PTE H 0 PTE V 1
384. ironments Manual The integer shift instructions are summarized in Table 2 25 Table 2 25 Integer Shift Instructions o Shift Left Word slw siw Shift Right Algebraic Word Immediate rA rS SH 2 3 4 2 Floating Point Instructions This section describes the floating point instructions which include the following e Floating point arithmetic instructions e Floating point multiply add instructions e Floating point rounding and conversion instructions e Floating point compare instructions e Floating point status and control register instructions e Floating point move instructions Chapter 2 Programming Model 2 41 See Section 2 3 4 3 Load and Store Instructions for information about floating point loads and stores The PowerPC architecture supports a floating point system as defined in the IEEE 754 standard but requires software support to conform with that standard All floating point operations conform to the IEEE 754 standard except if software sets the non IEEE mode FPSCR N J 2 3 4 2 1 Floating Point Arithmetic Instructions The floating point arithmetic instructions are summarized in Table 2 26 Table 2 26 Floating Point Arithmetic Instructions COC Floating Add Double Precision Floating Add Single Floating Subtract Double Precision Floating Subtract Single Floating Multiply Double Precision Floating Multiply Single Floating Divide Double Precision Floating Divide Single Floating Reciprocal Es
385. is negated and there is no outstanding attempt to perform an ARTRY of the associated address tenure Negated Indicates that the 750 must hold off its data tenures Chapter 7 Signal Descriptions 7 15 Timing Comments Assertion May occur any time to indicate the 750 is free to take data bus mastership It is not sampled until TS is asserted Negation May occur at any time to indicate the 750 cannot assume data bus mastership 7 2 6 2 Data Bus Write Only DBWO Input The data bus write only DBWO signal is an input only signal on the 750 Following are the state meaning and timing comments for the DBWO signal State Meaning Asserted Indicates that the 750 may run the data bus tenure for an outstanding write address even if a read address is pipelined before the write address Refer to Section 8 10 Using Data Bus Write Only for detailed instructions for using DBWO Negated Indicates that the 750 must run the data bus tenures in the same order as the address tenures Timing Comments Assertion Must occur no later than a qualified DBG for an outstanding write tenure DBWO is sampled by the 750 on the clock of a qualified DBG If no write requests are pending the 750 will ignore DBWO and assume data bus ownership for the next pending read request Negation May occur any time after a qualified DBG and before the next assertion of DBG 7 2 6 3 Data Bus Busy DBB The data
386. is zero which enables the use of these instructions 2 3 5 4 Optional External Control Instructions The PowerPC architecture defines an optional external control feature that if implemented is supported by the two external control instructions eciwx and ecowx These instructions allow a user level program to communicate with a special purpose device These instructions are provided and are summarized in Table 2 53 Table 2 53 External Control Instructions External rD rA rB A transfer size of 4 bytes is implied the TBST and TSIZ O 2 signals are Control In redefined to specify the Resource ID RID copied from bits EAR 28 31 For Word Indexed these operations TBST carries the EAR 28 data Misaligned operands for these instructions cause an alignment exception Addressing a location External rS rA rB where SR T 1 causes a DSI exception If MSR DR 0 a programming Control Out error occurs and the physical address on the bus is undefined Word Indexed Note These instructions are optional to the PowerPC architecture 2 64 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The eciwx ecowx instructions let a system designer map special devices in an alternative way The MMU translation of the EA is not used to select the special device as it is used in most instructions such as loads and stores Rather it is used as an address operand that is passed to the device over the address bus Four other signals th
387. isor and user programs with a 4 Kbyte page size and 256 Mbyte segment size In addition the MMUs of 32 bit PowerPC processors use an interim virtual address 52 bits and hashed page tables in the generation of 32 bit physical addresses PowerPC processors also have a BAT mechanism for mapping large blocks of memory Block sizes range from 128 Kbyte to 256 Mbyte and are software programmable Basic features of the 750 MMU implementation defined by the OEA are as follows e Support for real addressing mode Effective to physical address translation can be disabled separately for data and instruction accesses e Block address translation Each of the BAT array entries four IBAT entries and four DBAT entries provides a mechanism for translating blocks as large as 256 Mbytes from the 32 bit effective address space into the physical memory space This can be used for translating large address ranges whose mappings do not change frequently e Segmented address translation The 32 bit effective address is extended to a 52 bit virtual address by substituting 24 bits of upper address bits from the segment register for the 4 upper bits of the EA which are used as an index into the segment register file This 52 bit virtual address space is divided into 4 Kbyte pages each of which can be mapped to a physical page The 750 also provides the following features that are not required by the PowerPC architecture e Separate translation lookaside buff
388. ite tenure be performed ahead of a pending read tenure from the same 750 In general an address tenure on the bus is followed strictly in order by its associated data tenure Transactions pipelined by the 750 complete strictly in order However the 750 can run bus transactions out of order only when the external system allows the 750 to perform a cache line snoop push out operation or other write transaction if pending in the 750 write queues between the address and data tenures of a read operation through the use of DBWO This effectively envelopes the write operation within the read operation Figure 8 25 shows how the DBWO signal is used to perform an enveloped write transaction Read Address Write Address Enveloped Write Transaction BG ABB AACK Figure 8 25 Data Bus Write Only Transaction Note that although the 750 can pipeline any write transaction behind the read transaction special care should be used when using the enveloped write feature It is envisioned that most system implementations will not need this capability for these applications DBWO should remain negated In systems where this capability is needed DBWO should be asserted under the following scenario 1 The 750 initiates a read transaction either single beat or burst by completing the read address tenure with no address retry 2 Then the 750 initiates a write transaction by completing the write address tenure wit
389. itten to the cache block and the tag is marked M For cache misses with the replacement block marked E the zero line fill is performed and the cache block is marked M However if the replacement block is marked M the contents are written back to memory first The instruction executes regardless of whether the cache is locked if the cache is disabled an alignment exception occurs If M 1 coherency enforced the address is broadcast to the bus before the zero line fill The exception priorities from highest to lowest are as follows 1 Cache disabled Alignment exception 2 Page marked write through or cache Inhibited Alignment exception 3 BAT protection violation DSI exception 4 TLB protection violation DSI exception dcbz is the only cache instruction that broadcasts even if HIDO ABE 0 Chapter 2 Programming Model 2 63 Table 2 52 User Level Cache Instructions Continued Data Cache Block rA rB The EA is computed translated and checked for protection violations Store For cache hits with the tag marked E no further action is taken For cache hits with the tag marked M the cache block is written back to memory and marked E A debst is not broadcast unless HIDO ABE 1 regardless of WIMG settings The instruction acts like a load with respect to address translation and memory protection It executes regardless of whether the cache is disabled or locked The exception priorities from highest to lowest for dcbst ar
390. ken write TLB invalidate 11000 No action is taken External control word 11100 No action is taken read E CCU EN ETT or Write with flush A write with flush operation is a single beat or burst transaction initiated when a caching inhibited or write through store instruction is executed e If the addressed cache block is in the exclusive E state the cache block is placed in the invalid I state e If the addressed cache block is in the modified M state the 750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid I state e If the address misses in the cache no action is taken Any reservation associated with the address is canceled Write with kill A write with kill operation is a burst transaction initiated due to a castout caching allowed push or snoop copy back e If the address hits in the cache the cache block is placed in the invalid I state killing modified data that may have been in the block If the address misses in the cache no action is taken Any reservation associated with the address is canceled Chapter 3 Instruction and Data Cache Operation 3 27 Table 3 5 Response to Snooped Bus Transactions Continued Snooped Transaction TT 0 4 750 Response Read 01010 A read operation is used by most single beat and burst load transactions on the bus For single beat caching inhibited read transaction If the addressed cach
391. ld have been asserted in clock cycle 6 e In the third access DRTRY is asserted in clock cycle 11 to flush the previous data Note that all bidirectional signals are three stated between bus tenures The pipelining shown in Figure 8 18 can occur if the second access is not another load for example an instruction fetch 11 2 3 4 5 6 7 8 9 10 11 12 13 14 BR ea See a Se i EE AJO 31 l l l l l l TT O 4 l l l l l l 1i 2 yp LAT 7 6 7 7 7 8 7 910 111121118111 Figure 8 18 Single Beat Reads Showing Data Delay Controls Chapter 8 Bus Interface Operation 8 35 Figure 8 19 shows data delay controls in a single beat write operation Note that all bidirectional signals are three stated between bus tenures Data transfers are delayed in the following ways e The TA signal is held negated to insert wait states in clocks 3 and 4 e In clock 6 DBG is held negated delaying the start of the data tenure The last access is not delayed DRTRY is valid only for read operations l1 2 3 4 5 6 7 8 9 10 11 12 Pg nn eee e Ep y E GE a a a ie 8 THH ae See E EE A O 31 Mie par ar on TT 0 4 AAA a a EM eee E oe MU Ut I cate I hl GBL D 0 i SE e eer gl SE AM la al jad A A O DBG AAA TS SSS A TAAL Misa eee D 0 63 B ee TA A al PA ESTA PESA pers pea PEE eran a E HE GZ SAT E TES ESA O A j 11 12 Figure 8 19 Single B
392. le 5 3 Table 5 4 Table 5 5 Table 5 6 Table 5 7 Table 5 8 Table 6 1 Table 6 2 Table 6 3 Table 6 4 Table 6 5 Table 6 6 Table 6 7 Table 6 8 Table 7 1 Table 7 2 Table 7 3 Table 7 4 Table 7 5 Table 7 6 Table 8 1 Table 8 2 Table 8 3 Table 8 4 Table 8 5 Table 8 6 Table 8 7 Table 9 8 Table 10 1 Table 10 2 Table 10 3 Table 10 4 Table 10 5 Table 11 1 Table 11 2 Table 11 3 Table 11 4 Tables Tables Page ZER Number Performance Monitor Interrupt Exception Register Settings eee 4 23 Instruction Address Breakpoint Exception Register Settings 0 0 4 24 System Management Interrupt Exception Register Settings eee 4 25 Thermal Management Interrupt Exception Register Settings 0 0 eee 4 26 MMU Feature Summary tege eegen eege EE 5 3 Access Protection Options for Pages egene geed a 5 11 Translation Exception Conditions O Saas ee EE 5 17 Other MMU Exception Conditions for the PowerPC 750 Processor 5 18 PowerPC 750 Microprocessor Instruction Summary Control MMUs 5 19 PowerPC 750 Microprocessor MMU Registers ooooococcccnoncccnoncncnoncnonancncnnnaninnnos 5 20 Table Search Operations to Update History Bits TLB Hit Case 5 22 Model for Guaranteed R and C Bit Settings oocoocccnoncccnoncncnonnnononanonancnonnnanonnnos 5 24 Performance Effects of Memory Operand Placement 6 26 TEB Miss EE 6 28 Branch Instructions usina dida iii idad 6 31 System Register Instr
393. le to take advantage of future technological gains This section describes the PowerPC architecture in general and specific details about the implementation of the 750 as a low power 32 bit member of the PowerPC processor family The structure of this section follows the organization of the user s manual each subsection provides an overview of each chapter e Registers and programming model Section 1 4 PowerPC Registers and Programming Model describes the registers for the operating environment architecture common among PowerPC processors and describes the programming model It also describes the registers that are unique to the 750 The information in this section is described more fully in Chapter 2 Programming Model e Instruction set and addressing modes Section 1 5 Instruction Set describes the PowerPC instruction set and addressing modes for the PowerPC operating environment architecture and defines and describes the PowerPC instructions implemented in the 750 The information in this section is described more fully in Chapter 2 Programming Model e Cache implementation Section 1 6 On Chip Cache Implementation describes the cache model that is defined generally for PowerPC processors by the virtual environment architecture It also provides specific details about the 750 cache implementation The information in this section is described more fully in Chapter 3 Instruction and Data Cache Op
394. lected to UPMC1 UPMC4 which can be read by user level software The UPMC registers can be read with mfspr using the following SPR numbers e UPMCI1 is SPR 937 e UPMC2 is SPR 938 e UPMC3 is SPR 941 UPMC4 is SPR 942 2 1 2 4 7 Sampled Instruction Address Register SIA The sampled instruction address register SIA is a supervisor level register that contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition The SIA is shown in Figure 2 8 Instruction Address 0 31 Figure 2 8 Sampled Instruction Address Registers SIA If the performance monitor interrupt is triggered by a threshold event the SIA contains the exact instruction called the sampled instruction that caused the counter to overflow If the performance monitor interrupt was caused by something besides a threshold event the SIA contains the address of the last instruction completed during that cycle SIA can be accessed with the mtspr and mfspr instructions using SPR 955 2 1 2 4 8 User Sampled Instruction Address Register USIA The contents of SIA are reflected to USIA which can be read by user level software USIA can be accessed with the mfspr instructions using SPR 939 2 1 2 4 9 Sampled Data Address Register SDA and User Sampled Data Address Register USDA The 750 does not implement the sampled data address register SDA or the user level read only USDA regis
395. ll TLB entries indexed by the EA and operates on both the instruction and data TLBs simultaneously invalidating four TLB entries The index corresponds to bits 14 19 of the EA In addition depending on the setting of HIDxx execution of this instruction causes all entries in the congruence class corresponding to the EA to be invalidated in the other processors attached to the same bus Software must ensure that instruction fetches or memory references to the virtual pages specified by the tlbie instruction have been completed prior to executing the tlbie instruction tlbsync TLB Synchronize Synchronizes the execution of all other tlbie instructions in the system In the 750 when the TLBISYNC signal is negated instruction execution may continue or resume after the completion of a tlbsync instruction When the TLBISYNC signal is asserted instruction execution stops after the completion of a tlbsync instruction These instructions are defined by the PowerPC architecture but are optional Table 5 6 summarizes the registers that the operating system uses to program the 750 MMUs These registers are accessible to supervisor level software only These registers are described in Chapter 2 Programming Model Chapter 5 Memory Management 5 19 Table 5 6 PowerPC 750 Microprocessor MMU Registers Segment registers The sixteen 32 bit segment registers are present only in 32 bit implementations of SRO SR15 the PowerPC architecture The
396. lock Cache High speed memory containing recently accessed data and or instructions subset of main memory IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Cache block A small region of contiguous memory that is copied from memory into a cache The size of a cache block may vary among processors the maximum block size is one page In PowerPC processors cache coherency is maintained on a cache block basis Note that the term cache block is often used interchangeably with cache line Cache coherency An attribute wherein an accurate and common view of memory is provided to all devices that share the same memory system Caches are coherent if a processor performing a read from its cache is supplied with data corresponding to the most recent value written to memory or to another processor s cache Cache flush An operation that removes from a cache any data from a specified address range This operation ensures that any modified data within the specified address range is written back to main memory This operation is generated typically by a Data Cache Block Flush debf instruction Caching inhibited A memory update policy in which the cache is bypassed and the load or store is performed to or from main memory Cast outs Cache blocks that must be written to memory when a cache miss causes a cache block to be replaced Changed bit One of two page history bits found in each page table entry PTE T
397. ls HID1 can be accessed with mtspr and mfspr using SPR 1009 Chapter 2 Programming Model 2 13 2 1 2 4 Performance Monitor Registers This section describes the registers used by the performance monitor which is described in Chapter 11 Performance Monitor 2 1 2 4 1 Monitor Mode Control Register 0 MMCRO The monitor mode control register 0 MMCRO shown in Figure 2 5 is a 32 bit SPR provided to specify events to be counted and recorded The MMCRO can be accessed only in supervisor mode User level software can read the contents of MMCRO by issuing an mfspr instruction to UMMCRO described in Section 2 1 2 4 2 User Monitor Mode Control Register 0 UMMCRO INTONBITTRANS RTCSELECT DISCOUNT PMC2INTCONTROL ENINT PMC1INTCONTROL PMCTRIGGER elem A eer 15 16 17 18 19 25 26 Figure 2 5 Monitor Mode Control Register 0 MMCRO This register must be cleared at power up Reading this register does not change its contents The bits of the MMCRO register are described in Table 2 7 Table 2 7 MMCRO Bit Settings MA E A Disables counting unconditionally O The values of the PMCn counters can be changed by hardware 1 The values of the PMCn counters cannot be changed by hardware Disables counting while in supervisor mode O The PMCn counters can be changed by hardware 1 If the processor is in supervisor mode MSR PR is cleared the counters are not changed by hardware Disables counting while in user
398. m 0 to 63 The intent of the THRESHOLD support is to characterize L1 data cache misses PMC1INTCONTROL Enables interrupt signaling due to PMC1 counter overflow O Disable PMC1 interrupt signaling due to PMC1 counter overflow 1 Enable PMC1 Interrupt signaling due to PMC1 counter overflow _ PMCINTCONTROL Enable interrupt signaling due to any PMC2 PMC4 counter overflow Overrides the setting of DISCOUNT O Disable PMC2 PMC4 interrupt signaling due to PMC2 PMC4 counter overflow 1 Enable PMC2 PMC4 interrupt signaling due to PMC2 PMC4 counter overflow 1 PMCTRIGGER Can be used to trigger counting of PMC2 PMC4 after PMC1 has overflowed or after a performance monitor interrupt is signaled O Enable PMC2 PMC4 counting 1 Disable PMC2 PMC4 counting until either PMC1 0 1 or a performance monitor interrupt is signaled 19 25 PMC1SELECT PMC1 input selector 128 events selectable 25 defined See Table 11 5 26 31 PMC2SELECT PMC2 input selector 64 events selectable 21 defined See Table 11 6 MMCRO can be accessed with the mtspr and mfspr instructions using SPR 952 11 2 1 2 User Monitor Mode Control Register 0 UMMCRO The contents of MMCRO are reflected to UMMCRO which can be read by user level software UMMCRO can be accessed with the mfspr instructions using SPR 936 11 2 1 3 Monitor Mode Control Register 1 MMCR1 The monitor mode control register 1 MMCR1 functions as an event selector for performance mo
399. machine check occurs the processor enters the checkstop state Checkstop state is described in 4 5 2 2 4 5 2 1 Machine Check Exception Enabled MSR ME 1 Machine check exceptions are enabled when MSR ME 1 When a machine check exception is taken registers are updated as shown in Table 4 11 Table 4 11 Machine Check Exception Register Settings Setting Description SRRO On a best effort basis the 750 can set this to an EA of some instruction that was executing or about to be executing when the machine check condition occurred SRR1 0 10 Cleared Set when an L2 data cache parity error is detected otherwise zero Set when MCP signal is asserted otherwise zero Set when TEA signal is asserted otherwise zero Set when a data bus parity error is detected otherwise zero Set when an address bus parity error is detected otherwise zero 16 31 MSR 16 31 MSR Set to value of ILE Note that to handle another machine check exception the exception handler should set MSR ME as soon as it is practical after a machine check exception is taken Otherwise subsequent machine check excep tions cause the processor to enter the checkstop state The machine check exception is usually unrecoverable in the sense that execution cannot resume in the context that existed before the exception If the condition that caused the 4 18 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual machine check does not otherwise prevent continue
400. manage system activity in a way that prevents exceeding system and junction temperature thresholds This is particularly useful in high performance portable systems which cannot use the same cooling mechanisms such as fans that control overheating in desktop systems The information in this section is described more fully in Chapter 10 Power and Thermal Management e Performance monitor Section 1 12 Performance Monitor describes the performance monitor facility which system designers can use to help bring up debug and optimize software performance The information in this section is described more fully in Chapter 11 Performance Monitor The following sections summarize the features of the 750 distinguishing those that are defined by the architecture from those that are unique to the 750 implementation 1 20 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The PowerPC architecture consists of the following layers and adherence to the PowerPC architecture can be described in terms of which of the following levels of the architecture is implemented e PowerPC user instruction set architecture UISA Defines the base user level instruction set user level registers data types floating point exception model memory models for a uniprocessor environment and programming model for a uniprocessor environment e PowerPC virtual environment architecture VEA Describes the memory model for a multip
401. marked modified Designers should note that during burst transfers into and out of the L2 cache SRAM array an address is generated by the 750 for each data beat If the L2 cache is configured as write through the L2 sector is marked unmodified and the write is forwarded to the 60x bus If the L1 castout requires a new L2 tag entry to be allocated and the current tag is marked modified any modified sectors of the tag to be replaced are cast out of the L2 cache to the 60x bus Single beat read requests from the L1 caches that miss in the L2 cache do not cause any state changes in the L2 cache and are forwarded on the 60x bus interface Cacheable single beat store requests marked copy back that hit in the L2 are allowed to update the L2 cache sector but do not cause L2 cache sector allocation or deallocation Cacheable single beat store requests that miss in the L2 are forwarded to the 60x bus Single beat store requests marked write through through address translation or through the configuration of L2CR L2WT are written to the L2 cache if they hit and are written to the 60x bus independent of the L2 hit miss status If the store hits in the L2 cache the modified unmodified status of the tag remains unchanged All requests to the L2 cache that are marked cache inhibited by address translation through either the MMU or by default WIMG configuration bypass the L2 cache and do not cause any L2 cache tag state change Chapter 9 L2 Cache Interfa
402. ment Implementation 1 33 V Paragraph Number 1 9 1 10 1 11 1 12 2 1 2 1 1 DAD 2 1 2 1 222 2 1 2 3 2 1 2 4 2 1 2 4 1 2 1 2 4 2 2 1 2 4 3 2 1 2 4 4 2 1 2 4 5 2 1 2 4 6 2 1 2 4 7 2 1 2 4 8 2 1 2 4 9 21 3 2 1 4 2 1 5 2 2 221 22 2 23 2 2 4 2 3 23 1 2 3 1 1 2 3 1 2 2 3 1 3 2 3 1 4 2 32 2 3 2 1 Dad 23 233 vi Contents Page HES Number ee ee br 1 34 Power Management ui rain 1 36 eng TT 1 37 Performance MONO a 1 38 Chapter 2 Programming Model The PowerPC 750 Processor Register e aan 2 1 IO 2 1 PowerPC 750 Specific Regist iii dida sahaedesnsotads nata 2 8 Instruction Address Breakpoint Register IABR A 2 8 Hardware Implementation Dependent Register OU 2 9 Hardware Implementation Dependent Register 1 2 13 Performance Monitor Resist 2 14 Monitor Mode Control Register 0 OMMCRO 2 14 User Monitor Mode Control Register 0 UMMCRO cc eeeeeeeeeeeeeeee 2 15 Monitor Mode Control Register 1 OMMCRTI 2 16 User Monitor Mode Control Register 1 UMMCR1 An 2 16 Performance Monitor Counter Registers PMC1 PMC4 eee 2 16 User Performance Monitor Counter Registers UPMC1 UPMCA 2 20 Sampled Instruction Address Register GlA 2 20 User Sampled Instruction Address Register USIA 2 20 Sampled Data Address Register SDA and User Sampled Data Address Register USDA 2 20 Instruction Cache Throttling Control Register OCT 2 21 Thermal Management Registers CTHRMI THRM N 2
403. mentation Note In the 750 the decrementer register is decremented at a speed that is one fourth the speed of the bus clock Data address breakpoint register DABR This optional register is used to cause a breakpoint exception if a specified data address is encountered See Data Address Breakpoint Register DABR in Chapter 2 PowerPC Register Set of The Programming Environments Manual External access register EAR This optional register is used in conjunction with eciwx and ecowx Note that the EAR register and the eciwx and ecowx instructions are optional in the PowerPC architecture and may not be supported in all PowerPC processors that implement the OEA See External Access Register EAR in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information e 750 specific registers The PowerPC architecture allows implementation specific SPRs Those incorporated in the 750 are described as follows Note that in the 750 these registers are all supervisor level registers Instruction address breakpoint register ABR This register can be used to cause a breakpoint exception if a specified instruction address is encountered Hardware implementation dependent register 0 HIDO T his register controls various functions such as enabling checkstop conditions and locking enabling and invalidating the instruction and data caches Hardware implementation
404. ming Considerations The 750 is a superscalar processor as many as three instructions can be issued to the execution units one branch instruction to the branch processing unit and two instructions issued from the dispatch queue to the other execution units during each clock cycle Only one instruction can be dispatched to each execution unit Although instructions appear to the programmer to execute in program order the 750 improves performance by executing multiple instructions at a time using hardware to manage dependencies When an instruction is dispatched the register file provides the Chapter 6 Instruction Timing 6 7 source data to the execution unit The register files and rename register have sufficient bandwidth to allow dispatch of two instructions per clock under most conditions The 750 s BPU decodes and executes branches immediately after they are fetched When a conditional branch cannot be resolved due to a CR data dependency the branch direction is predicted and execution continues from the predicted path If the prediction is incorrect the following steps are taken 1 The instruction queue is purged and fetching continues from the correct path 2 Any instructions ahead of the predicted branch in the completion queue are allowed to complete 3 Instructions after the mispredicted branch are purged 4 Dispatching resumes from the correct path After an execution unit finishes executing an instruction it
405. ming Environments Manual 2 68 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 3 Instruction and Data Cache Operation The PowerPC 750 microprocessor contains separate 32 Kbyte eight way set associative instruction and data caches to allow the execution units and registers rapid access to instructions and data This chapter describes the organization of the on chip instruction and data caches the MEI cache coherency protocol cache control instructions various cache operations and the interaction between the caches the load store unit LSU the instruction unit and the bus interface unit BIU Note that in this chapter the term multiprocessor is used in the context of maintaining cache coherency These multiprocessor devices could be actual processors or other devices that can access system memory maintain their own caches and function as bus masters requiring cache coherency The 750 cache implementation has the following characteristics e There are two separate 32 Kbyte instruction and data caches Harvard architecture e Both instruction and data caches are eight way set associative e The caches implement a pseudo least recently used PLRU replacement algorithm within each set e The cache directories are physically addressed The physical real address tag is stored in the cache directory e Both the instruction and data caches have 32 byte cache blocks A cache block is the block of memory t
406. mments for SMI State Meaning Asserted The 750 initiates a system management interrupt operation if the MSR EE is set otherwise the 750 ignores the exception condition The system must hold SMI active until the exception is taken Negated Indicates that normal operation should proceed See Section 8 7 1 External Interrupts Timing Comments Assertion May occur at any time and may be asserted asynchronously to the input clocks The SMI input is level sensitive Negation Should not occur until interrupt is taken 7 2 9 3 Machine Check Interrupt MCP Input Following are the state meaning and timing comments for the MCP signal State Meaning Asserted The 750 initiates a machine check interrupt operation if MSR ME and HIDO EMCP are set if MSR ME is cleared and HIDO EMCP is set the 750 must terminate operation by internally gating off all clocks and releasing all outputs except CKSTP_OUT to the high impedance state If HIDO EMCP is cleared the 750 Chapter 7 Signal Descriptions 7 21 ignores the interrupt condition The MCP signal must be held asserted for two bus clock cycles Negated Indicates that normal operation should proceed See Section 8 7 1 External Interrupts Timing Comments Assertion May occur at any time and may be asserted asynchronously to the input clocks The MCP input is negative edge sensitive Negation May be negated two bus cycles after assertion 7 2 9 4 Ch
407. mplementation complies with the PowerPC architecture definition The term 750 is used herein to refer to both the 740 and 750 processors Differences between the two processors are indicated where appropriate 1 1 PowerPC 750 Microprocessor Overview This section describes the features and general operation of the 750 and provides a block diagram showing major functional units The 750 is an implementation of the PowerPC microprocessor family of reduced instruction set computer RISC microprocessors The 750 implements the 32 bit portion of the PowerPC architecture which provides 32 bit effective addresses integer data types of 8 16 and 32 bits and floating point data types of 32 and 64 bits The 750 is a superscalar processor that can complete two instructions simultaneously It incorporates the following six execution units e Floating point unit FPU e Branch processing unit BPU e System register unit SRU e lLoad store unit LSU e Two integer units Us IU1 executes all integer instructions U2 executes all integer instructions except multiply and divide instructions The ability to execute several instructions in parallel and the use of simple instructions with rapid execution times yield high efficiency and throughput for 750 based systems Most integer instructions execute in one clock cycle The FPU is pipelined the tasks it performs are broken into subtasks then implemented as three successive stages Typically a flo
408. mtfsb1x 63 38 Re mtfsfx 63 711 Rc mtfstix 63 00 00000 IMM al 1 Re mtmsr 98 31 146 0 mtmsrd 19 31 178 0 mtspr 5 31 467 0 mtsr 9 31 210 0 Appendix A PowerPC Instruction Set Listings A 5 Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 mtsrd 36 31 s 0 SR 00000 82 0 mtsrdin 96 31 S 00000 B 114 0 mtsrin 9 31 S 00000 B 242 0 mulhwx 31 D A B 0 75 Re mulhwux 31 D A B 0 11 Re mulldx 31 D A B OE 233 Re mulli 7 D A SIMM negx 31 D A norx 31 S A orx 31 S A orcx 31 S A ori 24 S A oris 25 S A rfi 36 19 rfid 13 19 rldclx 30 riderx 30 ridicx 30 ridiclx 30 ridicrx 30 ridimix 30 rlwimix 20 rlwinmx 21 rlwnmx 23 sc 17 slbia 1 23 31 slbie 1 23 31 sldx 31 slwx 31 A 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual o Name 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 srwx 31 S A B 536 Rc stb 38 S A d stbu 39 S A d stbux 31 S A B 247 0 stdex 31 S A B 214 1 stdu 62 s A ds 1 stdux 31 S A B 181 0 stdx 31 S A B 149 0 stfd 54 S A d stfdu 55 S A d stfdux 31 S A B 759 0 stfdx 31 S A B 727 0 stfiwx 31 S A B 983 0 stfs 52 S A d stfsu 53 S A d stfsx 31 S A B 663 0 sth 44 S A d sthbrx 31 S A B 918 0 sthu 45 S A d stmw 4 47 S A d stswi 31 S A NB 725 0 stswx 31 S A B 6
409. n BATs IBATs BATs are used to define and configure blocks of memory The count register CTR is decremented and tested by branch and count instructions DABR Supervisor The optional data address breakpoint register DABR supports the data address breakpoint facility DAR User The data address register DAR holds the address of an access after an alignment or DSI exception DEC Supervisor The decrementer register DEC is a 32 bit decrementing counter that provides a way to schedule decrementer exceptions DSISR The DSISR defines the cause of data access and alignment exceptions EAR Supervisor The external access register EAR controls access to the external access facility through the External Control In Word Indexed eciwx and External Control Out Word Indexed ecowx instructions Ee The processor version register PVR is a read only register that identifies the processor SDR1 SDR1 specifies the page table format used in virtual to physical page address translation SRRO Supervisor The machine status save restore register O SRRO saves the address used for restarting an interrupted program when a Return from Interrupt rfi instruction executes SRR1 Supervisor The machine status save restore register 1 SRR1 is used to save machine status on exceptions and to restore machine status when an rfi instruction is executed SPRGO SPRGO SPRG3 are provided for operating system use SPRG3 User read The time base regi
410. n access the registers shown in Figure 1 5 depending on the program s access privilege supervisor or user determined by the privilege level PR bit in the MSR GPRs and FPRs are accessed through operands that are part of the instructions Access to registers can be explicit that is through the use of specific instructions for that purpose such as Move to Special Purpose Register mtspr and Move from Special Purpose Register mfspr instructions or implicit as the part of the execution of an instruction Some registers can be accessed both explicitly and implicitly In the 750 all SPRs are 32 bits wide Table 1 2 describes the architecture defined SPRs implemented by the 750 The Programming Environments Manual describes these registers in detail including bit descriptions Section 2 1 1 Register Set describes how these registers are implemented in the 750 In particular this section describes which features the PowerPC architecture defines as optional are implemented on the 750 1 24 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 1 2 Architecture Defined SPRs Implemented LR User The link register LR can be used to provide the branch target address and to hold the return address after branch and link instructions BATs Supervisor The architecture defines 16 block address translation registers BATs which operate in pairs There are four pairs of data BATs DBATs and four pairs of instructio
411. n an exception was signaled the address of the last completed instruction during that cycle is saved in the SIA The SIA is not updated if no instruction completed the cycle in which the exception was taken Exception handling for the performance monitor interrupt exception is described in Section 4 5 13 Performance Monitor Interrupt OxOOFO0 11 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 11 2 Special Purpose Registers Used by Performance Monitor The performance monitor incorporates the SPRs listed in Table 11 1 All of these supervisor level registers are accessed through mtspr and mfspr instructions The following table shows more information about all performance monitor SPRs Table 11 1 Performance Monitor SPRs ee 11 2 1 Performance Monitor Registers This section describes the registers used by the performance monitor 11 2 1 1 Monitor Mode Control Register 0 MMCRO The monitor mode control register 0 MMCRO shown in Figure 11 1 is a 32 bit SPR provided to specify events to be counted and recorded MMCRO can be written to only in supervisor mode User level software can read the contents of MMCRO by issuing an mfspr instruction to UMMCRO described in Section 11 2 1 2 User Monitor Mode Control Register 0 UMMCRO Chapter 11 Performance Monitor 11 3 INTONBITTRANS RTCSELECT DISCOUNT ENINT for oufousfoun 3 1 2 456 7 8 PMC2INTCONTROL PMC1INTCONTROL tess
412. n error handler a program exception See Program Exception 0x00700 in Chapter 6 Exceptions in The Programming Environments Manual for information about illegal and invalid instruction exceptions The PowerPC architecture defines four types of reserved instructions e Instructions in the POWER architecture not part of the PowerPC UISA For details on POWER architecture incompatibilities and how they are handled by PowerPC processors see Appendix B POWER Architecture Cross Reference in The Programming Environments Manual e Implementation specific instructions required for the processor to conform to the PowerPC architecture none of these are implemented in the 750 e All other implementation specific instructions 2 34 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual e Architecturally allowed extended opcodes 2 3 2 Addressing Modes This section provides an overview of conventions for addressing memory and for calculating effective addresses as defined by the PowerPC architecture for 32 bit implementations For more detailed information see Conventions in Chapter 4 Addressing Modes and Instruction Set Summary of The Programming Environments Manual 2 3 2 1 Memory Addressing A program references memory using the effective logical address computed by the processor when it executes a memory access or branch instruction or when it fetches the next sequential instruction Bytes
413. n for all floating point instructions See Floating Point Registers FPRs in Chapter 2 PowerPC Register Set of The Programming Environments Manual Condition register CR The 32 bit CR consists of eight 4 bit fields CRO CR7 that reflect results of certain arithmetic operations and provide a mechanism for testing and branching See Condition Register CR in Chapter 2 PowerPC Register Set of The Programming Environments Manual Floating point status and control register FPSCR The FPSCR contains all floating point exception signal bits exception summary bits exception enable bits and rounding control bits needed for compliance with the IEEE 754 standard See Floating Point Status and Control Register FPSCR in Chapter 2 PowerPC Register Set of The Programming Environments Manual The remaining user level registers are SPRs Note that the PowerPC architecture provides a separate mechanism for accessing SPRs the mtspr and mfspr instructions These instructions are commonly used to explicitly access certain registers while other SPRs may be more typically accessed as the side effect of executing other instructions Integer exception register XER The XER indicates overflow and carries for integer operations See XER Register XER in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information Implementation Note To allow emulation
414. n in Figure 2 12 is a supervisor level implementation specific SPR used to configure and operate the L2 cache It is cleared by a hard reset or power on reset L2WT L2DF L2DRO Reserved L2PE L2DR L2CTL L2TS L2SL L2BYP L2CS L2IP lies 191 11 1011110901120 T 10 11 12 13 14 15 16 17 18 19 21 22 23 24 30 31 Figure 2 12 L2 Cache Control Register L2CR The L2 cache interface is described in Chapter 9 L2 Cache Interface Operation The L2CR bits are described in Table 2 18 2 24 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 18 L2CR Bit Settings L2 enable Enables L2 cache operation including snooping starting with the next transaction the L2 cache unit receives Before enabling the L2 cache the L2 clock must be configured through L2CR 2CLK and the L2 DLL must stabilize see the hardware specifications All other L2CR bits must be set appropriately The L2 cache may need to be invalidated globally L2 data parity generation and checking enable Enables parity generation and checking for the L2 data RAM interface When disabled generated parity is always zeros 0 Prevents L2 data parity checking 1 Allows data parity error on the L2 bus to cause a checkstop if msr ME 0 or a machine check interrupt if mas ME 1 L2 size Should be set according to the size of the L2 data RAMs used A 256 Kbyte L2 cache requires a data RAM configuration of 32 Kbytes x 64 bits a 512 Kbyte L2 c
415. n is enabled instruction fetch addresses are compared with an effective address stored in the IABR If the word specified in the IABR is fetched the instruction breakpoint handler is invoked The instruction that triggers the breakpoint does not execute before the handler is invoked For more information see Section 4 5 14 Instruction Address Breakpoint Exception 0x01300 The IABR can be accessed with mtspr and mfspr using the SPR1010 2 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 0 29 30 31 Figure 2 2 Instruction Address Breakpoint Register The IABR bits are described in Table 2 3 Table 2 3 Instruction Address Breakpoint Register Bit Settings C Em 029 Adress 0 29 Address Word address to be compared so BE Breakpoint enabled Setting this bit indicates that breakpoint checking is to be done Gr Te Translation enabled An IABR match is signaled if this bit matches MSR IR 2 1 2 2 Hardware Implementation Dependent Register 0 The hardware implementation dependent register 0 HIDO controls the state of several functions within the 750 The HIDO register is shown in Figure 2 3 DLOCK Reserved EMCP BCLK ECLK DOZE SLEEP ILOCK NOOPTI ail Acicka asallissosoa Su E 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Figure 2 3 Hardware Implementation Dependent Register 0 HIDO The HIDO bits are described in Table 2 4 Table 2 4 HIDO Bit Functions
416. n refers to the THRM threshold SPR THRM1 or THRM2 selected to contain the active threshold value After setting the desired operational parameters the TAU is enabled by setting the THRM3 E bit to 1 and placing a value allowing a sample interval of 20 microseconds or greater in the THRM3 SITV field The THRM3 SITV setting determines the number of processor clock cycles between input to the DAC and sampling of the comparator output accordingly the use of a value smaller than recommended in the THRM3 SITV field can cause inaccuracies in the sensed temperature If the junction temperature does not cross the programmed threshold the THRMn TIN bit is cleared to 0 to indicate that no interrupt is required and the THRMn TIV bit is set to 1 to indicate that the TIN bit state is valid If the threshold value has been crossed the THRMn TIN and THRMn TIV bits are set to 1 and a thermal management interrupt is generated if both the THRMn TIE and MSR EE bits are set to 1 A thermal management interrupt is held asserted internally until recognized by the 750 s interrupt unit Once a thermal management interrupt is recognized further temperature sampling is suspended and the THRMn TIN and THRMn TIV values are held until an mtspr instruction is executed to THRMn The execution of an mtspr instruction to THRMn anytime during TAU operation will clear the THRMn TIV bit to O and restart the temperature comparison Executing an mtspr instructio
417. n serialization also referred to as post dispatch or tail serialization Completion serialized instructions inhibit dispatching of subsequent instructions until the serialized instruction completes Completion serialization is used for instructions that bypass the normal rename mechanism e Refetch serialization flush serialization Refetch serialized instructions inhibit dispatch of subsequent instructions and force refetching of subsequent instructions after completion Chapter 6 Instruction Timing 6 17 6 4 Execution Unit Timings The following sections describe instruction timing considerations within each of the respective execution units in the 750 6 4 1 Branch Processing Unit Execution Timing Flow control operations conditional branches unconditional branches and traps are typically expensive to execute in most machines because they disrupt normal flow in the instruction stream When a change in program flow occurs the IQ must be reloaded with the target instruction stream Previously issued instructions will continue to execute while the new instruction stream makes its way into the IQ but depending on whether the target instruction is in the BTIC instruction cache L2 cache or in system memory some opportunities may be missed to execute instructions as the example in Section 6 3 2 3 Cache Miss shows Performance features such as the branch folding removal of fall through branch instructions BTIC dynamic b
418. n to THRM3 will clear both THRM1 TIV and THRM2 TIV bits to O and restart temperature comparison in THRMn if the THRM3 E bit is set to 1 Examples of valid THRM1 and THRM2 bit settings are shown in Table 10 4 10 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 10 4 Valid THRM1 and THRM2 Bit Settings The threshold in the SPR will not be used for comparison Threshold is used for comparison thermal management interrupt assertion is disabled Set TIN and do not assert thermal management interrupt if the junction temperature exceeds the threshold Set TIN and assert thermal management interrupt if the junction temperature exceeds the threshold 4 Set TIN and do not assert thermal management interrupt if the junction temperature is less than the threshold The state of the TIN bit is not valid The junction temperature is less than the threshold and as a result the thermal management interrupt is not generated for TIE 1 x The junction temperature is greater than the threshold and as a result the thermal management interrupt is generated if TIE 1 x The junction temperature is greater than the threshold and as a result the thermal management interrupt is not generated for TIE 1 D The junction temperature is less than the threshold and as a result the thermal management interrupt is generated if TIE 1 EA Na AAA A Area x x 1 1 1 Set TIN and assert thermal management interrupt if the junct
419. nal is an input output signal on the 750 7 2 4 6 1 Global GBL Output Following are the state meaning and timing comments for the GBL output signal State Meaning Asserted Indicates that a transaction is global reflecting the setting of the M bit for the block or page that contains the address of the current transaction except in the case of copy back operations and instruction fetches which are nonglobal Negated Indicates that a transaction is not global Timing Comments Assertion Negation The same as A 0 31 High Impedance The same as A 0 31 7 2 4 6 2 Global GBL Input Following are the state meaning and timing comments for the GBL input signal State Meaning Asserted Indicates that a transaction must be snooped by the 750 Negated Indicates that a transaction is not snooped by the 750 Timing Comments Assertion Negation The same as A 0 31 7 2 5 Address Transfer Termination Signals The address transfer termination signals are used to indicate either that the address phase of the transaction has completed successfully or must be repeated and when it should be terminated For detailed information about how these signals interact see Section 8 3 3 Address Transfer Termination Chapter 7 Signal Descriptions 7 13 7 2 5 1 Address Acknowledge AACK Input The address acknowledge AACK signal is an input only signal on the 750 Following are the state meaning and timing comments for the AACK
420. nchronous interrupt a system management interrupt a decrementer exception a hard or soft reset or machine check brings the 750 into the full power state The 750 in doze mode maintains the PLL in a fully powered state and locked to the system external clock input SYSCLK so a transition to the full power state takes only a few processor clock cycles Nap tThe nap mode further reduces power consumption by disabling bus snooping leaving only the time base register and the PLL in a powered state The 750 returns to the full power state upon receipt of an external asynchronous interrupt a system management interrupt a decrementer exception a hard or soft reset or a machine check input MCP A return to full power state from a nap state takes only a few processor clock cycles When the processor is in nap mode if QACK is negated the processor is put in doze mode to support snooping Sleep Sleep mode minimizes power consumption by disabling all internal functional units after which external system logic may disable the PLL and SYSCLK Returning the 750 to the full power state requires the enabling of the PLL and SYSCLK followed by the assertion of an external asynchronous interrupt a system management interrupt a hard or soft reset or a machine check input MCP signal after the time required to relock the PLL Chapter 10 Power and Thermal Management provides information about power saving and thermal management modes fo
421. ncnnnnnnos 8 21 Zeeland 8 21 Data Bus Kaa a AEE A IE A T E ech 8 23 Data Bus Artbitratioti sinnis nas e i AE de 8 23 Usine the DBB EE 8 24 Data Bus Write Only iii EN 8 25 RECKEN 8 25 Data Transfer Termina E 8 26 Normal Single Beat Termination eege geed 8 26 Data Transfer Termination Due to a Bus Error 8 30 Memory Coherency MEI Protocol 0 ce eeeceeeeeeeesseceeeeeceeseeeeeneeeeseeeenaeeees 8 30 Timpa e AA A a aE onae ibien 8 33 Optional Bus Configuration ee 8 39 32 B t Data Bus MO E VE a Bae E E 8 39 No D DRIRY M de aserra anh TA REANA On a reia 8 41 Reduced Pinout Mode rotondas aia aerea 8 41 Interrupt Checkstop and Reset Sgnals 8 42 External Interrupts ensenen e ee decidida sica ege dg 8 42 EM EE 8 42 Le 8 42 System Quiesce ENEE 8 43 Proc ssor State al de ao de ele Es E de O 8 43 Support for the lwarx stwex Instruction Par 8 43 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number 8 8 2 8 9 8 9 1 8 10 9 1 9 1 1 9 1 2 9 1 3 9 1 4 9 1 5 9 1 5 1 9 1 5 2 9 1 6 9 1 7 9 1 7 1 9 1 7 2 9 1 7 3 10 1 10 2 10 2 1 10 2 1 1 10 2 1 2 10 2 1 3 10 2 1 4 10 2 1 5 10 2 2 10 3 10 3 1 10 3 2 10 3 2 1 10 3 2 2 10 3 2 3 10 3 2 4 10 4 Contents Contents Title TEBISYNG Putas daa IEEE 1149 1a 1993 Compliant Interface JTAG COP nterface iii Using Data Bus Write Only soc ccccassecccsdswtevaacassbcesadaceedbesencendecdonastens Chapter 9 L2 Cache Interface Ope
422. nd coordinate the various masters and slaves with respect to the use of the data bus when DBWO is used Individual DBG signals associated with each bus device should allow the arbiter to synchronize both pipelined and split transaction bus organizations Individual DBG and DBWO signals provide a primitive form of source level tagging for the granting of the data bus Note that use of the DBWO signal allows some operation level tagging with respect to the 750 and the use of the data bus Chapter 8 Bus Interface Operation 8 46 Chapter 9 L2 Cache Interface Operation This chapter describes the PowerPC 750 microprocessor L2 cache interface and its configuration and operation It describes how the 750 signals defined in Chapter 7 Signal Descriptions interact to perform address and data transfers to and from the L2 cache Note that the PowerPC 740 microprocessor does not implement the L2 cache interface 9 1 L2 Cache Interface Overview The 750 s L2 cache interface is implemented with an on chip two way set associative tag memory with 4096 tags per way and a dedicated interface with support for up to 1 Mbyte of external synchronous SRAM for data storage The tags are sectored to support either two cache blocks per tag entry two sectors 64 bytes or four cache blocks per tag entry four sectors 128 bytes depending on the L2 cache size If the L2 cache is configured for 256 Kbytes or 512 Kbytes of external SRAM the tags are c
423. nd disabling SYSCLK Note that forcing the SYSCLK signal into a static state does not disable the 750 s PLL which will continue to operate internally at an undefined frequency unless placed in PLL bypass mode Additionally if the PLL is not disabled the L2 cache interface DLL will remain locked and the L2CLK_OUTA and L2CLK_OUTB signals will remain active The DLL is disabled by clearing the L2CR L2E bit to 0 Due to the fully static design of the 750 internal processor state is preserved when no internal clock is present Because the time base and decrementer are disabled while the 750 is in sleep mode the 750 s time base contents will have to be updated from an external time base after exiting sleep mode if maintaining an accurate time of day is required Before entering the sleep mode the 750 asserts the QREQ signal to indicate that it is ready to disable bus snooping When the system has ensured that snooping is no longer necessary it asserts QACK and the 750 will enter sleep mode e All functional units disabled including bus snooping and time base e All nonessential input receivers disabled Internal clock regenerators disabled PLL and DLL still running see below 10 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual e Sleep mode sequence Set sleep bit HIDO 10 1 clear doze and nap bits HIDO 8 and HIDO 9 The 750 asserts quiesce request QREQ System asserts quiesce acknowledg
424. nd1 and3 IQ0 add1 and1 Figure 6 7 Branch Folding Figure 6 8 shows the removal of fall through branch instructions which occurs when a branch is not taken or is predicted as not taken Branch Fall Through Not Taken Branch Clock 0 Clock 1 Clock 2 IQ5 add5 1Q4 add4 1Q3 add3 add5 add7 1Q2 b add4 add6 1Q1 add2 add3 add5 IQ0 addi b add4 Figure 6 8 Removal of Fall Through Branch Instruction In this case the branch instruction remains in the instruction queue and is removed from the instruction stream as if it were dispatched However it is not dispatched to an execution unit and is not assigned an entry in the completion queue Chapter 6 Instruction Timing 6 19 When a branch instruction is detected before it reaches a dispatch position and if the branch is correctly predicted as taken folding the branch instruction and any instructions from the incorrect path reduces the latency required for flow control to zero instruction execution proceeds as though the branch was never there The advantage of removing the fall through branch instructions at dispatch is only marginally less than that of branch folding Because the branch is not taken only the branch instruction needs to be discarded The only cost of expelling the branch instruction from one of the dispatch entries rather than folding it is missing a chance to dispatch an executable instruction from that posi
425. ne time There are six GPR rename registers therefore only six GPRs can be specified as destination operands at any time If no rename registers are available instructions cannot enter the execute stage and remain in the reservation station or instruction queue until they become available Note that load with update address instructions use two destination registers Similarly there are six FPR rename registers so only six FPR destination operands can be in the execute and complete stages at any time 6 6 1 Branch Dispatch and Completion Unit Resource Requirements This section describes the specific resources required to avoid stalls during branch resolution instruction dispatching and instruction completion Chapter 6 Instruction Timing 6 29 6 6 1 1 Branch Resolution Resource Requirements The following is a list of branch instructions and the resources required to avoid stalling the fetch unit in the course of branch resolution e The belr instruction requires LR availability e The bectr instruction requires CTR availability e Branch and link instructions require shadow LR availability e The branch conditional on counter decrement and the CR condition requires CTR availability or the CR condition must be false and the 750 cannot execute instructions after an unresolved predicted branch when the BPU encounters a branch e A branch conditional on CR condition cannot be executed following an unresolved pr
426. ng Comments Assertion Negation Data must be valid on the same bus clock cycle that TA is asserted 7 2 7 2 Data Bus Parity DP 0 7 The eight data bus parity DP 0 7 signals on the 750 are both output and input signals 7 2 7 2 1 Data Bus Parity DP 0 7 Output Following are the state meaning and timing comments for the DP output signals State Meaning Asserted Negated Represents odd parity for each of the 8 bytes of data write transactions Odd parity means that an odd number of bits including the parity bit are driven high The generation of parity is enabled through HIDO The signal assignments are listed in Table 7 5 Timing Comments Assertion Negation The same as DL 0 31 High Impedance The same as DL 0 31 Table 7 5 DP 0 7 Signal Assignments ooo or eure o 7 2 7 2 2 Data Bus Parity DP 0 7 Input Following are the state meaning and timing comments for the DP input signals State Meaning Asserted Negated Represents odd parity for each byte of read data Parity is checked on all data byte lanes regardless of the size of the transfer Detected even parity causes a checkstop if data parity errors are enabled in the HIDO register Timing Comments Assertion Negation The same as DL 0 31 7 18 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 2 7 3 Data Bus Disable DBDIS Input Following are the state meaning and timing comments for the DBDIS signal State Me
427. ng cycle On the following cycle only the snooping master that asserted ARTRY and needs to perform a snoop copy back operation is allowed to assert BR This guarantees the snooping master an opportunity to request and be granted the bus before the just retried master can restart its transaction Note that a nonclocked bus arbiter may detect the assertion of address bus request by the bus master that asserted ARTRY and return a qualified bus grant one cycle earlier than shown in Figure 8 8 Note that if the 750 asserts ARTRY due to a snoop operation and asserts BR in the bus cycle following ARTRY in order to perform a snoop push to memory it may be several bus cycles later before the 750 will be able to accept a BG The delay in responding to the assertion of BG only occurs during snoop pushes from the L2 cache The bus arbiter should keep BG asserted until it detects BR negated or TS asserted from the 750 indicating that the snoop copy back has begun The system should ensure that no other address tenures occur until the current snoop push from the 750 is completed Snoop push delays can also be avoided by operating the L2 cache in write through mode so no snoop pushes are required by the L2 cache 8 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Figure 8 8 Snooped Address Cycle with ARTRY 8 4 Data Bus Tenure This section describes the data bus arbitration transfer and termination phases defined by the 750
428. ng the performance monitor interrupt The IPL ROM code clears this bit before passing control to the operating system DISCOUNT Disables counting of PMCn when a performance monitor interrupt is signaled that is PMCnINTCONTROL 1 amp PMCn 0 1 amp ENINT 1 or the occurrence of an enabled time base transition with INTONBITTRANS 1 amp ENINT 1 0 Signaling a performance monitor interrupt does not affect counting status of PMCn 1 The signaling of a performance monitor interrupt prevents changing of PMC1 counter The PMCn counter does not change if PMC2COUNTCTL 0 Because a time base signal could have occurred along with an enabled counter overflow condition software should always reset INTONBITTRANS to zero if the value in INTONBITTRANS was a one a B E E H 11 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 11 2 MMCRO Bit Settings Continued MN ea o 7 8 RTCSELECT 64 bit time base bit selection enable Pick bit 63 to count Pick bit 55 to count Pick bit 51 to count Pick bit 47 to count INTONBITTRANS Causes interrupt signaling on bit transition identified in RTCSELECT from off to on O Do not allow interrupt signal on the transition of a chosen bit 1 Signal interrupt on the transition of a chosen bit Software is responsible for setting and clearing INTONBITTRANS 0 15 THRESHOLD Threshold value All 6 bits are supported by the 750 allowing threshold values fro
429. nitor counter registers 3 and 4 PMC3 and PMC4 The MMCRI register 1s shown in Figure 11 2 8 9 O Reserved 00 0000 0000 0000 0000 0000 5 0 4 9 10 31 Figure 11 2 Monitor Mode Control Register 1 MMCR1 Chapter 11 Performance Monitor 11 5 Bit settings for MMCR1 are shown in Table 11 3 The corresponding events are described in Section 11 2 1 5 Performance Monitor Counter Registers PMC1 PMC4 Table 11 3 MMCR1 Bit Settings E A PMC3SELECT PMC3 input selector 32 events selectable See Table 11 7 for defined selections PMC4SELECT PMC4 input selector 32 events selectable See Table 11 8 for defined selections f freee O MMCRI can be accessed with the mtspr and mfspr instructions using SPR 956 User level software can read the contents of MMCR1 by issuing an mfspr instruction to UMMCRI described in Section 11 2 1 4 User Monitor Mode Control Register 1 UMMCR1 11 2 1 4 User Monitor Mode Control Register 1 UMMCR1 The contents of MMCRI are reflected to UMMCRI which can be read by user level software UMMCRI can be accessed with the mfspr instructions using SPR 940 11 2 1 5 Performance Monitor Counter Registers PMC1 PMC4 PMC1 PMC4 shown in Figure 11 3 are 32 bit counters that can be programmed to generate interrupt signals when they overflow Counter Value ee 31 Figure 11 3 Performance Monitor Counter Registers PMC1 PMC4 The bits contained in the PMC registers are descri
430. nly bus transfer during the execution of the dcbz instruction and for the debi dcbf debst sync and eieio instructions if HIDO ABE is enabled which uses only the address bus with no data transfer involved Additionally the 750 s retry capability provides an efficient snooping protocol for systems with multiple memory systems including caches that must remain coherent Chapter 8 Bus Interface Operation 8 9 8 2 1 Arbitration Signals Arbitration for both address and data bus mastership is performed by a central external arbiter and minimally by the arbitration signals shown in Section 7 2 1 Address Bus Arbitration Signals Most arbiter implementations require additional signals to coordinate bus master slave snooping activities Note that address bus busy ABB and data bus busy DBB are bidirectional signals These signals are inputs unless the 750 has mastership of one or both of the respective buses they must be connected high through pull up resistors so that they remain negated when no devices have control of the buses The following list describes the address arbitration signals BR bus request Assertion indicates that the 750 is requesting mastership of the address bus BG bus grant Assertion indicates that the 750 may with the proper qualification assume mastership of the address bus A qualified bus grant occurs when BG is asserted and ABB and ARTRY are negated If the 750 is parked BR need no
431. nncc nncnnn 7 24 Reservation RSRV Output iaa til iicidss 7 24 Time Base Enable CTRBEN Jnput eee eeeececeeeeeceeeeeceeeeeeeeeeeeeeeeee 7 24 TLBI Sync TLBISYNC Inputt as sj cccesssccccasgonessadsasessadavdeenstensaeadnes 7 25 E ache e 7 25 L2 Address O2ADtDRIe Oly Ott 7 25 L2 Data E21 A PAO ET 7 25 L2 Data L2ZDATA 0 63 Output REENEN 7 25 L2 Data LZDATA 0 63 Innput e eee eeeeceeeeeceeeceseeeeeaeceeeeneeees 7 26 L2 Data Parity L2DPIO Divinidad 7 26 L2 Data Parity L2DP 0 7 Output eege de eege enee 7 26 L2 Data Parity L2DP 0 7 ImpUt cooococonococonocanonnncnonnnnononaconnnnnononanonn 7 26 L2 Chip melen EN 7 26 L2 Write Enable L2WBE OUtpUbooococnoonnononnconnnnnnnonncononosononocononcnonoreconass 7 27 L2 Clock Out A L2CLK_OUTA Olultput cooconcccoconononononnninncnnncnncnononanonns 7 27 L2 Clock Out B L2CLK_OUTB OUtpUtoccoconccnoconoconononnninncnncconanononanons 7 27 L2 Sync Out 22S YNC OUT Outpue iia 7 27 L2Syne In 128 YNC IN emt vi tn ee 7 28 L2 Low Power Mode Enable 2 t Oumut 7 28 TEEE 1149 1a 1993 Interface Description oocoonoocccnoncccnonnccnoncnononacononccinnnnnos 7 28 Clock eelere dadas 7 29 System Clock SY SCLK TIOput vinrcinniacciodi shavaavalescaccessaveavattecsenans 7 29 Clock Out CLK_OUT Outpu t Ae 7 29 PLL Configuration PLL_CFG 0 3 Input ocooooconoccnonocanonancnonnncononccinnnos 7 30 Power and Ground nal id 7 30 Chapter 8 Bus Interface Operation B
432. nonglobal on the bus 1 Instruction fetches reflect the M bit from the WIM settings stores to form a double word that is sent out on the 60x bus as a single beat operation Stores are gathered only if successive eligible stores are queued and pending Store gathering is Store gathering enable O Store gathering is disabled 1 Integer store gathering is performed for write through to nonguarded space or for cache inhibited stores to nonguarded space for 4 byte word aligned stores The LSU combines performed regardless of address order or endian mode defined by the PLRU bits This reduces the series of uniquely addressed load or dcbz instructions to eight per set The bit should be set just before beginning a cache flush routine DCFA Data cache flush assist Force data cache to ignore invalid sets on miss replacement selection O The data cache flush assist facility is disabled 1 The miss replacement algorithm ignores invalid entries and follows the replacement sequence and should be cleared when the series of instructions is complete New entries cannot be added until the BTIC is enabled 1 The BTIC is enabled and new entries can be added Not used Defined as FBIOB on earlier 603 type processors A Address broadcast enable controls whether certain address only operations such as cache operations eieio and sync are broadcast on the 60x bus O Address only operations affect only local L1 and L2 caches and are not broadcast
433. ns oooccnoccnoncnnonnnonnccononancnnnnos 2 50 Floating Point Load and Store Address Generaton ooconocccnocinocnnacnnnnnos 2 51 Floating Point Store Instructions viii dadas 2 51 Branch and Flow Control Instructions ococonoccnoncnnocononncnonnnannnona nono ncnoncnnnos 2 53 Branch Instruction Address Calculation oonoconoccnoncnoccnnonaconncnonanannnnnnos 2 53 Branch Instructions iuescucit dai tte 2 54 Condition Register Logical Instructions ooooonocononcnoncnnonnconncnonananncnnnoo 2 54 Trap Instructions iii ii lia 2 55 System Linkage Instruction UISA dida 2 55 Processor Control Instructons USA 2 55 Move to from Condition Register Instructions coonnoccnnnocccnnnnccnonnnnnno 2 56 Move to from Special Purpose Register Instructions UISA 2 56 Memory Synchronization Instructions UISA oo ceeeeeeseeeneeeeeeteee 2 59 PowerPC VEA hee ee EE 2 60 Processor Control Instructons NBA 2 60 Memory Synchronization Instructons NEA 2 61 Memory Control Instructions VEA cocoooccnoncccnoncnnnnnnnonnnnncnnnnnononcncnncninnnos 2 62 vii Paragraph Number 223 3 3 1 2 3 5 4 2 3 6 2 3 6 1 2 3 6 2 2 3 6 3 2 3 6 3 1 2 3 6 3 2 2 3 6 3 3 2 3 7 3 1 3 2 3 3 3 3 1 323 2 3 3 2 1 3 3 3 3 3 4 3 3 5 3 3 5 1 3 3 5 2 3 3 5 3 3 4 3 4 1 3 4 1 1 3 4 1 2 3 4 1 3 3 4 1 4 3 4 1 5 3 4 1 6 3 4 2 3 4 2 1 3 4 2 2 3 4 2 3 3 4 2 4 3 4 2 5 3 4 2 6 3 5 3 5 1 viii Contents Page Tule Num
434. ns e ecowx instructions e A store that occurs during a table search operation e Floating point store operations If store gathering is enabled and the stores do not fall under the above categories an eieio or sync instruction must be used to prevent two stores from being gathered 6 4 8 System Register Unit Execution Timing Most instructions executed by the SRU either directly access renamed registers or access or modify nonrenamed registers They generally execute in a serial manner Results from these instructions are not available to subsequent instructions until the instruction completes and is retired See Section 6 3 3 2 Instruction Serialization for more information on serializing instructions executed by the SRU and refer to Table 6 4 and Table 6 5 for SRU instruction execution timings 6 5 Memory Performance Considerations Because the 750 can have a maximum instruction throughput of three instructions per clock cycle lack of memory bandwidth can affect performance For the 750 to maximize performance it must be able to read and write data efficiently If a system has multiple bus devices one of them may experience long memory latencies while another bus master for example a direct memory access controller is using the external bus 6 5 1 Caching and Memory Coherency To minimize the effect of bus contention the PowerPC architecture defines WIM bits that are used to configure memory regions as caching enforced or cac
435. ns for identifying when an Iwarx instruction does generate a bus transaction If an implementation requires that all Iwarx instructions generate bus transactions then the associated pages should be marked as caching inhibited The state of the reservation is always presented onto the RSRV output signal This can be used to determine when an internal condition has caused a change in the reservation state The 750 s data cache treats all stwex operations as write through independent of the WIMG settings However if the stwex operation hits in the 750 s L2 cache then the operation completes with the reservation intact in the L2 cache See Chapter 9 L2 Cache Interface Operation for more information Otherwise the stwcx operation continues to the bus interface unit for completion When the write through operation completes successfully either in the L2 cache or on the 60x bus then the data cache entry is updated assuming it hits and CRO EQ is modified to reflect the success of the operation If the reservation is not intact the stwex completes in the bus interface unit without performing a bus transaction and without modifying either of the caches 3 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 3 4 Cache Control The 750 s L1 caches are controlled by programming specific bits in the HIDO special purpose register and by issuing dedicated cache control instructions Section 3 4 1 Cache Control Param
436. nstructions VEA describes user level memory control instructions 2 3 6 3 1 Supervisor Level Cache Management Instruction OEA Table 2 57 lists the only supervisor level cache management instruction Table 2 57 Supervisor Level Cache Management Instruction Data rA rB The EA is computed translated and checked for protection violations For cache Cache hits the cache block is marked regardless of whether it was marked E or M A Block debi is not broadcast unless HIDO ABE 1 regardless of WIMG settings The Invalidate instruction acts like a store with respect to address translation and memory protection It executes regardless of whether the cache is disabled or locked The exception priorities from highest to lowest for dcbi are as follows 1 BAT protection violation DSI exception 2 TLB protection violation DSI exception See Section 2 3 5 3 1 User Level Cache Instructions VEA for cache instructions that provide user level programs the ability to manage the on chip caches If the effective address references a direct store segment the instruction is treated as a no op 2 66 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 2 3 6 3 2 Segment Register Manipulation Instructions OEA The instructions listed in Table 2 58 provide access to the segment registers for 32 bit implementations These instructions operate completely independently of the MSR IR and MSR DR bit settings Refer to Syn
437. nt Since the 750 can pipeline transactions there may be an outstanding data bus transaction when a new address transaction is retried In this case the 750 becomes the data bus master to complete the previous transaction 8 4 1 1 Using the DBB Signal The DBB signal should be connected between masters if data tenure scheduling is left to the masters Optionally the memory system can control data tenure scheduling directly with DBG However it is possible to ignore the DBB signal in the system if the DBB input is not used as the final data bus allocation control between data bus masters and if the memory system can track the start and end of the data tenure If DBB is not used to signal the end of a data tenure DBG is only asserted to the next bus master the cycle before the cycle that the next bus master may actually begin its data tenure rather than asserting it earlier usually during another master s data tenure and allowing the negation of DBB to be the final gating signal for a qualified data bus grant Even if DBB is ignored in the system the 750 always recognizes its own assertion of DBB and requires one cycle after data tenure completion to negate its own DBB before recognizing a qualified data bus grant for another data tenure If DBB is ignored in the system it must still be connected to a pull up resistor on the 750 to ensure proper operation 8 24 IBM PowerPC 740 PowerPC 750 RISC Microproces
438. nt and performance 6 25 Operating environment architecture OEA 1 21 Operating environment architecture OEA xxvi Operations bus operations caused by cache control in structions 3 24 cache operations 3 1 data cache block push 3 22 enveloped high priority cache block push 3 22 instruction cache block fill 3 21 read operation 3 23 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual response to snooped bus transactions 3 27 single beat write operations 8 34 Optional instructions A 41 A 47 Overview 1 1 P Page address translation definition 1 12 page address translation flow 5 28 page size 5 21 selection of page address translation 5 9 5 16 TLB organization 5 26 Page history status cases of dcbt and dcbtst misses 5 22 R and C bit recording 5 12 5 21 5 25 Page table updates 5 34 Performance monitor event counting 11 11 event selecting 11 12 performance monitor interrupt 4 22 11 2 performance monitor SPRs 11 3 purposes 11 1 registers 11 3 warnings 11 12 Phase locked loop 10 3 Physical address generation 5 1 Pipeline instruction timing definition 6 2 pipeline stages 6 7 pipelined execution unit 6 4 superscalar pipeline diagram 6 5 PMC1 and PMC2 registers 1 26 PMCn performance monitor counter registers 2 16 4 23 11 6 Power and ground signals 7 30 Power management doze mode 10 2 doze nap sleep DPM bits 2 13 dynamic power management 10 1 full power mode 10 2 nap mode 10 3 programmable power mode
439. nternal processor clock frequency Settings are based on the desired bus and internal frequency of operation Timing Comments Assertion Negation Must remain stable during operation should only be changed during the assertion of HRESET or during sleep mode These bits may be read through the PC 0 3 bits in the HID1 register 7 2 12 Power and Ground Signals The 750 provides the following connections for power and ground 7 30 Vpp The Vpp signals provide the supply voltage connection for the processor core OVpp The OVpp signals provide the supply voltage connection for the system interface drivers L2Vpp The L2V pp signals provide the supply voltage connection for the L2 cache interface drivers These power supply signals are isolated from the Vpp and OVpp power supply signals These signals are not implemented on the 740 AVpp The AVpp power signal provides power to the clock generation phase locked loop See the 750 hardware specifications for information on how to use this signal L2AVpp The L2AVpp power signal provides power to the L2 delay locked loop See the 750 hardware specifications for information on how to use this signal This signal is not implemented on the 740 GND and OGND The GND and OGND signals provide the connection for grounding the 750 On the 750 there is no electrical distinction between the GND and OGND signals L2GND The L2GND signals provide the ground connection for the L2 cache int
440. nto the L2 cache it generates an address for each access Pipelined SRAMs may be used for all L2 clock modes Note that flow through SRAMs can be used only for L2 clock modes divide by 2 or slower divide by 1 and divide by 1 5 not allowed 00 Flow through register buffer synchronous burst SRAM 01 Reserved 10 Pipelined register register synchronous burst SRAM 11 Pipelined register register synchronous late write SRAM L2 data only Setting this bit enables data only operation in the L2 cache For this operation only transactions from the L1 data cache can be cached in the L2 cache which treats all transactions from the L1 instruction cache as cache inhibited bypass L2 cache no L2 checking done This bit is provided for L2 testing only L2 global invalidate Setting L2l invalidates the L2 cache globally by clearing the L2 bits including status bits This bit must not be set while the L2 cache is enabled Chapter 2 Programming Model 2 25 Table 2 18 L2CR Bit Settings Continued 11 L2 RAM control ZZ enable Setting L2CTL enables the automatic operation of the L2ZZ low power mode signal for cache RAMs that support the ZZ function While L2CTL is asserted L2ZZ asserts automatically when the 750 enters nap or sleep mode and negates automatically when the 750 exits nap or sleep mode This bit should not be set when the 750 is in nap mode and snooping is to be performed through deassertion of QACK Additionally the relati
441. nual 8 5 Timing Examples This section shows timing diagrams for various scenarios Figure 8 16 illustrates the fastest single beat reads possible for the 750 This figure shows both minimal latency and maximum single beat throughput By delaying the data bus tenure the latency increases but because of split transaction pipelining the overall throughput is not affected unless the data bus latency causes the third address tenure to be delayed Note that all bidirectional signals are three stated between bus tenures 1 2 3 4 5 6 7 8 9 10 11 12 y lalo ala cl Sled ET ee ee A aa SS Ltr E EE AjO 31 A Roane M i TTI0 4 A a ee ee ee EEN ES Date A LO Ce A GBL OOO A E O S EE SS RER ee tee ed NW We EN q 11 Ze Se a ae oe GI EG E 910 11 12 Figure 8 16 Fastest Single Beat Reads Chapter 8 Bus Interface Operation 8 33 Figure 8 17 illustrates the fastest single beat writes supported by the 750 All bidirectional signals are three stated between bus tenures 8 34 I1 2 3 4 5 6 7 8 9 10 11 12 Ch NM n D KE EE o Ir WE 5 TA IAN Figure 8 17 Fastest Single Beat Writes IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Figure 8 18 shows three ways to delay single beat reads showing data delay controls e The TA signal can remain negated to insert wait states in clock cycles 3 and 4 e For the second access DBG cou
442. o be changed to run on the 750 3 4 1 5 Instruction Cache Enabling Disabling The instruction cache may be enabled or disabled through the use of the instruction cache enable bit HIDO ICE HIDO ICE is cleared on power up disabling the instruction cache When the instruction cache is in the disabled state HID ICE 0 the cache tag state bits are ignored and all instruction fetches are propagated to the L2 cache or 60x bus as single beat transactions Note that the CI signal always reflects the state of the caching inhibited memory cache access attribute the I bit independent of the state of HIDO ICE Also note that disabling the instruction cache does not affect the translation logic translation for instruction accesses is controlled by MSR IR 3 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The setting of the ICE bit must be preceded by an isync instruction to prevent the cache from being enabled or disabled in the middle of an instruction fetch In addition the cache must be globally flushed before it is disabled to prevent coherency problems when it is re enabled The icbi instruction is not affected by disabling the instruction cache 3 4 1 6 Instruction Cache Locking The contents of the instruction cache can be locked by setting the instruction cache lock bit HIDO ILOCK An instruction fetch that hits in a locked instruction cache is serviced by the cache However all accesses that miss in the locked c
443. o selected for the L2 clock is dependent on the frequency supported by the external SRAMs the 750 s internal frequency of operation and the range of phase adjustment supported by the L2 DLL Refer to the 750 hardware specifications for additional information about L2 clock configuration 9 1 7 L2 Cache SRAM Timing Examples This section describes the signal timing for the three types of SRAM flow through burst SRAM pipelined burst SRAM and late write SRAM supported by the 750 s L2 cache interface The timing diagrams illustrate the best case logical ideal non AC timing accurate interface operations For proper interface operation the designer must select SRAMSs that support the signal sequencing illustrated in the timing diagrams Designers should also note that during burst transfers into and out of the L2 cache SRAM array an address is generated by the 750 for each data beat The SRAM selected for a system design is usually a function of desired system performance L2 bus frequency and SRAM unit cost The following sections describe the operation of the three SRAM types supported by the 750 and the design trade offs associated with each 9 1 7 1 Flow Through Burst SRAM Flow through burst SRAMs operate by clocking in the address and driving the data directly to the bus from the SRAM memory array This behavior allows the flow through burst SRAMs to provide initial read data one cycle sooner than pipelined burst SRAMs but the fl
444. o types of accesses generated by the 750 that require address translation instruction accesses and data accesses to memory generated by load store and cache control instructions The PowerPC architecture defines different resources for 32 and 64 bit processors the 750 implements the 32 bit memory management model The memory management model provides 4 Gbytes of logical address space accessible to supervisor and user programs with a 4 Kbyte page size and 256 Mbyte segment size BAT block sizes range from 128 Kbyte to 256 Mbyte and are software selectable In addition it defines an interim 52 bit virtual address and hashed page tables for generating 32 bit physical addresses The architecture also provides independent four entry BAT arrays for instructions and data that maintain address translations for blocks of memory These entries define blocks that can vary from 128 Kbytes to 256 Mbytes The BAT arrays are maintained by system software The PowerPC MMU and exception model support demand paged virtual memory Virtual memory management permits execution of programs larger than the size of physical memory demand paged implies that individual pages are loaded into physical memory from system memory only when they are first accessed by an executing program The hashed page table is a variable sized data structure that defines the mapping between virtual page numbers and physical page numbers The page table size is a power of 2 and its star
445. ocessors to translate effective addresses to virtual and then physical addresses 5 1 6 1 Real Addressing Mode and Block Address Translation Selection When an instruction or data access is generated and the corresponding instruction or data translation is disabled MSR IR 0 or MSR DR 0 real addressing mode is used physical address equals effective address and the access continues to the memory subsystem as described in Section 5 2 Real Addressing Mode Figure 5 5 shows the flow the MMUs use in determining whether to select real addressing mode block address translation or the segment descriptor to select page address translation 5 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Effective Address Generated Access D Access Instruction ee gt Data Translation Disabled Translation Enabled Translation Enabled Translation Disabled MSR IR 0 MSR IR 1 e MSR DR 1 MSR DR 0 Perform Real Addressing Mode Translation Perform Real Addressing Mode Translation Compare Address with Instruction or Data BAT Array As Appropriate BAT Array BAT Array See The Programming Miss Hit Environments Manual Perform Address Translation with Segment Descriptor See Figure 5 6 Access Access Protected Access Faulted Translate Address Continue Access to Memory Subsystem Figure 5 5 General Flow of Address Translation
446. ock cycle Prerequisite dependency event will occur on an undetermined subsequent clock cycle 750 three state output or input 750 nonsampled input Signal with sample point A sampled condition dot on high or low state with multiple dependencies ore le Timing for a signal had it been asserted it is not actually asserted Figure 8 3 Timing Diagram Legend 8 2 Memory Access Protocol Memory accesses are divided into address and data tenures Each tenure has three phases bus arbitration transfer and termination The 750 also supports address only transactions Note that address and data tenures can overlap as shown in Figure 8 4 Figure 8 4 shows that the address and data tenures are distinct from one another and that both consist of three phases arbitration transfer and termination Address and data tenures are independent indicated in Figure 8 4 by the fact that the data tenure begins before the address tenure ends which allows split bus transactions to be implemented at the system level in multiprocessor systems Figure 8 4 shows a data transfer that consists of a single beat transfer of as many as 64 bits Four beat burst transfers of 32 byte cache lines require data transfer termination signals for each beat of data 8 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual ADDRESS TENURE SSS Sh ARBITRATION TRANSFER TERMINATION INDEPENDENT ADDRESS AND DATA DATA TENURE
447. of the adjusted exponent value in the following examples when the corresponding exception enable bit is one e Underflow during multiplication using a denormalized operand e Overflow during division using a denormalized divisor 2 2 2 Data Organization in Memory and Data Transfers Bytes in memory are numbered consecutively starting with 0 Each number is the address of the corresponding byte Memory operands may be bytes half words words or double words or for the load store multiple and load store string instructions a sequence of bytes or words The address of a 2 28 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual memory operand is the address of its first byte that is of its lowest numbered byte Operand length is implicit for each instruction 2 2 3 Alignment and Misaligned Accesses The operand of a single register memory access instruction has an alignment boundary equal to its length An operand s address is misaligned if it is not a multiple of its width Operands for single register memory access instructions have the characteristics shown in Table 2 19 Although not permitted as memory operands quad words are shown because quad word alignment is desirable for certain memory operands The concept of alignment is also applied more generally to data in memory For example a 12 byte data item is said to be word aligned if its address is a multiple of four Some instructions require their memory operand
448. oherency Protocol State Diagram WIM 001 eccess 3 8 PERU Replacement leede ii acted 3 19 Double Word Address Ordering Critical Double Word Pret 3 23 Machine Status Save Restore Register O SRRO ooonooccnooccccoocccooncconnnncononacinnncnos 4 7 Machine Status Save Restore Register 1 SRR1 ooo eee eeeseeeeeeceeeeeeeteeeenseees 4 7 Machine State Register MSR sonrisa on reali 4 8 SRESET Asserted During HRESET ysisisigonscisssecesandaten seasendedeva thes sotecsenseeoanesbuwntes 4 14 MMU Conceptual Block Diagram 32 Bit Implementations eee 5 6 PowerPC 750 Microprocessor IMMU Block Diagram 5 7 PowerPC 750 Microprocessor DMMU Block Diagram oooonoconoccconcnnocnnnnnanonecnnos 5 8 Address Translation Types visitarlas seccatid serena 5 10 General Flow of Address Translation Real Addressing Mode and Block 5 13 General Flow of Page and Direct Store Interface Address Translation 5 15 Segment Register and DTLB Organization oococcoocccnooccconaccnnnncnnnoncnononcninnaninnnos 5 26 Page Address Translation Flow TLB Hu 5 29 Primary Page Table Seance EE 5 32 Secondary Page Table Search FlOW oooococnnncccnoncccnoncnonncncononcconononononcccnnncnonnnos 5 33 Pipelined Execution Unit sajiccs ceseasvcccsduccssasedesvendecasi edd lindaa deed E 6 4 superscalan Pipeline Diagram A A did 6 5 PowerPC 750 Microprocessor Pipeline Stage 6 7 Instruction Flow Diagrami Ai as 6 10 Instruction Timing Cache Hb diia 6 12
449. okes the program exception trap handler e The execution of an instruction that causes a floating point exception while exceptions are enabled in the MSR invokes the program exception handler A detailed description of exception conditions is provided in Chapter 4 Exceptions 2 3 3 Instruction Set Overview This section provides a brief overview of the PowerPC instructions implemented in the 750 and highlights any special information with respect to how the 750 implements a particular instruction Note that the categories used in this section correspond to those used in Chapter 4 Addressing Modes and Instruction Set Summary in The Programming Environments Manual These categorizations are somewhat arbitrary and are provided for the convenience of the programmer and do not necessarily reflect the PowerPC architecture specification Note that some instructions have the following optional features e CR Update The dot suffix on the mnemonic enables the update of the CR e Overflow option The o suffix indicates that the overflow bit in the XER is enabled Chapter 2 Programming Model 2 37 2 3 4 PowerPC UISA Instructions The PowerPC UISA includes the base user level instruction set excluding a few user level cache control synchronization and time base instructions user level registers programming model data types and addressing modes This section discusses the instructions defined in the UISA 2 3 4 1 Integer Ins
450. ollowing are the state meaning and timing comments for the ABB input signal State Meaning Asserted Indicates that the address bus is in use This condition effectively blocks the 750 from assuming address bus ownership regardless of the BG input see Section 8 3 1 Address Bus Arbitration Negated Indicates that the address bus is not owned by another bus master and that it is available to the 750 when accompanied by a qualified bus grant Chapter 7 Signal Descriptions 7 5 Timing Comments Assertion May occur when the 750 must be kept from using the address bus and the processor is not currently asserting ABB Negation May occur whenever the 750 can use the address bus 7 2 2 Address Transfer Start Signals Address transfer start signals are input and output signals that indicate that an address bus transfer has begun The transfer start TS signal identifies the operation as a memory transaction For detailed information about how TS interacts with other signals refer to Section 8 3 2 Address Transfer 7 2 2 1 Transfer Start TS The TS signal is both an input and an output signal on the 750 7 2 2 1 1 Transfer Start TS Output Following are the state meaning and timing comments for the TS output signal State Meaning Asserted Indicates that the 750 has begun a memory bus transaction and that the address bus and transfer attribute signals are valid When asserted with the appropriate TT 0 4
451. ollowing cycle The TEA signal is used to signal a nonrecoverable error during the data transaction It may be asserted on any cycle during DBB or on the cycle after a qualified TA during a read operation except when no DRTRY mode is selected where no DRTRY mode cancels checking the cycle after TA The assertion of TEA terminates the data tenure immediately even if in the middle of a burst however it does not prevent incorrect data that has just been acknowledged with TA from being written into the 750 s cache or GPRs The assertion of TEA initiates either a machine check exception or a checkstop condition based on the setting of the MSR ME bit An assertion of ARTRY causes the data tenure to be terminated immediately if the ARTRY is for the address tenure associated with the data tenure in operation If ARTRY is connected for the 750 the earliest allowable assertion of TA to the 750 is directly dependent on the earliest possible assertion of ARTRY to the 750 see Section 8 3 3 Address Transfer Termination 8 4 4 1 Normal Single Beat Termination Normal termination of a single beat data read operation occurs when TA is asserted by a responding slave The TEA and DRTRY signals must remain negated during the transfer see Figure 8 10 8 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Figure 8 10 Normal Single Beat Read Termination The DRTRY signal is not sampled during data writes as sho
452. ompleted to a point where they can no longer cause an exception If a prior memory access instruction causes direct store error exceptions the results are guaranteed to be determined before this instruction is executed e Previous instructions complete execution in the context privilege protection and address translation under which they were issued e The instructions following the se or rfi instruction execute in the context established by these instructions 2 3 2 4 2 Execution Synchronization An instruction is execution synchronizing if all previously initiated instructions appear to have completed before the instruction is initiated or in the case of sync and isync before the instruction completes For example the Move to Machine State Register mtmsr instruction is execution synchronizing It ensures that all preceding instructions have completed execution and cannot cause an exception before the instruction executes but does not ensure subsequent instructions execute in the newly established environment For example if the mtmsr sets the MSR PR bit unless an isync immediately follows the mtmsr instruction a privileged instruction could be executed or privileged access could be performed without causing an exception even though the MSR PR bit indicates user mode 2 36 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 2 3 2 4 3 Instruction Related Exceptions There are two kinds of exceptions in the 750 those
453. on however some events such as dispatch and write back happen instantaneously and may be thought to occur at the end of the stage IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual An instruction can spend multiple cycles in one stage An integer multiply for example takes multiple cycles in the execute stage When this occurs subsequent instructions may stall In some cases an instruction may also occupy more than one stage simultaneously especially in the sense that a stage can be seen as a physical resource for example when instructions are dispatched they are assigned a place in the completion queue at the same time they are passed to the execute stage They can be said to occupy both the complete and execute stages in the same clock cycle e Stall An occurrence when an instruction cannot proceed to the next stage e Superscalar A superscalar processor is one that can issue multiple instructions concurrently from a conventional linear instruction stream In a superscalar implementation multiple instructions can be in the execute stage at the same time e Throughput A measure of the number of instructions that are processed per cycle For example a series of double precision floating point multiply instructions has a throughput of one instruction per clock cycle e Write back Write back in the context of instruction handling occurs when a result is written into the architectural registers typicall
454. on the 60x bus The current bus master detecting the assertion of the ARTRY signal should abort the transaction and retry it at a later time so that the 750 can first perform a write operation back to memory from its cache or memory queues The 750 may also retry a bus transaction if it is unable to snoop the transaction on that cycle due to internal resource conflicts Additional snoop action may be forwarded to the cache as a result of a snoop hit in some cases a cache push of modified data or a cache block invalidation There is no immediate way for another CPU bus agent to determine the cause of the 750 ARTRY Implementation Note Snooping of the memory queues for pipeline collisions as described above is performed for burst read operations in progress only In this case the read address has completed on the bus however the data tenure may be either in progress or not yet started by the processor During this time the 750 will retry any other global access to that line by another bus master until all data has been received in it s L1 cache Pipeline collisions however do not apply for burst write operations in progress If the 750 has completed an address tenure for a burst write and is currently waiting for a data bus grant or is currently transferring data to memory it will not generate an address retry to another bus master that addresses the line It is the responsibility of the memory system to handle this collision usually by kee
455. onfigured for two sectors per L2 cache block The L2 tags are configured for four sectors per L2 cache block when 1 Mbyte of external SRAM is used Each sector 32 byte L1 cache block in the L2 cache has its own valid and modified bits The L2 cache control register L2CR allows control of the following e L2 cache configuration and timing e Byte level data parity generation and checking e global invalidation of L2 contents e write through operation e L2 test support The L2 cache interface provides two clock outputs that allow the clock inputs of the SRAMs to be driven at frequency divisions of 1 1 5 2 2 5 and 3 of the processor core frequency The 750 s L2 cache maintains cache coherency through snooping and is normally configured to operate in copy back mode Figure 9 26 shows the 750 configured with a 1 Mbyte L2 cache Chapter 9 L2 Cache Interface Operation 9 1 e e GE Ge Deele L2DP 0 7 Ze PARITY 0 3 128k x 36 SRAM L2CE L2WE L2ZZ e Optional gt CH CO O v N ANS L2CLK_OUTA 750 ADDR 16 0 DATA O 31 L2SYNC_OUT PARITY 0 3 L2SYNC_IN e 128k x 36 SRAM gt m CO O NO v Optional AN gt L2CLK_OUTB Notes Fora 1 Mbyte L2 use address bits 16 0 bit O is LSB Fora 512 Kbyte L2 use address bits 15 0 bit O is LSB For a 256 Kbyte L2 use address bits 14 0 bit O is LSB External clock routing should ensur
456. ons A PowerPC processor invokes the illegal instruction error handler part of the program exception when the unimplemented PowerPC instructions are encountered so they may be emulated in software as required Note that the architecture specification refers to exceptions as interrupts A defined instruction can have invalid forms The 750 provides limited support for instructions represented in an invalid form 2 3 1 3 Illegal Instruction Class Illegal instructions can be grouped into the following categories e Instructions not defined in the PowerPC architecture The following primary opcodes are defined as illegal but may be used in future extensions to the architecture 1 4 5 6 9 22 56 57 60 61 Future versions of the PowerPC architecture may define any of these instructions to perform new functions Chapter 2 Programming Model 2 33 e Instructions defined in the PowerPC architecture but not implemented in a specific PowerPC implementation For example instructions that can be executed on 64 bit PowerPC processors are considered illegal by 32 bit processors such as the 750 The following primary opcodes are defined for 64 bit implementations only and are illegal on the 750 2 30 58 62 e All unused extended opcodes are illegal The unused extended opcodes can be determined from information in Section A 2 Instructions Sorted by Opcode and Section 2 3 1 4 Reserved Instruction Class Notice that extend
457. ons Continued Data cache enable O The data cache is neither accessed nor updated All pages are accessed as if they were marked cache inhibited WIM X1X Potential cache accesses from the bus snoop and cache operations are ignored In the disabled state for the L1 caches the cache tag state bits are ignored and all accesses are propagated to the L2 cache or bus as single beat transactions For those transactions however Cl reflects the original state determined by address translation regardless of cache disabled status DCE is zero at power up The data cache is enabled Instruction cache lock O Normal operation 1 Instruction cache is locked A locked cache supplies data normally on a hit but are treated as a cache inhibited transaction on a miss On a miss the transaction to the bus or the L2 cache is single beat however Cl still reflects the original state as determined by address translation independent of cache locked or disabled status To prevent locking during a cache access an isync instruction must precede the setting of ILOCK 19 Data cache lock O Normal operation 1 Data cache is locked A locked cache supplies data normally on a hit but is treated as a cache inhibited transaction on a miss On a miss the transaction to the bus or the L2 cache is single beat however CI still reflects the original state as determined by address translation independent of cache locked or disabled status A snoop hit to a locked L1 data
458. ons in the L2 cache unit have completed 2 Initiate the global invalidation operation by setting the L2CR L2I bit to 1 3 Monitor the L2CR L2IP bit to determine when the global invalidation operation is completed indicated by the clearing of L2CR L2IP The global invalidation requires approximately 32K core clock cycles to complete 4 After detecting the clearing of L2CR L2IP clear L2CR L2 and re enable the L2 cache for normal operation by setting L2CR L2E 9 1 5 L2 Cache Test Features and Methods In the course of system power up testing may be required to verify the proper operation of the L2 tag memory external SRAM and overall L2 cache system The following sections describe the 750 s features and methods for testing the L2 cache The L2 cache address space should be marked as guarded G 1 so spurious load operations are not forwarded to the 60x bus interface before branch resolution during L2 cache testing 9 1 5 1 L2CR Support for L2 Cache Testing L2CR DO and L2CR TS support the testing of the L2 cache L2CR DO prevents instructions from being cached in the L2 This allows the L1 instruction cache to remain enabled during the testing process without having L1 instruction misses affect the contents of the L2 cache and allows all L2 cache activity to be controlled by program specified load and store operations Chapter 9 L2 Cache Interface Operation 9 7 L2CR TS is used with the dcbf and debst instructions to p
459. ons see Appendix F Simplified Mnemonics in The Programming Environments Manual Chapter 2 Programming Model 2 39 2 3 4 1 3 Integer Logical Instructions The logical instructions shown in Table 2 23 perform bit parallel operations on the specified operands Logical instructions with the CR updating enabled uses dot suffix and instructions andi and andis set CR field CRO to characterize the result of the logical operation Logical instructions do not affect XER SO XER OV or XER CA See Appendix F Simplified Mnemonics in The Programming Environments Manual for simplified mnemonic examples for integer logical operations Table 2 23 Integer Logical Instructions OR Immediate rA rS UIMM The PowerPC architecture defines ori r0 r0 0 as the preferred form for the no op instruction The dispatcher discards this instruction except for pending trace or breakpoint exceptions EE CC EREECHEN XOR Immediate Shifted xoris rA rS UIMM we pe EAN AND with Complement OR with Complement Extend Sign Byte Extend Sign Half Word Count Leading Zeros Word 2 3 4 1 4 Integer Rotate and Shift Instructions Rotation operations are performed on data from a GPR and the result or a portion of the result is returned to a GPR See Appendix F Simplified Mnemonics in The Programming Environments Manual for a complete list of simplified mnemonics that allows simpler coding of often used functions such as clearing the leftmost
460. oped as if they were writes causing the 750 to flush the cache block write the cache block back to memory and invalidate the cache block if it is modified or simply invalidate the cache block if it is unmodified The exception to this rule occurs when a snooped transaction is a caching inhibited read either burst or single beat where TT 0 4 X1010 see Table 7 1 for clarification in which case the 750 does not invalidate the snooped cache block If the cache block is modified the block is written back to memory and the cache block is marked exclusive If the cache block is marked exclusive no bus action is taken 3 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual and the cache block remains in the exclusive state This treatment of caching inhibited reads decreases the possibility of data thrashing by allowing noncaching devices to read data without invalidating the entry from the 750 s data cache Section 3 7 MEI State Transactions provides a detailed list of MEI transitions for various operations and WIM bit settings 3 3 2 1 MEI Hardware Considerations While the 750 provides the hardware required to monitor bus traffic for coherency the 750 data cache tags are single ported and a simultaneous load store and snoop access represents a resource conflict In general the snoop access has highest priority and is given first access to the tags The load or store access will then occur on the clock following t
461. oper Action on Hit Clean block Address only fo NA are E apo fop Kill block Address only 1 Flush cancel reservation OEA ETA External control word write fi fo fo mw TE alt Pope pu EEE EA External control word read Single beat read MAA A lwarx Address only reservation set Write with flush Single beat write or burst 1 Flush cancel reservation Write with kill Single beat write or burst 1 1 Kill cancel reservation Read Single beat read or burst WEE ES Clean or flush E ee a PE EE E EE 7 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 7 2 PowerPC 750 Snoop Hit Response Continued 60x Bus Specification PowerPC lt 50 p Transaction TTO TT1 TT2 TT3 TT4 Bus Snooper Command 3 Action on Hit Write with flush atomic Single beat write 1 1 Flush cancel reservation oserei Pots ts fe wa Read atomic Single beat read or burst fete i005 Clean or flush Read with intent to modify 1 GE atomic E o poppi iee E aa 7 2 4 2 Transfer Size TSIZ 0 2 Output Following are the state meaning and timing comments for the transfer size TSIZ 0 2 output signals on the 750 State Meaning Asserted Negated For memory accesses these signals along with TBST indicate the data transfer size for the current bus operation as shown in Table 7 3 Table 8 4 shows how the transfer size signals are used with the address signals for aligned transfers Table 8 5 shows how the t
462. or COP unit provide a serial interface to the system for performing board level boundary scan interconnect tests IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 1 Signal Configuration Figure 7 1 illustrates the 750 s signal configuration showing how the signals are grouped A pinout showing pin numbers is included in the 750 hardware specifications Address Arbitration Address Start Address Bus Transfer Attributes Address Termination Data Arbitration Data Transfer Data Termination SI L2Vpp N Not supported in the 740 BR lt e BG e ABB S TS AJO 31 p AP O 3 TT 0 4 lt TBST E TSIZ 0 2 lt GBL lt lt GR Cl AACK ARTRY DBG DBWO DBB lt D 0 63 DP 0 7 lt DBDIS aA DATRY TEA Von Von 1 0 AVpp L2AVpp L2ADDR 16 0 L2 Cache L2DATA 0 63 Address L2DP 0 7 Data L2CE L2WE L2CLK_OUT A B a L2 Cache LeSYNC_OUT o Clock L2SYNC_IN Conal L2ZZ gt E INT SMI MCP SRESET dee HRESET SEN CKSTP_IN CKSTP_OUT RSRV TBEN SES Processor TLBISYNC Status QRE e Control ACK SYSCLK PLL_CFG O 3 Clock CLK OUT 8 Control JTAG COP Test Factory Test Interface Figure 7 1 PowerPC 750 Signal Groups Chapter 7 Signal Descriptions 7 3 7
463. or a 4 byte word to be aligned it must be oriented on an address that is a multiple of 4 Table 8 4 Aligned Data Transfers Data Bus Byte Lane s Byte Half word EE Notes These entries indicate the byte portions of the requested operand that are read or written during that bus transaction These entries are not required and are ignored during read transactions and are driven with unde fined data during all write transactions The 750 supports misaligned memory operations although their use may substantially degrade performance Misaligned memory transfers address memory that is not aligned to the size of the data being transferred such as a word read of an odd byte address Although most of these operations hit in the primary cache or generate burst memory operations if they miss the 750 interface supports misaligned transfers within a word 32 bit aligned boundary as shown in Table 8 5 Note that the 4 byte transfer in Table 8 5 is only one example of misalignment As long as the attempted transfer does not cross a word boundary the 750 can transfer the data on the misaligned address for example a half word read from an odd byte aligned address An attempt to address data that crosses a word boundary requires two bus transfers to access the data 8 18 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Due to the performance degradations associated with misaligned memory operations they are best avoide
464. or rightmost bits of a register left justifying or right justifying an arbitrary field and simple rotates and shifts Equivalent eqv eqv 2 40 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Integer rotate instructions rotate the contents of a register The result of the rotation is either inserted into the target register under control of a mask if a mask bit is 1 the associated bit of the rotated data is placed into the target register and if the mask bit is 0 the associated bit in the target register is unchanged or ANDed with a mask before being placed into the target register The integer rotate instructions are summarized in Table 2 24 Table 2 24 Integer Rotate Instructions es Sie Rotate Left Word Immediate then AND with Mask rlwinm rlwinm rA rS SH MB ME Rotate Left Word then AND with Mask rlwnm rlwnm rA rS rB MB ME Rotate Left Word Immediate then Mask Insert rlwimi rlwimi rA rS SH MB ME The integer shift instructions perform left and right shifts Immediate form logical unsigned shift operations are obtained by specifying masks and shift values for certain rotate instructions Simplified mnemonics shown in Appendix F Simplified Mnemonics in The Programming Environments Manual are provided to make coding of such shifts simpler and easier to understand Multiple precision shifts can be programmed as shown in Appendix C Multiple Precision Shifts in The Programming Env
465. or will receive when executing a load operation that is of course until it is changed again With respect to the 750 caching allowed WIMG x0xx loads and caching allowed write back WIMG 00xx stores are performed when they have arbitrated to address the cache block Note that in the event of a cache miss these storage operations may place a memory request into the processor s memory queue but such operations are considered an extension to the state of the cache with respect to snooping bus operations Caching inhibited WIMG x1xx loads caching inhibited WIMG x1xx stores and write through WIMG 1xxx stores are performed when they have been successfully presented to the external 60x bus 3 3 5 2 Sequential Consistency of Memory Accesses The PowerPC architecture requires that all memory operations executed by a single processor be sequentially consistent with respect to that processor This means that all memory accesses appear to be executed in program order with respect to exceptions and data dependencies The 750 achieves sequential consistency by operating a single pipeline to the cache MMU All memory accesses are presented to the MMU in exact program order and therefore exceptions are determined in order Loads are allowed to bypass stores once exception checking has been performed for the store but data dependency checking is handled in the load store unit so that a load will not bypass a store with an address match Not
466. order access a hardware table search operation begins if there is a TLB miss If the access is out of order the table search operation is postponed until the access is required at which point the access is no longer out of order When the matching PTE is found in memory it is loaded into the TLB entry selected by the least recently used LRU replacement algorithm and the translation process begins again this time with a TLB hit To uniquely identify a TLB entry as the required PTE the PTE also contains four more bits of the page index EA 0 13 in addition to the API bits in of the PTE Software cannot access the TLB arrays directly except to invalidate an entry with the tlbie instruction 5 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Each set of TLB entries has one associated LRU bit The LRU bit for a set is updated any time either entry is used even if the access is speculative Invalid entries are always the first to be replaced Although both MMUs can be accessed simultaneously both sets of segment registers and TLBs can be accessed in the same clock only one exception condition can be reported at a time ITLB miss exceptions are reported when there are no more instructions to be dispatched or retired the pipeline is empty and DTLB miss conditions are reported when the load or store instruction is ready to be retired Refer to Chapter 6 Instruction Timing for more detailed information about the
467. ore to a direct store segment causes a DSI exception rather than an alignment exception as specified by the PowerPC architecture The 750 also implements the data address breakpoint facility which is defined as optional in the PowerPC architecture and is supported by the optional data address breakpoint register DABR Although the architecture does not strictly prescribe how this facility must be implemented the 750 follows the recommendations provided by the architecture and described in the Chapter 2 Programming Model and Chapter 6 Exceptions in The Programming Environments Manual 4 5 4 ISI Exception 0x00400 An ISI exception occurs when no higher priority exception exists and an attempt to fetch the next instruction fails This exception is implemented as it is defined by the PowerPC architecture OEA and is taken for the following conditions e The effective address cannot be translated e The fetch access is to a no execute segment SR N 1 e The fetch access is to guarded storage and MSR IR 1 Chapter 4 Exceptions 4 19 e The fetch access is to a segment for which SR T is set e The fetch access violates memory protection When an ISI exception is taken instruction fetching resumes at offset 0x00400 from the physical base address indicated by MSR IP 4 5 5 External Interrupt Exception 0x00500 An external interrupt is signaled to the processor by the assertion of the external interrupt signal
468. ounters and the control registers are supervisor level SPRs however in the 750 the contents of these registers can be read by user level software using separate SPRs UMMCRO and UMMCR1 Control fields in the MMCRO and MMCRI select the events to be counted can enable a counter overflow to initiate a performance monitor exception and specify the conditions under which counting is enabled As with other PowerPC exceptions the performance monitor interrupt follows the normal PowerPC exception model with a defined exception vector offset OxOOFOO Its priority is below the external interrupt and above the decrementer interrupt 11 1 Performance Monitor Interrupt The performance monitor provides the ability to generate a performance monitor interrupt triggered by a counter overflow condition in one of the performance monitor counter registers PMC1 PMCA4 shown in Figure 11 3 A counter is considered to have overflowed when its most significant bit is set A performance monitor interrupt may also be caused by the flipping from 0 to 1 of certain bits in the time base register which provides a way to generate a time reference based interrupt Although the interrupt signal condition may occur with MSR EE 0 the actual exception cannot be taken until MSR EE 1 As a result of a performance monitor exception being taken the action taken depends on the programmable events as follows To help track which part of the code was being executed whe
469. overable exception remains pending until the instruction completes Further instruction completion is halted The asynchronous maskable recoverable exception is taken when a recoverable state is reached e Instruction related exceptions These exceptions are further organized into the point in instruction processing in which they generate an exception Instruction fetch ISI exceptions Once this type of exception is detected dispatching stops and the current instruction stream is allowed to drain out of the machine If completing any of the instructions in this stream causes an exception that exception is taken and the instruction fetch exception is discarded but may be encountered again when instruction processing resumes Otherwise once all pending instructions have executed and a recoverable state is reached the ISI exception is taken Instruction dispatch execution Program DSI alignment floating point unavailable system call and instruction address breakpoint This type of exception is determined during dispatch or execution of an instruction The exception remains pending until all instructions before the exception causing instruction in program order complete The exception is then taken without completing the exception causing instruction If completing these previous instructions causes an exception that exception takes priority over the pending instruction dispatch execution exception which is then discarded bu
470. ow through burst SRAM frequencies available may only support the slowest L2 bus frequencies The 750 supports flow through burst SRAM at L2 clock ratios of 2 2 5 and 3 Chapter 9 L2 Cache Interface Operation 9 9 Figure 9 27 shows a burst read write read memory access sequence when the L2 cache interface is configured with flow through burst SRAM SRAMCIk NAPA L2CE L2WE SRAMAddress SRAMMemory SRAMData Note Bau indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read Figure 9 27 Burst Read Write Read L2 Cache Access Flow Through Figure 9 28 shows a burst read modify write memory access sequence when the L2 cache interface is configured with flow through burst SRAM SRAMCIK LJ LJ LI LILI 1 PL LT UUU L2CE a ee ee A L2WE burst rd burst rd SRAMAddress SRAMMemory SRAMData Note Dau indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read Figure 9 28 Burst Read Modify Write L2 Cache Access Flow Through 9 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Figure 9 29 shows a burst read write write memory access sequence when the L2 cache interface is configured with flow through burst SRAM SRAMCIk ITT U 1 We fn ip EW
471. owerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table A 10 Floating Point Rounding and Conversion Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 fcfidx 63 D 00000 B 846 Re fetidx 63 D 00000 B 814 Re fetidzx 63 D 00000 B 815 Re fctiwx 63 D 00000 B 14 Re fctiwzx 63 D 00000 B 15 Re frspx 63 D 00000 B 12 Re Note 1 64 bit instruction Table A 11 Floating Point Compare Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 fcmpo 63 ef 00 A B 32 0 fcmpu 63 crfD 00 A B 0 0 Table A 12 Floating Point Status and Control Register Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 merfs 63 00000 64 0 mffsx 63 D 00000 00000 583 Re mtfsb0x 63 croD 00000 00000 70 Re mtfsb1 x 63 croD 00000 00000 38 Re mtfsfx 31 0 FM 0 B 711 Re mtfsfix 63 crfD oo oo000 mm 134 Re Appendix A PowerPC Instruction Set Listings A 21 Table A 13 Integer Load Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Ibz 34 D A d Ibzu 35 D A d Ibzx 31 D A B 87 0 Id 1 58 D A ds 0 Idu 58 D A ds 1 Idux 31 D A B 53 0 Iha 42 D A d Ihau 43 D A d Ihaux 31 D A B 375 0 Ihax 31 D A B 343 0 Ihz 40 D A d Ihzu 41 D A d I
472. p revisions 3 0 and later L2DRO L2 DLL Rollover Checkstop Enable for chip revisions 3 0 and later 31 L2IP L2 global invalidate in progress read only This read only bit indicates whether an L2 global invalidate is occurring 9 1 3 L2 Cache Initialization Following a power on or hard reset the L2 cache and the L2 DLL are disabled initially Before enabling the L2 cache the L2 DLL must first be configured through the L2CR register and the DLL must be allowed 640 L2 clock periods to achieve phase lock Before enabling the L2 cache other configuration parameters must be set in the L2CR and the L2 tags must be globally invalidated The L2 cache should be initialized during system start up The sequence for initializing the L2 cache is as follows 1 Power on reset automatically performed by the assertion of HRESET signal 2 Disable interrupts and Dynamic Power Management DPM 3 Disable L2 cache by clearing L2 CR L2E 4 Set the L2CR L2CLK bits to the desired clock divider setting Setting a nonzero value automatically enables the DLL All other L2 cache configuration bits should be set to properly configure the L2 cache interface for the SRAM type size and interface timing required 5 Wait for the L2 DLL to achieve phase lock This can be timed by setting the decrementer for a time period equal to 640 L2 clocks or by performing an L2 global invalidate 9 6 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User
473. pecially in a multiprocessing system Memory hierarchy behavior may be monitored and studied in order to develop algorithms that schedule tasks and perhaps partition them and that structure and distribute data optimally To improve processor architecture the detailed behavior of the PowerPC 750 s structure must be known and understood in many software environments Some environments may not be easily characterized by a benchmark or trace To help system developers bring up and debug their systems The performance monitor uses the following 750 specific special purpose registers SPRs The performance monitor counter registers PMC1 PMC4 are used to record the number of times a certain event has occurred UPMC1 UPMC4 provide user level read access to these registers The monitor mode control registers MMCRO MMCR1 are used to enable various performance monitor interrupt functions and select events to count UMMCRO UMMCR 1 provide user level read access to these registers The sampled instruction address register SIA contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition USIA provides user level read access to the SIA Chapter 11 Performance Monitor 11 1 Four 32 bit counters in the 750 count occurrences of software selectable events Two control registers MMCRO and MMCRI1 are used to control performance monitor operation The c
474. ped Table 3 6 summarizes the address and transfer attribute information presented on the bus by the 750 for various master or snoop related transactions Table 3 6 Address Transfer Attribute Summary ren JL 3 Instruction fetch operations ane Cente Burst caching allowed PA O 28 0b000 01110 Single beat read PA O 28 0b000 01010 caching inhibited or cache disabled Cache block fill due to load or PA O 28 Ob000 A1110 store miss Chapter 3 Instruction and Data Cache Operation 3 29 Table 3 6 Address Transfer Attribute Summary Continued normal replacement P Push cache block push due to A O 26 0000000 00110 dcbf dcbst Snoop copyback GE 26 0000000 00110 Data cache bypass operations Single beat read caching inhibited or cache disabled Single beat write caching inhibited write through or cache disabled dcbz addr only Ob000 gt 1100 debi if HIDO ABE 1 PA O 26 0000000 01100 O 10 a addr only debf if HIDO ABE 1 PA O 26 0000000 00100 M e addr only debst if HIDO ABE 1 PA O 26 0000000 00000 010 aM 1 addr only sync if HIDO ABE 1 0x0000_0000 01000 010 addr only eieio if HIDO ABE 1 0x0000_0000 10000 010 addr only stwex always single beat write PA O 29 0b00 10010 eciwx Paro aaniioboo 11100 sarta 1 0 0 emm Image ol eareo 1 1 0 Notes PA Physical address CA Cac
475. perations and single beat noncacheable or write through memory read and write operations Additionally there can be address only operations variants of the burst and single beat operations global memory operations that are snooped and atomic memory operations for example and address retry activity for example when a snooped read access hits a modified line in the cache Since the 750 data cache tags are single ported simultaneous load or store and snoop accesses Cause resource contention Snoop accesses have the highest priority and are given first access to the tags unless the snoop access coincides with a tag write in which case the snoop is retried and must re arbitrate for access to the cache Loads or stores that are deferred due to snoop accesses are performed on the clock cycle following the snoop The 750 supports a three state coherency protocol that supports the modified exclusive and invalid MEI cache states The protocol is a subset of the MESI modified exclusive shared invalid four state protocol and operates coherently in systems that contain four state caches With the exception of the debz instruction and the debi debst and debf instructions if HIDO ABE is enabled the 750 does not broadcast cache Chapter 8 Bus Interface Operation 8 3 control instructions The cache control instructions are intended for the management of the local cache but not for other caches in the system Cache lines in the 750 are lo
476. ping the data transactions to memory in order Note also that all burst writes by the 750 and 603e are performed as non global and hence do not normally enable snooping even for address collision purposes Snooping may still occur for reservation cancelling purposes 3 6 4 Snoop Response to 60x Bus Transactions There are several bus transaction types defined for the 60x bus The transactions in Table 3 5 correspond to the transfer type signals TT O 4 which are described in Section 7 2 4 1 Transfer Type TT O4 The 750 never retries a transaction in which GBL is not asserted even if the tags are busy or there is a tag hit Reservations are snooped regardless of the state of GBL 3 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 3 5 Response to Snooped Bus Transactions Kill block 01100 The kill block operation is an address only bus transaction initiated when a dcbz or debi instruction is executed e If the addressed cache block is in the exclusive E state the cache block is placed in the invalid 1 state e If the addressed cache block is in the modified M state the 750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid I state e If the address misses in the cache no action is taken Any reservation associated with the address is canceled EIEIO 10000 No action is taken External control word 10100 No action is ta
477. places resulting data into the appropriate GPR or FPR rename register The results are then stored into the correct GPR or FPR during the write back stage If a subsequent instruction needs the result as a source operand it is made available simultaneously to the appropriate execution unit which allows a data dependent instruction to be decoded and dispatched without waiting to read the data from the register file Branch instructions that update either the LR or CTR write back their results in a similar fashion The following section describes this process in greater detail 6 3 1 General Instruction Flow As many as four instructions can be fetched into the instruction queue IQ in a single clock cycle Instructions enter the IQ and are issued to the various execution units from the dispatch queue The 750 tries to keep the IQ full at all times unless instruction cache throttling is operating The number of instructions requested in a clock cycle is determined by the number of vacant spaces in the IQ during the previous clock cycle This is shown in the examples in this chapter Although the instruction queue can accept as many as four new instructions in a single clock cycle if only one IQ entry is vacant only one instruction is fetched Typically instructions are fetched from the on chip instruction cache but they may also be fetched from the branch target instruction cache BTIC If the instruction request hits in the BTIC it can usuall
478. port the ZZ function This bit should not be set when the 750 is in nap mode and snooping is being performed through deassertion of QACK L2 write through Setting L2WT selects write through mode rather than the default copy back mode so all writes to the L2 cache also write through to the 60x bus L2 test support Setting L2TS causes cache block pushes from the L1 data cache that result from debf and dcbst instructions to be written only into the L2 cache and marked valid rather than being written only to the 60x bus and marked invalid in the L2 cache in case of hit If L2TS is set causes single beat store operations that miss in the L2 cache to be discarded Chapter 9 L2 Cache Interface Operation 9 5 Table 9 8 L2 Cache Control Register Continued a e e 14 15 L20H L2 output hold These bits configure the output hold time of the address data and control signals driven by the 750 to the L2 data RAMs 00 0 5nS 01 1 0nS 10 Reserved 11 Reserved L2SL L2 DLL slow Setting L2SL enables L2 data RAM clocking at frequencies less than 100 MHz L2 differential clock Setting L2DF configures the two clock out signals L2CLK_OUTA and L2CLK_OUTB of the L2 interface to operate as one differential clock 18 L2BYP L2 DLL bypass L2BYP is intended for use when the PLL is being bypassed and for engineering evaluation Reserved These bits are implemented but not used keep at 0 for future compatibility L2CS L2 Clock Stop for chi
479. priate TLB page translation succeeds and the physical address bits are forwarded to the memory subsystem If the required translation is not resident the MMU performs a search of the page table If the required PTE is found a TLB entry is allocated and the page translation is attempted again This time the TLB is guaranteed to hit When the translation is located the access is qualified with the appropriate protection bits If the access causes a protection violation either an ISI or DSI exception is generated If the PTE is not found by the table search operation a page fault condition exists and an ISI or DSI exception occurs so software can handle the page fault 5 1 7 MMU Exceptions Summary To complete any memory access the effective address must be translated to a physical address As specified by the architecture an MMU exception condition occurs if this translation fails for one of the following reasons e Page fault there is no valid entry in the page table for the page specified by the effective address and segment descriptor and there is no valid BAT translation e An address translation is found but the access is not allowed by the memory protection mechanism The translation exception conditions defined by the OEA for 32 bit implementations cause either the ISI or the DSI exception to be taken as shown in Table 5 3 The state saved by the processor for each of these exceptions contains information that identifies the
480. processor User s Manual Write with Flush operation 3 27 Write with Kill operation 3 27 WT write through signal 7 13 X XER register 2 3 Index Index 11 Index 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual PowerPC 740 PowerPC 750 Overview Processor Programming Model L1 Instruction and Data Cache Operation Exceptions Memory Management Instruction Timing Signal Descriptions Bus Interface Operation L2 Cache Interface Operation Power and Thermal Management Performance Monitor PowerPC Instruction Set Listings Instructions Not Implemented Glossary of Terms and Abbreviations Index esch GI Be gt i i gt esch Z a PowerPC 740 PowerPC 750 Overview NO Processor Programming Model 3 L1 Instruction and Data Cache Operation Eh Exceptions Wu Memory Management Instruction Timing N Signal Descriptions Bus Interface Operation L2 Cache Interface Operation h 0 Power and Thermal Management 11 Performance Monitor PowerPC Instruction Set Listings EW Instructions Not Implemented CIRO Glossary of Terms and Abbreviations IND Index
481. provided by the PowerPC OEA for PowerPC processors e Chapter 6 Instruction Timing provides information about latencies interlocks special situations and various conditions to help make programming more efficient This chapter is of special interest to software engineers and system designers e Chapter 7 Signal Descriptions provides descriptions of individual signals of the 750 e Chapter 8 Bus Interface Operation describes signal timings for various operations It also provides information for interfacing to the 750 e Chapter 9 L2 Cache Interface Operation describes the implementation and use of the 750 L2 cache and cache controller Note that this feature is not supported on the 740 e Chapter 10 Power and Thermal Management provides information about power saving and thermal management modes for the 750 About This Book Xxix e Chapter 11 Performance Monitor describes the operation of the performance monitor diagnostic tool incorporated in the 750 e Appendix A PowerPC Instruction Set Listings lists all the PowerPC instructions while indicating those instructions which are not implemented by the 750 it also includes the instructions which are specific to the 750 Instructions are grouped according to mnemonic opcode function and form Also included is a quick reference table that contains general information such as the architecture level privilege level and form and in
482. ptional 64 bit bridge instruction 4 64 bit instruction Table A 27 Cache Management Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 dcba 13 31 00000 A B 758 0 dcbf 31 00000 A B 86 0 debi 2 31 00000 A B 470 0 dcbst 31 00000 A B 54 0 dcbt 31 00000 A B 278 0 dcbtst 31 00000 A B 246 0 dcbz 31 00000 A B 1014 0 icbi 31 00000 A B 982 0 Notes 1 Optional instruction 2 Supervisor level instruction 3 32 bit instruction not implemented by the PowerPC 750 Appendix A PowerPC Instruction Set Listings A 27 Table A 28 Segment Register Manipulation Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 mier 12 31 D o SR 00000 595 0 mfsrin 1 2 31 D 00000 B 659 0 mtsr 12 31 S 0 SR 00000 210 0 S 0 0 mtsrin 12 31 S 00000 B 0 Notes 1 Supervisor level instruction 2 Optional 64 bit bridge instruction Table A 29 Lookaside Buffer Management Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 slbia 2 3 31 00000 00000 00000 498 0 i 00000 00000 00000 E 1 2 44 5 5 tibsyne 00000 00000 00000 E Notes 1 Supervisor level instruction 2 Optional instruction 3 64 bit instruction 4 32 bit instruction not implemented by the PowerPC 750 A Supervisor level instruction 5 Optional instruction Table A 30 External Control Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 1
483. quired The DRTRY signal may be held asserted for multiple bus clock cycles When DRTRY is negated data must have been valid on the previous clock with TA asserted Negation Must occur during the bus clock cycle after a valid data beat This may occur several cycles after DBB is negated effectively extending the data bus tenure Start up The DRTRY signal is sampled at the negation of HRESET if DRTRY is asserted no DRTRY mode is selected If DRTRY is negated at start up DRTRY is enabled 7 2 8 3 Transfer Error Acknowledge TEA Input Following are the state meaning and timing comments for the TEA signal State Meaning Timing Comments 7 20 Asserted Indicates that a bus error occurred Causes a machine check exception and possibly causes the processor to enter checkstop state if machine check enable bit is cleared MSR ME 0 For more information see Section 4 5 2 2 Checkstop State MSR ME 0 Assertion terminates the current transaction that is assertion of TA and DRTRY are ignored The assertion of TEA causes the negation high impedance of DBB in the next clock cycle However data entering the GPR or the cache are not invalidated Note that the term exception is also referred to as interrupt in the architecture specification Negated Indicates that no bus error was detected Assertion May be asserted while DBB is asserted and the cycle after TA during
484. r device can invalidate that line before the 750 is granted mastership of the bus Once the 750 is granted the bus it no longer needs to perform the copy back operation therefore the 750 does not assert ABB and does not use the bus for the copy back operation Note that the 750 asserts BR for at least one clock cycle in these instances System designers should note that it is possible to ignore the ABB signal and regenerate the state of ABB locally within each device by monitoring the TS and AACK input signals The 750 allows this operation by using both the ABB input signal and a locally regenerated version of ABB to determine if a qualified bus grant state exists both sources are internally ORed together The ABB signal may only be ignored if ABB and TS are asserted simultaneously by all masters or where arbitration through assertion of BG is properly managed in cases where the regenerated ABB may not properly track the ABB signal on Chapter 8 Bus Interface Operation 8 13 the bus If the 750 s ABB signal is ignored by the system it must be connected to a pull up resistor to ensure proper operation Additionally the 750 will not qualify a bus grant during the cycle that TS is asserted on the bus by any master Address bus arbitration without the use of the ABB signal requires that every assertion of TS be acknowledged by an assertion of AACK while the processor is not in sleep mode 8 3 2 Address Transfer
485. r clock cycle Complete Write Back Figure 1 6 Pipeline Diagram Note that Figure 1 6 does not show features such as reservation stations and rename buffers that reduce stalls and improve instruction throughput 1 34 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The instruction pipeline in the 750 has four major pipeline stages described as follows e The fetch pipeline stage primarily involves retrieving instructions from the memory system and determining the location of the next instruction fetch The BPU decodes branches during the fetch stage and removes those that do not update CTR or LR from the instruction stream e The dispatch stage is responsible for decoding the instructions supplied by the instruction fetch stage and determining which instructions can be dispatched in the current cycle If source operands for the instruction are available they are read from the appropriate register file or rename register to the execute pipeline stage If a source operand is not available dispatch provides a tag that indicates which rename register will supply the operand when it becomes available At the end of the dispatch stage the dispatched instructions and their operands are latched by the appropriate execution unit e Instructions executed by the IUs FPU SRU and LSU are dispatched from the bottom two positions in the instruction queue In a single clock cycle a maximum of two instructions can be dispatched to
486. r enters sleep mode after several processor clocks At this point the system logic may turn off the PLL by first configuring PLL_CFG 0 3 to PLL bypass mode then disabling SYSCLK Dynamic power management enable 0 Dynamic power management is disabled 1 Functional units enter a low power mode automatically if the unit is idle This does not affect operational performance and is transparent to software or any external hardware 15 NHR Not hard reset software use only Helps software distinguish a hard reset from a soft reset O A hard reset occurred if software had previously set this bit 1 A hard reset has not occurred If software sets this bit after a hard reset when a reset occurs and this bit remains set software can tell it was a soft reset ICE Instruction cache enable O The instruction cache is neither accessed nor updated All pages are accessed as if they were marked cache inhibited WIM X1X Potential cache accesses from the bus snoop and cache operations are ignored In the disabled state for the L1 caches the cache tag state bits are ignored and all accesses are propagated to the L2 cache or bus as single beat transactions For those transactions however Cl reflects the original state determined by address translation regardless of cache disabled status ICE is zero at power up 1 The instruction cache is enabled 2 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 4 HIDO Bit Functi
487. r information about supervisor level cache segment register manipulation and translation lookaside buffer management instructions 2 3 5 3 1 User Level Cache Instructions VEA The instructions summarized in this section help user level programs manage on chip caches if they are implemented See Chapter 3 Instruction and Data Cache Operation for more information about cache topics The following sections describe how these operations are treated with respect to the 750 s cache As with other memory related instructions the effects of cache management instructions on memory are weakly ordered If the programmer must ensure that cache or other instructions have been performed with respect to all other processors and system mechanisms a sync instruction must be placed after those instructions Note that the 750 interprets cache control instructions icbi debi dcbf dcbz and dcbst as if they pertain only to the local L1 and L2 cache A debz with M set is always broadcast on the 60x bus The debi dcbf and debst operations are broadcast if HIDO ABE is set 2 62 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The 750 never broadcasts an icbi Of the broadcast cache operations the 750 snoops only dcbz regardless of the HIDO ABE setting Any bus activity caused by other cache instructions results directly from performing the operation on the 750 cache All cache control instructions to T 1 space are no ops
488. r snoop monitoring e Pipeline collision detection for data cache buffers e Reservation address snooping for lwarx stwex instructions e One level address pipelining e Load ahead of store capability A conceptual block diagram of the bus interface is shown in Figure 8 1 The address register queues in the figure hold transaction requests that the bus interface may issue on the bus independently of the other requests The bus interface may have up to two transactions operating on the bus at any given time through the use of address pipelining Chapter 8 Bus Interface Operation 8 1 lt lt D Cache D Cache D Cache D Cache LD Addr CST ST Addr 0 CST ST Addr 1 SNP Addr Cache LD Addr BIU Control Data Control L2 or System Bus Figure 8 1 Bus Interface Address Buffers 8 1 Bus Interface Overview The bus interface prioritizes requests for bus operations from the instruction and data caches and performs bus operations in accordance with the protocol described in the PowerPC Microprocessor Family The Bus Interface for 32 Bit Microprocessors It includes address register queues prioritization logic and bus control unit The bus interface latches snoop addresses for snooping in the data cache and in the address register queues and for reservations controlled by the Load Word and Reserve Indexed Iwarx and Store Word Conditional Indexed stwex instructions and maintains
489. r the 750 1 36 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 1 11 Thermal Management The 750 s thermal assist unit TAU provides a way to control heat dissipation This ability is particularly useful in portable computers which due to power consumption and size limitations cannot use desktop cooling solutions such as fans Therefore better heat sink designs coupled with intelligent thermal management is of critical importance for high performance portable systems Primarily the thermal management system monitors and regulates the system s operating temperature For example if the temperature is about to exceed a set limit the system can be made to slow down or even suspend operations temporarily in order to lower the temperature The thermal management facility also ensures that the processor s junction temperature does not exceed the operating specification To avoid the inaccuracies that arise from measuring junction temperature with an external thermal sensor the 750 s on chip thermal sensor and logic tightly couples the thermal management implementation The TAU consists of a thermal sensor digital to analog convertor comparator control logic and the dedicated SPRs described in Section 1 4 PowerPC Registers and Programming Model The TAU does the following e Compares the junction temperature against user programmable thresholds e Generates a thermal management interrupt if the temperatu
490. ranch prediction implemented in the BHT two level branch prediction and the implementation of nonblocking caches minimize the penalties associated with flow control operations on the 750 The timing for branch instruction execution is determined by many factors including the following e Whether the branch is taken e Whether instructions in the target stream typically the first two instructions in the target stream are in the branch target instruction cache BTIC e Whether the target instruction stream is in the on chip cache e Whether the branch is predicted e Whether the prediction is correct 6 4 1 1 Branch Folding and Removal of Fall Through Branch Instructions When a branch instruction is encountered by the fetcher the BPU immediately begins to decode it and tries to resolve it All branch instructions except those that update either the LR or CTR are removed from the instruction flow before they would take a position in the completion queue Branch folding occurs either when a branch is taken or is predicted as taken as is the case with unconditional branches When the BPU folds the branch instruction out of the instruction stream the target instruction stream that is fetched into the instruction queue overwrites the branch instruction Figure 6 7 shows branch folding Here a br instruction is encountered in a series of add instructions The branch is resolved as taken What happens on the next clock cycle depends on whe
491. ransfer size signals are used with the address signals for misaligned transfers Note that the 750 does not generate all possible TSIZ 0 2 encodings For external control instructions eciwx and ecowx TSIZ 0 2 are used to output bits 29 31 of the external access register EAR which are used to form the resource ID TBSTIITSIZO TSIZ2 Timing Comments Assertion Negation The same as A 0 31 High Impedance The same as A 0 31 Table 7 3 Data Transfer Size Co eas Negated 010 001 010 11 Chapter 7 Signal Descriptions 7 11 Table 7 3 Data Transfer Size Continued Negated 4 bytes Negated 6 bytes Negated 7 bytes Note Not generated by 750 7 2 4 3 Transfer Burst TBST The transfer burst TBST signal is an input output signal on the 750 7 2 4 3 1 Transfer Burst TBST Output Following are the state meaning and timing comments for the TBST output signal 11 State Meaning Asserted Indicates that a burst transfer is in progress Negated Indicates that a burst transfer is not in progress For external control instructions eciwx and ecowx TBST is used to output bit 28 of the EAR which is used to form the resource ID TBSTIITSIZO TSIZ2 Timing Comments Assertion Negation The same as A 0 31 High Impedance The same as A 0 31 7 2 4 3 2 Transfer Burst TBST Input Following are the state meaning and timing comments for the TBST input signal State Meaning Asserte
492. ranteed that this instruction and all previous instructions can cause no exceptions Chapter 6 Instruction Timing 6 1 6 2 Fall through branch fall through A not taken branch On the 750 fall through branch instructions are removed from the instruction stream at dispatch That is these instructions are allowed to fall through the instruction queue via the dispatch mechanism without either being passed to an execution unit and or given a position in the completion queue Fetch The process of bringing instructions from memory such as a cache or system memory into the instruction queue Folding branch folding The replacement with target instructions of a branch instruction and any instructions along the not taken path when a branch is either taken or predicted as taken Finish Finishing occurs in the last cycle of execution In this cycle the completion queue entry is updated to indicate that the instruction has finished executing Latency The number of clock cycles necessary to execute an instruction and make ready the results of that execution for a subsequent instruction Pipeline In the context of instruction timing the term pipeline refers to the interconnection of the stages The events necessary to process an instruction are broken into several cycle length tasks to allow work to be performed on several instructions simultaneously analogous to an assembly line As an instruction is processed it passes
493. ration L2 Cache Interface Overview asistir nie HAB AAA Ee L2 Cache Control Register L2CR ti E2 Cache Itata L2 Cache Global Invalidation ooonoocccnoccccnoncnononanononacnnnnccnnnnnss L2 Cache Test Features and Methode L2CR Support for L2 Cache Testing oooocnnocccnoccccnonccionnnnnnnos E2 Cache Eeer E2 Clock EE L2 Cache SRAM Timing ExampleS AA Flow Through Burst SRAM ssssssessesssssesssresssesserssesesseeessees Pipelined Burst SRAM EE Late Write SRAM invscinasaradasorctindaretaio epou Rini Chapter 10 Power and Thermal Management Dynamic Power Management Programmable Power MOdeS aid dd Power Management Modes esos tirita Full Power Mode with DPM Disabled sssseennsessesseeessee Full Power Mode with DPM Enabled A Doze Mode iii lala Thermal Assist Unit Operation ooooooocnocccnnccccnoncnononanonnnccnnnnccnannnos TAU Single Threshold Mode sis TAU Dual Threshold Mode PowerPC 750 Junction Temperature Determination Power Saving Modes and TAU Operation ocooococcconcnnnnncnnncnnss Instruction Cache Throttling coacciones E Page Number XV Contents Paragraph Title Page Number Number Chapter 11 Performance Monitor 11 1 Performance Monitor tru 11 2 11 2 Special Purpose Registers Used by Performance Monttor 11 3 11 2 1 Performance Monitor RESIStCES ssscdesyscdeeessdevaganssadauenseccdindeeadierseecaseveciednedacens 11 3 11 2 1 1 Monitor Mode Control Register 0 OMMCROU 11 3
494. rator a subunit for logical operations and a subunit for performing rotates shifts and count leading zero operations These subunits handle all one cycle arithmetic instructions only one subunit can execute an instruction at a time The IU1 has a 32 bit integer multiplier divider as well as the adder shift and logical units of the IU2 The multiplier supports early exit for operations that do not require full 32 x 32 bit multiplication Each IU has a dedicated result bus not shown in Figure 1 1 that connects to rename buffers 1 2 2 4 2 Floating Point Unit FPU The FPU shown in Figure 1 1 is designed such that single precision operations require only a single pass with a latency of three cycles As instructions are dispatched to the FPU s reservation station source operand data can be accessed from the FPRs or from the FPR rename buffers Results in turn are written to the rename buffers and are made available to subsequent instructions Instructions pass through the reservation station in dispatch order 1 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual The FPU contains a single precision multiply add array and the floating point status and control register FPSCR The multiply add array allows the 750 to efficiently implement multiply and multiply add operations The FPU is pipelined so that one single or double precision instruction can be issued per clock cycle Thirty two 64 bit floating point regis
495. rd defines conventions for 64 and 32 bit arithmetic The standard requires that single precision arithmetic be provided for single precision operands The standard permits double precision arithmetic instructions to have either or both single precision or double precision operands but states that single precision arithmetic instructions should not accept double precision operands The PowerPC UISA follows these guidelines e Double precision arithmetic instructions may have single precision operands but always produce double precision results e Single precision arithmetic instructions require all operands to be single precision and always produce single precision results For arithmetic instructions conversion from double to single precision must be done explicitly by software while conversion from single to double precision is done implicitly by the processor All PowerPC implementations provide the equivalent of the following execution models to ensure that identical results are obtained The definition of the arithmetic instructions for infinities denormalized numbers and NaNs follow conventions described in the following sections Although the double precision format specifies an 11 bit exponent exponent arithmetic uses two additional bit positions to avoid potential transient overflow conditions An extra bit is required when denormalized double precision numbers are prenormalized A second bit is required to permit computation
496. rded to the processor before the rest of the cache line is filled For all other burst Chapter 8 Bus Interface Operation 8 25 operations however the cache line is transferred beginning with the eight word aligned data 8 4 4 Data Transfer Termination Four signals are used to terminate data bus transactions TA DRTRY data retry TEA transfer error acknowledge and ARTRY The TA signal indicates normal termination of data transactions It must always be asserted on the bus cycle coincident with the data that itis qualifying It may be withheld by the slave for any number of clocks until valid data is ready to be supplied or accepted DRTRY indicates invalid read data in the previous bus clock cycle DRTRY extends the current data beat and does not terminate it If it is asserted after the last or only data beat the 750 negates DBB but still considers the data beat active and waits for another assertion of TA DRTRY is ignored on write operations TEA indicates a nonrecoverable bus error event Upon receiving a final or only termination condition the 750 always negates DBB for one cycle If DRTRY is asserted by the memory system to extend the last or only data beat past the negation of DBB the memory system should three state the data bus on the clock after the final assertion of TA even though it will negate DRTRY on that clock This is to prevent a potential momentary data bus conflict if a write access begins on the f
497. re 5 10 show how the conceptual model for the primary and secondary page table search operations described in The Programming Environments Manual are realized in the 750 Figure 5 9 shows the case of a debz instruction that is executed with W 1 or I 1 and that the R bit may be updated in memory if required before the operation is performed or the alignment exception occurs The R bit may also be updated if memory protection is violated Chapter 5 Memory Management 5 31 Primary Page Table Search Generate PA Using Primary Hash Function PA lt Base PA of PTEG Fetch PTE from PTEG PA PA 8 Fetch Next PTE in PTEG Fetch PTE 64 Bits from PA Otherwise PTE VSID API H V Segment Descriptor VSID EA API 0 1 AS Secondary Page Table Search Hit Last PTE in PTEG PTE R 1 Perform Secondary Page Table Search PTE R 0 From Figure 5 10 PTE R 1 R_Flag 1 Write PTE into TLB Otherwise dcbz Instruction with W or 1 Check Memory Protection R Flag 1 Otherwise Violation Conditions PTE R 1 Update PTE R in Memory O Alignment Exception Access Permitted Access Prohibited A Operation h Otherwise with PTEJC 0 e P Otherwise R_Flag 1 TLB PTE C lt 1 R_Flag 1 PTE R lt 1 Update PTE R in Memory Page Table Search Complete PTE R lt 1 Update PTE C in Memor Up IC i Update PTE R in Also Up
498. re crosses the threshold e Enables the user to estimate the junction temperature by way of a software successive approximation routine The TAU is controlled through the privileged mtspr mfspr instructions to the three SPRs provided for configuring and controlling the sensor control logic which function as follows e THRM1 and THRM2 provide the ability to compare the junction temperature against two user provided thresholds Having dual thresholds gives the thermal management software finer control of the junction temperature In single threshold mode the thermal sensor output is compared to only one threshold in either THRM1 or THRM2 e THRM3 is used to enable the TAU and to control the comparator output sample time The thermal management logic manages the thermal management interrupt generation and time multiplexed comparisons in the dual threshold mode as well as other control functions Instruction cache throttling provides control of the 750 s overall junction temperature by determining the interval at which instructions are fetched This feature is accessed through the ICTC register Chapter 10 Power and Thermal Management provides information about power saving and thermal management modes for the 750 Chapter 1 PowerPC 740 PowerPC 750 Overview 1 37 1 12 Performance Monitor The 750 incorporates a performance monitor facility that system designers can use to help bring up debug and optimize software performance T
499. re done transparently and only a page fault causes a restart 2 3 4 3 1 Self Modifying Code When a processor modifies a memory location that may be contained in the instruction cache software must ensure that memory updates are visible to the instruction fetching mechanism This can be achieved by the following instruction sequence debst lupdate memory sync Iwait for update icbi lremove invalidate copy in instruction cache isync lremove copy in own instruction buffer Chapter 2 Programming Model 2 45 These operations are required because the data cache is a write back cache Since instruction fetching bypasses the data cache changes to items in the data cache may not be reflected in memory until the fetch operations complete Special care must be taken to avoid coherency paradoxes in systems that implement unified secondary caches and designers should carefully follow the guidelines for maintaining cache coherency that are provided in the VEA and discussed in Chapter 5 Cache Model and Memory Coherency in The Programming Environments Manual Because the 750 does not broadcast the M bit for instruction fetches external caches are subject to coherency paradoxes 2 3 4 3 2 Integer Load and Store Address Generation Integer load and store operations generate effective addresses using register indirect with immediate index mode register indirect with index mode or register indirect mode See Section 2 3 2 3 Effective Addre
500. ress onto the bus at that boundary for coherency consideration or must operate as noncoherent data with respect to the 750 The 750 never generates a bus transaction with a transfer size of 5 bytes 6 bytes or 7 bytes 8 3 2 2 3 Write Through WT Signal The 750 provides the WT signal to indicate a write through operation as determined by the WIM bit settings during address translation by the MMU The WT signal is also asserted for burst writes due to the execution of the dcbf and debst instructions and snoop push operations The WT signal is deasserted for accesses caused by the execution of the ecowx instruction During read operations the 750 uses the WT signal to indicate whether the transaction is an instruction fetch WT set to 1 or a data read operation WT cleared to 0 8 3 2 2 4 Cache Inhibit Cl Signal The 750 indicates the caching inhibited status of a transaction determined by the setting of the WIM bits by the MMU through the use of the CI signal The CI signal is asserted even if the L1 caches are disabled or locked This signal is also asserted for bus transactions caused by the execution of eciwx and ecowx instructions independent of the address translation 8 16 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 8 3 2 3 Burst Ordering During Data Transfers During burst data transfer operations 32 bytes of data one cache line are transferred to or from the cache in order Burst write transfers are alwa
501. retry is required Negation Occurs the second bus cycle after the assertion of AACK Since this signal may be simultaneously driven by multiple devices it negates in a unique fashion First the buffer goes to high impedance for a minimum of one half processor cycle dependent on the clock mode then it is driven negated for one bus cycle before returning to high impedance This special method of negation may be disabled by setting precharge disable in HIDO IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 2 5 2 2 Address Retry ARTRY Input Following are the state meaning and timing comments for the ARTRY input signal State Meaning Asserted If the 750 is the address bus master ARTRY indicates that the 750 must retry the preceding address tenure and immediately negate BR if asserted If the associated data tenure has already started the 750 also aborts the data tenure immediately even if the burst data has been received If the 750 is not the address bus master this input indicates that the 750 should immediately negate BR to allow an opportunity for a copy back operation to main memory after a snooping bus master asserts ARTRY Note that the subsequent address presented on the address bus may not be the same one associated with the assertion of the ARTRY signal Negated High Impedance Indicates that the 750 does not need to retry the last address tenure Timing Comments Assertion May occur as ea
502. rget stream can be fetched into the instruction queue on the next clock cycle The BTIC can be disabled and invalidated through bits in HIDO e Dynamic branch prediction The 512 entry branch history table BHT is implemented with two bits per entry for four degrees of prediction not taken strongly not taken taken strongly taken Whether a branch instruction is taken or not taken can change the strength of the next prediction This dynamic branch prediction is not defined by the PowerPC architecture To reduce aliasing only predicted branches update the BHT entries Dynamic branch prediction is enabled by setting HIDO BHT otherwise static branch prediction is used e Static branch prediction Static branch prediction is defined by the PowerPC architecture and involves encoding the branch instructions See Section 6 4 1 3 1 Static Branch Prediction Branch instructions that do not update the LR or CTR are removed from the instruction stream either by branch folding or removal of fall through branch instructions as described in Section 6 4 1 1 Branch Folding and Removal of Fall Through Branch Instructions Branch instructions that update the LR or CTR are treated as if they require dispatch even through they are not issued to an execution unit in the process They are assigned a position in the completion queue to ensure that the CTR and LR are updated sequentially All other instructions are issued from the IQO and IQ1
503. ridge gt XO tlbsync ojo AENA AAA EEE EE Ee EE EA AA EME AS E a AA TAS EEN E E EA AA EAN A A EA ESSE E AAA 1 Supervisor and user level instruction 2 Load store string or multiple instruction 3 32 bit instruction not implemented by the PowerPC 750 4 Instruction is optional for 64 bit implementations only A 54 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Appendix B Instructions Not Implemented B 1 Lists of Instructions This appendix provides a list of the 32 bit and 64 bit PowerPC instructions that are not implemented in the PowerPC 750 microprocessor Note that any attempt to execute instructions that are not implemented on the 750 will generate an illegal instruction exception Note that exceptions are referred to as interrupts in the architecture specification Table B 1 provides the 32 bit PowerPC instructions that are optional to the PowerPC architecture but not implemented by the 750 Table B 1 32 Bit Instructions Not Implemented deba Data Cache Block Allocate reen Floating Square Root Double Precision fsqrts Floating Square Root Single tibia TLB Invalidate All Table B 2 provides a list of 64 bit instructions that are not implemented by the 750 Table B 2 64 Bit Instructions Not Implemented ema 1 weg CCE eva CIC Floating Convert to Integer Double Word with Round toward Zero Load Double Word Load Double Word and Reserve Indexed fetid Floating
504. rithms that schedule tasks and perhaps partition them and that structure and distribute data optimally e To help system developers bring up and debug their systems The performance monitor uses the following SPRs e The performance monitor counter registers PMC1 PMC4 are used to record the number of times a certain event has occurred UPMC1 UPMC4 provide user level read access to these registers e The monitor mode control registers MMCRO MMCR1 are used to enable various performance monitor interrupt functions UMMCRO UMMCRI provide user level read access to these registers e The sampled instruction address register SIA contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition The USIA register provides user level read access to the SIA Table 4 13 lists register settings when a performance monitor interrupt exception is taken Table 4 13 Performance Monitor Interrupt Exception Register Settings Setting Description SRRO Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present SRR1 0 Loaded with equivalent MSR bits 1 4 Cleared 5 9 Loaded with equivalent MSR bits 10 15 Cleared 16 31 Loaded with equivalent MSR bits MSR et to value of ILE As with other PowerPC exceptions the performance monitor interrupt follows the normal PowerPC exception
505. rly as the second cycle following the assertion of TS and must occur by the bus clock cycle immediately following the assertion of AACK if an address retry is required Negation Must occur two bus clock cycles after the assertion of AACK 7 2 6 Data Bus Arbitration Signals Like the address bus arbitration signals data bus arbitration signals maintain an orderly process for determining data bus mastership Note that there is no data bus arbitration signal equivalent to the address bus arbitration signal BR bus request because except for address only transactions TS implies data bus requests For a detailed description on how these signals interact see Section 8 4 1 Data Bus Arbitration One special signal DBWO allows the 750 to be configured dynamically to write data out of order with respect to read data For detailed information about using DBWO see Section 8 10 Using Data Bus Write Only 7 2 6 1 Data Bus Grant DBG Input The data bus grant DBG signal is an input only signal on the 750 Following are the state meaning and timing comments for the DBG signal State Meaning Asserted Indicates that the 750 may with the proper qualification assume mastership of the data bus The 750 derives a qualified data bus grant when DBG is asserted and DBB DRTRY and ARTRY are negated that is the data bus is not busy DBB is negated there is no outstanding attempt to retry the current data tenure DRTRY
506. rocessor environment defines cache control instructions and describes other aspects of virtual environments Implementations that conform to the VEA also adhere to the UISA but may not necessarily adhere to the OEA e PowerPC operating environment architecture OEA Defines the memory management model supervisor level registers synchronization requirements and the exception model Implementations that conform to the OEA also adhere to the UISA and the VEA The PowerPC architecture allows a wide range of designs for such features as cache and system interface implementations The 750 implementations support the three levels of the architecture described above For more information about the PowerPC architecture see PowerPC Microprocessor Family The Programming Environments Specific features of the 750 are listed in Section 1 2 PowerPC 750 Microprocessor Features 1 4 PowerPC Registers and Programming Model The PowerPC architecture defines register to register operations for most computational instructions Source operands for these instructions are accessed from the registers or are provided as immediate values embedded in the instruction opcode The three register instruction format allows specification of a target register distinct from the two source operands Load and store instructions transfer data between registers and memory PowerPC processors have two levels of privilege supervisor mode of operation typically used
507. rotection violations as defined in the PowerPC architecture This instruction is treated as a load with respect to address translation and memory protection If the address hits in the cache and the cache block is in the exclusive E state no action is taken If the address hits in the cache and the cache block is in the modified M state the modified block is written back to memory and the cache block is placed in the exclusive E state The execution of a debst instruction does not broadcast on the 60x bus unless broadcast is enabled through the HIDO ABE bit The function of this instruction is independent of the WIMG bit settings of the block containing the effective address The debst instruction executes regardless of whether the cache is disabled or locked however a BAT or TLB protection violation generates a DSI exception 3 16 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 3 4 2 4 Data Cache Block Flush dcbf The effective address is computed translated and checked for protection violations as defined in the PowerPC architecture This instruction is treated as a load with respect to address translation and memory protection If the address hits in the cache and the block is in the modified M state the modified block is written back to memory and the cache block is placed in the invalid I state If the address hits in the cache and the cache block is in the exclusive E state the cache block is placed
508. rrive from the instruction cache in the next clock cycle The BTIC reduces the number of missed opportunities to dispatch instructions and gives the processor a one cycle head start on processing the target stream The BPU contains an adder to compute branch target addresses and three user control registers the link register LR the count register CTR and the CR The BPU calculates the return pointer for subroutine calls and saves it into the LR for certain types of branch instructions The LR also contains the branch target address for the Branch Conditional to Link Register belrx instruction The CTR contains the branch target address for the Branch Conditional to Count Register bectrx instruction Because the LR and CTR are SPRs their contents can be copied to or from any GPR Because the BPU uses dedicated registers rather than GPRs or FPRs execution of branch instructions is largely independent from execution of integer and floating point instructions 1 2 2 3 Completion Unit The completion unit operates closely with the instruction unit Instructions are fetched and dispatched in program order At the point of dispatch the program order is maintained by assigning each dispatched instruction a successive entry in the six entry completion queue The completion unit tracks instructions from dispatch through execution and retires them in program order from the two bottom entries in the completion queue CQO and CQ1 Chapter 1 PowerP
509. rst read memory burst operations are the most common memory accesses followed by burst write memory operations and single beat noncacheable or write through memory read and write operations The 750 also supports address only operations variants of the burst and single beat operations for example atomic memory operations and global memory operations that are snooped and address retry activity for example when a snooped read access hits a modified block in the cache The broadcast of some address only operations is controlled through HIDO ABE I O accesses use the same protocol as memory accesses Access to the system interface is granted through an external arbitration mechanism that allows devices to compete for bus mastership This arbitration mechanism is flexible allowing the 750 to be integrated into systems that implement various fairness and bus parking procedures to avoid arbitration overhead Typically memory accesses are weakly ordered sequences of operations including load store string and multiple instructions do not necessarily complete in the order they begin maximizing the efficiency of the bus without sacrificing data coherency The 750 allows read operations to go ahead of store operations except when a dependency exists or in cases where a noncacheable access is performed and provides support for a write operation to go ahead of a previously queued read data tenure for example letting a snoop push be env
510. ructions Not Implemented Glossary of Terms and Abbreviations Index mb GI Be gt i i gt esch Z a PowerPC 740 PowerPC 750 Overview NO Processor Programming Model 3 L1 Instruction and Data Cache Operation Eh Exceptions Wu Memory Management Instruction Timing N Signal Descriptions Bus Interface Operation L2 Cache Interface Operation h 0 Power and Thermal Management 11 Performance Monitor PowerPC Instruction Set Listings EW Instructions Not Implemented CIRO Glossary of Terms and Abbreviations IND Index Paragraph Number 1 1 1 2 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 3 1 2 2 4 1 2 2 4 1 1 2 2 4 2 1 2 2 4 3 1 2 2 4 4 1 2 3 1 2 4 1 2 3 1 2 6 1 2 7 1 2 8 1 2 9 1 3 1 4 1 5 1 5 1 1 5 2 1 6 1 6 1 1 6 2 1 7 1 7 1 1 7 2 1 8 1 8 1 1 8 2 Contents Contents Page Title Preface umber e EE xxvii Organizat E xxvii SU e in a eee Sige E ae eae R a xxviii C nvent oS sensei oee edd XXX Acronyms and Abbreviations 0 cccccssscecssececssccecssccecssccecssececssceeessceecssceeesacees XXX Terminology Conventions Hai XXX1V Chapter 1 PowerPC 740 PowerPC 750 Overview PowerPC 750 Microprocessor UVM 1 1 PowerPC 750 Microprocessor Features 1 4 Overview of the PowerPC 750 Microprocessor beatures 1 4 MS CRUG ENG OW paca its eas eect nse ae ca here eae ac ea eat a one haut Ode 1 7 Instruction Queue and Dispatch Unit oooooccno
511. ry is available via the world wide web at http www chips ibm com e PowerPC Microprocessor Family 60x Bus Interface for 32 Bit Microprocessors G522 0291 00 provides a detailed functional description of the 60x bus interface as implemented on the 601 603 and 604 family of PowerPC microprocessors This document is intended to help system and chipset developers by providing a centralized reference source to identify the bus interface presented by the 60x family of PowerPC microprocessors e PowerPC Microprocessor Family The Programmer e Reference Guide MPRPPCPRG 01 is a concise reference that includes the register summary memory control model exception vectors and the PowerPC instruction set e PowerPC Microprocessor Family The Programmer e Pocket Reference Guide SA14 2093 00 This foldout card provides an overview of the PowerPC registers instructions and exceptions for 32 bit implementations e Application notes These short documents contain useful information about specific design issues useful to programmers and engineers working with PowerPC processors e Documentation for support chips These include the following IBM27 82660 PowerPC to PCI Bridge and Memory Controller User s Manual SC09 3026 01 About This Book xxxi Additional literature on PowerPC implementations is being released as new processors become available For a current list of PowerPC documentation refer to the web sites listed at the beginning o
512. s 2 3 4 3 10 Floating Point Store Instructions This section describes floating point store instructions There are three basic forms of the store instruction single precision double precision and integer The integer form is supported by the optional stfiwx instruction Because the FPRs support only floating point double precision format for floating point data single precision floating point store instructions convert double precision data to single precision format before storing the operands Table 2 38 summarizes the floating point store instructions Chapter 2 Programming Model 2 51 Table 2 38 Floating Point Store Instructions Store Floating Point Single Store Floating Point Single Indexed Store Floating Point Single with Update Store Floating Point Double frS d rA frS d rA 3 3 Store Floating Point Double Indexed Store Floating Point Double with Update Store Floating Point Double with Update Indexed Store Floating Point as Integer Word Indexed f Store Floating Point Single with Update Indexed aan SAB Note The stfiwx instruction is optional to the PowerPC architecture Some floating point store instructions require conversions in the LSU Table 2 39 shows conversions the LSU makes when executing a Store Floating Point Single instruction Table 2 39 Store Floating Point Single Behavior Double Normalized If exp 896 then Denormalize and Store else Store Note The FPRs are not initialized by H
513. s Excluding SbPRoh 1 24 Architecture Defined SPRs Implemented oooocccnnncccnonccinoncccnoncnononcnononcnonnnaninnnos 1 25 Emplementation Specitic Registers iii Eeer 1 26 PowerPC 750 Microprocessor Exception Classifications oooconnonnnninocnnicnnnnnos 1 31 Exceptions and Conditions icc ssscadinacssdecsvessadsies josasaconsaeas landaeesseceedesaessaceeasteas vedas 1 31 Additional MSR EE 2 4 Additional SRRI E 2 6 Instruction Address Breakpoint Register Bit Setmnges 2 9 HIDO Bit Funct ege ee O EE anaia 2 9 HIDO BCLK and HIDO ECLK CLK_OUT Configuration 2 13 HIDI Bit Funcion id dis 2 13 AIR sedatacageacavanndenda nected cade seaataanddlavcans 2 14 MMERI tee 2 16 EM 2 17 PMC1 Events MMCRO 19 25 Select ENCOdIN8S ooooooccccoccccoonccononnnonancnnnnnos 2 17 PMC2 Events MMCRO 26 31 Select Encodmgs 2 18 PMC3 Events MMCR 1 0 4 Select Encodings cooococonoccccnoncccnoncnononcnonancninnnos 2 18 PMC4 Events MMCR1 5 9 Select Bncodunge 2 19 ICTC Bit Stings viii li a lia E O aas 2 21 THRMI THRMO Bit Sets 2 22 Valid NERT EE 2 23 THRM3 BT SUNS Ss cae Pa EE 2 24 LDR Bit SCUINGS enee 2 25 Floating Point Operand Data Type Behavior oooocccnocccioccnonononcnonanononcnoncnanncnnnnos 2 30 Floating Point Result Data Type Behavior oooooonnncccnonoccnoncccnoncnononcnononcnonnnaninnnos 2 31 Integer Arithmetic Instructions las dis 2 38 Integer Compare Instructions psp hated he A NEES 2 39 Integer Logical Instructions 0 dee e dE 2 4
514. s a decrementer exception condition occurs for example the decrementer register has completed decrementing and MSR EE 1 In the 750 the decrementer register is decremented at one fourth the bus clock rate Register settings for this exception are described in Chapter 6 Exceptions in The Programming Environments Manual When a decrementer exception is taken instruction fetching resumes at offset 0x00900 from the physical base address indicated by MSR IP 4 5 10 System Call Exception 0x00C00 A system call exception occurs when a System Call sc instruction is executed In the 750 the system call exception is implemented as it is defined in the PowerPC architecture Chapter 4 Exceptions 4 21 Register settings for this exception are described in Chapter 6 Exceptions in The Programming Environments Manual When a system call exception is taken instruction fetching resumes at offset OxOOCOO from the physical base address indicated by MSR IP 4 5 11 Trace Exception 0x00D00 The trace exception is taken if MSR SE 1 or if MSR BE 1 and the currently completing instruction is a branch Each instruction considered during trace mode completes before a trace exception is taken When a trace exception is taken the values written to SRR1 are implementation specific those values for the 750 are shown in Table 4 12 Table 4 12 Trace Exception SRRi1 Settings SRR1 010 Set for a load instruction otherwise
515. s those distinctions are shown clearly throughout this book For ease in reference the arrangement of topics in this book follows that of The Programming Environments Manual Topics build upon one another beginning with a description and complete summary of 750 specific registers and instructions and progressing to more specialized topics such as 750 specific details regarding the cache exception and memory management models As such chapters may include information from multiple levels of the architecture For example the discussion of the cache model uses information from both the VEA and the OEA The PowerPC Architecture A Specification for a New Family of RISC Processors defines the architecture from the perspective of the three programming environments and remains the defining document for the PowerPC architecture For information about ordering PowerPC documentation see Suggested Reading on page xxviii The information in this book is subject to change without notice as described in the disclaimers on the title page of this book As with any technical documentation it is the readers responsibility to be sure they are using the most recent version of the documentation xxviii IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual To locate any published errata or updates for this document refer to the web sites noted at the beginning of this section Audience This manual is intended for system software
516. s 10 2 sleep mode 10 4 software considerations 10 5 Index PowerPC architecture instruction list A 1 A 9 A 17 operating environment architecture OEA 1 21 operating environment architecture OEA xxvi user instruction set architecture UISA 1 21 user instruction set architecture UISA XXV virtual environment architecture VEA 1 21 virtual environment architecture VEA xxvi Priorities exception 4 4 Process switching 4 12 Processor control instructions 2 55 2 60 2 65 A 27 Program exception 4 20 Program order definition 6 2 Programmable power states doze mode 10 2 full power mode with DPM enabled dis abled 10 2 nap mode 10 3 sleep mode 10 4 Protection of memory areas no execute protection 5 14 options available 5 11 protection violations 5 16 PVR processor version register 2 5 Q QACK quiescent acknowledge signal 7 24 OREO quiescent request signal 7 23 8 43 Qualified bus grant 8 10 Qualified data bus grant 8 24 R Read operation 3 27 Read atomic operation 3 27 Read with intent to modify operation 3 27 Real address RA see Physical address gen eration Index 7 Real addressing mode translation disabled data accesses 5 12 5 20 instruction accesses 5 12 5 20 support for real addressing mode 5 2 Referenced R bit maintenance recording 5 12 5 22 5 31 Registers implementation specific ICTC 2 21 10 11 L2CR 2 24 9 5 MMCRO 2 14 4 23 11 3 MMCRI 2 16 4 23 11 5 SIA 2
517. s Interface Over vi Wns isis isis 8 2 xiii Paragraph Number 8 1 1 8 1 2 8 1 3 8 1 4 8 1 5 8 2 8 2 1 8 2 2 8 3 8 3 1 8 3 2 8 3 2 1 8 3 2 2 8 3 2 2 1 8 3 2 2 2 8 3 2 2 3 8 3 2 2 4 8 3 2 3 8 3 2 4 8 3 2 4 1 8 3 2 5 8 3 3 8 4 8 4 1 8 4 1 1 8 4 2 8 4 3 8 4 4 8 4 4 1 8 4 4 2 8 4 5 8 5 8 6 8 6 1 8 6 2 8 6 3 8 7 8 7 1 8 7 2 8 7 3 8 7 4 8 8 8 8 1 xiv Contents Page Tule Number Operation of the Instruction and Data L1 Caches oooonnccccnocccinoncconanccinnnccnnnnnos 8 3 Operation of the L2 Cache rinitis ii 8 6 Operation or the Bus Interface A li 8 6 Optional 32 Bit Data us Modest ia 8 7 ee ee 8 7 Memory ACCESS Entente ebe a 8 8 Arbitration Sigtials sisi snenie i es lesdgantsbsedetvandeaes deet de 8 10 Address Pipelining and Split Bus Transactions 2 0 0 ceccceeeseeeesseeeeeteeeeneeees 8 11 Address AS o AS 8 12 Addr ss EE 8 12 ER ENEE 8 14 Address Bus Parity cerillas 8 15 Address Transfer Attribute Signals ooooconnnocccnoncccnncccononccononncnnonacnnnnccnnnnnos 8 15 Transfer Type TT O 4 Signals 0 0 eee eeececsseeeceeeeecseeeeeseeeeneeenes 8 15 Transfer Size KT Uh 8 15 Write Through WT Stata ts id i 8 16 Cache inhibit CD Signal ee ee 8 16 Burst Ordering During Data Transfers 8 17 Effect of Alignment in Data Transfers 8 18 Effect of Alignment in Data Transfers 32 Bit Bush 8 19 Alignment of External Control InstructiONS ococnnoccnnonccconcccnnncccnnn
518. s are combined in the load store unit LSU to form a double word and are sent out on the 60x bus as a single beat operation However stores can be gathered only if the successive stores that meet the criteria are queued and pending Store gathering takes place regardless of the address order of the stores The store gathering feature is enabled by setting HIDO SGE Store gathering is done for both big and little endian modes Store gathering is not done for the following e Cacheable stores e Stores to guarded cache inhibited or write through space e Byte reverse store e stwex and ecowx accesses e Floating point stores e Store operations attempted during a hardware table search If store gathering is enabled and the stores do not fall under the above categories an eieio or sync instruction must be used to prevent two stores from being gathered 2 48 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 2 3 4 3 6 Integer Load and Store with Byte Reverse Instructions Table 2 34 describes integer load and store with byte reverse instructions When used in a PowerPC system operating with the default big endian byte order these instructions have the effect of loading and storing data in little endian order Likewise when used in a PowerPC system operating with little endian byte order these instructions have the effect of loading and storing data in big endian order For more information about big endian and little endian by
519. s at the proper time If the completion logic detects an instruction containing an exception status all following instructions are cancelled their execution results in rename registers are discarded and the correct instruction stream is fetched The complete stage ends when the instruction is retired Two instructions can be retired per cycle Instructions are retired only from the two lowest completion queue entries CQO and CQ1 The notation conventions used in the instruction timing examples are as follows Fetch The fetch stage includes the time between when an instruction is requested and when it is brought into the instruction queue This latency can be very variable depending upon whether the instruction is in the BTIC the on chip cache the L2 cache or system memory in which case latency can be affected by bus speed and traffic on the system bus and address translation issues Therefore in the examples in this chapters the fetch stage is usually idealized that is an instruction is usually shown to be in the fetch stage when it is a valid instruction in the instruction queue The instruction queue has six entries IQO IQS H In dispatch entry IQ0 IQ1 Instructions can be dispatched from IQO and IQ1 Because dispatch is instantaneous it is perhaps more useful to describe it as an event that marks the point in time between the last cycle in the fetch stage and the first cycle in the execute stage Kg Execute
520. s can be dispatched only from the two lowest instruction queue entries QO and IQ1 A maximum of two instructions can be dispatched per clock cycle although an additional branch instruction can be handled by the BPU Only one instruction can be dispatched to each execution unit per clock cycle There must be a vacancy in the specified execution unit Chapter 6 Instruction Timing 6 5 Arename register must be available for each destination operand specified by the instruction For an instruction to dispatch the appropriate execution unit must be available and there must be an open position in the completion queue If no entry is available the instruction remains in the IQ The execute stage consists of the time between dispatch to the execution unit or reservation station and the point at which the instruction vacates the execution unit Most integer instructions have a one cycle latency results of these instructions can be used in the clock cycle after an instruction enters the execution unit However integer multiply and divide instructions take multiple clock cycles to complete The IU1 can process all integer instructions the U2 can process all integer instructions except multiply and divide instructions The LSU and FPU are pipelined as shown in Figure 6 2 The complete complete write back pipeline stage maintains the correct architectural machine state and commits it to the architectural register
521. s for the L2 address output signals State Meaning Asserted Negated Represents the address of the data to be transferred to the L2 cache The L2 address bus is configured with bit O as the least significant bit Address bit 14 determines which cache tag set is selected Timing Comments Assertion Negation Driven valid by the 750 during read and write operations driven with static data when the L2 cache memory is not being accessed 7 2 9 9 L2 Data L2DATA O 63 The data bus L2DATA 0 63 consists of 64 signals that are both input and output on the 750 7 2 9 9 1 L2 Data L2DATA 0 63 Output Following are the state meaning and timing comments for the L2 data output signals State Meaning Asserted Negated Represents the state of data during a data write transaction data is always transferred as double words Chapter 7 Signal Descriptions 7 25 Timing Comments Assertion Negation Driven valid by 750 during write operations driven with static data when the L2 cache memory is not being accessed by a read operation High Impedance Occurs for at least one cycle when changing between read and write operations to the L2 cache memory 7 2 9 9 2 L2 Data L2DATA 0 63 Input Following are the state meaning and timing comments for the L2 data input signals State Meaning Asserted Negated Represents the state of data during a data read transaction data is always transferred as double words Timing Comments Assertion N
522. s to have certain alignment In addition alignment may affect performance For single register memory access instructions the best performance is obtained when memory operands are aligned Instructions are 32 bits one word long and must be word aligned The 750 does not provide hardware support for floating point memory that is not word aligned If a floating point operand is not aligned the 750 invokes an alignment exception and it is left up to software to break up the offending storage access operation appropriately In addition some non double word aligned memory accesses suffer performance degradation as compared to an aligned access of the same type In general floating point word accesses should always be word aligned and floating point double word accesses should always be double word aligned Frequent use of misaligned accesses is discouraged since they can degrade overall performance 2 2 4 Floating Point Operand The 750 provides hardware support for all single and double precision floating point operations for most value representations and all rounding modes This architecture provides for hardware to implement a floating point system as defined in ANSI IEEE standard 754 1985 IEEE Standard for Binary Floating Point Arithmetic Detailed information about the floating point execution model can be found in Chapter 3 Operand Conventions in The Programming Environments Manual The 750 supports non IEEE mode whenever
523. saction must eventually complete Address retry causes the transaction to be restarted TA wait states and DRTRY assertion for reads delay termination of individual data beats Eventually however the system must either terminate the transaction or assert the TEA signal For this reason care must be taken to check for the end of physical memory and the location of certain system facilities to avoid memory accesses that result in the assertion of TEA Note that TEA generates a machine check exception depending on MSR ME Clearing the machine check exception enable control bits leads to a true checkstop condition instruction execution halted and processor clock stopped 8 4 5 Memory Coherency MEI Protocol The 750 provides dedicated hardware to provide memory coherency by snooping bus transactions The address retry capability enforces the three state MEI cache coherency protocol see Figure 8 15 The global GBL output signal indicates whether the current transaction must be snooped by other snooping devices on the bus Address bus masters assert GBL to indicate that the current transaction is a global access that is an access to memory shared by more than one device If GBL is not asserted for the transaction that transaction is not snooped When 8 30 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual other devices detect the GBL input asserted they must respond by snooping the broadcast address Normally G
524. scriptions of the reset signals are as follows 7 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 7 2 9 6 1 Hard Reset HRESET Input The hard reset HRESET signal must be used at power on in conjunction with the TRST signal to properly reset the processor Following are the state meaning and timing comments for the HRESET signal State Meaning Asserted Initiates a complete hard reset operation when this input transitions from asserted to negated Causes a reset exception as described in Section 4 5 1 System Reset Exception 0x00100 Output drivers are released to high impedance within five clocks after the assertion of HRESET Negated lIndicates that normal operation should proceed See Section 8 7 3 Reset Inputs Timing Comments Assertion May occur at any time and may be asserted asynchronously to the 750 input clock must be held asserted for a minimum of 255 clock cycles after the PLL lock time has been met Refer to the 750 hardware specifications for further timing comments Negation May occur any time after the minimum reset pulse width has been met This input has additional functionality in certain test modes 7 2 9 6 2 Soft Reset SRESET Input Following are the state meaning and timing comments for the SRESET signal State Meaning Asserted Initiates processing for a reset exception as described in Section 4 5 1 System Reset Exception 0x00100 Negated Indicates
525. seeoateacaassaeesachssasaseus ASA leegen 3 11 Cache Control EE 3 13 Cache Control Parameters ELE 3 13 Data Cache Flash Invalidation eege 3 13 Data Cache Enabling Disabling AAA 3 13 Data Cache BEE 3 14 Instruction Cache Flash Invaldaton cc ceeceeeeceecesececeeeceesteeeesteeeenaees 3 14 Instruction Cache Enabimng tusablmg A 3 14 Instruction Cache Locking tee Eegen 3 15 Cache Control Instict Ons ee TRER 3 15 Data Cache Block Touch debt and Data Cache Block Touch for Store debtst ooooonocccononncanannos 3 15 Data Cache Block Zero dchz 3 16 Data Cache Block Store dchst 3 16 Data Cache Block Flush debe ii ugeet Eege Baul 3 17 Data Cache Block Invalidate dch 3 17 Instruction Cache Block Invalidate Och 3 17 Cacho CIPELAL ca 3 18 Cache Block Replacement Castout Operatons 3 18 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number 3 5 2 3 5 3 3 5 4 3 5 5 3 5 5 1 3 6 3 6 1 3 6 2 3 6 3 3 6 4 3 6 5 3 7 4 1 4 2 4 3 4 3 1 4 3 2 4 3 3 4 3 4 4 4 4 5 4 5 1 4 5 1 1 4 5 1 2 4 5 2 4 5 2 1 4 5 2 2 4 5 3 4 5 4 4 5 5 4 5 6 4 5 7 4 5 8 4 5 9 4 5 10 4 5 11 4 5 12 4 5 13 4 5 14 4 5 15 Contents Contents Page mle Number Cache Flush pera eegene 3 21 Data Cache Block Fill Operations 00 0 0 ecceeecceeesccecseececeeececeeeeeceteeecsneeeeeaeees 3 21 Instruction Cache Block Fill Operations ccooococnoococnonccconnnncnoncconanacononccinnncnos
526. ser s Manual 4 5 1 2 Hard Reset A hard reset is initiated by asserting HRESET Hard reset is used primarily for power on reset POR in which case TRST must also be asserted but it can also be used to restart a running processor The HRESET signal must be asserted during power up and must remain asserted for a period that allows the PLL to achieve lock and the internal logic to be reset This period is specified in the hardware specifications Table 4 8 shows the state of selected 750 signals during HRESET while HRESET is held asserted and from HRESET deassertion until the L2 interface is enabled Unless noted the 750 tri states all other IO drivers within five clocks of HRESET assertion The 750 internal state after the hard reset interval is defined in Table 4 9 If HRESET is asserted for less than this amount of time the results are not predictable If HRESET is asserted during normal operation all operations cease and the machine state is lost see Section 7 2 9 6 1 for more information on hard reset Table 4 8 HRESET Signal States Signal Name During HRESET Deassertion to L2 HRESET Enabled L2ADDR hi z 0 L2DATA hi z 0 L2DP hi z 0 LOWE i P L2LCK_OUTA 0 0 L2LCK_OUTB 0 0 L2SYNC_OUT 0 0 Ez o S The hard reset exception is a nonrecoverable nonmaskable asynchronous exception When HRESET is asserted or at power on reset POR the 750 immediately branches to OxFFFO_0100 without attempting to r
527. signal State Meaning Timing Comments Asserted Indicates that the address phase of a transaction is complete The address bus will go to a high impedance state on the next bus clock cycle The 750 samples ARTRY on the bus clock cycle following the assertion of AACK Negated During ABB indicates that the address bus and the transfer attributes must remain driven Assertion May occur as early as the bus clock cycle after TS is asserted assertion can be delayed to allow adequate address access time for slow devices For example if an implementation supports slow snooping devices an external arbiter can postpone the assertion of AACK Negation Must occur one bus clock cycle after the assertion of AACK 7 2 5 2 Address Retry ARTRY The address retry ARTRY signal is both an input and output signal on the 750 7 2 5 2 1 Address Retry ARTRY Output Following are the state meaning and timing comments for the ARTRY output signal State Meaning Timing Comments 7 14 Asserted Indicates that the 750 detects a condition in which a snooped address tenure must be retried If the 750 needs to update memory as a result of the snoop that caused the retry the 750 asserts BR the second cycle after AACK if ARTRY is asserted High Impedance Indicates that the 750 does not need the snooped address tenure to be retried Assertion Asserted the third bus cycle following the assertion of TS if a
528. smallest component of the PowerPC architecture defines additional user level functionality that falls outside typical user level software requirements The VEA describes the memory model for an environment in which multiple processors or other devices can access external memory and defines aspects of the cache model and cache control instructions from a user level perspective The resources defined by the VEA are particularly useful for optimizing memory accesses and for managing resources in an environment in which other processors and other devices can access external memory Implementations that conform to the PowerPC VEA also conform to the PowerPC UISA but may not necessarily adhere to the OEA e PowerPC operating environment architecture OEA The OEA defines supervisor level resources typically required by an operating system The OEA defines the PowerPC memory management model supervisor level registers and the exception model Implementations that conform to the PowerPC OEA also conform to the PowerPC UISA and VEA It is important to note that some resources are defined more generally at one level in the architecture and more specifically at another For example conditions that cause a floating point exception are defined by the UISA while the exception mechanism itself is defined by the OEA Because it is important to distinguish between the levels of the architecture in order to ensure compatibility across multiple platform
529. sor User s Manual 8 4 2 Data Bus Write Only As a result of address pipelining the 750 may have up to two data tenures queued to perform when it receives a qualified DBG Generally the data tenures should be performed in strict order the same order as their address tenures were performed The 750 however also supports a limited out of order capability with the data bus write only DBWO input When recognized on the clock of a qualified DBG DBWO may direct the 750 to perform the next pending data write tenure even if a pending read tenure would have normally been performed first For more information on the operation of DBWO refer to Section 8 10 Using Data Bus Write Only If the 750 has any data tenures to perform it always accepts data bus mastership to perform a data tenure when it recognizes a qualified DBG If DBWO is asserted with a qualified DBG and no write tenure is queued to run the 750 still takes mastership of the data bus to perform the next pending read data tenure Generally DBWO should only be used to allow a copy back operation burst write to occur before a pending read operation If DBWO is used for single beat write operations it may negate the effect of the eieio instruction by allowing a write operation to precede a program scheduled read operation 8 4 3 Data Transfer The data transfer signals include DH O 31 DL O0 31 and DP 0 7 For memory accesses the DH and DL signals form a
530. ss extends the latency of the fetch stage so in this example the fetch stage shown represents not only the time the instruction spends in the IQ but the time required for the instruction to be loaded from system memory beginning in clock cycle 2 During clock cycle 3 the target instruction for the b instruction is not in the BTIC the instruction cache or the L2 cache therefore a memory access must occur During clock cycle 5 the address of the block of instructions is sent to the system bus During clock cycle 7 two instructions 64 bits are returned from memory on the first beat and are forwarded both to the cache and the instruction fetcher 6 14 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Eoo Fech A In dispatch entry IQ0 1Q1 2 add _a lt Execute 3 fadd E A Complete In CQ 4b In retirement entry CQ0 CQ1 5 fsub EE ia AAN 6 fadd m ioo 7 fadd __ 8 add 9 add 10 add 11 add 12 fadd 13 fadd a Completion Queue oO bi ao wo wm OO all Ga oj gt MIJ AJ 0 OJN fi 3 6 Instructions 5 and 6 are not in the IQ in clock cycle 5 Here the fetch stage shows cache latency Figure 6 6 Instruction Timing Cache Miss 6 3
531. ss Calculation for information about calculating effective addresses Note that in some implementations operations that are not naturally aligned may suffer performance degradation Refer to Section 4 5 6 Alignment Exception 0x00600 for additional information about load and store address alignment exceptions 2 3 4 3 3 Register Indirect Integer Load Instructions For integer load instructions the byte half word word or double word addressed by the EA effective address is loaded into rD Many integer load instructions have an update form in which rA is updated with the generated effective address For these forms if rA 0 and rA rD otherwise invalid the EA is placed into rA and the memory element byte half word word or double word addressed by the EA is loaded into rD Note that the PowerPC architecture defines load with update instructions with operand rA 0 or rA rD as invalid forms Implementation Notes The following notes describe the 750 implementation of integer load instructions e The PowerPC architecture cautions programmers that some implementations of the architecture may execute the load half algebraic lha Ihax instructions with greater latency than other types of load instructions This is not the case for the 750 these instructions operate with the same latency as other load instructions e The PowerPC architecture cautions programmers that some implementations of the architecture may run the load s
532. ss translation results in a hit the changed bit in the matching TLB entry is checked If it is already set it is not updated If the TLB changed bit is 0 the 750 initiates the table search operation to set the C bit in the corresponding PTE in the page table The 750 then reloads the TLB with the C bit set The changed bit in both the TLB and the PTE in the page tables is set only when a store operation is allowed by the page memory protection mechanism and the store is guaranteed to be in the execution path unless an exception other than those caused by the sc rfi or trap instructions occurs Furthermore the following conditions may cause the C bit to be set e The execution of an stwex instruction is allowed by the memory protection mechanism but a store operation is not performed e The execution of an stswx instruction is allowed by the memory protection mechanism but a store operation is not performed because the specified length is Zero e The store operation is not performed because an exception occurs before the store is performed Again note that although the execution of the debt and debtst instructions may cause the R bit to be set they never cause the C bit to be set 5 4 1 3 Scenarios for Referenced and Changed Bit Recording This section provides a summary of the model defined by the OEA that is used by PowerPC processors for maintaining the referenced and changed bits In some scenarios the bits are guarante
533. ssist Unit With the increasing power dissipation of high performance processors and operating conditions that span a wider range of temperatures than desktop systems thermal management becomes an essential part of system design to ensure reliable operation of portable systems One key aspect of thermal management is ensuring that the junction temperature of the microprocessor does not exceed the operating specification While the case temperature can be measured with an external thermal sensor the thermal constant from the junction to the case can be large and accuracy can be a problem This may lead to lower overall system performance due to the necessary compensation to alleviate measurement deficiencies The 750 provides the system designer an efficient means of monitoring junction temperature through the incorporation of an on chip thermal sensor and programmable control logic to enable a thermal management implementation tightly coupled to the processor for improved performance and reliability 10 3 1 Thermal Assist Unit Overview The on chip thermal assist unit TAU is composed of a thermal sensor a digital to analog converter DAC a comparator control logic and three dedicated SPRs See Figure 10 1 for a block diagram of the TAU Thermal Sensor Interrupt Control THRM3 Thermal Interrupt Request 0x1700 Thermal Sensor Control Logic Figure 10 1 Thermal Assist Unit
534. ssue 64 tlbie instructions that each successively increment this field TLB tlbsync On the 750 the only function tlbsync serves is to wait for the TLBISYNC signal Synchronize to go inactive Implementation Note The tlbia instruction is optional for an implementation if its effects can be achieved through some other mechanism Therefore it is not implemented on the 750 As described above tlbie can be used to invalidate a particular index of the TLB based on EA 14 19 a sequence of 64 tlbie instructions followed by a tlbsync instruction invalidates all the TLB structures for EA 14 19 0 1 2 63 Attempting to execute tlbia causes an illegal instruction program exception Chapter 2 Programming Model 2 67 The presence and exact semantics of the TLB management instructions are implementation dependent To minimize compatibility problems system software should incorporate uses of these instructions into subroutines 2 3 7 Recommended Simplified Mnemonics To simplify assembly language coding a set of alternative mnemonics is provided for some frequently used operations such as no op load immediate load address move register and complement register Programs written to be portable across the various assemblers for the PowerPC architecture should not assume the existence of mnemonics not described in this document For a complete list of simplified mnemonics see Appendix F Simplified Mnemonics in The Program
535. ster TB is a 64 bit register that maintains the time of day and operates Supervisor interval timers The TB consists of two 32 bit fields time base upper TBU and time base read write lower TBL The XER contains the summary overflow bit integer carry bit overflow bit and a field specifying the number of bytes to be transferred by a Load String Word Indexed Iswx or Store String Word Indexed stswx instruction Table 1 3 describes the supervisor level SPRs in the 750 that are not defined by the PowerPC architecture Section 2 1 2 PowerPC 750 Specific Registers gives detailed descriptions of these registers including bit descriptions Chapter 1 PowerPC 740 PowerPC 750 Overview 1 25 Table 1 3 Implementation Specific Registers HIDO Supervisor The hardware implementation dependent register 0 HIDO provides checkstop enables and other functions HID1 Supervisor The hardware implementation dependent register 1 HID1 allows software to read the configuration of the PLL configuration signals Supervisor The instruction address breakpoint register IABR supports instruction address breakpoint exceptions It can hold an address to compare with instruction addresses in the IQ An address match causes an instruction address breakpoint exception upervisor The instruction cache throttling control register ICTC has bits for controlling the interval at which instructions are fetched into the instruction buff
536. stfdx 31 S A 0 stfiwx 2 31 S A 0 stfsux 31 S A 0 stfsx 31 S A 0 E g ne 3 g sthx 31 S A 0 stswi 31 S A 0 stswx 31 S A 0 stwbrx 31 S A 0 stwux 31 S A H stwx 31 0 sync 31 0 td 31 68 0 tlbia 29 6 31 370 0 wees ar f op000 ooo00 e mea waer a1 ooooo ooooo f 0o00 we o tw 31 TO A B 4 0 xorx 31 S A B 316 Re Notes 1 64 bit instruction 2 Optional instruction 3 Supervisor level instruction 4 Load store string multiple instruction 5 Optional 64 bit bridge instruction 6 32 bit instruction not implemented by the PowerPC 750 Appendix A PowerPC Instruction Set Listings A 35 Table A 37 XL Form Specific Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 bectrx 19 BO Bl 00000 528 LK belrx 19 BO BI 00000 16 LK crand 19 crbD crbA crbB 257 0 crandc 19 crbD crbA crbB 129 0 creqv 19 croD crbA crbB 289 0 crnand 19 croD crbA crbB 225 0 crnor 19 croD crbA crbB 33 0 cror 19 croD crbA crbB 449 0 crorc 19 crbD crbA crbB 417 0 crxor 19 croD crbA crbB 193 0 isync 19 00000 00000 00000 150 0 mcrf 19 crfD 00 crfS 00 00000 0 0 rfi 12 19 00000 00000 00000 50 0 rfid 3 19 00000 00000 00000 18 0 Notes H Supervisor level instruction 2 Optional 64 bit bridge instruction 3 64 bit instruction Table A 38 XFX Form OPCD D spr XO 0 OPCD D 0 CRM 0 XO 0 OPCD S spr XO 0 OPCD D tbr XO 0 Sp
537. struction 7 is in the final FPU execute stage and instructions 8 10 wait in the completion queue Instructions 11 and 12 are dispatched to the U2 and FPU respectively Note that at this point the completion queue is full Two more instructions 15 and 16 which are shown only in the instruction queue are fetched 8 In cycle 8 instructions 7 11 are through executing Instructions 7 and 8 complete write back and vacate the completion queue Because the completion queue is full instructions 13 and 14 cannot be dispatched and must remain in the instruction queue Only the FPU is executing during this cycle instruction 12 Additional instructions instructions 16 and 17 shown only in the instruction queue are fetched filling the instruction queue 9 Incycle 9 two more instructions instructions 7 and 8 are retired from the completion queue allowing instructions 13 and 14 to be dispatched again filling the completion queue No instructions are fetched on this cycle because the instruction queue was full on the previous clock cycle 6 3 2 3 Cache Miss Figure 6 6 shows an instruction fetch that misses both the on chip cache and L2 cache A processor bus clock ratio is 1 2 is used The same instruction sequence is used as in Section 6 3 2 2 Cache Hit however in this example the branch target instruction is not in either the L1 or L2 cache Because the target instruction is not in the L1 cache it cannot be in the BTIC A cache mi
538. sults of executing instructions in contexts where results are allowed to be boundedly undefined are constrained to ones that could have been achieved by executing an arbitrary sequence of defined instructions in valid form starting in the state the machine was in before attempting to execute the given instruction Branch folding The replacement with target instructions of a branch instruction and any instructions along the not taken path when a branch is either taken or predicted as taken Branch prediction The process of guessing whether a branch will be taken Such predictions can be correct or incorrect the term predicted as it is used here does not imply that the prediction is correct successful The PowerPC architecture defines a means for static branch prediction as part of the instruction encoding Branch resolution The determination of whether a branch is taken or not taken A branch is said to be resolved when the processor can determine which instruction path to take If the branch is resolved as predicted the instructions following the predicted branch that may have been speculatively executed can complete see completion If the branch is not resolved as predicted instructions on the mispredicted path and any results of speculative execution are purged from the pipeline and fetching continues from the nonpredicted path Burst A multiple beat data transfer whose total size is typically equal to a cache b
539. supervisor level register that contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition The SIA is shown in Figure 11 4 Instruction Address 0 31 Figure 11 4 Sampled instruction Address Registers SIA If the performance monitor interrupt is triggered by a threshold event the SIA contains the address of the exact instruction called the sampled instruction that caused the counter to overflow If the performance monitor interrupt was caused by something besides a threshold event the SIA contains the address of the last instruction completed during that cycle SIA can be accessed with the mtspr and mfspr instructions using SPR 955 11 10 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 11 2 1 8 User Sampled Instruction Address Register USIA The contents of SIA are reflected to USIA which can be read by user level software USIA can be accessed with the mfspr instructions using SPR 939 11 3 Event Counting Counting can be enabled if conditions in the processor state match a software specified condition Because a software task scheduler may switch a processor s execution among multiple processes and because statistics on only a particular process may be of interest a facility is provided to mark a process The performance monitor PM bit MSR 29 is used for this purpose System software may set this bit when a marke
540. sx 59 D fresx 2 59 D Appendix A PowerPC Instruction Set Listings A 3 Name frspx frsqrtex 2 fselx 2 fsqrtx 27 fsqrtsx 7 fsubx fsubsx icbi isync Ibz Ibzu Ibzux Ibzx Id Idarx A 4 Idu Idux Idx lfd lfdu Ifdux Hds lfs Heu Ifsux Hex Iha Ihau Ihaux Ihax Ihbrx Ihz Ihzu 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 63 D 00000 B 12 Re 63 D 00000 B 00000 26 Re 63 D A B C 23 Re 63 D B 59 D B 31 B 19 34 D A d 35 D A d 31 D A B 119 0 31 D A B 87 0 58 D A ds 0 31 D A B 84 0 31 D A B 21 0 50 D A d 51 D A d 31 D A B 631 0 48 D A d 49 D A d 31 D A B 567 0 31 D A B 535 0 42 D A d 43 D A d 31 D A B 343 0 31 D A B 790 0 40 D A d 41 D A d IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Ihzux 31 D A B 311 0 Ihzx 31 D A B 279 0 Imw 4 46 D A d Iwa 58 D A ds 2 lwarx 31 D A B 20 0 Iwaux 31 D A B 373 0 Iwax 31 D A B 341 0 Iwz 32 D A d Iwzu 33 D A d Iwzux 31 D A B 55 0 Iwzx 31 D A B 23 0 merf 19 0 0 mfcr 31 19 0 mffsx 63 583 Re mfmsr 31 83 0 mfspr 31 339 0 mtsr 36 31 595 0 mfsrin 96 31 659 0 mftb 31 371 0 mtcrf 31 144 0 mtfsb0x 63 70 Re
541. t specified by the architecture and while this description applies to the 750 it does not necessarily apply to other PowerPC processors 5 4 3 1 TLB Organization Because the 750 has two MMUs IMMU and DMMU that operate in parallel some of the MMU resources are shared and some are actually duplicated shadowed in each MMU to maximize performance For example although the architecture defines a single set of segment registers for the MMU the 750 maintains two identical sets of segment registers one for the IMMU and one for the DMMU when an instruction that updates the segment register executes the 750 automatically updates both sets Each TLB contains 128 entries organized as a two way set associative array with 64 sets as shown in Figure 5 7 for the DTLB the ITLB organization is the same When an address is being translated a set of two TLB entries is indexed in parallel with the access to a segment register If the address in one of the two TLB entries is valid and matches the 40 bit virtual page number that TLB entry contains the translation If no match is found a TLB miss occurs Chapter 5 Memory Management 5 25 EA 0 31 Segment Registers 0 7 8 31 l EA 0 3 VSID i EA 4 13 Compare Compare EA 14 19 Select Line1 Line O Hit gt PA O 19 Figure 5 7 Segment Register and DTLB Organization Unless the access is the result of an out of
542. t Fetched cache block is o modify stored in the cache No change No change None o GE with intent t AA cache block is o modify stored in the cache No change No change None o For additional details about the specific bus operations performed by the 750 see Chapter 8 Bus Interface Operation 3 6 3 Snooping The 750 maintains data cache coherency in hardware by coordinating activity between the data cache the bus interface logic the L2 cache and the memory system The 750 has a copy back cache which relies on bus snooping to maintain cache coherency with other caches in the system For the 750 the coherency size of the bus is the size of a cache block 32 bytes This means that any bus transactions that cross an aligned 32 byte boundary must present a new address onto the bus at that boundary for proper snoop operation by the 750 or they must operate noncoherently with respect to the 750 As bus operations are performed on the bus by other bus masters the 750 bus snooping logic monitors the addresses and transfer attributes that are referenced The 750 snoops the bus transactions during the cycle that TS is asserted for any of the following qualified snoop conditions e The global signal GBL is asserted indicating that coherency enforcement is required e A reservation is currently active in the 750 as the result of an Iwarx instruction and the transfer type attributes TT 0 4 indicate a write or kill op
543. t be asserted for the qualified bus grant ABB address bus busy Assertion by the 750 indicates that the 750 is the address bus master The following list describes the data arbitration signals 8 10 DBG data bus grant Indicates that the 750 may with the proper qualification assume mastership of the data bus A qualified data bus grant occurs when DBG is asserted while DBB DRTRY and ARTRY are negated The DBB signal is driven by the current bus master DRTRY is only driven from the bus and ARTRY is from the bus but only for the address bus tenure associated with the current data bus tenure that is not from another address tenure DBWO data bus write only Assertion indicates that the 750 may perform the data bus tenure for an outstanding write address even if a read address is pipelined before the write address If DBWO is asserted the 750 will assume data bus mastership for a pending data bus write operation the 750 will take the data bus for a pending read operation if this input is asserted along with DBG and no write is pending Care must be taken with DBWO to ensure the desired write is queued for example a cache line snoop push out operation DBB data bus busy Assertion by the 750 indicates that the 750 is the data bus master The 750 always assumes data bus mastership if it needs the data bus and is given a qualified data bus grant see DBG For more detailed information on th
544. t hold time L2 DLL slow Setting L2SL increases the delay of each tap of the DLL delay line lt is intended to increase the delay through the DLL to accommodate slower L2 RAM bus frequencies Generally L2SL should be set if the L2 RAM interface is operated below 100 MHz L2 differential clock Setting L2DF configures the two clock out signals L2CLK_OUTA and L2CLK_OUTB of the L2 interface to operate as one differential clock In this mode the B clock is driven as the logical complement of the A clock This mode supports the differential clock requirements of late write SRAMs Generally this bit should be set when late write SRAMs are used L2 DLL bypass The DLL unit receives three input clocks e A square wave clock from the PLL unit to phase adjust and export A non square wave clock for the internal phase reference e A feedback clock L2SYNC_IN for the external phase reference Asserting L2BYP causes clock 2 to be used as clocks 1 and 2 Clock 2 is the actual clock used by the registers of the L2 interface circuitry L2BYP is intended for use when the PLL is being bypassed and for engineering evaluation If the PLL is being bypassed the DLL must be operated in divide by 1 mode and SYSCLK must be fast enough for the DLL to support Reserved These bits are implemented but not used keep at 0 for future compatibility L2CS L2 Clock Stop for chip revisions 3 0 and later Asserting this bit causes the L2 clocks to the SRAMS to
545. t in any other processor s cache The data in this cache block is consistent with system memory Invalid 1 This state indicates that the address block does not contain valid data or that the addressed cache block is not resident in the cache The 750 provides dedicated hardware to provide memory coherency by snooping bus transactions Figure 3 4 shows the MEI cache coherency protocol as enforced by the 750 Figure 3 4 assumes that the WIM bits for the page or block are set to 001 that is write back caching not inhibited and memory coherency enforced Chapter 3 Instruction and Data Cache Operation 3 7 SH CRW SH CRW WM RM RH Modified Wa RH odifie SH O WH SH CIR Bus Transactions SH Snoop Hit O Snoop Push RH Read Hit RM Read Miss WH Write Hit O Cache Block Fill WM Write Miss SH CRW Snoop Hit Cacheable Read Write SH CIR Snoop Hit Caching Inhibited Read Figure 3 4 MEI Cache Coherency Protocol State Diagram WIM 001 Since data cannot be shared the 750 signals all cache block fills as if they were write misses read with intent to modify which flushes the corresponding copies of the data in all caches external to the 750 prior to the cache block fill operation Following the cache block load the 750 is the exclusive owner of the data and may write to it without a bus broadcast transaction To maintain the three state coherency all global reads observed on the bus by the 750 are sno
546. t in the reservation address register see Chapter 3 Instruction and Data Cache Operation for more information For information about timing see Section 7 2 9 7 3 Reservation RSRV Output Chapter 8 Bus Interface Operation 8 43 8 8 2 TLBISYNC Input The TLBISYNC input allows for the hardware synchronization of changes to MMU tables when the 750 and another DMA master share the same MMU translation tables in system memory It is asserted by a DMA master when it is using shared addresses that could be changed in the MMU tables by the 750 during the DMA master s tenure The TLBISYNC input when asserted to the 750 prevents the 750 from completing any instructions past a tlbsync instruction Generally during the execution of an eciwx or ecowx instruction by the 750 the selected DMA device should assert the 750 s TLBISYNC signal and maintain it asserted during its DMA tenure if it is using a shared translation address Subsequent instructions by the 750 should include a sync and tlbsync instruction before any MMU table changes are performed This will prevent the 750 from making table changes disruptive to the other master during the DMA period 8 9 IEEE 1149 1a 1993 Compliant Interface The 750 boundary scan interface is a fully compliant implementation of the IEEE 1149 1a 1993 standard This section describes the 750 s IEEE 1149 1a 1993 JTAG interface 8 9 1 JTAG COP Interface The 750 has extensive on chip test cap
547. t may be encountered again when instruction processing resumes Post instruction execution Trace Trace exceptions are generated following execution and completion of an instruction while trace mode is enabled If executing the instruction produces conditions for another type of exception that exception is taken and the post instruction exception is forgotten for that instruction Chapter 4 Exceptions 4 5 Note that these exception classifications correspond to how exceptions are prioritized as described in Table 4 3 Table 4 3 PowerPC 750 Exception Priorities Asynchronous Exceptions Interrupts System reset Power on reset assertion of HRESET and TRST hard reset Any enabled machine check condition L1 address or data parity error L2 data parity error assertion of TEA or MCP e pemen pass roo Instruction Fetch Exceptions Any ISI exception condition Instruction Dispatch Execution Exceptions Instruction address Any instruction address breakpoint exception condition breakpoint Program Occurrence of an illegal instruction privileged instruction or trap exception condition Note that floating point enabled program exceptions have lower priority ee System call System Call sc instruction Floating point Any floating point unavailable exception condition unavailable E A floating point enabled exception condition lowest priority program exception ooo E exception due to eciwx ecowx with EAR E 0 DSIS
548. t word block of data to be copied to the write back buffer Within one cycle the instruction cache provides up to four instructions to the instruction queue The instruction cache can be invalidated entirely or on a cache block basis The instruction cache can be disabled and invalidated by clearing HIDO ICE and setting HIDO ICFT The instruction cache can be locked by setting HIDO ILOCK The instruction cache supports only the valid invalid states Chapter 1 PowerPC 740 PowerPC 750 Overview 1 13 The 750 also implements a 64 entry 16 set four way set associative branch target instruction cache BTIC The BTIC is a cache of branch instructions that have been encountered in branch loop code sequences If the target instruction is in the BTIC it is fetched into the instruction queue a cycle sooner than it can be made available from the instruction cache Typically the BTIC contains the first two instructions in the target stream The BTIC can be disabled and invalidated through software For more information and timing examples showing cache hit and cache miss latencies see Section 6 3 2 Instruction Fetch Timing 1 2 5 L2 Cache Implementation Not Supported in the PowerPC 740 The L2 cache is a unified cache that receives memory requests from both the L1 instruction and data caches independently The L2 cache is implemented with an on chip two way set associative tag memory and with external synchronous SRAMs for data storage
549. ta to be provided one cycle after the write operation is signaled on the address and control buses In this way write operations are queued on the address and data bus in the same way as read operations allowing transitions between read and write operations to occur more efficiently 9 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Figure 9 33 shows a burst read write read memory access sequence when the L2 cache interface is configured with late write SRAM SRAMCIk L2CE L2WE SRAMAddress SRAMMemory SRAMData Note WA is the last previous write that was queued in the late write RAM Figure 9 33 Burst Read Write Read L2 Cache Access Late Write SRAM Figure 9 34 shows a burst read modify write memory access sequence when the L2 cache interface is configured with late write SRAM SRAMCk OL U U U UUU uuruuu uan L2CE DAT A e abe A A AAA EE A L2WE burst rd burst rd SRAMAddress SRAMMemory SRAMData Note WA is the last previous write that was queued in the late write RAM Figure 9 34 Burst Read Modify Write L2 Cache Access Late Write SRAM Chapter 9 L2 Cache Interface Operation 9 13 Figure 9 35 shows a burst read write write memory access sequence when the L2 cache interface is configured with late write SRAM SRAMCIK LJ LJ LILI PL Lf U UM L2CE
550. tage Also as will be shown in cycle 4 there is a single cycle stall that occurs when the FPU pipeline is full Because there were three vacancies in the instruction queue in the previous clock cycle instructions 8 11 are fetched in this clock cycle Instruction 1 completes in cycle 4 allowing instruction 2 to complete Instructions 3 and 6 continue through the FPU pipeline Although instruction 7 is in IQ it cannot be dispatched because the FPU is busy and because instruction 7 cannot be dispatched neither can instruction 8 The additional cycle stall allows the instruction queue to be completely filled Because there was one opening in the instruction queue in clock cycle 3 one instruction is fetched 12 and the instruction queue is full In cycle 5 instruction 3 completes allowing instruction 7 to be dispatched to the FPU which in turn allows instruction 8 to be dispatched to the IU2 Instructions 9 and 10 drop to the dispatch positions in the instruction queue No instructions are fetched in this clock cycle because there were no vacant IQ entries in clock cycle 4 Incycle 6 instruction 6 completes instruction 7 is in stage 2 of the FPU execute stage and although instruction 8 has executed it must wait for instruction 7 to complete The two integer instructions 9 and 10 are dispatched to the IU2 and IU1 respectively Fetching resumes with instructions 13 and 14 Chapter 6 Instruction Timing 6 13 7 In cycle 7 in
551. te ordering see Byte Ordering in Chapter 3 Operand Conventions in The Programming Environments Manual Table 2 34 Integer Load and Store with Byte Reverse Instructions je Lues Ss Load Half Word Byte Reverse Indexed bn Ip Load Word Byte Reverse Indexed w n TI on Store Half Word Byte Reverse Indexed sthbrx TI eng Store Word Byte Reverse Indexed stwbrx rSwArB 2 3 4 3 7 Integer Load and Store Multiple Instructions The load store multiple instructions are used to move blocks of data to and from the GPRs The load multiple and store multiple instructions may have operands that require memory accesses crossing a 4 Kbyte page boundary As a result these instructions may be interrupted by a DSI exception associated with the address translation of the second page Implementation Notes The following describes the 750 implementation of the load store multiple instruction e For load store string operations the hardware does not combine register values to reduce the number of discrete accesses However if store gathering is enabled and the accesses fall under the criteria for store gathering the stores may be combined to enhance performance At a minimum additional cache access cycles are required e The 750 supports misaligned single register load and store accesses in little endian mode without causing an alignment exception However execution of misaligned load store multiple string operations causes an
552. teanteaeesces 4 19 DSI Exception UDI a iii 4 19 IST Exception OxQ0400 vasca EEGEN 4 19 External Interrupt Exception Ox00500 0 0 eee eeeceseeeneeeeeeceseeeeeeesneeenaeens 4 20 Alignment Exception COXDOGUO evoca added 4 20 Program Exception OxU0 FOO EE 4 20 Floating Point Unavailable Exception Ox00800 ocooococonocccconcccconcconnnacinnncnos 4 21 Decrementer Exception Ox00900 lt cssccsssscesscciscnccssanedcanscectadesseseders sesseaanabe 4 21 System Call Exception OSUOCO0 4 21 Trace Exception 0X00DOO EE 4 22 Floating Point Assist Exception OSUOPOU 4 22 Performance Monitor Interrupt 0X00F00 oooococnnoccccoocccnonnncnoncncnonaconanccinnncnos 4 22 Instruction Address Breakpoint Exception UxOT 2001 4 23 System Management Interrupt 0x01400 oooooonnoccccnocccconcnononnnononcconancnnnnncconno 4 25 ix Paragraph Number 4 5 16 5 1 5 1 1 5 1 2 5 1 3 5 1 4 5 1 5 5 1 6 5 1 6 1 5 1 6 2 5 1 7 5 1 8 5 2 32 5 4 5 4 1 5 4 1 1 5 4 1 2 5 4 1 3 5 4 2 5 4 3 5 4 3 1 5 4 3 2 5 4 4 5 4 5 5 4 6 5 4 7 6 1 6 2 6 3 6 3 1 6 3 2 6 3 2 1 6 3 2 2 6 3 2 3 Contents Page ae Number Thermal Management Interrupt Exception SOT 00 4 26 Chapter 5 Memory Management MMU ONE eene een Eege 5 2 Memory AE Ee 5 4 RRE Te RE EE 5 4 Address Translation Mechansms ak 5 9 Memory Protection Facilities ii c 5 cccsnscsesescassacds ascents cecdqntssustacsvanedcevasseeaanedens 5 11 Page History Information ts
553. ted with an on chip two way set associative tag memory and with external synchronous SRAMs for data storage The external SRAMs are accessed through a dedicated L2 cache port that supports a single bank of up to 1 Mbyte of synchronous SRAMs The L2 cache interface is not implemented in the PowerPC 740 For information about the L2 cache implementation see Chapter 9 L2 Cache Interface Operation The 750 has a 32 bit address bus and a 64 bit data bus Multiple devices compete for system resources through a central external arbiter The 750 s three state cache coherency protocol MEI supports the exclusive modified and invalid states a compatible subset of the MESI modified exclusive shared invalid four state protocol and it operates coherently in systems with four state caches The 750 supports single beat and burst data transfers for memory accesses and memory mapped I O operations The system interface is described in Chapter 7 Signal Descriptions and Chapter 8 Bus Interface Operation The 750 has four software controllable power saving modes Three static modes doze nap and sleep progressively reduce power dissipation When functional units are idle a dynamic power management mode causes those units to enter a low power mode automatically without affecting operational performance software execution or external hardware The 750 also provides a thermal assist unit TAU and a way to reduce the instruction fet
554. ter 10 Power and Thermal Management 10 9 10 3 2 3 PowerPC 750 Junction Temperature Determination While the 750 s TAU does not implement an analog to digital converter to enable the direct determination of the junction temperature system software can execute a simple successive approximation routine to find the junction temperature The TAU configuration used to approximate the junction temperature is the same required for single threshold mode except that the threshold SPR selected has its TIE bit cleared to 0 to disable thermal management interrupt generation Once the TAU is enabled the successive approximation routine loads a threshold value into the active threshold SPR and then continuously polls the threshold SPRs TIV bit until it is set to 1 indicating a valid TIN bit The successive approximation routine can then evaluate the TIN bit value and then increment or decrement the threshold value for another comparison This process is continued until the junction temperature is determined 10 3 2 4 Power Saving Modes and TAU Operation The static power saving modes provided by the 750 the nap doze and sleep modes allow the temperature of the processor to be lowered quickly and can be invoked through the use of the TAU and associated thermal management interrupt The TAU remains operational in the nap and doze modes and in sleep mode as long as the SYSCLK signal input remains active If the SYSCLK signal is made static when sleep mo
555. ters in Chapter 2 PowerPC Register Set of The Programming Environments Manual Because BAT upper and lower words are loaded separately software must ensure that BAT translations are correct during the time that both BAT entries are being loaded The 750 implements the G bit in the IBAT registers however attempting to execute code from an IBAT area with G 1 causes an ISI exception This complies with the revision of the architecture described in The Programming Environments Manual SDR1 The SDR1 register specifies the page table base address used in virtual to physical address translation See SDR1 in Chapter 2 PowerPC Register Set of The Programming Environments Manual Segment registers SR The PowerPC OEA defines sixteen 32 bit segment registers SRO SR15 Note that the SRs are implemented on 32 bit implementations only The fields in the segment register are interpreted differently depending on the value of bit 0 See Segment Registers in Chapter 2 PowerPC Register Set of The Programming Environments Manual for more information Note that the 750 implements separate memory management units MMUs for instruction and data It associates the architecture defined SRs with the data MMU DMMU It reflects the values of the SRs in separate so called shadow segment registers in the instruction MMU IMMU Chapter 2 Programming Model 2 5 Exception handling registers
556. ters However for compatibility with processors that do those registers can be written to by boot code without causing an exception SDA is SPR 959 USDA is SPR 943 2 20 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 2 1 3 Instruction Cache Throttling Control Register ICTC Reducing the rate of instruction fetching can control junction temperature without the complexity and overhead of dynamic clock control System software can control instruction forwarding by writing a nonzero value to the ICTC register a supervisor level register shown in Figure 2 9 The overall junction temperature reduction comes from the dynamic power management of each functional unit when the 750 is idle in between instruction fetches PLL phase locked loop and DLL delay locked loop configurations are unchanged Reserved Figure 2 9 Instruction Cache Throttling Control Register ICTC Table 2 14 describes the bit fields for the ICTC register Table 2 14 ICTC Bit Settings CIA Tt O OUO eae pee Instruction forwarding interval expressed in processor clocks 0x00 Oclock cycle 0x01 1 clock cycle OxFF 255 clock cycles Cache throttling enable O Disable instruction cache throttling 1 Enable instruction cache throttling Instruction cache throttling is enabled by setting ICTC E and writing the instruction forwarding interval into ICTC FI Enabling disabling and changing the instruction forwarding interval af
557. ters are provided to support floating point operations Stalls due to contention for FPRs are minimized by automatic allocation of the six floating point rename registers The 750 writes the contents of the rename registers to the appropriate FPR when floating point instructions are retired by the completion unit The 750 supports all IEEE 754 floating point data types normalized denormalized NaN zero and infinity in hardware eliminating the latency incurred by software exception routines Note that exception is also referred to as interrupt in the architecture specification 1 2 2 4 3 Load Store Unit LSU The LSU executes all load and store instructions and provides the data transfer interface between the GPRs FPRs and the cache memory subsystem The LSU calculates effective addresses performs data alignment and provides sequencing for load store string and multiple instructions Load and store instructions are issued and translated in program order however some memory accesses can occur out of order Synchronizing instructions can be used to enforce strict ordering When there are no data dependencies and the guarded bit for the page or block is cleared a maximum of one out of order cacheable load operation can execute per cycle with a two cycle total latency on a cache hit Data returned from the cache is held in a rename register until the completion logic commits the value to a GPR or FPR Stores cannot be executed
558. tes one stage it can pass on to the next stage leaving the previous stage available to the subsequent instruction This improves overall instruction throughput A superscalar processor is one that issues multiple independent instructions into separate execution units allowing instructions to execute in parallel The 750 has six independent execution units two for integer instructions and one each for floating point instructions branch instructions load store instructions and system register instructions Having separate GPRs and FPRs allows integer floating point calculations and load and store operations to occur simultaneously without interference Additionally rename buffers are provided to allow operations to post execution results for use by subsequent instructions without committing them to the architected FPRs and GPRs As shown in Figure 1 6 the common pipeline of the 750 has four stages through which all instructions must pass fetch decode dispatch execute and complete write back Some instructions occupy multiple stages simultaneously and some individual execution units have additional stages For example the floating point pipeline consists of three stages through which all floating point instructions must pass Maximum four instruction fetch per clock cycle Maximum three instruction dispatch per clock cycle includes one branch instruction Dispatch Execute Stage Maximum two instruction completion pe
559. the L2 cache and are forwarded to the 60x bus 9 4 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual 9 1 2 L2 Cache Control Register L2CR The L2 cache control register is used to configure and enable the L2 cache The L2CR is a supervisor level read write implementation specific register that is accessed as SPR 1017 The contents of the L2CR are cleared during power on reset Table 9 8 describes the L2CR bits For additional information about the configuration of the L2CR refer to Section 2 1 5 L2 Cache Control Register L2CR Table 9 8 L2 Cache Control Register L2 size Should be set according to the size of the L2 data RAMs used 00 Reserved 01 256 Kbyte 10 512 Kbyte 11 1 Mbyte L2 clock ratio core to L2 frequency divider 000 L2 clock and DLL disabled 001 1 010 1 5 011 Reserved 100 2 101 25 110 3 Reserved L2 RAM type Configures the L2 RAM interface for the type of synchronous SRAMs used 00 Flow through register buffer synchronous burst SRAM 01 Reserved 10 Pipelined register register synchronous burst SRAM 11 Pipelined register register synchronous late write SRAM L2 data only Setting this bit disables the caching of instructions in the L2 cache L2 global invalidate Setting L2I invalidates the L2 cache globally by clearing the L2 status bits L2 RAM control ZZ enable Setting L2CTL enables the automatic operation of the L2ZZ low power mode signal for cache RAMs that sup
560. the entire TB register at once e The time base counter is clocked at a frequency that is one fourth that of the bus clock Counting is enabled by assertion of the time base enable TBE input signal 2 3 5 2 Memory Synchronization Instructions VEA Memory synchronization instructions control the order in which memory operations are completed with respect to asynchronous events and the order in which memory operations are seen by other processors or memory access mechanisms See Chapter 3 Instruction and Data Cache Operation for more information about these instructions and about related aspects of memory synchronization In addition to the syne instruction specified by UISA the VEA defines the Enforce In Order Execution of I O eieio and Instruction Synchronize isync instructions The number of cycles required to complete an eieio instruction depends on system parameters and on the processor s state when the instruction is issued As a result frequent use of this instruction may degrade performance slightly Chapter 2 Programming Model 2 61 Table 2 51 describes the memory synchronization instructions defined by the VEA Table 2 51 Memory Synchronization Instructions VEA The eieio instruction is dispatched to the LSU and executes after all previous In Order cache inhibited or write through accesses are performed all subsequent Execution of instructions that generate such accesses execute after eieio If HIDO ABE 1 an
561. the incorrect path are flushed from the processor and processing begins from the correct path The 750 allows a second branch instruction to be predicted instructions from the second predicted instruction stream can be fetched but cannot be dispatched Dynamic prediction is implemented using a 512 entry branch history table BHT a cache that provides two bits per entry that together indicate four levels of prediction for a branch instruction not taken strongly not taken taken strongly taken When dynamic branch prediction is disabled the BPU uses a bit in the instruction encoding to predict the direction of the conditional branch Therefore when an unresolved conditional branch instruction is encountered the 750 executes instructions from the predicted target stream although the results are not committed to architected registers until the conditional branch is resolved This execution can continue until a second unresolved branch instruction is encountered When a branch is taken or predicted as taken the instructions from the untaken path must be flushed and the target instruction stream must be fetched into the IQ The BTIC is a 64 entry cache that contains the most recently used branch target instructions typically in pairs When an instruction fetch hits in the BTIC the instructions arrive in the instruction queue in the next clock cycle a clock cycle sooner than they would arrive from the instruction cache Additional instructions a
562. the processor would have attempted to execute next if no exception conditions were present SRR1 0 Loaded with equivalent MSR bits 1 4 Cleared 5 9 Loaded with equivalent MSR bits 10 15 Cleared 16 31 Loaded with equivalent MSR bits MSR et to value of ILE Like the external interrupt a system management interrupt is signaled to the 750 by the assertion of an input signal The system management interrupt signal SMI is expected to remain asserted until the interrupt is taken If SMI is negated early recognition of the interrupt request is not guaranteed After the 750 begins execution of the system management interrupt handler the system can safely negate SMI After the assertion of SMI is detected the 750 stops dispatching instructions and waits for all pending instructions to complete This allows any instructions in progress that need to take an exception to do so before the system management interrupt is taken When a system management interrupt exception is taken instruction fetching resumes as offset 0x01400 from the base address indicated by MSR IP Chapter 4 Exceptions 4 25 4 5 16 Thermal Management Interrupt Exception 0x01 700 A thermal management interrupt is generated when the junction temperature crosses a threshold programmed in either THRM1 or THRM2 The exception is enabled by the TIE bit of either THRM1 or THRM2 and can be masked by setting MSR EE Table 4 16 lists register settings when a thermal management int
563. the touch load address for the cache The interface allows one level of pipelining that is with certain restrictions discussed later there can be two outstanding transactions at any given time Accesses are prioritized with load operations preceding store operations Instructions are automatically fetched from the memory system into the instruction unit where they are dispatched to the execution units at a peak rate of two instructions per clock Conversely load and store instructions explicitly specify the movement of operands to and from the integer and floating point register files and the memory system When the 750 encounters an instruction or data access it calculates the logical address effective address in the architecture specification and uses the low order address bits to check for a hit in the on chip 32 Kbyte instruction and data caches During cache lookup the instruction and data memory management units MMUs use the higher order address bits to calculate the virtual address from which they calculate the physical address real 8 2 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual address in the architecture specification The physical address bits are then compared with the corresponding cache tag bits to determine if a cache hit occurred in the L1 instruction or data cache If the access misses in the corresponding cache the physical address is used to access the L2 cache tags if the L2 cache is enabled
564. their encodings Table 11 5 PMC1 Events MMCRO 19 25 Select Encodings Ec A A 0000011 Number of transitions from 0 to 1 of specified bits in time base lower register Bits are specified through RTCSELECT MMRCO 7 8 00 15 01 19 10 23 11 31 Bits MMCRO 26 31 specify events associated with PMC2 as shown in Table 11 6 Table 11 6 PMC2 Events MMCRO 26 31 Select Encodings ET OOOO A 00 0000 Nothing Register holds current value 00 0001 0001 Processorcycles Processorcycles Count every eyele every Count every eyele EC 0010 Number of instructions that have completed Indicates number of instructions that have completed Does not include folded branches Chapter 11 Performance Monitor 11 7 Table 11 6 PMC2 Events MMCRO 26 31 Select Encodings Continued ell ee 00 0011 Time base lower bit transitions Counts transitions from 0 to 1 of specified bits in time base lower register Bits are specified through RTCSELECT MMRCO 7 8 00 15 01 19 10 23 11 31 00 0100 Number of instructions dispatched O 1 or 2 instructions per cycle 00 0101 Number of L1 cache misses Indicates the number of times an instruction fetch missed the L1 instruction cache 00 0110 Number of ITLB misses Indicates the number of times the needed instruction address translation was not in the ITLB 00 0111 L1 I misses Counts the number of accesses which miss the L2 due to an I side request 00 100
565. ther the target instruction stream is in the BTIC the instruction cache or if it must be fetched from the L2 cache or from system memory 6 18 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Figure 6 7 shows cases where there is a BTIC hit and when there is a BTIC miss and instruction cache hit If there is a BTIC hit on the next clock cycle the b instruction is replaced by the target instruction and1 that was found in the BTIC the second and instruction is also fetched from the BTIC On the next clock cycle the next four and instructions from the target stream are fetched from the instruction cache If the target instruction is not in the BTIC there is an idle cycle while the fetcher attempts to fetch the first four instructions from the instruction cache on the next clock cycle In the example in Figure 6 7 the first four target instruction are fetched on the next clock If it misses in the caches an L2 cache or memory access is required the latency of which is dependent on several factors such as processor bus clock ratios In most cases new instructions arrive in the IQ before the execution units become idle Branch Folding Branch Folding Taken Branch BTIC Hit Taken Branch BTIC Miss Clock0 Clock 1 Clock 2 Clock0 Clock 1 Clock 2 105 add5 1Q5 add5 104 add4 1Q4 add4 1Q3 add3 and6 IQ3 add3 and4 1Q2 b and5 1Q2 b and3 1Q1 add2 and2 and4 1Q1 add2 and2 1Q0 addi a
566. timate Single l fres fres Ing Floating Reciprocal Square Root Estimate 1 frsqrte frsqrte Tipp Note The fsel instruction is optional in the PowerPC architecture All single precision arithmetic instructions are performed using a double precision format The floating point architecture is a single pass implementation for double precision products In most cases a single precision instruction using only single precision operands in double precision format has the same latency as its double precision equivalent 2 3 4 2 2 Floating Point Multiply Add Instructions These instructions combine multiply and add operations without an intermediate rounding operation The floating point multiply add instructions are summarized in Table 2 27 2 42 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 27 Floating Point Multiply Add Instructions el e FIN Precision i 2 3 4 2 3 Floating Point Rounding and Conversion Instructions The Floating Round to Single Precision frsp instruction is used to truncate a 64 bit double precision number to a 32 bit single precision floating point number The floating point convert instructions convert a 64 bit double precision floating point number to a 32 bit signed integer number Examples of uses of these instructions to perform various conversions can be found in Appendix D Floating Point Models in The Programming Environments Manual Table 2 28
567. ting address is a multiple of its size The page table contains a number of page table entry groups PTEGs A PTEG contains eight page table entries PTEs of eight bytes each therefore each PTEG is 64 bytes long PTEG addresses are entry points for table search operations Setting MSR IR enables instruction address translations and MSR DR enables data address translations If the bit is cleared the respective effective address is the same as the physical address 1 8 2 PowerPC 750 Microprocessor Memory Management Implementation The 750 implements separate MMUs for instructions and data It implements a copy of the segment registers in the instruction MMU however read and write accesses mfsr and mtsr are handled through the segment registers implemented as part of the data MMU The 750 MMU is described in Section 1 2 3 Memory Management Units MMUs The R referenced bit is updated in the PTE in memory if necessary during a table search due to a TLB miss Updates to the changed C bit are treated like TLB misses A complete table search is performed and the entire TLB entry is rewritten to update the C bit Chapter 1 PowerPC 740 PowerPC 750 Overview 1 33 1 9 Instruction Timing The 750 is a pipelined superscalar processor A pipelined processor is one in which instruction processing is divided into discrete stages allowing work to be done on different instructions in each stage For example after an instruction comple
568. tion e EFE floating point enabled exceptions a type of program exception are ignored when both MSR FE0 and MSR FE1 are cleared If either bit is set all IEEE enabled floating point exceptions are taken and cause a program exception e Asynchronous maskable exceptions such as the external and decrementer interrupts are enabled by setting MSR EE When MSR EE 0 recognition of these exception conditions is delayed MSR EE is cleared automatically when an exception is taken to delay recognition of conditions causing those exceptions e A machine check exception can occur only if the machine check enable bit MSR ME is set If MSR ME is cleared the processor goes directly into checkstop state when a machine check exception condition occurs Individual machine check exceptions can be enabled and disabled through bits in the HIDO register which is described in Table 4 10 e System reset exceptions cannot be masked 4 3 2 Steps for Exception Processing After it is determined that the exception can be taken by confirming that any instruction caused exceptions occurring earlier in the instruction stream have been handled and by confirming that the exception is enabled for the exception condition the processor does the following 1 SRRO is loaded with an instruction address that depends on the type of exception See the individual exception description for details about how this register is used for specific exceptions 2
569. tion 6 4 1 2 Branch Instructions and Completion As described in the previous section instructions that do not update either the LR or CTR are removed from the instruction stream before they reach the completion queue either by branch folding in the case of taken branches or by removing fall through branch instructions at dispatch in the case of non taken branches However branch instructions that update the architected LR and CTR must do so in program order and therefore must perform write back in the completion stage like the instructions that update the FPRs and GPRs Branch instructions that update the CTR or LR pass through the instruction queue like nonbranch instructions At the point of dispatch however they are not sent to an execution unit but rather are assigned a slot in the completion queue as shown in Figure 6 9 Branch Completion LR CTR Write Back Clock0 Clock 1 Clock2 Clock 3 1Q5 add5 1Q4 add4 1Q3 add3 add5 add7 add9 1Q2 be add4 add6 add8 1Q1 add2 add3 add5 add7 IQ0 add1 bc add4 add6 CQ5 CQ4 CQ3 CQ2 cai add2 add3 add5 cao add1 bc add4 Figure 6 9 Branch Completion In this example the be instruction is encoded to decrement the CTR It is predicted as not taken in clock cycle 0 In clock cycle 2 be and add3 are both dispatched In clock cycle 3 the architected CTR is updated and the be instruction is retired from the completion queue 6 20
570. tion cache Chapter 3 Instruction and Data Cache Operation 3 21 3 5 5 Data Cache Block Push Operation When a cache block in the 750 is snooped and hit by another bus master and the data is modified the cache block must be written to memory and made available to the snooping device The cache block that is hit is said to be pushed out onto the 60x bus The 750 supports two kinds of push operations normal push operations and enveloped high priority push operations which are described in Section 3 5 5 1 Enveloped High Priority Cache Block Push Operation 3 5 5 1 Enveloped High Priority Cache Block Push Operation In cases where the 750 has completed the address tenure of a read operation and then detects a snoop hit to a modified cache block by another bus master the 750 provides a high priority push operation If the address snooped is the same as the address of the data to be returned by the read operation ARTRY is asserted one or more times until the data tenure of the read operation is completed The cache block push transaction can be enveloped within the address and data tenures of a read operation This feature prevents deadlocks in system organizations that support multiple memory mapped buses More specifically the 750 internally detects the scenario where a load request is outstanding and the processor has pipelined a write operation on top of the load Normally when the data bus is granted to the 750 the resulting data
571. tion describes external interrupts checkstop operations and hard and soft reset inputs 8 7 1 External Interrupts The external interrupt input signals INT SMI and MCP of the 750 eventually force the processor to take the external interrupt vector or the system management interrupt vector if the MSR EE is set or the machine check interrupt if the MSR ME and the HIDO EMCP bits are set 8 7 2 Checkstops A checkstop causes the processor to halt and assert the checkstop output pin CKSTP_OUT_ Once the 750 enters a checkstop state only a hard reset can clear the processor from the checkstop state The 750 has two checkstop input signals CKSTP_IN nonmaskable and MCP enabled when MSR ME is cleared and HIDO EMCP is set and a checkstop output CKSTP_OUT signal If CKSTP_IN or MCP is asserted the 750 halts operations by gating off all internal clocks The 750 asserts CKSTP_OUT if CKSTP_IN is asserted If CKSTP_OUT is asserted by the 750 it has entered the checkstop state and processing has halted internally The CKSTP_OUT signal can be asserted for various reasons including receiving a TEA signal and detection of external parity errors For more information about checkstop state see Section 4 5 2 2 Checkstop State MSR ME 0 Following is the list of checkstop sources e Machine Check with MSR ME 0 If MSR ME 0 when a machine check interrupt occurs then the checkstop state is entered The machine check sour
572. tion on static branch prediction see Conditional Branch Control in Chapter 4 of The Programming 1 The debt and debtst instructions are no oped globally 2 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 5 shows how HIDO BCLK HIDO ECLK and HRESET are used to configure CLK_OUT See Section 7 2 11 2 Clock Out CLK_OUT Output for more information Table 2 5 HIDO BCLK and HIDO ECLK CLK_OUT Configuration HRESET HIDO ECLK HIDO BCLK CLK_OUT e e INICIAN CONAN De o pri Negated Negated Negated Note For 750 chip revisions 3 0 and later the ECLK BCLK setting of 00 will not select the Hi Z state Instead it will select a diagnostic monitor signal for the DLL unit of the L2 cache HIDO can be accessed with mtspr and mfspr using SPR1008 2 1 2 3 Hardware Implementation Dependent Register 1 The hardware implementation dependent register 1 HID1 reflects the state of the PLL_CFG 0 3 signals The HID1 bits are shown in Figure 2 4 Reserved ZETA 12 3 4 31 Figure 2 4 Hardware Implementation Dependent Register 1 HID1 The HID1 bits are described in Table 2 6 Table 2 6 HID1 Bit Functions o Pco PLL configuration bit O read only Es E RE PLL configuration bit 1 read only 2 Pez PLL configuration bit 2 read only a pros PLL configuration bit 3 read only on TI J een Note The clock configuration bits reflect the state of the PLL_CFG 0 3 signa
573. tion was correct program flow continues along that path otherwise the processor flushes any instructions and their results from the mispredicted path and program flow resumes along the correct path Static branch prediction is used when HIDO BHT is cleared That is the branch history table which is used for dynamic branch prediction is disabled For information about static branch prediction see Conditional Branch Control in Chapter 4 Addressing Modes and Instruction Set Summary in The Programming Environments Manual 6 4 1 3 2 Predicted Branch Timing Examples Figure 6 10 shows cases where branch instructions are predicted It shows how both taken and not taken branches are handled and how the 750 handles both correct and incorrect predictions The example shows the timing for the following instruction sequence 0 add 1 add 2 be ES mulhw 4 bc TO 5 fadd 6 and add T7 add T8 add T9 add T10 add TEL er 6 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual eco l l l I Coad EE Fetch 1 add Kn In dispatch entry IQ0 IQ1 2 be OO e PEE Predict 3 mulhw Execute 4 bc Complete In CQ 5 fadd To add E In retirement entry CQ0 CQ1 T1 add T2 add T3 add T4 and T5 or 5 fadd ae can T e o l l Ins
574. tions ooccncccnoncnooncnonnnannnnnnns A 21 Floating Point Compare Instructions oooconoccnoncnonnnonnnononcnonccnnn conc ccnn ccoo ncnnncnrnnno A 21 Floating Point Status and Control Register Instructions oooconccnoninonnnancnnnnos A 21 Inteser Load USMC ONS id A 22 Integer Store Instructions entre hee geg A 23 Integer Load and Store with Byte Reverse Instructions eeeeecesceeeeereerreeree A 23 Integer Load and Store Multiple Instructtons A 23 Integer Load and Store String Instructons A 24 Memory Synchronization Instructions oonoccnoccnoncnnnonnnoncnoncnnnn conc ccoo nono ncnnnccnnnnos A 24 Floating Point Load Instructions 2 toate Aa Aa A 24 Floating Point Store Instructions vecinita A 25 Floating Point Move Instructions ccoooccnnnnccnonoccnononononononnnnncnnnnnconnnnonnnnnrnnnncnnnnos A 25 Branch stud A 25 Condition Register Logical Instructons cnn no A 26 System Linkage Instructions tege A 26 Trap TIS TRUCE ONS pen O A 26 Processor Control InstruchonS cnt A 27 Cache Management Instructions crisi tdt caia A 27 Segment Register Manipulation Instructions ooonnocononononcnnonnnonncnonccannnnnn corn ncnos A 28 Lookaside Buffer Management Instructions ooooocccnnncccnoncccnoncnonnnanonnnncnnnnncnnnnos A 28 External Control Ins uu A a A 28 A O A 29 B FOrM sida Mi dee eae eee i eee eke dN A 29 SCEO tege eege A 29 EE A 29 DET EE A 31 AA A T ET A A 31 KLEFO A lt de acl Se EU A 36 ALA OPEN E A 36 MELA OM EE A
575. to the processor In each exception handler When enough state information has been saved that a machine check or system reset exception can reconstruct the previous state set MSR RI In each exception handler Clear MSR RI set SRRO and SRR1 appropriately and then execute rfi Note that the RI bit being set indicates that with respect to the processor enough processor state data remains valid for the processor to continue but it does not guarantee that the interrupted process can resume 4 3 4 Returning from an Exception Handler The Return from Interrupt rfi instruction performs context synchronization by allowing previously issued instructions to complete before returning to the interrupted process In general execution of the rfi instruction ensures the following All previous instructions have completed to a point where they can no longer cause an exception If a previous instruction causes a direct store interface error exception the results must be determined before this instruction is executed Previous instructions complete execution in the context privilege protection and address translation under which they were issued The rfi instruction copies SRR1 bits back into the MSR Instructions fetched after this instruction execute in the context established by this instruction Program execution resumes at the instruction indicated by SRRO For a complete description of context synchronization refer to Chapter
576. tore byte reverse Ihbrx Ibrx sthbrx stwbrx instructions with greater latency than other types of load store instructions This is not the case for the 750 These instructions operate with the same latency as the other load store instructions e The PowerPC architecture describes some preferred instruction forms for load and store multiple instructions and integer move assist instructions that may perform better than other forms in some implementations None of these preferred forms affect instruction performance on the 750 2 46 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual e The PowerPC architecture defines the lwarx and stwex as a way to update memory atomically In the 750 reservations are made on behalf of aligned 32 byte sections of the memory address space Executing Iwarx and stwex to a page marked write through does not cause a DSI exception if the W bit is set but as with other memory accesses DSI exceptions can result for other reasons such as a protection violations or page faults e In general because stwex always causes an external bus transaction it has slightly worse performance characteristics than normal store operations Table 2 32 summarizes the integer load instructions Table 2 32 Integer Load Instructions e el ee C a Teezer Je Oem Teom CU EEC EEE CN ET eem Jr a egener Je Jens eegenen Je Jee rot a 020 up ec awn Jens ESCHER ECT ESTO CI ECC EIN CT E EI aux Oem ECU EC Iegmeaa
577. truction i Queue l l 3 5 T5 T5 8 2 bc 4 T4 T4 7 E 1 3 Ti T3 T3 6 0 2 TO T2 T2 5 Completion ueue 3 Y 8 8 8 2 TO T1 7 7 7 1 1 3 TO 6 6 6 6 0 0 2 3 5 5 5 5 Instructions 5 and 6 are not in the IQ in clock cycle 5 Here the fetch stage shows cache latency 0 Figure 6 10 Branch Instruction Timing During clock cycle 0 instructions O and 1 are dispatched to their respective execution units Instruction 2 is a branch instruction that updates the CTR It is predicted as not taken in clock cycle 0 Instruction 3 is amulhw instruction on which instruction 4 depends Chapter 6 Instruction Timing 6 23 1 Inclock cycle 1 instructions 2 and 3 enter the dispatch entries in the IQ Instruction 4 a second be instruction and 5 are fetched The second be instruction is predicted as taken It can be folded but it cannot be resolved until instruction 3 writes back 2 In clock cycle 2 instruction 4 has been folded and instruction 5 has been flushed from the IQ The two target instructions TO and T1 are both in the BTIC so they are fetched in this cycle Note that even though the first be instruction may not have resolved by this point we can assume it has the 750 allows fetching from a second predicted branch stream However these instructions could not be dispatched until the previous branch has resolved 3 In clock cycle 3 target instructions T2 T5 are fetched as TO and
578. tructions This section describes the integer instructions These consist of the following e Integer arithmetic instructions e Integer compare instructions e Integer logical instructions e Integer rotate and shift instructions Integer instructions use the content of the GPRs as source operands and place results into GPRs into the integer exception register XER and into condition register CR fields 2 3 4 1 1 Integer Arithmetic Instructions Table 2 21 lists the integer arithmetic instructions for the PowerPC processors Table 2 21 Integer Arithmetic Instructions YO oe o Pamesa fe OOOO CST Foaroan Caryn neoa ate roms menter E o subfme subfme subfmeo subfmeo addze addze addzeo addzeo 2 38 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 2 21 Integer Arithmetic Instructions Continued ee Multiply Low mullw mullw mullwo mullwo Multiply High Word mulhw mulhw Multiply High Word Unsigned mulhwu mulhwu Divide Word divw divw divwo divwo Divide Word Unsigned divwu divwu divwuo divwuo Although there is no Subtract Immediate instruction its effect can be achieved by using an addi instruction with the immediate operand negated Simplified mnemonics are provided that include this negation The subf instructions subtract the second operand rA from the third operand rB Simplified mnemonics are provided in which the third operand is subtracted from the second operand See App
579. trx 19 BO Bl 00000 528 LK belrx 19 BO Bl 00000 16 LK Appendix A PowerPC Instruction Set Listings A 25 Table A 23 Condition Register Logical Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 crand crandc creqv crnand crnor cror 449 crorc 19 croD crbA crbB 417 crxor 19 croD crbB 193 mcrf 0000000000 Table A 24 System Linkage Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 rfi 12 19 00000 50 0 rfid 13 19 00000 00000 00000 18 0 sc 17 00000 00000 000000000000000 1 0 Notes Supervisor level instruction 2 Optional 64 bit bridge instruction 3 64 bit instruction Table A 25 Trap Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 td tdi tw 31 TO twi 03 TO Note 1 64 bit instruction A 26 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table A 26 Processor Control Instructions Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 mcrxr 512 0 mfcr 19 0 mfmsr 83 0 mfspr 2 339 0 mftb 371 0 mtcrf 144 0 mtmsr 13 146 0 mtmsrd 14 178 0 mtspr 467 0 Notes f Supervisor level instruction 2 Supervisor and user level instruction S O
580. ual Table i Acronyms and Abbreviated Terms Continued we ee About This Book xxxiii Table i Acronyms and Abbreviated Terms Continued NON IN e fw een pee CN Cn e ETC EN eem XXXIV IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table i Acronyms and Abbreviated Terms Continued Cee ee About This Book XXXV Table i Acronyms and Abbreviated Terms Continued em em O S XATC Extended address transfer code Register used for indicating conditions such as carries and overflows for integer operations Terminology Conventions Table 11 describes terminology conventions used in this manual and the equivalent terminology used in the PowerPC architecture specification Table ii Terminology Conventions XXXVI IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table iii describes instruction field notation used in this manual Table iii Instruction Field Conventions The Architecture Specification Equivalent to BA BB BT crbA crbB crbD respectively BF BFA crfD crfS respectively rA rB rD rS respectively About This Book xxxvii xxxviii IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 1 PowerPC 740 PowerPC 750 Overview This chapter provides an overview of the PowerPC 750 microprocessor features including a block diagram showing the major functional components It provides information about how the 750 i
581. uct could create a situation where personal injury or death may occur Should customer purchase or use the products for any such unintended or unauthorized application customer shall indemnify and hold IBM and its respective officers employees subsidiaries affiliates and distributors harmless against all claims costs damages and expenses and reasonable attorney fees arising out of directly or indirectly any claim of personal injury or death associated with such unintended or unauthorized use even if such claim alleges that IBM was negligent regarding the design or manufacture of the part IBM and IBM logo are registered trademarks and IBM Microelectronics is a trademark of International Business Machines Corp The PowerPC name PowerPC logotype PowerPC 740 and PowerPC 750 are trademarks of International Business Machines Corp International Business Machines Corp is an Equal Opportunity Affirmative Action Employer International Business Machines Corporation IBM Microelectronics Division 1580 Route 52 Bldg 504 Hopewell Junction NY 12533 6531 WWW Addresses http www chips ibm com http www ibm com PowerPC 740 PowerPC 750 Overview Processor Programming Model L1 Instruction and Data Cache Operation Exceptions Memory Management Instruction Timing Signal Descriptions Bus Interface Operation L2 Cache Interface Operation Power and Thermal Management Performance Monitor PowerPC Instruction Set Listings Inst
582. uction cache invalidate capability as described in Section 3 4 1 4 Instruction Cache Flash Invalidation Chapter 3 Instruction and Data Cache Operation 3 5 3 3 1 Memory Cache Access Attributes WIMG Bits Some memory characteristics can be set on either a block or page basis by using the WIMG bits in the BAT registers or page table entry PTE respectively The WIMG attributes control the following functionality e Write through W bit e Caching inhibited I bit e Memory coherency M bit e Guarded memory G bit These bits allow both uniprocessor and multiprocessor system designs to exploit numerous system level performance optimizations The WIMG attributes are programmed by the operating system for each page and block The W and I attributes control how the processor performing an access uses its own cache The M attribute ensures that coherency is maintained for all copies of the addressed memory location The G attribute prevents out of order loading and prefetching from the addressed memory location The WIMG attributes occupy four bits in the BAT registers for block address translation and in the PTEs for page address translation The WIMG bits are programmed as follows e The operating system uses the mtspr instruction to program the WIMG bits in the BAT registers for block address translation The IBAT register pairs do not have a G bit and all accesses that use the IBAT register pairs are considered not guarded
583. uctions eii cvs caapausedacny ica innata deg 6 31 Condition Register Logical Instructions ooooncccnnncccnoncccnnncnononcnononcnononcccnnncnonnnos 6 32 A O ica i eh ate haces 6 33 Floating Point Instructions ett eeben ege Eeer eege e 6 34 Load and Store Instructions isis re 6 36 Transfer Type Encodings for PowerPC 750 Bus Master ooococccoccconccnnoncnoncnonnnnnos 7 9 PowerPC 750 Snoop Hit Response siccivesccdiveriesassoedacss decades deedansasseaeesansdenvadec 7 10 Data KEE 7 11 D ta B s Lane Assignments EE 7 17 EI IR EE E EE 7 18 IEFE Interface Pin Descriptions it 7 28 Transfer Size Signal Encodings lt sss cccscicesszncciiavaveesasedyactaannscteasesvedeansconaenntenedeses 8 16 meet Ordering aa 8 17 Burst Orden e392 Bit B S diia 8 17 Aligned Data 1 Tan E 8 18 Misaligned Data Transfers Four Byte Examples oooconnnccccnoncccnonccononcnonancnnnn 8 19 Aligned Data Transfers 32 Bit Bus Mode 8 20 Misaligned 32 Bit Data Bus Transfer Four Byte Examples ooooonnncccnnnnccccn 8 21 L2 Cache Control Register miis ta ing 9 5 PowerPC 750 Microprocessor Programmable Power Modes A 10 2 THRM1 and THRM2 Bit Field Senge 10 7 THRM3 Eeer heen Noa tan shee dass 10 7 Valid THRM1 and THRM2 Bit Settings 0 eee eeseeceeeeeceeeeecseeeeenteeeenaes 10 9 ICTC Bit Field Settings iii edd Edel 10 11 Performance Monitor ee ees 11 3 AHC ee 11 4 INTACT BUC SE TEE eeh EE ee 11 6 EE SN EA 11 6 xxiii Paragraph Number Table 11 5 Table 11 6 Ta
584. ull power Requested logic by By instruction dispatch with DPM demand e Bus snooping Controlled by SW External asynchronous exceptions e Data cache as needed Decrementer interrupt e Decrementer timer Performance monitor interrupt Thermal management interrupt Hard or soft reset e Bus snooping Controlled by hardware External asynchronous exceptions enabled by deassertion and software Decrementer interrupt of QACK Hard or soft reset e Decrementer timer Sleep None Controlled by hardware External asynchronous exceptions and software Hard or soft reset Note Exceptions are referred to as interrupts in the architecture specification 10 2 1 Power Management Modes The following sections describe the characteristics of the 750 s power management modes the requirements for entering and exiting the various modes and the system capabilities provided by the 750 while the power management modes are active 10 2 1 1 Full Power Mode with DPM Disabled Full power mode with DPM disabled is selected when the DPM enable bit bit 11 in HIDO is cleared e Default state following power up and HRESET e All functional units are operating at full processor speed at all times 10 2 1 2 Full Power Mode with DPM Enabled Full power mode with DPM enabled HIDO DPM 1 provides on chip power management without affecting the functionality or performance of the 750 e Required functional units are operating at full processor speed e Funct
585. undary the 750 will generate an alignment exception 8 3 3 Address Transfer Termination The address tenure of a bus operation is terminated when completed with the assertion of AACK or retried with the assertion of ARTRY The 750 does not terminate the address transfer until the AACK address acknowledge input is asserted therefore the system can extend the address transfer phase by delaying the assertion of AACK to the 750 The assertion of AACK can be as early as the bus clock cycle following TS see Figure 8 8 which allows a minimum address tenure of two bus cycles As shown in Figure 8 8 these signals are asserted for one bus clock cycle three stated for half of the next bus clock cycle driven high till the following bus cycle and finally three stated Note that AACK must be asserted for only one bus clock cycle Chapter 8 Bus Interface Operation 8 21 The address transfer can be terminated with the requirement to retry if ARTRY is asserted anytime during the address tenure and through the cycle following AACK The assertion causes the entire transaction address and data tenure to be rerun As a snooping device the 750 asserts ARTRY for a snooped transaction that hits modified data in the data cache that must be written back to memory or if the snooped transaction could not be serviced As a bus master the 750 responds to an assertion of ARTRY by aborting the bus transaction and re requesting the bus Note that aft
586. undary between protection domains Protection domain A protection domain is a segment a virtual page a BAT area or a range of unmapped effective addresses It is defined only when the appropriate relocate bit in the MSR IR or DR is 1 O Quiesce To come to rest The processor is said to quiesce when an exception is taken or a sync instruction is executed The instruction stream is stopped at the decode stage and executing instructions are allowed to complete to create a controlled context for instructions that may be affected by out of order parallel execution See Context synchronization Glossary of Terms and Abbreviations Glossary 9 Glossary 10 Quiet NaN A type of NaN that can propagate through most arithmetic Operations without signaling exceptions A quiet NaN is used to represent the results of certain invalid operations such as invalid arithmetic operations on infinities or on NaNs when invalid See Signaling NaN rA The rA instruction field is used to specify a GPR to be used as a source or destination rB The rB instruction field is used to specify a GPR to be used as a source rD The rD instruction field is used to specify a GPR to be used as a destination rS The rS instruction field is used to specify a GPR to be used as a source Real address mode An MMU mode when no address translation is performed and the effective address specified is the same as the physical address The processor s MMU is operatin
587. unit and the system interface which accesses external memory The TLBs store page address translations for recent memory accesses For each access an effective address is presented for page and block translation simultaneously If a translation is found in both the TLB and the BAT array the block address translation in the BAT array 1s used Usually the translation is in a TLB and the physical address is readily available to the on chip cache When a page address translation is not in a TLB hardware searches for one in the page table following the model defined by the PowerPC architecture Instruction and data TLBs provide address translation in parallel with the on chip cache access incurring no additional time penalty in the event of a TLB hit The 750 s TLBs are 128 entry two way set associative caches that contain instruction and data address translations The 750 automatically generates a TLB search on a TLB miss 1 2 4 On Chip Instruction and Data Caches The 750 implements separate instruction and data caches Each cache is 32 Kbyte and eight way set associative As defined by the PowerPC architecture they are physically indexed Each cache block contains eight contiguous words from memory that are loaded 1 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual from an 8 word boundary that is bits EA 27 31 are zeros thus a cache block never crosses a page boundary An entire cache block can be updated by a four
588. ush data into the L2 cache When L2CR TS is set and the L1 data cache is enabled an instruction loop containing a dcbf instruction can be used to store any address or data pattern to the L2 cache Additionally 60x bus broadcasting is inhibited when a dcbz instruction is executed This allows the use of a dcbz instruction to clear an L1 cache block followed by a debf instruction to push the cache block into the L2 cache and invalidate the L1 cache block When the L2 cache is enabled cacheable single beat read operations are allowed to hit in the L2 cache and cacheable write operations are allowed to modify the contents of the L2 cache when a hit occurs Cacheable single beat read and writes occur when address translation is disabled invoking the use of the default WIMG bits 0b0011 or when address translation is enabled and accesses are marked as cacheable through the page table entries or the BATs and the L1 data cache is disabled or locked When the L2 cache has been initialized and the L1 cache has been disabled or locked load or store instructions then bypass the L1 cache and hit in the L2 cache directly When L2CR TS is set cacheable single beat writes are inhibited from accessing the 60x bus interface after an L2 cache miss During L2 cache testing the performance monitor can be used to count L2 cache hits and misses thereby providing a numerical signature for test routines and a way to verify proper L2 cache operation 9 1 5 2 L2
589. using HIDO bits as described in Table 4 10 Table 4 10 HIDO Machine Check Enable Bits C EMCP Enable MCP The primary purpose of this bit is to mask out further machine check exceptions caused by assertion of MCP similar to how MSR EE can mask external interrupts O Masks MCP Asserting MCP does not generate a machine check exception or a checkstop 1 Asserting MCP causes a checkstop if MSR ME 0 or a machine check exception if MSR ME 1 DBP Enable disable 60x bus address and data parity generation 0 If address or data parity is not used by the system and the respective parity checking is disabled HIDO EBA or HIDO EBD 0 input receivers for those signals are disabled do not require pull up resistors and therefore should be left unconnected If all parity generation is disabled all parity checking should also be disabled and parity signals need not be connected Parity generation is enabled Enable disable 60x bus address parity checking O Prevents address parity checking 1 Allows a address parity error to cause a checkstop if MSR ME 0 or a machine check exception if MSR ME 1 EBA and EBD allow the processor to operate with memory subsystems that do not generate parity Enable 60x bus data parity checking O Parity checking is disabled 1 Allows a data parity error to cause a checkstop if MSR ME 0 or a machine check exception if MSR ME 1 EBA and EBD allow the processor to operate with memory subsyst
590. vely long recovery time from ZZ negation that many SRAM vendors require may only allow use of this function for deep sleep operation L2 write through Setting L2WT selects write through mode rather than the default write back mode so all writes to the L2 cache also write through to the 60x bus For these writes the L2 cache entry is always marked as clean valid unmodified rather than dirty valid modified This bit must never be asserted after the L2 cache has been enabled as previously modified lines can get remarked as clean during normal operation L2 test support Setting L2TS causes cache block pushes from the L1 data cache that result from dcbf and dcbst instructions to be written only into the L2 cache and marked valid rather than being written only to the 60x bus and marked invalid in the L2 cache in case of hit This bit allows a dcbz dcbf instruction sequence to be used with the L1 cache enabled to easily initialize the L2 cache with any address and data information This bit also keeps dcbz instructions from being broadcast on the 60x and single beat cacheable store misses in the L2 from being written to the 60x bus requirements for which late write SRAMs usually differ from flow through or burst SRAMs 00 0 5nS 01 1 0nS 1x Reserved 14 15 L2 output hold These bits configure output hold time for address data and control signals driven by the 750 to the L2 data RAMs They should generally be set according to the SRAM s inpu
591. visor Level Cache Management Instruction ooooocnnoconocanonacnonnnannnonn nono ncnnos 2 66 Segment Register Manipulation Instructions ooonnnccnoncnnncnnncnnonncnonncnnnnonn nono ncnnos 2 67 Translation Lookaside Buffer Management Instructnon 2 67 EEN Ee 3 7 PLRU Bit Update Rules noni indicas 3 20 PERU Replacement Block Selec iii 2 pint geed iia 3 20 Bus Operations Caused by Cache Control Instructions WIM 001 3 24 Response to Snooped Bus Transactions 0 ceescecesnceeseececeeececeeeeeeseeeeesteeeeaees 3 27 Address Transfer Attribute Summary oocooccccnnccconncccnnnanononcnononanononanonnncnnnnnncnnnnnos 3 29 MEI State Transitions davis ida ias 3 31 PowerPC 750 Microprocessor Exception Classifications oooonocccnncnnocnnoonnnonccnnnos 4 2 Exceptionsand CAOS E 4 3 PowerPG 750 Exception Priorities eher ta 4 6 MSR EE 4 8 IEEE Floating Point Exception Mode Bits cee eee eeseeesceeeeeereeesseeeneenseeeeneees 4 10 MSR Setting Due to Exception EE 4 12 System Reset Exception Register Settings oooocnoncnoncnonnnoncncnnncnoncnnnnconanonnncnnos 4 13 HRESE EStenal States uni ta 4 15 Settings Caused by Hard eet nee 4 16 HIDO Machine Check Enable Bits oi tii di ii 4 17 Machine Check Exception Register Settmngs 4 18 Trace Exception SRR1 Settings 2202821 ih bi 4 22 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Paragraph Number Table 4 13 Table 4 14 Table 4 15 Table 4 16 Table 5 1 Table 5 2 Tab
592. w of instructions can stall when a longer latency instruction reaches the last position in the completion queue Subsequent instructions cannot be completed and retired until that longer latency instruction completes and retires Examples of this are shown in Section 6 3 2 2 Cache Hit and Section 6 3 2 3 Cache Miss The 750 can execute instructions out of order but in order completion by the completion unit ensures a precise exception mechanism Program related exceptions are signaled when 6 16 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual the instruction causing the exception reaches the last position in the completion queue Prior instructions are allowed to complete before the exception is taken 6 3 3 1 Rename Register Operation To avoid contention for a given register file location in the course of out of order execution the 750 provides rename registers for holding instruction results before the completion commits them to the architected register There are six GPR rename registers six FPR rename registers and one each for the CR LR and CTR When the dispatch unit dispatches an instruction to its execution unit it allocates a rename register or registers for the results of that instruction If an instruction is dispatched to a reservation station associated with an execution unit due to a data dependency the dispatcher also provides a tag to the execution unit identifying the rename register that
593. with index addressing mode Floating point loads and stores are not supported for direct store accesses The use of floating point loads and stores for direct store access results in an alignment exception There are two forms of the floating point load instruction single precision and double precision operand formats Because the FPRs support only the floating point double precision format single precision floating point load instructions convert single precision data to double precision format before loading an operand into an FPR Implementation Notes The 750 treats exceptions as follows e The FPU can be run in two different modes ignore exceptions mode MSR FEO MSR FE1 0 and precise mode any other settings for MSR FEO FE1 For the 750 ignore exceptions mode allows floating point instructions to complete earlier and thus may provide better performance than precise mode e The floating point load and store indexed instructions Ifsx lfsux lfdx Ifdux stfsx stfsux stfdx stfdux are invalid when the Rc bit is one In the 750 executing one of these invalid instruction forms causes CRO to be set to an undefined value The PowerPC architecture defines a load with update instruction with rA 0 as an invalid form Table 2 37 summarizes the floating point load instructions Table 2 37 Floating Point Load Instructions ee ie EIC EC EE Je Jas EEE Je Ja sa Fain Fon Sre Ue CN EEN EIC CN EEN EE CC EN EE CN Ia EE O Ja
594. wn in Figure 8 11 Figure 8 11 Normal Single Beat Write Termination Chapter 8 Bus Interface Operation 8 27 Normal termination of a burst transfer occurs when TA is asserted for four bus clock cycles as shown in Figure 8 12 The bus clock cycles in which TA is asserted need not be consecutive thus allowing pacing of the data transfer beats For read bursts to terminate successfully TEA and DRTRY must remain negated during the transfer For write bursts TEA must remain negated for a successful transfer DRTRY is ignored during data writes Figure 8 12 Normal Burst Transaction For read bursts DRTRY may be asserted one bus clock cycle after TA is asserted to signal that the data presented with TA is invalid and that the processor must wait for the negation of DRTRY before forwarding data to the processor see Figure 8 13 Thus a data beat can be terminated by a predicted branch with TA and then one bus clock cycle later confirmed with the negation of DRTRY The DRTRY signal is valid only for read transactions TA must be asserted on the bus clock cycle before the first bus clock cycle of the assertion of DRTRY otherwise the results are undefined The DRTRY signal extends data bus mastership such that other processors cannot use the data bus until DRTRY is negated Therefore in the example in Figure 8 13 DBB cannot be asserted until bus clock cycle 6 This is true for both read and write operations even though DRTRY does not
595. y access or an I O access 7 2 4 1 Transfer Type TT 0 4 The transfer type TT 0 4 signals consist of five input output signals on the 750 For a complete description of TT 0 4 signals and for transfer type encodings see Table 7 1 7 2 4 1 1 Transfer Type TT 0O 4 Output Following are the state meaning and timing comments for the TT 0 4 output signals on the 750 State Meaning Asserted Negated Indicates the type of transfer in progress Timing Comments Assertion Negation High Impedance The same as A 0 31 7 2 4 1 2 Transfer Type TT O0 4 Input Following are the state meaning and timing comments for the TT 0 4 input signals on the 750 State Meaning Asserted Negated Indicates the type of transfer in progress see Table 7 2 Timing Comments Assertion Negation The same as A 0 31 7 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 7 1 describes the transfer encodings for an 750 bus master Address only Address only sync Address only dcbz or debi Address only Single beat write nonGBL N A Single beat read nonGBL N A N A N A N A N A Z Sees gt PowerPC 750 EE 60x Bus Bus Master Source TTO TT1 TT2 TT3 TT4 Specification Transaction Transaction Command Address only EJ Clean block Address only Pe e e ame reso AAN Kill block Address only External control Single beat word write write 1 External control Single beat word read read lw
596. y paradoxes can be encountered within a single processor system e Load or store to a caching inhibited page WIMG x1 xx and a cache hit occurs The 750 ignores any hits to a cache block in a memory space marked caching inhibited WIMG x1xx The access is performed on the external bus as if there were no hit The data in the cache is not pushed and the cache block is not invalidated e Store to a page marked write through WIMG 1 xxx and a cache hit occurs to a modified cache block The 750 ignores the modified bit in the cache tag The cache block is updated during the write through operation but the block remains in the modified state M Note that when WIM bits are changed in the page tables or BAT registers it is critical that the cache contents reflect the new WIM bit settings For example if a block or page that had allowed caching becomes caching inhibited software should ensure that the appropriate cache blocks are flushed to memory and invalidated 3 3 4 Coherency Precautions in Multiprocessor Systems The 750 s three state coherency protocol permits no data sharing between the 750 and other caches All burst reads initiated by the 750 are performed as read with intent to modify Burst snoops are interpreted as read with intent to modify or read with no intent to cache This effectively places all caches in the system into a three state coherency scheme Four state caches may share data amongst themselves but not with the
597. y present the first two instructions of the new instruction stream in the next clock cycle giving enough time for the next pair of instructions to be fetched from the instruction cache with no idle cycles If instructions are not in the BTIC or the on chip instruction cache they are fetched from the L2 cache or from system memory The 750 s instruction cache throttling feature managed through the instruction cache throttling control ICTC register can lower the processor s overall junction temperature by slowing the instruction fetch rate See Chapter 10 Power and Thermal Management 6 8 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Branch instructions are identified by the fetcher and forwarded to the BPU directly bypassing the dispatch queue If the branch is unconditional or if the specified conditions are already known the branch can be resolved immediately That is the branch direction is known and instruction fetching can continue from the correct location Otherwise the branch direction must be predicted The 750 offers several resources to aid in quick resolution of branch instructions and for improving the accuracy of branch predictions These include the following e Branch target instruction cache The 64 entry four way associative branch target instruction cache BTIC holds branch target instructions so when a branch is encountered in a repeated loop usually the first two instructions in the ta
598. y resident in physical memory On PowerPC processors a page fault exception condition occurs when a matching valid page table entry PTE V 1 cannot be located Page table A table in memory is comprised of page table entries or PTEs It is further organized into eight PTEs per PTEG page table entry group The number of PTEGs in the page table depends on the size of the page table as specified in the SDR1 register Page table entry PTE Data structures containing information used to translate effective address to physical address on a 4 Kbyte page basis A PTE consists of 8 bytes of information in a 32 bit processor and 16 bytes of information in a 64 bit processor Physical memory The actual memory that can be accessed through the system s memory bus Pipelining A technique that breaks operations such as instruction processing or bus transactions into smaller distinct stages or tenures respectively so that a subsequent operation can begin before the previous one has completed Precise exceptions A category of exception for which the pipeline can be stopped so instructions that preceded the faulting instruction can complete and subsequent instructions can be flushed and redispatched after exception handling has completed See Imprecise exceptions Primary opcode The most significant 6 bits bits 0 5 of the instruction encoding that identifies the type of instruction See Secondary opcode Protection boundary A bo
599. y the GPRs and FPRs Results are written back at completion time Results in the write back buffer cannot be flushed If an exception occurs these buffers must write back before the exception is taken 6 2 Instruction Timing Overview The 750 design minimizes average instruction execution latency the number of clock cycles it takes to fetch decode dispatch and execute instructions and make the results available for a subsequent instruction Some instructions such as loads and stores access memory and require additional clock cycles between the execute phase and the write back phase These latencies vary depending on whether the access is to cacheable or noncacheable memory whether it hits in the L1 or L2 cache whether the cache access generates a write back to memory whether the access causes a snoop hit from another device that generates additional activity and other conditions that affect memory accesses The 750 implements many features to improve throughput such as pipelining superscalar instruction issue branch folding removal of fall through branches two level speculative branch handling and multiple execution units that operate independently and in parallel As an instruction passes from stage to stage in a pipelined system the following instruction can follow through the stages as the former instruction vacates them allowing several instructions to be processed simultaneously While it may take several cycles for an instru
600. ycles separated by a colon Table 6 8 Load and Store Instructions ES ES ES ES ES ES Execution 6 36 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 6 8 Load and Store Instructions Continued Completion execution S O mare A bar a mare E bi bai N bars mare On E Da 00 mare barg W N mare eae 597 533 wo ow Ol Chapter 6 Instruction Timing 6 37 Table 6 8 Load and Store Instructions Continued Notes 1 For cache ops the first number indicates the latency in finishing a single instruction the second indicates the throughput for back to back cache ops Throughput may be larger than the initial latency as more cycles may be needed to complete the instruction to the cache which stays busy keeping subsequent cache ops from executing 2 The throughput number of 6 cycles for dcbz assumes it is to nonglobal M 0 address space For global address space throughput is at least 11 cycles 3 Load store multiple string instruction cycles are represented as a fixed number of cycles plus a variable number of cycles where nis the number of words accessed by the instruction 6 38 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Chapter 7 Signal Descriptions This chapter describes the PowerPC 750 microprocessor s external signals It contains a concise description of individual signals showing behavior when the signal is asserted and negate
601. ype signals the 750 supports the transfer attribute signals TBST TSIZ 0 2 WT CI and GBL The TBST and TSIZ 0 2 signals indicate the data transfer size for the bus transaction The WT signal reflects the write through status the complement of the W bit for the transaction as determined by the MMU address translation during write operations WT is asserted for burst writes due to dcbf flush and dcbst clean instructions and for snoop pushes WT is negated for ecowx transactions Since the write through status is not meaningful for reads the 750 uses the WT signal during read transactions to indicate that the transaction is an instruction fetch WT negated or not an instruction fetch WT asserted The CI signal reflects the caching inhibited allowed status the complement of the I bit of the transaction as determined by the MMU address translation even if the L1 caches are disabled or locked CI is always asserted for eciwx ecowx bus transactions independent of the address translation The GBL signal reflects the memory coherency requirements the complement of the M bit of the transaction as determined by the MMU address translation Castout and snoop copy back operations TT 0 4 00110 are generally marked as nonglobal GBL negated and are not snooped except for reservation monitoring Other masters however may perform DMA write operations with this encoding but marked global GBL asserted and thus must be snoo
602. ys performed zero double word first but since burst reads are performed critical double word first a burst read transfer may not start with the first double word of the cache line and the cache line fill may wrap around the end of the cache line Table 8 2 describes the data bus burst ordering Table 8 2 Burst Ordering For Starting Address Data Transfer A 27 28 00 A 27 28 01 A 27 28 10 First data beat DW1 DW2 Note A 29 31 are always 0b000 for burst transfers by the 750 Table 8 3 describes the burst ordering when the 750 is configured with a 32 bit bus Table 8 3 Burst Ordering 32 Bit Bus For Starting Address Data Transfer A 27 28 00 A 27 28 01 A 27 28 10 First data beat DWOo U DW1 U Second data beat DWO L DW1 L Third data beat DW1 U DW2 U Fourth data beat DW1 L Dw2 L Fifth data beat Dw2 U DW3 U Sixth data beat Dw2 L DW3 L Seventh data beat DW3 U DWo U Eighth data beat DW3 L DWO L Dw2 L Notes A 29 31 are always 0b000 for burst transfers by the 750 U and L represent the upper and lower word of the double word respectively Chapter 8 Bus Interface Operation 8 17 8 3 2 4 Effect of Alignment in Data Transfers Table 8 4 lists the aligned transfers that can occur on the 750 bus These are transfers in which the data is aligned to an address that is an integral multiple of the size of the data For example Table 8 4 shows that 1 byte data is always aligned however f
603. ystem reset o ofo fee 2 EB oe 0 EEE he OO DOES A AAA O a pS ee OU EE EE EE EE E EE EE E E ENEE EE E EE EE E a EE E E E EE ER EE E EE E E EE E EE EE EE EE EE EE EE EE EE LEE E EE ET EE EEES EE E E ER E EE EE E E EE E E EE AAA ee eee ee EE 4 12 IBM PowerPC 740 PowerPC 750 RISC Microprocessor User s Manual Table 4 6 MSR Setting Due to Exception Continued Exception Type Note 1 O Bit is cleared ILEBit is copied from the MSR ILE Bit is not altered Reserved bits are read as if written as 0 The setting of the exception prefix bit IP determines how exceptions are vectored If the bit is cleared exceptions are vectored to the physical address 0x000n_nnnn where nnnnn is the vector offset if IP is set exceptions are vectored to physical address OxFFFn_nnnn Table 4 2 shows the exception vector offset of the first instruction of the exception handler routine for each exception type 4 5 1 System Reset Exception 0x00100 The 750 implements the system reset exception as defined in the PowerPC architecture OEA The system reset exception is a nonmaskable asynchronous exception signaled to the processor through the assertion of system defined signals In the 750 the exception is signaled by the assertion of either the soft reset SRESET or hard reset HRESET inputs described more fully in Chapter 7 The 750 implements HIDO NHR which helps software distinguish a hard reset from a soft reset

Contents

Contents

Download Pdf Manuals

Related Search

Related Contents