Home

e300 Power Architecture™ Core Family Reference Manual Supports

1. Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD D A d OPCD D A SIMM OPCD S A d OPCD S A UIMM OPCD cfD O L A SIMM OPCD cD O L A UIMM OPCD TO A SIMM Specific Instruction addi 14 D A SIMM addic 12 D A SIMM addic 13 D A SIMM addis 15 D A SIMM andi 28 S A UIMM andis 29 S A UIMM cmpi 11 cfiD JOIL A SIMM cmpli 10 CD O L A UIMM Ibz 34 D A d Ibzu 35 D A d lfd 50 D A d Ifdu 51 D A d Ifs 48 D A d lfsu 49 D A d Iha 42 D A d Ihau 43 D A d Ihz 40 D A d Ihzu 41 D A d Imw 46 D A d Iwz 32 D A d e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 25 Instruction Set Listings Table A 33 D Form continued Iwzu 33 D A d mulli 7 D A SIMM ori 24 S A UIMM oris 25 S A UIMM stb 38 S A d stbu 39 S A d stfd 54 S A d stfdu 55 S A d stfs 52 S A d stfsu 53 S A d sth 44 S A d sthu 45 S A d stmw 47 S A d stw 36 S A d stwu 37 S A d Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 subfic 08 D A SIMM tdi 02 TO A SIMM twi 03 TO A SIMM xori 26 S A UIMM xoris 27 S A UIMM 1 Load and store string or multiple instruction 2 64 bit instruction Table A 34 DS Form Name 0 5 6 7 8 9 10 11 1
2. Term Meaning CMP2 IABR2 compare type COP Common on chip processor CQ Completion queue CR Condition register CSRRO Critical interrupt save restore register 0 CSRR1 Critical interrupt save restore register 1 CTR Count register DABR Data address breakpoint register DABR2 Data address breakpoint register 2 DAR Data address register DBAT Data BAT DBCR Data address control register DCE Data cache enable DCFI Data cache flash invalidate DCMP Data TLB compare DEC Decrementer register DLOCK Data cache lock DMISS Data TLB miss address DMMU Data memory management unit DPM Dynamic power management enable DR Data address translation enable DSISR Register used for determining the source of a DSI interrupt DTLB Data translation lookaside buffer DWLCK Data cache way lock EA Effective address EAR External access register ECC Error checking and correction EE External interrupt enable FEO Floating point exception model 0 FE1 Floating point exception model 1 FIFO First in first out FP Floating point available FPR Floating point register e300 Power Architecture Core Family Reference Manual Rev 3 xxviii Freescale Semiconductor Table i Acronyms and Abbreviated Terms continued Term Meaning FPSCR Floating point status and control register FPU Floating point
3. Mnemonic Instruction entlzd Count Leading Zeros Double Word divd Divide Double Word divdu Divide Double Word Unsigned extsw Extend Sign Word fcfid Floating Convert From Integer Double Word fctid Floating Convert to Integer Double Word fctidz Floating Convert to Integer Double Word with Round toward Zero Id Load Double Word Idarx Load Double Word and Reserve Indexed Idu Load Double Word with Update Idux Load Double Word with Update Indexed Idx Load Double Word Indexed lwa Load Word Algebraic e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Instructions Not Implemented Table B 2 64 Bit Instructions Not Implemented by the e300 core continued Mnemonic Instruction lwaux Load Word Algebraic with Update Indexed lwax Load Word Algebraic Indexed mulld Multiply Low Double Word mulhd Multiply High Double Word mulhdu Multiply High Double Word Unsigned ridcl Rotate Left Double Word then Clear Left rider Rotate Left Double Word then Clear Right ridic Rotate Left Double Word Immediate then Clear ridicl Rotate Left Double Word Immediate then Clear Left rldicr Rotate Left Double Word Immediate then Clear Right rldimi Rotate Left Double Word Immediate then Mask Insert slbia SLB Invalidate All slbie SLB Invalidate Entry sld Shift Left Double Word srad Shift Right Algebraic Double
4. MSR Bit Interrupt Type POW TGPR ILE EE PR FP ME FEO SE BE FE1 CE IP IR DR RI LE System reset 0 0 0 0 Oo 0 0 0 0 1 0 04 0 JILE Machine check 0 0 0O0 0 0 0 0 0 0 0 0Oj 0 0 JILE DSI 0 0 0 0 0 0 0 0 o 0 07 0 ILE ISI 0 0 0 0 0 0 0 0 o 0 0 0 JILE External 0 0 0 0 0 0 0 0 o 0 0 0 JILE Alignment 0 0 0 0 0 0 0 0 o 0 0 0 ILE Program 0 0 0 0 0o 0 0 0 0 0 0 0 JILE Floating point 0 0 0 0 0o 0 0 0 0 0 0 0 JILE unavailable Decrementer 0 0 0 0 0 0 0 0 0 0O 0 0 JILE Critical Interrupt 0 0 0 0 DI 0 0 0 0 D 0 01 0 ILE System call 0 0 0 0 0 0 0 0 0 DD 0 JILE Trace 0 0 0o0 0 0 0 0 0 o 0 0 0 JILE ITLB miss 0 1 0 0 0 0 0 0 o 0 0 0 JILE DTLB miss on load 0 1 0O 0 0 0 0 0 o 0 0 0 JILE DTLB miss on store 0 1 0 o 0 0 0 0 0 ll D 0 0 JILE Instruction address 0 0 0 0 0 0 0 0 0 0O 0 0 JILE breakpoint e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 17 Interrupts and Exceptions Table 5 10 MSR Setting Due to Interrupt continued
5. e300 Power Architecture Core Family Reference Manual Rev 3 20 Freescale Semiconductor Interrupts and Exceptions mtmsr ei load MSR setting ILE and LE bits isync wait for all instructions to complete End Big Endian mode True Little Endian enabled modify the 8 Big Endian instructions into valid True Little Endian instructions True Little Endian Mode mtspr SRR1 rl load the Machine State with LE enabled xor r0 r0 xr0 initialize register oris r0 r0 0x0001 set Starting address at b 0001 0000 mtspr SRRO r0 load the next instruction address whatever instructions the supervisor OS wants rfi return from HRESET_ interrupt routine End HRESET_ handler in True Little Endian Mode See Section 3 1 2 Endian Modes and Byte Ordering for more information on the endian modes of the e300 core 5 5 2 Machine Check Interrupt 0x00200 The e300 core conditionally initiates a machine check interrupt after detecting the assertion of the tea or mcp signals on the coherent system bus CSB assuming the machine check is enabled with MSR ME 1 The assertion of one of these signals indicates that a bus error occurred and the system terminates the current transaction One clock cycle after the signal is asserted the data bus signals go to the high impedance state however data entering the GPR or the cache is not invalidated Note tha
6. MSR Bit Interrupt Type POW TGPR ILE EE PR FP ME FEO SE BE FE1 CE IP IR DR RI LE System management 0 0 0 0 DI 0 0 0 0 0 0 0 JILE interrupt Note 0 Bit is cleared 1 Bit is set ILE Bit is copied from the ILE bit in the MSR Bitis not altered Reserved bits are read as if written as 0 1 e300 core only 5 5 1 Reset Interrupts 0x00100 The system reset interrupt is a nonmaskable asynchronous interrupt signaled to the e300 core either through the assertion of the reset signals sreset or hreset The assertion of the soft reset signal sreset causes the system reset interrupt to be taken and the physical base address of the handler is determined by the MSR IP bit The assertion of the hard reset signal reset causes the system reset interrupt to be taken Note that there are some byte ordering precautions necessary when coming out of reset in big endian mode and switching to little endian mode The following sections describe the differences between a hard and soft reset and the byte ordering implications for reset interrupt handling 5 5 1 1 Hard Reset and Power On Reset As described in Section 5 1 2 Summary of Front End Interrupt Handling the hard reset interrupt is a nonrecoverable nonmaskable asynchronous interrupt When hreset is asserted or at power on reset POR the e300 core immediately branches to the address determined by the state of t
7. Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 andx 31 S A B 28 Rc andcx 31 S A B 60 Rc andi 28 S A UIMM andis 29 S A UIMM entizdx 1 31 S A 00000 58 Rc cntlzwx 31 S A 00000 26 Rc eqvx 31 S A B 284 Rc extsbx 31 S A 00000 954 Rc extshx 31 S A 00000 922 Rc extswx 31 S A 00000 986 Rc nandx 31 S A 476 Rc norx 31 S A B 124 Rc orx 31 S A B 444 Rc orcx 31 S A B 412 Rc ori 24 S A UIMM oris 25 S A UIMM xorx 31 S A B 316 Rc xori 26 S A UIMM xoris 27 S A UIMM 1 64 bit instruction Table A 6 Integer Rotate Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ridelx 30 S A B mb 8 Rc rldcrx 30 A B me 9 Rc rldicx 30 S A sh mb 2 sh Re ridiclx 30 S A sh mb o sh Rc rldicrx 30 S A sh me 1 sh Re e300 Power Architecture Core Family Reference Manual Rev 3 16 Freescale Semiconductor ridimix rlwimix rlwinmx rlwnmx Table A 6 Integer Rotate Instructions continued Instruction Set Listings 1 64 bit instruction Name sldx slwx sradx sradix srawx srawix srdx srwx 30 S A sh mb 3 sh Rc 22 S A SH MB ME Rc 20 S A SH MB ME Rc 21 S A SH MB ME Rc Table A 7 Integer Shift Instructions 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 31 S A B 27 Rc 31
8. RE DBAT3U DBAT3L Select DBAT4U DBAT4L ee a DBAT7U PA0 PA19 SPR976 SPR977 PA0 PA19 D Cache SPRS Hit Miss SPR978 SPR979 SPR982 PA0 PA31 Figure 6 3 e300 Core DMMU Block Diagram e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Memory Management 6 1 3 Address Translation Mechanisms Processors that implement the PowerPC architecture support the following four types of address translation e Page address translation translates the page frame address for a 4 Kbyte page size e Block address translation translates the block number for blocks that range in size from 128 Kbytes to 256 Mbytes e Direct store interface address translation used to generate direct store interface accesses on the external bus not implemented in the e300 core e Real addressing mode translation when address translation is disabled the physical address is identical to the effective address Figure 6 4 shows the three implemented address translation mechanisms provided by the MMUs The segment descriptors shown in the figure control the page address translation mechanism When an access uses page address translation the appropriate segment descriptor is required In 32 bit implementations one of the 16 on chip segment registers which contain segment descriptors is selected by the 4 highest order effective address bits A control bit in the corre
9. e300 Power Architecture Core Family Reference Manual Rev 3 12 Freescale Semiconductor Register Model Table 2 5 e300 HIDO Field Descriptions continued Bits Name Function DOZE Doze mode enable Operates in conjunction with MSR POW 0 Doze mode disabled 1 Doze mode enabled Doze mode is invoked by setting MSR POW while this bit is set In doze mode the PLL time base and snooping remain active NAP Nap mode enable Operates in conjunction with MSR POW The greq signal is asserted to indicate that the processor is ready to enter nap mode If the system logic determines that the processor may enter nap mode the quiesce acknowledge signal gack is asserted to notify the processor 0 Nap mode disabled 1 Nap mode enabled Nap mode is invoked by setting MSR POW while this bit is set In nap mode the PLL and time base remain active 10 SLEEP Sleep mode enable Operates in conjunction with MSR POW O Sleep mode disabled 1 Sleep mode enabled Sleep mode is invoked by setting MSR POW while this bit is set greq is asserted to indicate that the processor is ready to enter sleep mode If the system logic determines that the processor may enter sleep mode the quiesce acknowledge signal gack is asserted back to the processor Once gack assertion is detected the processor enters sleep mode after several processor clocks At this point the system logic may turn off the PLL by first c
10. 3 2 4 3 10 Floating Point Store Instructions There are three basic forms of the store instruction single precision double precision and integer The integer form is supported by the optional stfiwx instruction Because the FPRs support only double precision format for floating point data the FPU converts double precision data to single precision format before storing the operands The conversion steps are described in Floating Point Store Instructions in Appendix D Floating Point Models in the Programming Environments Manual Implementation Note The PowerPC architecture defines store with update instructions with rA 0 as an invalid form however the core treats this case as valid Table 3 20 lists the floating point store instructions Table 3 20 Floating Point Store Instructions Name Mnemonic Operand Syntax Store Floating Point as Integer Word Indexed stfiwx frS rA rB Store Floating Point Double stfd frS d rA Store Floating Point Double Indexed stfdx frS rA rB Store Floating Point Double with Update stfdu frS d rA Store Floating Point Double with Update Indexed stfdux frS rA rB Store Floating Point Single stfs frS d rA Store Floating Point Single Indexed stfsx frS rA rB Store Floating Point Single with Update stfsu frS d rA Store Floating Point Single with Update Indexed stfsux frS rA rB 3 2 4 4 Branch and Flow Control Instructions Branch instructions are e
11. Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD D A B OE XO Rc OPCD D A B 0 XO Re OPCD D A 00000 OE XO Re Specific Instructions addx 31 D A B OE 266 Rc addcx 31 D A B OE 10 Rc addex 31 D A B OE 138 Rc addmex 31 D A 00000 OE 234 Rc addzex 31 D A 00000 OE 202 Rc divdx 31 D A B OE 489 Rc divdux 31 D A B OE 457 Rc divwx 31 D A B OE 491 Rc divwux 31 D A B OE 459 Rc mulhdx 31 D A B 0 73 Re mulhdux 31 D A B 0 9 Rc mulhwx 31 D A B 0 75 Re mulhwux 31 D A B 0 11 Re mulldx 31 D A B OE 233 Rc mullwx 31 D A B OE 235 Rc negx 31 D A 00000 OE 104 Rc subfx 31 D A B OE 40 Rc e300 Power Architecture Core Family Reference Manual Rev 3 32 Freescale Semiconductor Instruction Set Listings Table A 40 XO Form continued Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 subfcx 31 D A B OE 8 Rc subfex 31 D A B OE 136 Rc subfmex 31 D A 00000 OE 232 Rc subfzex 31 D A 00000 OE 200 Rc 1 64 bit instruction Table A 41 A Form Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD D A B 00000 XO Re OPCD D A B C XO Rc OPCD D A 00000 C XO Rc OPCD D 00000 B 00000 XO Re Specific Instructions faddx 63 D A B 00000 21 Re faddsx 59 D A B 00000 21 Re fdivx 63 D A B 00000 1
12. A page address translation access occurs when MSR DR is set and there is not a match in the BAT Note the following points e The following is true for all loads and stores except strings multiples note that these four cases do not cause an alignment interrupt in the e300c2 core Byte operands never cause an alignment interrupt Half word operands can cause an alignment interrupt if the EA ends in OxFFF Word operands can cause an alignment interrupt if the EA ends in 0xFFD FFF Double word operands cause an alignment interrupt if the EA ends in OxFF9 FFF e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 27 Interrupts and Exceptions e The debz instruction causes an alignment interrupt if the access is to a page or block with the W write through or I cache inhibit bit set in the TLB or BAT respectively A misaligned memory access that does not cause an alignment interrupt do not perform as well as an aligned access of the same type The resulting performance degradation due to misaligned accesses depends on how well each individual access behaves with respect to the memory hierarchy At a minimum additional cache access cycles are required that can delay other processor resources from using the cache More dramatically for an access to a noncacheable page each discrete access involves individual processor bus operations that reduce the effective bandwidth of that bus
13. 3 2 4 1 4 Name Mnemonic Operand Syntax AND and and rA rS rB AND Immediate andi rA rS UIMM AND Immediate Shifted andis rA rS UIMM AND with Complement andc andc rA rS rB Count Leading Zeros Word cntlzw entlzw rA rS Equivalent eqv eqv rA rS rB Extend Sign Byte extsb extsb rA rS Extend Sign Half Word extsh extsh rA rS NAND nand nand rA rS rB NOR nor nor rA rS rB OR or or rA rS rB OR Immediate ori rA rS UIMM OR Immediate Shifted oris rA rS UIMM OR with Complement orc orc rA rS rB XOR xor xor rA rS rB XOR Immediate xori rA rS UIMM XOR Immediate Shifted xoris rA rS UIMM Integer Rotate and Shift Instructions Rotation operations are performed on data from a GPR and the result or a portion of the result is returned to a GPR See Appendix F Simplified Mnemonics in the Programming Environments Manual for a complete list of simplified mnemonics that allows simpler coding of often used functions such as clearing the left most or right most bits of a register left justifying or right justifying an arbitrary field and simple rotates and shifts e300 Power Architecture Core Family Reference Manual Rev 3 12 Freescale Semiconductor Instruction Set Model Integer rotate instructions rotate the contents of a register The result of the rotation is either inserted into the target register under control of a mask if a mask bit is
14. Bits Name Function 29 30 Reserved should be cleared 31 NOOPTI No op the data cache touch instructions 0 The debt and debtst instructions are enabled 1 The debt and debtst instructions are no oped internal to the e300 core Table 2 6 shows how HIDO SBCLK HIDO ECLK and hreset are used to configure clk_out Table 2 6 HIDO SBCLK and HIDO ECLK elk out Configuration hreset HIDO ECLK HIDO SBCLK clk_out Asserted x x Core Negated 0 0 Core Negated 0 1 Core clock frequency 2 Negated 1 0 Core Negated 1 1 Bus HIDO can be accessed with mtspr and mfspr using SPR1008 2 2 2 Hardware Implementation Register 1 HID1 The e300 implementation of HID1 is shown in Figure 2 6 SPR 1009 Access Supervisor read write 0 1 2 3 4 5 6 7 31 R W PCO PC1 PC2 PC3 PC4 PC5 PC6 Reset pll_cfg 0 6 0x000_0000 Figure 2 6 HID1 Register Table 2 7 shows the bit definitions for HID1 Table 2 7 HID1 Bit Settings Bits Name Description 0 PCO PLL configuration bit 0 read only PC1 PLL configuration bit 1 read only PC3 PLL configuration bit 3 read only PC4 PLL configuration bit 4 read only oo AJ wo N PC2 PLL configuration bit 2 read only 3 a PC5 PLL configuration bit 5 read only e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Se
15. Latenc Latency Primary Extended y in Cycles in Mnemonic Unit in Cycles in Opcode Opcode e300c1 e300c2 e3000c3 twi 03 Km Integer 2 2 mulli 07 Integer 2 3 2 subfic 08 Integer 1 1 cmpli 10 Integer amp SRU 1 1 cmpi 11 Integer amp SRU 1 1 addic 12 Integer 1 1 addic 13 Integer 1 1 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 29 Instruction Timing Table 7 4 Integer Instructions continued r Latency 3 Latency 3 Mnemonic GE SE Unit in Cycles in in Se m pcode pcode e300c1 e300c2 e3000c3 addi 14 Integer amp SRU 1 1 addis 15 Integer amp SRU 1 1 rlwimi 20 Integer 1 1 rlwinm 21 Ke Integer 1 1 rlwnm 23 Integer 1 1 ori 24 Deg Integer 1 1 oris 25 Gees Integer 1 1 xori 26 Integer 1 1 xoris 27 Integer 1 1 andi 28 _ Integer 1 1 andis 29 Sa Integer 1 1 cmp 31 000 Integer amp SRU 14 14 tw 31 004 Integer 2 2 subfc o 31 008 Integer 1 1 addc o 31 010 Integer 1 1 mulhwu 31 011 Integer 2 3 4 5 6 2 slw 31 024 Integer 1 1 entizw 31 026 Integer 1 1 and 31 028 Integer 1 1 cmpl 31 032 Integer amp SRU 14 1 subf 31 040 Integer 1 1 andc 31 060 Integer 1 1 mulhw 31 075 Integer 2 3 4 5 2 neg o 31 104 Integer 1 1 nor 31 124 Integer 1 1 subfe o 31 136 Integer
16. 3 Maskable asynchronous interrupts for example external interrupt and decrementer interrupts are delayed until higher priority interrupts are taken System reset and machine check interrupts may occur at any time and are not delayed even if an interrupt is being handled As a result state information for the interrupted interrupt may be lost therefore these interrupts are typically nonrecoverable All other interrupts have lower priority than system reset and machine check interrupts and the interrupt may not be taken immediately when it is recognized e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Interrupts and Exceptions 5 1 1 Interrupt Priorities The interrupts are listed in Table 5 3 in order of highest to lowest priority Table 5 3 Interrupt Priorities Interrupt Ga rupt u Category Priority Interrup Cause Asynchronous 0 System reset hreset Machine check tea Mcp ape or dpe 2 System reset sreset 3 Critical interrupt cint See Section 5 2 1 2 CSRRO and CSRR1 Bit Settings for more information 4 System management smi interrupt External interrupt int Performance monitor pm_event_in interrupt 7 Decrementer interrupt Decrementer passed through 0x0000_0000 Instruction fetch 0 ITLB miss Instruction TLB miss 1 Instruction access Instruction access interrupt Instruction 0 IABR I
17. 3 2 4 3 1 3 17 4 1 1 4 1 4 2 4 3 4 3 4 4 4 5 2 8 4 20 4 6 7 4 23 In Figure 2 2 e300c1 Processor Version Register add PVR value of the e300c2 core in the PVR register diagram In Figure 2 3 Machine State Register change reset value from All zeros to 0000_0040 or 0000_0000 or 0001_0041 or 0001_0001 to reflect the values during different reset states In Table 2 3 MSR Bit Settings add statements that bits FP FEO and FE1 are read only in e300c2 In Table 2 4 e300 HIDO Field Descriptions add description to bit 25 to show the decrementer auto reload DECAREN bit found in e300c2 only In Table 2 4 e300 HIDO Field Descriptions add phrase in HIDO ICE and HIDO ILOCK descriptions to note that burst transactions can be generated even if the instruction cache is off or locked In Table 2 4 e300 HIDO Field Descriptions before the bit settings add the following The greq signal is asserted to indicate that the processor is ready to enter nap mode If the system logic determines that the processor may enter nap mode the quiesce acknowledge signal gack is asserted to notify the processor Add description to bit 11 in Table 2 7 e300 HID2 Field Descriptions to show the enable weighted LRU ELRW bit found in e300c2 only Add description to bit 12 in Table 2 7 e300 HID2 Field Descriptions to show the no kill for snoop NOKS bit found in e300c2 Ad
18. Register settings for this interrupt are described in Chapter 6 Interrupts in the Programming Environments Manual When an ISI interrupt is taken instruction execution for the handler begins at offset 0x00400 from the physical base address indicated by MSR IP 5 5 5 External Interrupt 0x00500 An external interrupt is signaled to the e300 core by the assertion of the int signal as described in Section 8 3 1 External Interrupts The interrupt may not be recognized if a higher priority interrupt occurs simultaneously or if the MSR EE bit is cleared when int is asserted After the int is recognized the e300 core generates a recoverable halt to instruction completion The e300 core allows the next instruction in program order to complete including handling any interrupts that instruction may generate However the e300 core blocks subsequent instructions from completing and allows any outstanding stores to occur to system memory If any other interrupts are encountered in this process they are taken first and the external interrupt is delayed until a recoverable halt is achieved At this time the e300 core saves the state information and takes the external interrupt as defined by the PowerPC architecture The register settings for the external interrupt are shown in Table 5 16 Table 5 16 External Interrupt Register Settings Register Setting SRRO Set to the effective address of the instruction that the processo
19. SRR1 0 9 Cleared 10 Instruction cache parity error caused interrupt 11 Data cache parity error caused interrupt 12 mcp Machine check signal caused interrupt 13 tea Transfer error acknowledge signal caused interrupt 14 dpe Data parity error condition and signal assertion caused interrupt 15 ape Address parity error condition and signal assertion caused interrupt 16 29Loaded from MSR 16 29 30 O for instruction cache parity error data cache parity error tea dpe ape loaded from MSR 30 for mcp H mcp and tea are asserted simultaneously then SRR1 30 is cleared and the interrupt is not recoverable 31 Loaded from MSR 31 MSR POW 0 FP 0 FE1 0 RI 0 TGPR 0 ME CE 0 LE Set to value of ILE ILE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 Note When a machine check interrupt is taken the interrupt should set MSR ME as soon as it is practical to handle another tea assertion Otherwise subsequent fea assertions cause the processor to automatically enter the checkstop state 5 5 2 2 Checkstop State MSR ME 0 When the e300 core enters the checkstop state it asserts the checkstop output signal ckstp_out The following events cause the e300 core to enter the checkstop state e Machine check interrupt occurs with MSR ME cleared e External checkstop input ckstp_in is asserted When a processor is in the checkstop state instruction processing is suspended and generally cannot
20. When the effective address for a data store or cache operation cannot be translated by the DTLB a data TLB miss on store interrupt is generated The data TLB miss on store interrupt is also taken when the changed bit C 0 for a DTLB entry needs to be updated for a store operation Register settings for the instruction and data TLB miss interrupts are described in Table 5 21 If a data TLB miss interrupt handler fails to find the desired PTE then a page fault must be synthesized The handler must restore the machine state and clear MSR TGPR before invoking the DSI interrupt 0x00300 Software table search operations are discussed in Chapter 6 Memory Management When a data TLB miss on store interrupt is taken instruction execution for the handler begins at offset 0x01200 from the physical base address indicated by MSR IP 5 5 17 Instruction Address Breakpoint Interrupt 0x01300 The instruction address breakpoint is controlled by the ABR special purpose register Bits 0 29 of IABR holds an effective address to which each instruction s address is compared The interrupt is enabled by setting bit 30 in the IABR The interrupt is taken when an instruction breakpoint address matches on the next instruction to complete The instruction tagged with the match is not completed before the instruction address breakpoint interrupt is taken The breakpoint action can be trapped to interrupt vector 0x01300 default e300 Power Archit
21. Decrement the counter and branch if CTR 0 In the above example both the divw and beqlr instructions are fetched at the same time this assumes a 64 bit data bus the preloading code does not work for a 32 bit data bus due to their placement on a double word boundary The divide instruction was chosen because it takes many cycles to execute During execution of the divide the processor starts fetching instructions speculatively at the target destination of the branch instruction The speculation occurs because the branch is statically predicted as taken This speculative fetching causes the cache block that is pointed to by the link register LR to be loaded into the cache Because the divw instruction always produces a non zero result the beqlr is not taken and execution of all speculatively fetched instructions is canceled However the instructions remain valid in the cache If the destination instruction stream contains an unconditional branch to another memory location it is possible to also prefetch the destination of the unconditional branch instruction This does not cause a problem if the destination of the unconditional branch is also inside the area of memory that needs to be preloaded But if the destination of the unconditional branch is not in the area of memory to be loaded then care must be taken to ensure that the branch destination is to an area of memory that is caching inhibited Otherwise unintentional instructions ma
22. For arithmetic instructions conversions from double to single precision must be done explicitly by software while conversions from single to double precision are done implicitly All PowerPC implementations provide the equivalent of the following execution models to ensure that identical results are obtained The definition of the arithmetic instructions for infinities denormalized numbers and NaNs follow conventions described in the following sections Although the double precision format specifies an 11 bit exponent exponent arithmetic uses two additional bit positions to avoid potential transient overflow conditions An extra bit is required when denormalized double precision numbers are prenormalized A second bit is required to permit e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Instruction Set Model computation of the adjusted exponent value in the following examples when the corresponding exception enable bit is one e Underflow during multiplication using a denormalized factor e Overflow during division using a denormalized divisor 3 1 5 Effect of Operand Placement on Performance The VEA states that the placement location and alignment of operands in memory affect the relative performance of memory accesses The best performance is guaranteed if memory operands are aligned on natural boundaries To obtain the best performance from the core the programmer should assume the pe
23. JTAG and TAP trst JTAG test reset This input causes asynchronous initialization of the internal JTAG test access port controller tck JTAG test clock Driven by a free running clock signal Input signals to the test access port are sampled on the rising edge of tck TAP output signal changes occur on the falling edge of tck The test logic allows TCK to be stopped asynchronously with respect to all other core complex clocks e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Core Interface Operation Table 8 1 Summary of Selected Internal Signals continued Signal HO Comments or Meaning when Asserted ims JTAG test mode select Decoded by the internal JTAG TAP controller to determine the primary operation of the test support circuitry tdi JTAG test data input The value present on the rising edge of tck is loaded into the selected JTAG test instruction or data register tdo O JTAG test data output The contents of the selected internal instruction or data register are shifted out onto this signal on the falling edge of tck Test Interface tImsel O TLM selected tImsel provides feedback to the external TAP linking module logic tap_en TAP enable tap_en is used by the TAP linking module TLM logic external to the core complex core_disable On assertion core output signals are negated or forced to a high impedance state The core
24. e The counter s overflow condition is enabled PMLCan CE is set e The counter indicates an overflow PMCn OV is set If PMGCO PMIE is set an enabled condition or event triggers the signaling of a performance monitor exception If PMGCO FCECE is set an enabled condition or event also triggers all performance monitor counters to freeze 5 5 14 Instruction TLB Miss Interrupt 0x01000 When the effective address for an instruction load store or cache operation cannot be translated by the ITLB an instruction TLB miss interrupt is generated Register settings for the instruction and data TLB miss interrupts are described in Table 5 21 If the instruction TLB miss interrupt fails to find the desired PTE then a page fault must be synthesized The handler must restore the machine state and clear MSR TGPR before invoking the ISI interrupt 0x00400 Software table search operations are discussed in Chapter 6 Memory Management When an instruction TLB miss interrupt is taken instruction execution for the handler begins at offset 0x01000 from the physical base address indicated by MSR IP 5 5 15 Data TLB Miss on Load Interrupt 0x01100 When the effective address for a data load or cache operation cannot be translated by the DTLB a data TLB miss on load interrupt is generated Register settings for the instruction and data TLB miss interrupts are described in Table 5 21 If a data TLB miss interrupt fails to find the desi
25. eieio eqvx extsbx extshx extswx fabsx faddx faddsx fefidx fempo fempu fetidx fctidzx fctiwx fctiwzx fdivx fdivsx fmaddx fmaddsx fmrx fmsubx fmsubsx fmulx Lj e ej ej ej e lt lt Hp Say lt Se lt lt lt OS e OS e gt Pl D X Ei Ei Pl D X X X X X X X S P X X X X X X X X e300 Power Architecture Core Family Reference Manual Rev 3 36 Freescale Semiconductor fmulsx fnabsx fnegx fnmaddx fnmaddsx fnmsubx fnmsubsx fresx frspx frsqrtex 2 fselx fsqrtx gt fsqrtsx gt fsubx fsubsx icbi icbt isync Ibz Ibzu Ibzux Ibzx Id Idarx Idu Idux Idx 1 Ifd Ifdu Ifdux lfdx Ifs Heu Ifsux lfsx Table A 45 PowerPC Instruction Set Legend continued Instruction Set Listings UISA VEA OEA Supervisor Level Optional 64 Bit Form Ni V X V X V A V A V A V A V V A V X V V A V V A V V A y V A V A V A V X V X V XL Ni V D 4 4 V V DS V V X V V DS V V X V y X V D V D V X V X V D V D V X V X e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 37 Instruction Set Listings Table A 45 PowerPC Instruction Set Lege
26. mfspr Ee SEEL get srrl andi 273 Eine OXLLEL clean srrl addis r2 r2 0x4000 or in srrl lt xl gt 1 to flag pte not found ISEL mtctr F restore counter mtspr SrL T2 set srrl mfmsr r0 get msr xoris CD 0 0x8000 flip the msr lt tgpr gt bit mtcrf 0x80 r3 restore CRO mtmsr ro flip back to the native gprs b vec400 go to instr access interrupt Data TLB miss flow Entry Vec 1100 rr gt address of instruction that caused data tlb miss srrl gt 0 3 cr0O 4 lru way bit 5 1 if store 16 31 saved MSR dMiss dcmp hashl hash2 ro ral r2 3 is is is is msr lt tgpr gt gt 1 ea that missed the compare value for the va that missed pointer to first hash pteg pointer to second hash pteg gt Register usage saved counter junk pointer to pteg current compare value csect tlbmiss PR Org vec0 0x1100 tlbDataMiss mfspr r2 hashl get first pointer addi SEL 0 8 load 8 for counter mfctr ro save counter mfspr r3 dCmp get first compare value addi e EAr mg pre dec the pointer dm0 mtctr l load counter dm1 lwzu rip Se get next pte cmp C04 Ee E see if found pte bdnzft 0 dml dec count br if cmp ne and if count not zero bne dataSecHash if not found set up second hash or exit 1 rl 4 r2 load tlb entry lower word mtctr ro restore counter mfspr r0 dMiss get the miss address for the tlbld mfspr Esp Srel get the saved cr0 bits mtcrf 0x80 r3 restore CRO mtspr rpa rl set the pte o
27. 0 Instruction s EA matches IABR CEA OR instruction s EA matches IABR2 CEA 1 Instruction s EA matches I ABR CEA AND instruction s EA matches IABR2 CEA 15 DNS Do not signal Disable jabr and iabr2 output signals 0 Allow signal to toggle on a match 1 Do not toggle signal on match 2 2 16 Data Address Breakpoint Register DABR and DABR2 The optional data address breakpoint facility on the e300 core is controlled by optional SPRs DABR and DABR2 The data address breakpoint facility provides a means to detect data accesses to a designated double word address The breakpoint address is compared to the effective address of all data accesses it does not apply to instruction fetches DABR and DABR2 the two data address breakpoint registers shown in Figure 2 21 can both cause the data address breakpoint interrupt SPR 1013 DABR Access Supervisor read write 317 DABR2 o 28 29 30 31 R CEA BT WBE RBE W Reset All zeros Figure 2 21 DABR and DABR2 Registers When an enabled data breakpoint condition matches with the address of a data access a DSI interrupt occurs When a DSI interrupt is taken to indicate a data breakpoint condition DAR is set to the data address that causes the breakpoint and DSISR 9 is set The address of the instruction associated with the breakpoint condition is stored in SRRO e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semic
28. 5 31 10 3 10 4 tracing facilities 10 1 Translation lookaside buffers TLBs see Memory management unit MMU Trap instructions 3 24 program interrupt 5 4 5 28 U UISA instructions see Instructions user instruction set architecture UISA UPMC0 3 user performance monitor counter registers 11 7 UPMGCO user global control register 0 11 4 e300 Power Architecture Core Family Reference Manual Rev 3 VEA instructions see Instructions virtual environment Freescale Semiconductor Index 9 e300 Power Architecture Core Family Reference Manual Rev 3 Index 10 Freescale Semiconductor
29. 8 Words Block e Figure 4 4 e300c2 and e300c3 Instruction Cache Organization Each block consists of 32 bytes of data 32 parity bits an address tag and a valid bit Each cache block contains eight contiguous words from memory that are loaded from an eight word boundary that is bits A27 A31 are not part of the cache block address thus a cache block never crosses a page boundary Misaligned accesses across a page boundary can incur a performance penalty Address bits A20 A26 provide an index to select a set Bits A27 A31 select a byte within a block The tags consists of bits PAO PA19 Address translation occurs in parallel such that higher order bits the tag bits in the cache are physical The replacement algorithm is a PLRU algorithm that is the pseudo least recently used block is filled with new instructions on a cache miss The instruction cache is only written as a result of a block fill operation on a cache miss A hardware invalidation capability is provided to support cache maintenance The instruction fetcher accesses the instruction cache frequently in order to sustain the high throughput provided by the six entry instruction queue 4 4 Memory and Cache Coherency The primary objective of a coherent memory system is to provide the same image of memory to all devices using the system Coherency allows synchronization and cooperative use of shared resources Otherwise multiple copies of a memory location some contai
30. A31 are untranslated and therefore identical for both effective and physical addresses After translating the address the MMUs pass the resulting 32 bit physical address to the memory subsystem e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Memory Management In addition to the higher order address bits the MMUs automatically keep an indicator of whether each access was generated as an instruction or data access and a supervisor user indicator that reflects the state of the PR bit of the MSR when the effective address was generated In addition for data accesses there is an indicator of whether the access is for a load or a store operation This information is then used by the MMwUs to appropriately direct the address translation and to enforce the protection hierarchy programmed by the operating system Section 5 2 Interrupt Processing describes the MSR which controls some of the critical functionality of the MMUs The figures show how the A20 A26 address bits index into the on chip instruction and data caches to select a cache set The remaining physical address bits are then compared with the tag fields comprised of bits PAO PA19 of the four selected cache blocks to determine if a cache hit has occurred In the case of a cache miss the instruction or data access is then forwarded to the bus interface unit which then initiates a Coherent System Bus CSB access to the memory subsy
31. IBCR SPR 309 DBCR SPR 310 Performance Monitor PMR 400 PMR 16 19 PMR 144 147 1 These registers are e300 core implementation specific not defined by the PowerPC architecture e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Register Model Figure 2 1 e300 Programming Model Registers The e300 core user level registers are described as follows User level registers UISA The user level registers can be accessed by all software with either user or supervisor privileges The user level register set includes the following General purpose registers GPRs The GPR file consists of thirty two 32 bit GPRs designated as GPRO GPR31 This register file serves as the data source or destination for all integer instructions and provides data for generating addresses Floating point registers FPRs The FPR file consists of thirty two 64 bit FPRs designated as FPRO FPR31 which serves as the data source or destination for all floating point instructions These registers can contain data objects of either single or double precision floating point format Before the stfd instruction is used to store the contents of an FPR to memory the FPR must have been initialized after reset explicitly loaded with any value by using a floating point load instruction Implementation Note The e300c2 core does not support any floating point registers or instructions Condition register CR The CR co
32. PMLCa3 e The performance monitor interrupt is assigned to interrupt vector OxOFOO Software communication with the performance monitor is achieved through PMRs rather than SPRs The DMR are used for enabling conditions that can trigger the performance monitor interrupt 1 2 PowerPC Architecture Implementation The PowerPC architecture consists of the following layers and adherence to the PowerPC architecture can be measured in terms of which of the following levels of the architecture is implemented e User instruction set architecture UISA Defines the base user level instruction set user level registers data types floating point interrupt model memory models for a uniprocessor environment and programming model for a uniprocessor environment e Virtual environment architecture VEA Describes the memory model for a multiprocessor environment defines cache control instructions and describes other aspects of virtual environments Implementations that conform to the VEA also adhere to the UISA but may not necessarily adhere to the OEA e Operating environment architecture OEA Defines the memory management model supervisor level registers synchronization requirements and interrupt model Implementations that conform to the OEA also adhere to the UISA and VEA The PowerPC architecture allows a wide range of designs for such features as cache and core interface implementations 1 3 Implementation Specific Information Th
33. Partial function e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Register Model Table 2 4 MSR Bit Settings continued Bits Name Description 13 POW Power management enable implementation specific 0 Disables programmable power modes normal operation mode 1 Enables programmable power modes nap doze or sleep mode This bit controls the programmable power modes only it has no effect on dynamic power management DPM MSR POW may be altered with an mtmsr instruction only Also when altering the POW bit software may alter only this bit in the MSR and no others The mtmsr instruction must be followed by a context synchronizing instruction See Chapter 9 Power Management for more information 14 TGPR Temporary GPR remapping implementation specific 0 Normal operation 1 TGPR mode GPRO GPR3 are remapped to TGPRO TGPR3 for use by TLB miss routines The contents of GPRO GPR3 remain unchanged while MSR TGPR 1 Attempts to use GPR4 GPR311 with MSR TGPR 1 yield undefined results Temporarily replaces T PRO TGPR3 with GPRO GPR3 for use by TLB miss routines The TGPR bit is set when either an instruction TLB miss data read miss or data write miss interrupt is taken The TGPR bit is cleared by an rfi instruction 15 ILE Interrupt little endian mode When an interrupt occurs this bit is copied into MSR LE to select the endian mode for the co
34. SRR1 0 15 Cleared 16 31 Loaded from MSR 16 31 MSR POW 0 FP 0 FE1 0 RI 0 TGPR 0 ME CE LE Set to value of ILE ILE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 If trace and breakpoint conditions occur simultaneously the breakpoint conditions receive higher priority When a trace interrupt is taken instruction execution for the handler begins at offset 0OxOODO0 from the base address indicated by MSR IP 5 5 12 1 Single Step Instruction Trace Mode The single step instruction trace mode is enabled by setting MSR SE Encountering the single step breakpoint causes the following action trap to address vector 0x00D00 The single step trace action traps after an instruction execution and completion 5 5 12 2 Branch Trace Mode The branch trace mode is enabled by setting MSR BE Encountering the branch trace breakpoint causes the following action trap to interrupt vector OxOODO0 The branch trace action is to trap after the completion of any branch instruction whenever MSR BE is set e300 Power Architecture Core Family Reference Manual Rev 3 32 Freescale Semiconductor Interrupts and Exceptions 5 5 13 Performance Monitor Interrupt 0x00F00 The performance monitor interrupt is triggered by an enabled condition or event The only performance monitor enabled condition or event defined for the e300c3 is the following A PMCn overflow condition occurs when both of the following are true
35. See Chapter 4 Instruction and Data Cache Operation for additional information about these instructions and about related aspects of memory synchronization The sync instruction delays execution of subsequent instructions until previous instructions have completed to the point that they can no longer cause an interrupt and until all previous memory accesses are performed globally the sync operation is not broadcast onto the e300 core Coherent System Bus CSB interface Additionally all load and store cache bus activities initiated by prior instructions are completed Touch load operations icbt deht and debtst are required to complete at least through address translation but are not required to complete on the bus The functions performed by the sync instruction normally take a significant amount of time to complete as a result frequent use of this instruction may adversely affect performance In addition the number of cycles required to complete a sync instruction depends on system parameters and on the processor s state when the instruction is issued The proper paired use of the lwarx and stwex instructions allows programmers to emulate common semaphore operations such as test and set compare and swap exchange memory and fetch and add Examples of these operations can be found in Appendix E Synchronization Programming Examples in the Programming Environments Manual Typically the lwarx instruction should be paired with an
36. See Table 7 6 for load and store instruction execution timing 7 4 5 System Register Unit Execution Timing Most SRU instructions access or modify nonrenamed registers or directly access renamed registers They generally execute in a serial manner Results from these instructions are not available or forwarded for use by subsequent instructions until the instruction completes and is retired The SRU can also execute the integer instructions addi addis add addo cmpi cmp cmpli and cmpl without serialization and in parallel with another integer instruction Refer to Section 7 3 3 2 Instruction Serialization for additional information on serializing instructions and Table 7 2 Table 7 3 and Table 7 4 for SRU instruction execution timing 7 5 Memory Performance Considerations Due to the e300 core instruction throughput of three instructions per clock cycle lack of data bandwidth can become a performance bottleneck For the core to approach its potential performance levels it must be able to read and write data quickly and efficiently If there are many processors in a system environment one processor may experience long memory latencies while another bus master for example a direct memory access controller is using the external bus To alleviate this possible contention the e300 core provides three memory update modes copy back write through and cache inhibit Each page of memory is specified to be in one of these modes If a
37. Supported One Not included Not included 0x8083 set associative instruction and data caches e300c2 16 Kbyte 4 way Not supported Two Included Not included 0x8084 set associative instruction and data caches e300c3 16 Kbyte 4 way Supported Two Included Included 0x8085 set associative instruction and data caches Impact Some cores The e300c2 Two integer The enhanced A performance The core version implement does not execution units multipliers are monitor provides number in the smaller L1 implement provide higher faster and the ability to processor instruction and hardware throughput of provide a count version register data caches support for integer maximum predefined PVR is listed in floating point operations two cycle events and this table and operations latency for processor the revision level multiply clocks for each core instructions associated with starts at 0x0010 particular and changes for operations such each revision of as cache the core misses mispredicted branches or the number of cycles an execution unit stalls e300 Power Architecture Core Family Reference Manual Rev 3 36 Freescale Semiconductor Chapter 2 Register Model This chapter describes the PowerPC register model and its specific implementation on the e300 core First it outlines the register organization as defined by the three levels of the PowerPC architecture user instruction set architecture UISA virtual environ
38. Table Search Generate PA Using Primary Hash Function PA lt Base PA of PTEG Fetch PTE from PTEG PA lt PA 8 Fetch Next PTE in PTEG Fetch PTE 64 Bits from PA Otherwise PTE VSID API H V Segment Descriptor VSID EA API 0 1 Otherwise Secondary Page PTE R 1 PTE R 0 Table Search Hit Last PTE in PTEG i from Figure 6 10 PTE R lt 1 R_Flag lt 1 PT q Write PTE into TLB EA AEE oN dcbz Instruction th Otherwise withWorl 1 TONNS Check Memory Protection R_Flag 1 Violation Conditions gt Byte Writeto Update PTE R in Access Permitted Memory ya Access Prohibited Bea ez Otherwise Store Operation with eee Perform Operation to Otherwise Otherwise Memory or Take ee E Alignment Interrupt J TLB PTE C lt 1 R_Flag 1 PTE R lt 1 PTE C lt 1 PTE R lt 1 Update PTE R Update PTE C Update PTE R in Memory in Memory in Memory L O Q Page Table Page Table Memory Protection Search Complete Search Complete Violation ee J Optional Figure 6 9 Primary Page Table Search Conceptual Flow e300 Power Architecture Core Family Reference Manual Rev 3 28 Freescale Semiconductor Memory Management Secondary Page Table Search Generate PA using Secondary Hash Function PA lt Base PA of PTEG Fetch PTE from PTEG PA lt PA 8 Fetch PTE 64 Bits Fetch Next PTE in PTEG from PA O
39. The icbt instruction allocates blocks into the instruction cache regardless of WIMG settings or whether the instruction cache is enabled so that the instruction cache can be easily locked and preloaded before turning on In addition in case of a cache hit the icbt instruction will re write to the same hitting line so that an existing locked and protected page in the instruction cache can be easily re loaded with a different program without going back to an initial start up state e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 45 Instruction and Data Cache Operation e300 Power Architecture Core Family Reference Manual Rev 3 46 Freescale Semiconductor Chapter 5 Interrupts and Exceptions The PowerPC interrupt mechanism allows the processor to change to supervisor state as a result of external signals errors or unusual conditions arising in the execution of instructions When interrupts occur information about the state of the processor is saved to certain registers and the processor begins execution at an address interrupt vector predetermined for each interrupt Processing of interrupts occurs in supervisor mode Although multiple exception conditions can map to a single interrupt vector a more specific condition may be determined by examining a register associated with the interrupt or exception that causes it for example the DSISR or FPSCR Additionally certain exception conditio
40. data TLB miss on store 5 5 5 34 decrementer interrupt 5 4 5 30 DSI data storage interrupt 5 3 5 23 10 3 10 4 external input interrupt int 5 4 5 12 5 25 floating point unavailable 5 4 5 29 instruction address breakpoint 5 5 5 34 10 1 10 3 10 4 instruction TLB miss 5 5 5 33 ISI instruction storage interrupt 5 4 5 24 machine check 5 3 5 13 5 21 5 22 performance monitor interrupt 1 14 5 33 11 1 11 9 program interrupt 5 4 5 28 system call 5 4 5 31 system management interrupt smi 5 5 5 12 5 36 system reset 5 3 5 18 trace interrupt 5 5 5 31 10 3 10 4 latencies hard reset and machine check 5 17 soft reset 5 17 5 18 5 19 5 20 prefix for interrupt vector offsets 5 13 5 15 priorities 5 5 5 7 process switching guidelines 5 16 processing of interrupts 5 8 5 16 front end actions and state 5 7 recoverable interrupt indication MSR RI 5 14 5 15 5 16 registers 5 8 5 14 critical save restore 0 1 CSRRO 1 5 8 5 10 5 15 5 16 5 17 data address register DAR 5 23 10 2 10 3 DSI status register DSISR 5 1 10 2 10 3 floating point status and control FPSCR 5 1 5 29 FPSCR 5 1 5 29 MSR 5 17 save restore 0 1 SRRO 1 5 8 5 9 5 10 5 14 5 15 5 16 5 17 returning from an interrupt handler 5 15 returning from critical interrupt 5 16 Interrupt little endian mode 5 12 IQ instruction queue 7 9 ISI instruction storage interrupt 1 28 5 4 5 24 J JTAG te
41. data for that address The data at this address in external memory is not valid Most significant bit msb The highest order bit in an address registers data element or instruction encoding Most significant byte MSB The highest order byte in an address registers data element or instruction encoding N NaN An abbreviation for not a number a symbolic entity encoded in floating point format There are two types of NaNs signaling NaNs and quiet NaNs No op No operation A single cycle operation that does not affect registers or generate bus activity Normalization A process by which a floating point value is manipulated such that it can be represented in the format for the appropriate precision single or double precision For a floating point value to be representable in the single or double precision format the leading implied bit must be a 1 O OEA operating environment architecture The level of the architecture that describes PowerPC memory management model supervisor level registers synchronization requirements and the interrupt model It also defines the time base feature from a supervisor level perspective Implementations that conform to the PowerPC OEA also conform to the PowerPC UISA and VEA Optional A feature such as an instruction a register or an interrupt that is defined by the PowerPC architecture but not required to be implemented Out of order An aspect of an operation that allows i
42. enters a sleep mode and instruction fetching and dispatching are disabled 8 1 2 1 PLL Configuration p _cfg 0 6 Input The PLL is configured by pll_cfg 0 6 For a given bus frequency the PLL configuration signals set the internal CPU frequency of operation Table 8 2 shows the PLL configuration options Table 8 2 Core PLL Configuration erie Pay PLL_CFG 6 eege VCO divider xXx 0000 D bypass off n a XX 1111 D off n a 00 0001 0 1x 2 01 0001 0 1x 4 10 0001 0 1x 8 11 0001 0 1x 8 00 0001 1 1 5x 2 01 0001 1 1 5x 4 10 0001 1 1 5x 8 11 0001 1 1 5x 8 00 0010 0 2x 2 01 0010 0 2x 4 10 0010 0 2x 8 11 0010 0 2x 8 00 0010 1 2 5X 2 01 0010 1 2 5X 4 e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Table 8 2 Core PLL Configuration continued Core Interface Operation Bere dE PLL_CFG 6 Beste VCO divider 10 0010 1 25x 8 11 0010 1 2 5x 8 00 0011 0 3x 2 o 0011 0 3x 4 10 0011 0 3x 8 11 0011 0 3x 8 00 0011 1 3 5x 2 01 0011 1 3 5x 4 10 0011 1 DS 8 11 0011 1 3 5x 8 00 0100 0 4x 2 01 0100 0 4x 4 10 0100 0 4x 8 11 0100 0 4x 8 00 0100 1 4 5x 2 01 0100 1 4 5x 4 10 0100 1 4 5x 8 11 0100 1 4 5x 8 00 0101 0 5x 2 o 0101 0 5x 4 10 0101 0 5x 8 11 0101 0 5x 8 00 0101 1 5 5x 2 o 0101 1 5 5x 4 10 010
43. or programming environments as follows e User instruction set architecture UISA The UISA defines the architecture level to which user level software should conform The UISA defines the base user level instruction set user level registers data types memory conventions and the memory and programming models seen by application programmers e Virtual environment architecture VEA The VEA which is the smallest component of the PowerPC architecture defines additional user level functionality that falls outside typical user level software requirements The VEA describes the memory model for an environment in which multiple processors or other devices can access external memory and defines aspects of the cache model and cache control instructions from a user level perspective The resources defined by the VEA are particularly useful for optimizing memory accesses and for managing resources in an environment in which other processors and devices can access external memory e Operating environment architecture OQEA The OEA defines supervisor level resources typically required by an operating system The OEA defines the memory management model supervisor level registers and interrupt model Implementations that conform to the OEA also conform to the UISA and VEA e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor xxiii Note that some resources are defined more generally at one level in the archi
44. physical memory On PowerPC processors a page fault interrupt condition occurs when a matching valid page table entry PTE V 1 cannot be located Page table A table in memory is comprised of page table entries or PTEs It is further organized into eight PTEs per PTEG page table entry group The number of PTEGs in the page table depends on the size of the page table as specified in the SDR1 register Page table entry PTE Data structures containing information used to translate effective address to physical address on a 4 Kbyte page basis A PTE consists of 8 bytes of information in a 32 bit processor and 16 bytes of information in a 64 bit processor Park The act of allowing a bus master to maintain bus mastership without having to arbitrate Physical memory The actual memory that can be accessed through the system s memory bus Pipelining A technique that breaks operations such as instruction processing or bus transactions into smaller distinct stages or tenures respectively so that a subsequent operation can begin before the previous one has completed Precise interrupts A category of interrupt for which the pipeline can be stopped so instructions that preceded the faulting instruction can complete and subsequent instructions can be flushed and redispatched after interrupt handling has completed See Imprecise interrupts Primary opcode The most significant 6 bits bits 0 5 of the instruction encoding that ident
45. the memory access is completed by referencing the location in main memory bypassing the on chip cache During the access the addressed location is not loaded into the cache nor is the location allocated in the cache It is considered a programming error if a copy of the target location of an access to caching inhibited memory is resident in the cache Software must ensure that the location has not been previously loaded into the cache or if it has that it has been flushed from the cache The PowerPC architecture permits data accesses from more than one instruction to be combined for caching inhibited operations except when the accesses are separated by a sync instruction or by an eieio instruction when the page or block is also designated as guarded This combined access capability is not implemented on the e300 core The eieio instruction is treated as a no op by the e300 core The caching inhibited I bit in the core also controls whether load and store operations are strongly or weakly ordered If an I O device requires load and store accesses to occur in program order then the I bit for the page must be set Refer to Section 4 4 3 2 Sequential Consistency of Memory Accesses for more information 4 4 1 3 Memory Coherency Attribute M When an access requires coherency the processor performing the access must inform the coherency mechanisms throughout the system that the access requires memory coherency The M attribute determines
46. the snoop is retried and must re arbitrate before the lookup is possible Occasionally cache snoops cannot be serviced and must be retried These retries occur if the cache is busy with a burst read or write when the snoop operation takes place It is possible for a snoop to hit a modified cache block that is already in the process of being written to the copy back buffer for replacement purposes If this happens the core retries the snoop and raises the priority of the cast out operation to allow it to occur on the CSB before the cache block fill 4 10 Applications Information Cache Locking This section describes the entire cache locking and cache way locking features of the e300 core 4 10 1 Cache Locking Terminology Cache locking refers to the ability to prevent some or all of a processor s instruction or data cache from being overwritten Cache locking can be set for either an entire cache or for individual ways within the cache as follows e Entire cache locking When an entire cache is locked data for read hits within the cache are supplied to the requesting unit in the same manner as hits from an unlocked cache Similarly writes e300 Power Architecture Core Family Reference Manual Rev 3 32 Freescale Semiconductor Instruction and Data Cache Operation that hit in the data cache are written to the cache in the same way as write hits to an unlocked cache However any access that misses in the cache is treated as a c
47. twi 03 TO A SIMM 1 64 bit instruction Table A 26 Processor Control Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 merxr 31 crfS 00 00000 00000 512 0 mfcr 31 D 00000 00000 19 0 mfmsr 1 31 D 00000 00000 83 0 mfspr 2 31 D spr 339 0 mftb 31 D tpr 371 0 mtcrf 31 S 0 CRM 0 144 0 mtmsr 1 31 S 00000 00000 146 0 mtspr 2 31 D spr 467 0 1 Supervisor level instruction 2 Supervisor and user level instruction Table A 27 Cache Management Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 dcbf 31 00000 A B 86 0 debi 31 00000 A B 470 0 dcbst 31 00000 A B 54 0 debt 31 00000 A B 278 0 debtst 31 00000 A B 246 0 dcbz 31 00000 A B 1014 0 icbi 31 00000 A B 982 0 icbt 31 00000 A B 22 0 i Supervisor level instruction 2 e300 core implementation specific instruction e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 23 Instruction Set Listings Name mfsr mfsrin mtsr mtsrin Table A 28 Segment Register Manipulation Instructions 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 31 D 0 SR 00000 595 0 31 D 00000 B 659 0 31 S 0 SR 00000 210 0 31 S 00000 B 242 0 i Supervisor level instruction Name slbia
48. vector for floating point assist interrupts Performance OOFOO Caused when pm_event_in is asserted monitor Instruction 01000 Caused when the effective address for an instruction fetch cannot be translated by the translation miss ITLB Data load 01100 Caused when the effective address for a data load operation cannot be translated by the translation miss DTLB Data store 01200 Caused when the effective address for a data store operation cannot be translated by the translation miss DTLB or when a DTLB hit occurs and the change bit in the PTE must be set due to a data store operation Instruction 01300 Occurs when the address bits 0 29 in the IABR matches the next instruction to complete address in the completion unit and IABR 30 is set Note that the e300 core also implements breakpoint IABR2 which functions identically to IABR System 01400 Caused when MSR EE 1 and the smi input signal is asserted management interrupt Reserved 01500 02FFF 1 3 5 Memory Management The following sections describe the memory management features of the PowerPC architecture and the e300 core implementation respectively 1 3 5 1 PowerPC Memory Management The primary functions of the MMU are to translate logical effective addresses to physical addresses for memory accesses and to provide access protection on blocks and pages of memory The core generates two types of accesses that require address translation instruction accesses
49. 0 CRM 0 0010010000 0 mtmsr 011111 S 00000 00000 0010010010 0 stdx 011111 S A B 0010010101 0 stwex 011111 S A B 0010010110 1 stwx 011111 S A B 0010010111 0 stdux 011111 S A B 0010110101 0 stwux 011111 S A B 0010110111 0 subfzex 011111 D A 00000 OE 0011001000 Re addzex 011111 D A 00000 OE 0011001010 Re mtsr 011111 S 0 SR 00000 0011010010 0 stdex 011111 S A B 0011010110 1 stbx 011111 S A B 0011010111 0 subfmex 011111 D A 00000 OE 0011101000 Re mulld 011111 D A B OE 0011101001 Re addmex 011111 D A 00000 OE 0011101010 Re mullwx 011111 D A OE 0011101011 Re mtsrin 011111 S 00000 B 0011110010 0 dcbtst 011111 00000 A B 0011110110 0 stbux 011111 S A B 0011110111 0 addx 011111 D A B OE 0100001010 Re debt 011111 00000 A B 0100010110 Inzx 011111 D A B 0100010111 eqvx 011111 S A B 0100011100 Rc tlbie 3 011111 00000 00000 B 0100110010 0 eciwx 011111 D A B 0100110110 0 Ihzux 011111 D A B 0100110111 0 xorx 011111 S A B 0100111100 Rc mfsprf 011111 D spr 0101010011 0 was li 011111 D A B 0101010101 0 Ihax 011111 D A B 0101010111 0 tlbia gt 011111 00000 00000 00000 0101110010 0 e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Instruction Set Listings Table A 2 Complete Instruction List Sorted by Opcode continued
50. 00800 A program interrupt is caused by one of the following exceptions which correspond to bit settings in SRR1 and arise during execution of an instruction Floating point enabled exception A floating point enabled exception condition is generated when the following condition is met MSR FEO MSR FE1 A FRSCR FEX is 1 e FPSCR FEX is set by the execution of a floating point instruction that causes an enabled exception or by the execution of one of the move to FPSCR instructions that results in both an exception condition bit and its corresponding enable bit being set in the FPSCR e Illegal instruction An illegal instruction program exception is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode fields including PowerPC instructions not implemented in the core or when execution of an optional instruction not provided in the core is attempted these do not include those optional instructions that are treated as no ops e Privileged instruction A privileged instruction type program exception is generated when the execution of a privileged instruction is attempted and the MSR register user privilege bit MSR PR is set In the e300 core this exception is generated for mtspr or mfspr with an invalid SPR field if SPR O 1 and MSR PR 1 This may not be true for all cores that implement the PowerPC architecture e Trap A trap type program e
51. 1 Table 5 1 Interrupt Classifications Synchronous Asynchronous Precise Imprecise Interrupt Type Asynchronous nonmaskable Imprecise Machine check System reset Asynchronous maskable Precise External interrupt Decrementer System management interrupt Critical interrupt Synchronous Precise Instruction caused interrupts Table 5 1 defines interrupt categories that are handled uniquely by the e300 core Note that Table 5 1 includes no synchronous imprecise interrupts While the PowerPC architecture supports imprecise handling of floating point exceptions the e300 core implements them as precise Although the PowerPC architecture specifies that the recognition of the machine check interrupt is nonmaskable on the e300 core the stimuli that cause this interrupt are maskable For example the machine check interrupt is caused by the assertion of tea ape dpe or mcp However mcp ape and dpe can be disabled by bits 0 2 and 3 respectively in HIDO Therefore the machine check caused by asserting fea is the only truly nonmaskable machine check interrupt The e300 core interrupts and conditions that cause them are listed in Table 5 2 Note that the e300c2 core does not support floating point operations Table 5 2 Interrupts and Exception Conditions Vector Offset Interrupt Type hex Exception Conditions Reserved 00000 System reset 00100 A system reset is caused by the assertion of
52. 1 7 3 branch folding 7 1 7 18 branch instruction timing 7 18 7 21 latency 7 28 branch prediction static 7 1 7 19 branch resolution definition 7 1 resource requirements 7 26 overview 1 9 Breakpoints address matching options 10 4 branch trace enabling 5 13 5 32 10 1 10 3 10 4 data address breakpoints DSI data storage interrupt 10 3 10 4 match conditions 10 2 instruction address breakpoints ABR interrupt 10 1 signaling 1 33 single stepping 10 1 10 3 single step trace enabling 5 13 5 32 10 3 10 4 using breakpoints 10 3 Bus interface unit BIU 1 12 4 2 Bus snooping 9 2 Byte ordering 3 1 byte ordering considerations 5 20 interrupt LE mode MSR ILE bit 5 12 LE or BE mode MSR LE bit 5 14 Byte reverse instructions A 20 C Cache inhibited accesses I bit see Memory cache access attributes WIMG bits Caches bus interface buffers 4 26 4 27 address pipelining 4 26 load ahead of store capability 4 26 pipeline collision detection 4 26 reservation address snooping for lwarx stwex 4 26 cache control and CSB operations 4 29 8 7 broadcasting operations 4 17 cache control instructions 4 17 4 21 enabling disabling caches data cache 4 14 instruction cache 4 16 invalidating caches e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Index 1 data cache 4 15 instruction cache 4 16 parameters in HIDn 4 14 4 17 parity error reporting
53. 1 and 2 locked 0b100 Ways 0 1 2 and 3 locked 0b101 Ways 0 1 2 3 and 4 locked 0b110 Ways 0 1 2 3 4 and 5 locked 0b111 Ways 0 1 2 3 4 5 and 6 locked e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 43 Instruction and Data Cache Operation Table 4 16 shows the HID2 I WLCK 0 2 settings for the e300c2 and e300c3 Table 4 21 e300c2 Core IWLCK 0 2 Encodings IWLCK 0 2 Ways Locked 0b000 No ways locked 0b001 Way 0 locked 0b010 Ways 0 and 1 locked 0b011 Ways 0 1 and 2 locked 0b100 Reserved Ways 0 1 and 2 locked 0b101 Reserved Ways 0 1 and 2 locked 0b110 Reserved Ways 0 1 and 2 locked 0b111 Reserved Ways 0 1 and 2 locked Note that on the e300c2 and e300c3 values greater than 0b011 are reserved but default to the maximum number of ways locked Ways 0 1 and 2 The following assembly code locks way 0 of the core instruction cache Lock way 0 of the instruction cache This corresponds to setting iwlck 0 2 to 0b001 bits 16 18 mfspr rl HID2 lis r2 OXxFFFF ori r2 r2 Ox1FFF and ely Elp 2 ori rl rl 0x2000 sync isync mtspr HID2 rl isync 4 10 3 2 7 Invalidating the Instruction Cache Even if Locked There are two methods for invalidating the instruction cache In the first way invalidate the entire cache by setting and then immediately clearing the instruction cache flash invalidate bit HIDO ICFI bit
54. 1 the associated bit of the rotated data is placed into the target register and if the mask bit is 0 the associated bit in the target register is unchanged or ANDed with a mask before being placed into the target register The integer rotate instructions are listed in Table 3 6 Table 3 6 Integer Rotate Instructions Name Mnemonic Operand Syntax Rotate Left Word Immediate then AND with Mask rlwinm rlwinm rA rS SH MB ME Rotate Left Word Immediate then Mask Insert rlwimi riwimi rA rS SH MB ME Rotate Left Word then AND with Mask rlwnm rlwnm rA rS rB MB ME The integer shift instructions perform left and right shifts Immediate form logical unsigned shift operations are obtained by specifying masks and shift values for certain rotate instructions Simplified mnemonics are provided making coding of such shifts simpler and easier to understand Multiple precision shifts can be programmed as shown in Appendix C Multiple Precision Shifts in the Programming Environments Manual The integer shift instructions are listed in Table 3 7 Table 3 7 Integer Shift Instructions Name Mnemonic Operand Syntax Shift Left Word slw sIw rA rS rB Shift Right Algebraic Word sraw sraw rA rS rB Shift Right Algebraic Word Immediate srawi srawi rA rS SH Shift Right Word srw srw rA rS rB 3 2 4 2 Floating Point Instructions This section describes the floating poin
55. 1 1 adde o 31 138 Integer 1 1 subfze o 31 200 Integer 1 1 addze o 31 202 Integer 1 1 subfme o 31 232 Integer 1 1 addmef o 31 234 Integer 1 1 mull o 31 235 Integer 2 3 4 5 2 add o 31 266 Integer amp SRU 1 1 e300 Power Architecture Core Family Reference Manual Rev 3 30 Freescale Semiconductor Table 7 4 Integer Instructions continued Instruction Timing R Latency 3 Latency Mnemonic SC Ee Unit in Cycles in in Ee e pcode pcode e300c1 e300c2 e3000c3 eqv 31 284 Integer 1 d xor 31 316 Integer 1 1 orc 31 412 Integer 1 1 orl 31 444 Integer 1 d divwu o 31 459 Integer 20 20 nand 31 476 Integer 1 1 divw o 31 491 Integer 20 20 srw 31 536 Integer 1 1 sraw 31 792 Integer 1 1 srawi 31 824 Integer 1 1 extsh 31 922 Integer 1 1 extsb 31 954 Integer 1 1 Note indicates that the cycle time immediately forwards their CR results to the BPU for fast branch resolution 1 The SRU can only execute the add and add o instructions Table 7 5 provides the latencies for the floating point instructions Table 7 5 Floating Point Instructions mnemonic Gima Extended unt VE fdivs 59 018 FPU 18 fsubs 59 020 FPU 1 1 14 fadds 59 021 FPU 1 1 14 fres 59 024 FPU 18 fmuls 59 025 FPU 1 1 14 fmsubs 59 028 FPU 1 1 14 fmadds 59 029 FP
56. 10 11 CMP2 DABR2 breakpoint compare type 00 Match if data s EA equals DABR2 CEA 01 Reserved 10 Match if data s EA less than DABR2 CEA 11 Match if data s EA greater than or equal to DABR2 CEA 12 13 Reserved 14 SIG_TYPE Combinational signal type 0 Data access EA matches DABR CEA OR EA matches DABR2 CEA 1 Data access EA matches DABR CEA AND EA matches DABR2 CEA 15 DNS Do not signal Disable dabr and dabr2 output signals 0 Allow signal to toggle on a match 1 Do not toggle signal on match 16 31 Reserved e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 27 Register Model 2 2 18 Performance Monitor Registers The performance monitor provides a set of PMRs for defining enabling and counting conditions that trigger the performance interrupt It also the performance monitor interrupt vector The supervisor level performance monitor registers are accessed with mtpmr and mfpmr Attempting to read or write supervisor level registers in user mode causes a privilege exception The user level performance monitor registers are read only and are accessed with the mfpmr instruction Attempting to write these user level registers in either supervisor or user mode causes an illegal instruction exception Chapter 11 Performance Monitor for a detailed description of performance monitor registers e300 Power Architecture Core Family Reference Manual Rev 3 28 Freescale Semiconductor Chap
57. 10111 IBAT7L 2 Supervisor 568 10001 11000 DBAT4U 2 Supervisor 569 10001 11001 DBATAL 2 Supervisor 570 10001 11010 DBATS5U 2 Supervisor 571 10001 11011 DBATSL 2 Supervisor 572 10001 11100 DBAT6U 2 Supervisor 573 10001 11101 DBAT6L Supervisor 574 10001 11110 DBAT7U 2 Supervisor 575 10001 11111 DBATZL 2 Supervisor 976 11110 10000 DMISS Supervisor 977 11110 10001 DCMP Supervisor 978 11110 10010 HASH1 Supervisor 979 11110 10011 HASH2 Supervisor 980 11110 10100 IMISS Supervisor 981 11110 10101 ICMP Supervisor 982 11110 10110 RPA Supervisor 1008 11111 10000 HIDO Supervisor 1009 11111 10001 HID1 Supervisor 1010 11111 10010 IABR Supervisor 1011 11111 10011 HID2 Supervisor 1013 11111 10101 DABR 2 Supervisor 1018 11111 11010 IABR2 2 Supervisor 1 Note that the order of the two 5 bit halves of the SPR number is reversed compared with actual instruction coding For mtspr and mfspr instructions the SPR number coded in assembly language does not appear directly as a 10 bit binary number in the instruction The number coded is split into two 5 bit halves that are reversed in the instruction with the high order 5 bits appearing in bits 16 20 of the instruction and the low order 5 bits in bits 11 15 2 These registers are implementation specific for e300 core only e300 Power Architecture Core Family Reference Manual Rev 3 Instruction Set Model Freescale Semiconductor 31 Instruction Set Model Implementation note Th
58. 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 rlwimix 20 S A SH MB ME Re rlwinmx 21 S A SH MB ME Re rlwnmx 23 S A B MB ME Rc sc 17 00000 00000 00000000000000 1 0 srawx 31 A B 792 Re srawix 31 S A SH 824 Rc T ee ee ee eee srwx 31 S A B 536 Rc stb 38 S A d stbu 39 S A d stbux 31 S A B 247 0 stbx 31 S A B 215 0 stfd 54 S A d stfdu 55 S A d stfdux 31 S A 759 stfdx 31 S A 727 stfiwx 31 S A B 983 stfs 52 S A d stfsu 53 S A d stfsux 31 S A 695 stfsx 31 S A 663 sth 44 S A d sthbrx 31 S A B 918 0 e300 Power Architecture Core Family Reference Manual Rev 3 6 Freescale Semiconductor Instruction Set Listings Table A 1 Complete Instruction List Sorted by Mnemonic continued Name o 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 sthu 45 S A d sthux 31 S A 439 0 sthx 31 S A 407 0 stmw 4 47 S A d stswi 4 31 S A NB 725 0 stswx 4 31 S A B 661 0 stw 36 S A d stwbrx 31 S A 662 0 stwex 31 S A 150 1 stwu 37 S A d stwux 31 S A B 183 0 stwx 31 S A B 151 0 subfx 31 D A B OE 40 Re subfcx 31 D A B OE 8 Re subfex 31 D A B OE 136 Re subfic 08 D A SIMM subfmex 31 D A 00000 OE 232 Re subfzex 31 D A 00000 OE 200 Re sync 31 00000 00000 00000 598 td 31 TO A B 68 0
59. 2 1 7 9 7 4 2 7 21 7 7 7 25 8 2 8 7 8 1 1 8 2 8 1 2 8 2 Revision History Revise Figure 4 5 PLRU Replacement Algorithm with note that BO 0 leg is always taken on e300c2 Add row for e300c2 cache organization in Table 4 9 Cache Organization Add comment in example code that the number of blocks for e300c2 is 0x200 Added comment that the example uses e300c1 value of 0x400 Separate Table 4 15 e300 Core DWLCK 0 2 Encodings into two tables e300c1 Core DWLCK 0 2 Encodings and e300c2 Core DWLCK 0 2 Encodings Separate Table 4 18 e300 Core IWLCK 0 2 Encodings into two tables e300c1 Core IWLCK 0 2 Encodings and e300c2 Core IWLCK 0 2 Encodings In Table 5 2 Interrupts and Exception Conditions change the decrementer interrupt description to say that it is triggered when DEC 0 changes from 0 to 1 not when DEC 31 changes from 0 to 1 In Figure 5 6 Machine State Register MSR change reset value from All zeros to Q0000_0040 or 0000_0000 or 0001_0041 or 0001_0001 to reflect the values during different reset states In Table 5 8 MSR Bit Settings add statements that bits FP FEO and FE1 are read only in e300c2 Because the e300 core supports the hit under cancel capability replace the last sentence of the section with the following two sentences The instruction fetch cancel extension allows a new instruction fetch to be is
60. 20 Even when a cache is locked toggling the ICFI bit invalidates all of the instruction cache The following assembly code invalidates the entire instruction cache Set and then clear the HIDO ICFI bit bit 20 mfspr rl HIDO mr v2 EL ori rl x1 0x0800 sync isync mtspr HIDO rl mtspr HIDO r2 isync e300 Power Architecture Core Family Reference Manual Rev 3 44 Freescale Semiconductor Instruction and Data Cache Operation In the second method the instruction cache block invalidate icbi instruction can be used to invalidate individual cache blocks The icbi instruction invalidates blocks in an entirely locked instruction cache The icbi instruction also may invalidate way locked blocks within the instruction cache 4 10 3 2 8 Instruction Cache Way Protection The instruction cache way protect extension supplements the existing cache lock capability of the instruction cache For normal operation from one to all ways eight ways for 32Kbyte caches four ways for 16Kbyte caches of the instruction cache may be locked using HID2 IWLCK and HIDO ILOCK The locking mechanism normally allows a portion or all of the instruction cache to be used as a very fast always resident program memory Specifically pages of memory up to a full 4Kbytes per page per way may be locked in the instruction cache and used as fast always resident program memory while allowing the remaining unlocked portion of the instruction cache to cont
61. 23 Lookaside Buffer Management Instructons eee eesceseeeneeceeeeseeesseecsaecsaeenseeeeaeessaeenes A 24 e300 Power Architecture Core Family Reference Manual Rev 3 XX Freescale Semiconductor Table Number A 30 A 31 A 28 A 32 A 33 A 34 A 35 A 36 A 37 A 38 A 39 A 40 A 41 A 42 A 43 A 44 A 45 B 1 B 2 B 3 Tables Page Title Number N Gen EE A 24 EE A 24 Segment Register Manipulation Instructions ceeececesceeeeeececeeeeeceeeeeeseeeeseeeenaeeeeeeeeaas A 24 BC MORI EE A 25 Eed A 25 DS e EE A 26 DET OFM EE TEE A 27 POA PORN WEE E E A 31 KEX F IR Ee A 31 EE A 32 Pas OUI EE EE A 32 RE EE A 32 PPM EE A 33 Ee e EE A 34 IND SE OT E A 34 RRE EE A 34 PowerPC Instruction Set Legend DEE A 35 32 Bit Instructions Not Implemented by the e300 core eee ceescceesseceesteceeseceesaeeeenaeeeaees B 1 64 Bit Instructions Not Implemented by the e300 core eee eeeseceesseceesteeeesteceenaeeeenaeeesaees B 1 64 Bit SPR Encoding Not Implemented by the e300 core ssssssssssssesssssessseessessseesseesseesssesse B 2 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Xxi Tables Table Page Number Title Number e300 Power Architecture Core Family Reference Manual Rev 3 xxii Freescale Semiconductor About This Book The primary objective of this reference manual is to describe the functionality of the microprocessors in the e300 core family which are
62. 4 10 1100 0 12x 8 11 1100 0 12x 8 00 1101 0 16x 2 01 1101 0 16x 4 10 1101 0 16x 8 11 1101 0 16x 8 00 1110 0 20x 2 01 1110 0 20x 4 10 1110 0 20x 8 11 1110 0 20x 8 XX all other modes D off n a 8 2 Overview of Core Interface Accesses The e300 core contains an internal coherent system bus CSB that interfaces the core to the peripheral logic This internal bus is very similar in function to the external 60x bus interface on the MPC603e processor The CSB system logic decodes e300 core initiated transactions and directs all accesses to the appropriate on chip interface The core interface prioritizes requests for bus operations from the instruction and data caches and performs bus operations following the coherent system bus CSB protocol The core interface includes address register queues prioritization logic and the bus control unit The core interface latches snoop addresses for snooping in the data cache and address register queues and reservations controlled by the Load Word and Reserve Indexed Iwarx and Store Word Conditional e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Core Interface Operation Indexed stwex instructions it also maintains the touch load address for the data cache The interface allows one level of pipelining that is with certain restrictions described in subsequent sections there can be as many as two outstanding transactions at any give
63. 4 14 WIMG bits see Memory cache access attributes WIMG bits 4 14 cache locking applications information 4 32 data cache locking guidelines 4 35 4 38 overview 4 15 entire cache locking 4 32 instruction cache locking guidelines 4 40 4 44 overview 4 16 way protection 4 17 MSR bits and cache locking 4 34 register summary 4 33 terminology 4 32 way locking 1 25 4 33 cache management instructions 3 32 A 23 cache miss effects 7 14 coherency 4 5 4 14 9 4 3 state MEI coherency model 4 9 4 10 4 state MESI coherency model 4 9 4 11 in single processor systems 4 12 load and store operations 4 12 CSB coherent system bus 4 26 4 27 instruction cache arbitration 7 10 instruction cache hit timing 7 11 loads and stores caching inhibited loads stores 4 13 operations 1 12 4 21 4 25 organization 4 3 4 4 4 33 overview and features 4 1 4 2 parity checking 4 25 performance cache inhibited pages 7 25 memory considerations 7 24 performed loads and stores definition 4 13 PLRU replacement algorithm 4 4 snooping 4 29 4 32 way locking 1 25 Change C bit 6 10 6 20 checking 6 36 maintenance recording 6 10 6 20 6 22 Checkstop signal 8 9 Checkstop state 5 22 Clock multiplier 1 14 Clock signals pll_cfgn 8 4 Coherent system bus CSB 4 26 4 27 overview 8 1 parity checking 8 9 signals described 8 1 8 4 Compare instructions A 16 Completion queue CQ 7 1 Glossary 3 Completion unit ov
64. 4 5 1 9 Instruction Cache Flash Invalidate HIDO ICF 0 ee eeeeeeeeeeeeeeeeeeee 4 16 4 5 1 10 Instruction Cache Way Protect HID2 ICWP A 4 17 4 5 1 11 Cache Operation Broadcasting HIDO ABE eeeeeeeeeeeeeeeceeeeeeeeeeecsteeeenaeees 4 17 4 5 2 Cache Control Imstructi ons wivsisisavssccgissceecdivcesacsasdozcesascacaataspanndceshedeaabedunraatessneeuennweneas 4 17 4 5 2 1 Data Cache Block Touch debt Instrucpon 4 18 4 5 2 2 Data Cache Block Touch For Store debtst Instruction cc ccceseesceeeeees 4 18 4 5 2 3 Data Cache Block Clear To Zero debz Instruction cccececccccccceceesesseseceeeeeees 4 18 4 5 2 4 Data Cache Block Store debst Instructpon 4 19 4 5 2 5 Data Cache Block Flush debf Instrueton 4 19 4 5 2 6 Data Cache Block Invalidate debi Instruction c cc ccccccccesssececeesseeeceestseeees 4 19 4 5 2 7 Instruction Cache Block Touch icbt Instruction c cc cececeessssecececeeeeeenensees 4 20 4 5 2 8 Instruction Cache Block Invalidate icbi Instruction 2 0 0 0 eeceeceeeesseeeeeesseeeees 4 20 4 6 RETTEN ame 4 21 4 6 1 Data Cache Fill Operations et wte aissgveistcsceateasescaciaseavaatasdecadsassavcsatheosetersevaceuatecnens 4 21 4 6 2 Instruction Cache Fill Operations va ciccssssiecessicsiersesaeateatiaessvevsataatedseaenuacdesd dEr 4 21 4 6 3 Instruction Fetch Cancel Extension esesssessssseesseesseessesssereseeessesseesseessseessseessrese 4 21 4 6 4 Data Cache C
65. 4 byte words The UISA provides for byte half word and word operand loads and stores between memory and a set of 32 GPRs It also provides for word and double word operand loads and stores between memory and a set of 32 FPRs Floating point instructions and registers are not supported on the e300c2 core Arithmetic and logical instructions do not read or modify memory To use the contents of a memory location in a computation and then modify the same or another memory location the memory contents must be loaded into a register modified and then written to the target location using load and store instructions The description of each instruction includes the mnemonic and a formatted list of operands To simplify assembly language programming a set of simplified mnemonics extended mnemonics in the architecture specification and symbols is provided for some of the frequently used instructions see Appendix F Simplified Mnemonics in the Programming Environments Manual for a complete list of simplified mnemonic examples 3 2 1 Classes of Instructions The e300 core instructions belong to one of the following three classes s Defined e Illegal e Reserved Note that although the definitions of these terms are consistent among the processors of this family the assignment of these classifications is not For example an instruction that is specific to 64 bit implementations is considered defined for 64 bit implementations but illegal for
66. 5 13 exceptions floating point unavailable interrupt 5 4 5 29 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Index 3 FP interrupt mode 0 5 13 5 14 FP interrupt mode 1 5 13 IEEE exceptions program interrupt 5 29 execution model 3 3 floating point execution unit FPU 7 3 execution timing 7 23 latency FP instructions 7 31 FP arithmetic instructions A 17 FP compare instructions A 18 FP load instructions A 21 FP move instructions A 22 FP multiply add instructions A 18 FP registers FPRv 1 18 FP rounding conversion instructions A 18 FP status and control reg FPSCR 5 1 FP store instructions A 21 FPRn floating point registers 0 31 1 18 FPSCR instructions A 18 instructions 3 13 3 16 arithmetic 3 14 compare 3 15 load 3 21 move 3 16 multiply add 3 14 rounding and conversion 3 15 status and control 3 15 store 3 22 load store address generation 3 21 FPRn floating point registers 0 31 1 18 2 3 FPSCR floating point status and control reg 1 18 2 3 5 1 FPSCR floating point status and control register bit settings 2 4 FPSCR instructions A 18 Full power mode 9 2 with DPM disabled 9 3 with DPM enabled 9 3 Fully static 9 1 G G2 overview 1 15 Global accesses signaling and snooping 4 29 4 32 GPRn general purpose registers 0 31 1 18 2 3 H Hashing functions primary PTEG 6 28 secondary PTEG 6 29 see also Memory management un
67. 6 5 1 Page Table Search Operation Conceptual Flow The table search process for a processor of this family varies slightly for 64 and 32 bit implementations The main differences are the address ranges and PTE formats specified See the Programming Environments Manual for the PTE format An outline of the page table search process performed by a 32 bit implementation is as follows 1 The 32 bit physical address of the primary PTEG is generated as described in Chapter 7 Memory Management in the Programming Environments Manual for 32 bit implementations e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 25 Memory Management 2 The first PTE PTEO in the primary PTEG is read from memory PTE reads should occur with an implied WIM memory cache mode control bit setting of 0b001 Therefore they are considered cacheable and burst in from memory and placed in the cache Effective Address Generated 1 See Figure 6 6 Otherwise O Instruction Fetch with N Bit Page Address Set in Segment Descriptor Translation No Execute Generate 52 Bit Virtual Address From Segment Descriptor Compare Virtual Address with TLB Entries TLB Hit Case O dcbz Instruction withWorl 1 Kiran Alignment Interrupt Check Page Memory Protection Violation Conditions See the Programming Environments Manual See the Programming Environments Access Permitted Access Prohibited Manual Page Memor
68. 63 D 00000 B 846 Re fetidx 63 D 00000 B 814 Re fetidzx 63 D 00000 B 815 Re fctiwx 63 D 00000 B 14 Rc fctiwzx 63 D 00000 B 15 Rc frspx 63 D 00000 B 12 Rc 1 64 bit instruction Table A 11 Floating Point Compare Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 fempo 63 crfD 00 A B 32 0 fempu 63 crfD 00 A B 0 0 Table A 12 Floating Point Status and Control Register Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 merfs 63 crfD 00 crfS 00 00000 64 0 mffsx 63 D 00000 00000 583 Re mtfsb0x 63 crbD 00000 00000 70 Re mtfsb1x 63 crbD 00000 00000 38 Re mtfsfx 31 0 0 B 711 Re mtfsfix 63 crfD 00 00000 IMM 134 Re e300 Power Architecture Core Family Reference Manual Rev 3 18 Freescale Semiconductor Instruction Set Listings Table A 13 Integer Load Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Ibz 34 D A d Ibzu 35 D A d Ibzux 31 D A B 119 Ibzx 31 D A 87 0 B Iha 42 D A d Ihau 43 D A d Ihaux 31 D A 375 Ihax 31 D A 343 0 Ihz 40 D A d Ihzu 41 D A d Ihzux 31 D A 311 0 Ihzx 31 D A 279 B 0 lwz 32 D A d lwzu 33 D A d Iwzux 31 D A 55 0 lwzx 31 D A 23 0 1 64
69. A B 438 0 eieio 31 00000 00000 00000 854 0 eqvx 31 S A B 284 Rc extsbx 31 S A 00000 954 Rc extshx 31 S A 00000 922 Rc extswx 31 S A 00000 986 Rc fabsx 63 D 00000 264 Rc faddx 63 D A B 00000 21 Rc faddsx 59 D A B 00000 21 Rc fcfidx 63 D 00000 B 846 Rc e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Name fcmpo fempu fctidx fetidzx fctiwx fctiwzx fdivx fdivsx fmaddx fmaddsx fmrx fmsubx fmsubsx fmulx fmulsx fnabsx fnegx fnmaddx fnmaddsx fnmsubx fnmsubsx fresx frspx frsqrtex fsel fsqrtx fsqrtsx fsubx fsubsx icbi icbt isync Ibz Ibzu Ibzux Ibzx Instruction Set Listings Table A 1 Complete Instruction List Sorted by Mnemonic continued 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 63 crfiD 00 A 32 0 63 crfD 00 A B 0 0 63 D 00000 B 814 Re 63 D 00000 B 815 Re 63 D 00000 B 14 Re 63 D 00000 B 15 Rc 63 D A B 00000 18 Rc 59 D A B 00000 18 Rc 63 D A B C 29 Rc 59 D A B C 29 Rc 63 D 00000 B 72 Rc 63 D A B C 28 Rc 59 D A B C 28 Rc 63 D A 00000 C 25 Rc 59 D A 00000 C 25 Rc 63 D 00000 B 136 Rc 63 D 00000 B 40 Rc 63 D A B C 31 Rc 59 D A B C 31 Rc 63 D A B C 30 Rc 59 D A B C 30 Rc 59 D 00000 B 00000 24 Rc 63 D 00000 B 12 Rc 63 D 00000 B 00000 26 Rc 63 D A B C 23 Rc 63 D 00000 B 00
70. Architecture Core Family Reference Manual Rev 3 20 Freescale Semiconductor Name eieio isync Idarx lwarx stdex stwex sync 1 64 bit instruction Name lfd Ifdu Ifdux Ifdx Ifs lfsu lfsux lfsx Name stfd stfdu stfdux stfdx stfiwx stfs stfsu stfsux stfsx Instruction Set Listings Table A 18 Memory Synchronization Instructions 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 31 00000 00000 00000 854 0 19 00000 00000 00000 150 0 31 D A B 84 0 31 D A B 20 0 31 S A B 214 1 31 S A B 150 1 31 00000 00000 00000 598 0 Table A 19 Floating Point Load Instructions 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 50 D A d 51 D A d 31 D A 631 0 31 D A 599 0 48 D A d 49 D A d 31 D A B 567 0 31 D A 535 0 Table A 20 Floating Point Store Instructions 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 54 S A d 55 S A d 31 S A 759 0 31 S A 727 0 31 S A 983 0 52 S A d 53 S A d 31 S A 695 0 31 S A 663 0 1 Optional in the PowerPC architecture e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 21 Instruction Set Listings Table A 21 Floating Point Move Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 1
71. B C 11111 Re fnmaddx 111 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Instruction Set Listings Table A 2 Complete Instruction List Sorted by Opcode continued Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Tempo 111111 crfD 00 A B 0000100000 0 mtfsbix 111111 crbD 00000 00000 0000100110 Re fnegx 111111 D 00000 B 0000101000 Re merfs 111111 crfD 00 crfS 00 00000 0001000000 0 mtfsb0x 111111 crbD 00000 00000 0001000110 Re fmrx 111111 D 00000 B 0001001000 Re mtfsfix 111111 crfD 00 00000 IMM 0 0010000110 Re fnabsxv 111111 D 00000 B 0010001000 Re fabsx 111111 D 00000 B 0100001000 Re mffsx 111111 D 00000 00000 1001000111 Re mtfsfx 111111 0 FM 0 B 1011000111 Re fctidx 111111 D 00000 B 1100101110 Re fctidzx 111111 D 00000 B 1100101111 Re fcfidx 111111 D 00000 B 1101001110 Re 1 64 bit instruction 2 Supervisor level instruction S Optional in the PowerPC architecture Supervisor and user level instruction 5 Load and store string or multiple instruction 6 e300 core implementation specific instruction e300 Power Architecture Core Family Reference Manual Rev 3 14 Freescale Semiconductor Instruction Set Listings A 3 Instructions Grouped by Functional Categories Table A 3 through Table A 29 list the PowerPC instructio
72. C 1 Changes From Revision 0 to Revision 1 sg c ic2 eisend aed eo kaiee haeieed eee C 4 Glossary Index e300 Power Architecture Core Family Reference Manual Rev 3 xiv Freescale Semiconductor Figures Figure Page Number Title Number 1 1 300c1 Core Block Diasr any EE 1 2 1 2 SIUC Core Block RE ocs cesses ee nace oe A iced E asds ce panede goa eandees 1 3 1 3 300c3 C r Block PM AS ATM ig ceases tiesto echo stale ess ced ale ws ec adh Dale dap tele ve ae ma ead 1 4 1 4 e300 Programming Model Registers o cs c 0 icc sess eeesdieccavadomdieeeleaeneiaaasie 1 18 1 5 e300c1 Data Cache Creomzafton eege Der ee d e be puadcaensedeasteverseaeenae 1 24 1 6 e300c2 and e300c3 Data Cache Organization eesecesceesscceesteceesecesseceesaeceeaeceeaaeeaas 1 24 1 7 Core nterfa EE 1 32 2 1 e300 Programming Model Regeiaterg AANEREN 2 3 2 2 Floating Point Status and Control Register OPDSCHR AA 2 3 2 3 300 Processor Version IK Eeer 2 6 2 4 M chine State RESISTE geesde GE dee 2 7 2 5 HIDO Register rcar ninian a E EA E E E A E ARS 2 12 2 6 HI E E 2 15 2 7 MUI IRE CIS ee Seege eege 2 16 2 8 DMISS and EE 2 18 2 9 PEMP and O MEE EE ee 2 19 2 10 HASHI and HASH2 RES ister einna ANEREN edel 2 19 2 11 Required Physical Address Register RPA Aen 2 20 2 12 Upper BAT RESister EE 2 21 2 13 L wer AYRE GIS CE geet 2 21 2 14 Critical Interrupt Save Restore Register 0 CSRRO ceeceeeesceceseeeceseeeceeeeceeeeecse
73. Cache Hit A cache miss extends the latency of the fetch stage so in this example the fetch stage represents not only the time the instruction spends in the IQ but also the time required for the instruction to be loaded from system memory beginning in clock cycle 3 During clock cycle 2 the target instruction for the br instruction is not in the instruction cache therefore a memory access must occur During clock cycle 5 the address of the block of instructions is sent to the system bus During clock cycle 9 two instructions 64 bits are returned from memory on the first beat and are forwarded both to the cache and instruction fetcher e300 Power Architecture Core Family Reference Manual Rev 3 14 Freescale Semiconductor Instruction Timing 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Fetch in IQ In Dispatch Entry IQ0 IQ1 Wm Eecute SE Complete In CQ Es In Retirement Entry CQ0 CQ1 Instruction Queue Completion Queue 3 z 0 2
74. Family Reference Manual Rev 3 30 Freescale Semiconductor Overview e The dispatch pipeline stage is responsible for decoding the instructions supplied by the instruction fetch stage and determining which of the instructions are eligible to be dispatched in the current cycle In addition the source operands of the instructions are read from the appropriate register file and dispatched with the instruction to the execute pipeline stage At the end of the dispatch pipeline stage the dispatched instructions and their operands are latched by the appropriate execution unit e In the execute pipeline stage each execution unit with an instruction executes the selected instruction perhaps over multiple cycles writes the instruction s result into the appropriate rename register and notifies the completion stage when the execution has finished In the case of an internal interrupt the execution unit reports the interrupt to the completion write back pipeline stage and discontinues instruction execution until the interrupt is handled The interrupt is not signaled until that instruction is the next to be completed Execution of most floating point instructions is pipelined within the FPU allowing up to three instructions to execute in the FPU concurrently The FPU pipeline stages are multiply add and round convert The LSU has two pipeline stages the first stage for effective address calculation and MMU translation and the second for accessing
75. Finally note that when the e300 core is in page address translation mode there is no special handling for accesses that fall into BAT regions 5 5 6 2 Load Store Multiple Alignment Exceptions Most alignment interrupts store the address as computed by the instruction in the DAR However when the operand of an lmw stmw Iwarx or stwex instruction is not word aligned that address value 4 is stored into the DAR 5 5 7 Program Interrupt 0x00700 The e300 core implements the program interrupt as it is defined by the PowerPC architecture OEA A program interrupt occurs when no higher priority interrupt exists and one or more of the exception conditions defined in the OEA occur When a program interrupt is taken instruction execution for the handler begins at offset 0x00700 from the physical base address indicated by MSR IP The exception conditions are as follows e Floating point enabled exception These exceptions correspond to IEEE defined exception conditions such as overflows and divide by zeros that may occur during the execution of a floating point arithmetic instruction As a group these exceptions are enabled by the FEO and FE1 bits in the MSR Individual conditions are enabled by specific bits in the FPSCR For general information about this interrupt see the Programming Environments Manual For more information about how these exceptions are implemented in the e300 core see Section 5 5 7 1 IEEE Floating Point Exception
76. HIDO and HID1 registers provide the means for enabling core checkstops and features and allow software to read the configuration of the PLL configuration signals The HID2 register enables the true little endian mode cache way locking and the additional BAT registers Instruction Set and Addressing Modes The following sections describe the PowerPC instruction set and addressing modes in general 1 3 2 1 PowerPC Instruction Set and Addressing Modes All PowerPC instructions are encoded as single word 32 bit opcodes Instruction formats are consistent among all instruction types permitting efficient decoding to occur in parallel with operand accesses This fixed instruction length and consistent format simplifies instruction pipelining The PowerPC instructions are divided into the following categories Integer instructions These include computational and logical instructions Integer arithmetic instructions Integer compare instructions Integer logical instructions Integer rotate and shift instructions Floating point instructions These include floating point computational instructions as well as instructions that affect the FPSCR Floating point arithmetic instructions Floating point multiply add instructions Floating point rounding and conversion instructions Floating point compare instructions Floating point status and control instructions Load store instructions These include inte
77. Instruction List Sorted by Opcode continued Name 0 fsqrtsx 111011 fmsubsx 111011 fmaddsx 111011 fnmsubsx 111011 fnmaddsx 111011 std 111 stdu 111 fempu 111 frspx 111 fctiwx 111 fctiwzx 111 fdivx 111 fsubx 111 faddx 111 icbt 111 fsqrtx 111 fselx 111 fmulx 111 frsqrtex 111 fmsubx 111 fmaddx 111 fnmsubx 111 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 au 110011 A d stfs 110100 d stfsu 110101 S A d stfd 110110 S A d stfdu 110111 S A d d 111010 D A ds 00 ldu 111010 D A ds 01 wall 111010 D A ds 10 fdivsx 111011 D A B 00000 10010 Rc fsubsx 111011 D A B 00000 10100 Rc faddsx 111011 D A B 00000 10101 Re D 00000 B 00000 10110 Re fresx 111011 D 00000 B 00000 11000 Rc fmulsx 111011 D A 00000 C 11001 Ge D A B C 11100 Re D A B C 11101 Re D A B C 11110 Re D A B C 11111 Re 10 S A ds 00 10 S A ds 01 11 cD 00 A B 0000000000 0 11 D 00000 B 0000001100 Rc 11 D 00000 B 0000001110 11 D 00000 B 0000001111 Rc 11 D A B 00000 10010 Rc 11 D A B 00000 10100 Rc 11 D A B 00000 10101 Rc 11 00000 A B 0000010110 0 11 D 00000 B 00000 10110 Ge 11 D A B C 10111 Re 11 D A 00000 C 11001 Rc 11 D 00000 B 00000 11010 Ge 11 D A B C 11100 Be 11 D A B C 11101 Re 11 D A B C 11110 Re 11 D A
78. LK 1 Fetching is stopped and the second branch waits for the first branch to be completed Note a bl instruction does not have to wait for a branch LK 1 to complete e A be based on CR waiting for resolution due to a CR dependency followed by a be based on CR Fetching is stopped and the second branch waits for the first CR dependency to be resolved 7 4 1 2 Static Branch Prediction Static branch prediction allows software for example compilers to give a hint to the machine hardware about the direction the branch is likely to take When a branch instruction encounters a data dependency the BPU waits for the required condition code to become available Rather than stalling instruction dispatch until the source operand is ready the core predicts the likely path and instructions are fetched and executed along that path When the branch operand becomes available the branch is evaluated If the prediction is correct program flow continues along that path uninterrupted otherwise the processor backs up and program flow resumes along the correct path If the target address of the branch link or count register is modified by an instruction that appears before the branch instruction the BPU waits until the target address is available The core executes through one level of prediction The processor may not predict a branch if a prior branch instruction is still unresolved The number of instructions that can be executed after br
79. Level Cache Management Instruction ss ssssssesssesesseeesseesseesseeesee 3 32 Segment Register Manipulation Instructions s sessssesessesessseressetesseessresseessee 3 32 Translation Lookaside Buffer Management Instructions sssssssseseseeessseesses 3 33 Recommended Simplified Mnemontcs 3 33 Implementation Specific Instructions cee ccescecesneeceeeeeceeeeeceeeeeceeeeeceeeeeceeeeeeseeeees 3 34 Chapter 4 Instruction and Data Cache Operation ees e ATOI EE 4 Instruction and Data Cache Features s cissuisaleisisecesasssceasilebonsecaiedeensaaveraceesdezcuensnceatade 4 EE Ee eene eege eege egenen 4 2 Data Cache Organization EE 4 3 Instruction Cache Organization ege eelere tie 4 4 Memory and Cache hee ee Ee 4 5 Memory Cache Access Attributes WIMG Bue 4 6 Write Through Attribute W scocedscvessraas encaasaenaueatekedeusyandedeasvaeesactdunsonaesustdaanteeuaeuets 4 6 Caching Inhibited Attribute OI 4 7 Memory Coherency Gett biute M csccsssrsesienaneysusceahenereatinenstdageon ey addevdaepenad een cteaseee 4 7 Guarded Attribute G EE 4 7 W I and M Bit Combinations ccc cccccccccceceseseseseseeeseseseseseseseseseseseseseseseeens 4 8 CONES CH SUP DOKL peices cede esis sys EE EEE E E cess ccrecsmcaneues 4 9 MEI Coherency Protocol sicis cicccseaseanaavedeeddaseetasscecesusead a R aR 4 9 MEI State Transitions EE 4 9 MESI Coherency Protocols rsisi anei R a i a iaia 4 10 MESI State Transitions c
80. MESI mode The resulting cache line is marked as exclusive if operating in three state MEI mode If operating in four state MESI mode it is marked as exclusive or shared depending on the shared indication on the CSB The debt instruction is treated as a no op if any of the following apply e It hits in the data cache e The target address is mapped to caching inhibited even if the data cache is disabled e Touch load operations are disabled by HIDO NOPTI e The target address is mapped as guarded e The address translation does not hit in the TLB or BAT mechanism e The address translation does not have load access permission 4 5 2 2 Data Cache Block Touch For Store dcbtst Instruction The debtst instruction like the data cache block touch instruction debt allows software to prefetch a cache block in anticipation of a store operation RWITM The debtst instruction is treated similarly to the debt instruction except that the transaction type RWITM is always used if operating in 4 state MESI mode and the resulting cache line is always marked as exclusive not modified When operating in MEI mode the debtst has exactly the same behavior as debt 4 5 2 3 Data Cache Block Clear To Zero dcbz Instruction If the block containing the byte addressed by the EA is in the data cache all bytes are cleared If memory coherency is required WIMG nn In the debz instruction broadcasts on the bus on a cache miss or hit to a shared block I
81. Machine Status Save Restore Register 1 SRR1 oo eee eee ceeseceececeeeeeceeceeceeeeeceteeeceteeeenaeees 5 8 Critical Interrupt Save Restore Register 0 CSRRO ceeeeeeeeceeseeceeeeceeeeecseeeeceeeeeeneeeeesaes 5 10 Critical Interrupt Save Restore Register 1 CSRR1 oo eee eeecceceeeeeceseeeeeeeeecseeeseeeeesteeeesaes 5 11 PRC RECISUET Eeer 5 11 Machine State Register MSR aa a e A E a NES 5 12 MMU Conceptual Block Diagram 32 Bit Implementations eeeeeeseeeeeeeeseeeesersrrrrrsreseee 6 5 300 Core IMMU Block Dia Sra issin eriein enei e a A E R AES 6 6 e300 Core DMMU Block Diagramiesssiis ccisss isattvssnsetdaseseessaavcccacsaneaveasale ENEE EEGENEN 6 7 Address Translation tege eine i EA E ee E E ee ee 6 9 General Flow of Address Translation Real Addressing Mode and Block 6 11 General Flow of Page and Direct Store Interface Address Translation seeeeeeeeereeeeecee 6 13 Segment Register and TLB Organization cccceesscecssccecssececseeceeseeceesaeeeesaeceeaeceenaeeesaees 6 24 Page Address Translation Flow for 32 Bit Implementations TLB Hit eee 6 26 Primary Page Table Search Conceptual low 6 28 Secondary Page Table Search Flow Conceptual low 6 29 DMISS and IMISS Kette oleae dee BAe Bae ee he 6 32 IDC MP RE ME 6 32 HASH1 and HASH RESISters oeren 6 33 Required Physical Address Register RPA cecccessseeeseeceeseeeeeneeeesaeeeeaeceeueeeeaeeeeaaeeeas 6 33 Flow for Example Software Table
82. Manual Rev 3 12 Freescale Semiconductor Overview than after the data tenure completes thus allowing for greater bus utilization in systems that support it This is sometimes referred to as 1 1 2 level pipelining Typically memory accesses are weakly ordered meaning that sequences of operations including load store string and multiple instructions do not necessarily complete in the order they begin This weak ordering maximizes the efficiency of the bus without sacrificing coherency of the data The core allows read operations to precede store operations except when a dependency exists or in cases where a noncacheable access is performed and provides support for a write operation to proceed a previously queued read data tenure for example allowing a snoop push to be enveloped by the address and data tenures of a read operation Because the processor can dynamically optimize run time ordering of load store traffic overall performance is improved 1 1 7 System Support Functions The e300 core implements several support functions that include power management time base decrementer registers for system timing tasks a JTAG based on IEEE 1149 1 interface hardware debug and a phase locked loop PLL clock multiplier These system support functions are described in the following sections 1 1 7 1 Power Management The e300 core provides four power modes selectable by setting the appropriate control bits in the machine sta
83. Move to Performance Monitor Register mtpmr PMRN rS e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Performance Monitor The following instruction has been added to support performance monitor operations mfpmr mfpmr Move From Performance Monitor Register Integer Unit mfpmr rD PMRN 011111 rD PMRNS5 9 PMRNo4 0o 1 010011100 0 5 6 10 11 15 16 20 21 31 GPR rD lt PMREG PMRN PMRN denotes a performance monitor register as listed in Table 11 1 and Table 11 2 The contents of the designated performance monitor register are placed into GPR rD When MSR PR 1 specifying a performance monitor register that is not implemented and is not privileged PMRN 5 0 results in an illegal instruction exception type program interrupt When MSR PR 1 specifying a performance monitor register that is privileged PMRN 5 1 results in a privileged instruction execution type program interrupt When MSR PR 0 specifying an unimplemented performance monitor register is boundedly undefined Other registers altered None e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Performance Monitor mtpmr mtpmr Move To Performance Monitor Register Integer Unit mtpmr PMRN rS 01 1 1 1 1 rS PMRNS 9 PMRNO 4 O 1 1 10 01 1 10 0 0 5 6 10 11 15 16 20 21 31 PMREG PMRN lt GPR rS PMRN denotes a perfor
84. Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 mftb 011111 D tbr 0101110011 Iwaux 011111 D A B 0101110101 baus 011111 D A B 0101110111 sthx 011111 S A B 0110010111 orex 011111 S A B 0110011100 Rc sradix 011111 S A sh 1100111011 sh Rc slbie 3 011111 00000 00000 B 0110110010 0 ecowx 011111 S A B 0110110110 0 sthux 011111 S A B 0110110111 0 orx 011111 S A B 0110111100 Rc divdux 011111 D A B OE 0111001001 Rc divwux 011111 D A B OE 0111001011 Rc mtspr 4 011111 S spr 0111010011 0 debi 011111 00000 A B 0111010110 0 nandx 011111 S A B 0111011100 Rc divdx 011111 D A B OE 0111101001 Re divwx 011111 D A B OE 0111101011 Re slbia 1 331 011111 00000 00000 00000 0111110010 0 merxr 011111 ef 00 00000 00000 1000000000 0 Iswx 011111 D A B 1000010101 0 Iwbrx 011111 D A B 1000010110 0 lfsx 011111 D A B 1000010111 0 srwx 011111 S A B 1000011000 Rc srdax 011111 S A B 1000011011 Rc tlbbsync 3 011111 00000 00000 00000 1000110110 0 Ifsux 011111 D A B 1000110111 0 mfsr 011111 D 0 SR 00000 1001010011 0 ew Pl 011111 D A NB 1001010101 0 sync 011111 00000 00000 00000 1001010110 0 Ifdx 011111 D A B 1001010111 0 Ifdux 011111 D A B 1001110111 0 mfsrin 011111 D 00000 B 1010010011 0 stswx 011111 S A B 1010010101 0 stwbrx 011111 S A B 1010010110 0 stfsx 011111 S A B 1010010111 0 stfsux 011111 S A B 1010110111 0 e300 Power Architecture Core Family Reference Manual
85. Name Mnemonic Operand Syntax Floating Absolute Value fabs fabs frD frB Floating Move Register fmr fmr frD frB Floating Negate fneg fneg frD frB Floating Negative Absolute Value fnabs fnabs frD frB 3 2 4 3 Load and Store Instructions Load and store instructions are issued and translated in program order however the accesses can occur out of order Synchronizing instructions are provided to enforce strict ordering This section describes the load and store instructions of the e300 core which consist of the following e Integer load instructions e Integer store instructions e Integer load and store with byte reverse instructions e Integer load and store multiple instructions e Integer load and store string instructions e Floating point load instructions e Floating point store instructions e300 Power Architecture Core Family Reference Manual Rev 3 16 Freescale Semiconductor Instruction Set Model 3 2 4 3 1 Self Modifying Code When a processor modifies a memory location that may be contained in the instruction cache software must ensure that memory updates are visible to the instruction fetching mechanism This can be achieved by the following instruction sequence dcbst Jupdate memory sync wait for update icbi remove invalidate copy in instruction cache isync remove copy in own instruction buffer These operations are required because the data cache is a write back cache Since instruct
86. Needed FPR rename registers available Completion queue is not full Instruction is dispatch serialized and completion buffer is empty A dispatch serialized instruction is not currently being executed e Requirements for dispatching from IQ are as follows 7 6 1 3 Instruction in IQO must dispatch Instruction dispatched by QO is not dispatch serialized Needed execution unit is available after dispatch from IQO Needed GPR rename registers are available after dispatch from IQO Needed FPR rename register is available after dispatch from IQO Completion queue is not full after dispatch from IQO Instruction dispatched from IQ1 is not dispatch serialized Completion Unit Resource Requirements The following is a list of resources required to avoid stalls in the completion unit note that the two completion buffers are described as CQO and CQ1 where CQO is the entry at the end of the completion queue e Requirements for completing an instruction from CQO are as follows Instruction in CQO must be finished Instruction in CQO must not follow an unresolved predicted branch Instruction in CQO must not cause an interrupt e Requirements for completing an instruction from CQ are as follows Instruction in CQO must complete in same cycle Instruction in CQ1 must be finished Instruction in CQ1 must not follow an unresolved predicted branch Instruction in CQ1 must not cause an interrupt Instruction in CQ1 must be an integer or load i
87. PLL configuration signals to seven PC5 PC6 New HID2 bits The e300 core has new HID2 bits defined to support instruction fetch bursting IFEB MESI coherency protocol MESI instruction fetch cancels IFEC data cache queue sharing EBQS pipelining extension EBPX additional cache way locking IWLCK and DWLCKk and instruction cache way protection ICWP New PVR register value The processor version register values differ See Table 1 4 for e300 PVR values New IBCR and DBCR bits The e300 core has new IBCR IABRSTAT IABR2STAT and DBCR DABR1 STAT DABR2STAT fields to provide instruction and data address breakpoint status 16 Kbyte four way set associative instruction and data caches Some e300 cores may have different cache sizes than the G2_LE L1 cache parity The e300 core supports parity for both instruction and data caches the G2_LE does not support cache parity MEI or MESI coherency protocols MEI protocol only The e300 supports two coherency protocols MEI and MESI the G2_LE only supports the MEI protocol Instruction cancel extension The e300 instruction cancel mechanism improves utilization of instruction cache by supporting hits under cancels and misses under cancels the G2_LE requires the cancel to complete before new instruction fetches can begin e300 Power Architecture Core Family Reference Manual Rev 3 34 Freescale Semiconduc
88. PowerPC microprocessors built on Power Architecture technology The e300 designs are based on the MPC603e microprocessor This reference manual describes the e300c1 e300c2 and e300c3 configurations Unless otherwise noted the information presented here applies to all the e300 cores The e300c1 has similar functionality to the G2_LE core and any differences in functionality are summarized in Table 1 3 The e300c2 e300c3 have similar functionality to the e300c1 core and any differences regarding functionality are summarized in Section 1 5 Differences Between e300 Cores This book is intended as a companion to the Programming Environments Manual for 32 Bit Implementations of the PowerPC Architecture referred to as the Programming Environments Manual which describes the features common to PowerPC processors and cores and indicates those features that are optional or that may be implemented differently in the design of each processor and core NOTE About the Companion Programming Environments Manual The PowerPC architecture definition is flexible to support a broad range of processors Note that the Programming Environments Manual describes only architecture features for 32 bit implementations Contact your sales representative or visit the website on the inside cover of this manual for a copy of the Programming Environments Manual This reference manual and the Programming Environments Manual distinguish between the three levels
89. Program Interrupts e Illegal instruction An illegal instruction program interrupt is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode fields including PowerPC instructions not implemented in the e300 core These do not include those optional instructions treated as no ops e Privileged instruction A privileged instruction type program interrupt is generated when the execution of a privileged instruction is attempted and the MSR register user privilege bit MSR PR is set In the e300 core this interrupt is generated for mtspr or mfspr with an invalid SPR field if SPR O 1 and MSR PR 1 This may not be true for all processors that implement the PowerPC architecture e Trap A trap type program interrupt is generated when any of the conditions specified in a trap instruction is met e300 Power Architecture Core Family Reference Manual Rev 3 28 Freescale Semiconductor Interrupts and Exceptions 5 5 7 1 IEEE Floating Point Exception Program Interrupts Floating point exceptions are signaled by condition bits set in the floating point status and control register FPSCR They can cause the system floating point enabled interrupt handler to be invoked The e300 core handles all floating point exceptions precisely The e300 core implements the FPSCR as it is defined by the PowerPC architecture for more information about the FPSCR see the Program
90. Reference Manual Rev 3 Freescale Semiconductor Glossary 5 Interrupt A condition encountered by the processor that requires special supervisor level processing Interrupt handler A software routine that executes when an interrupt is taken Normally the interrupt handler corrects the condition that caused the interrupt or performs some other meaningful task that may include aborting the program that caused the interrupt The address for each interrupt handler is identified by an interrupt vector offset defined by the architecture and a prefix selected via the MSR Key bits A set of key bits referred to as Ks and Kp in each segment register and each BAT register The key bits determine whether supervisor or user programs can access a page within that segment or block Kill An operation that causes a cache block to be invalidated without writing any modified data to memory Latency The number of clock cycles necessary to execute an instruction and make ready the results of that execution for a subsequent instruction L2 cache See Secondary cache Least significant bit Isb The bit of least value in an address register field data element or instruction encoding Least significant byte LSB The byte of least value in an address register data element or instruction encoding Little endian A byte ordering method in memory where the address n of a word corresponds to the least significant byte In an addressed memo
91. Rev 3 Freescale Semiconductor 11 Instruction Set Listings Table A 2 Complete Instruction List Sorted by Opcode continued Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 stswi 011111 S A NB 1011010101 0 stfdx 011111 B 1011010111 0 stfdux 011111 S A B 1011110111 Ihbrx 011111 D A B 1100010110 srawx 011111 S A B 1100011000 Rc sradx 011111 S A B 1100011010 Rc srawix 011111 S A SH 1100111000 Rc eieio 011111 00000 00000 00000 1101010110 0 sthbrx 011111 S A B 1110010110 0 extshx 011111 S A 00000 1110011010 Rc extsbx 011111 S A 00000 1110111010 Rc tbid 011111 00000 00000 B 1111010010 0 icbi 011111 00000 A B 1111010110 0 stfiwx 011111 S A B 1111010111 0 extsw 011111 S A 00000 1111011010 Rc up Zi 011111 00000 00000 B 1111110010 0 debz 011111 00000 A B 1111110110 0 wz 100000 D A d Iwzu 100001 D A d Ibz 100010 D A d bau 100011 D A d stw 100100 S A d stwu 100101 S A d stb 100110 S A d stbu 100111 S A d Ihz 101000 D A d bau 101001 D A d bal 101010 D A d Ihau 101011 D A d sth 101100 S A d sthu 101101 S A d Imw 101110 D A d stmw Pl 101111 S A d Wel 110000 D A d Weu 110001 D A d fd 110010 D A d e300 Power Architecture Core Family Reference Manual Rev 3 12 Freescale Semiconductor Instruction Set Listings Table A 2 Complete
92. S A B 24 Rc 31 S A B 794 Rc 31 S A sh 413 sh Re 31 S A B 792 Rc 31 S A SH 824 Rc 31 S A 539 Rc 31 S A B 536 Rc 1 64 bit instruction Name faddx faddsx fdivx fdivsx fmulx fmulsx fresx frsqrtex fsubx fsubsx fselx fsqrtx fsqrtsx Table A 8 Floating Point Arithmetic Instructions 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 63 D A B 00000 21 Re 59 D A B 00000 21 Re 63 D A B 00000 18 Rc 59 D A B 00000 18 Rc 63 D A 00000 C 25 Rc 59 D A 00000 C 25 Rc 59 D 00000 B 00000 24 Rc 63 D 00000 B 00000 26 Rc 63 D A B 00000 20 Rc 59 D A B 00000 20 Rc 63 D A B C 23 Rc 63 D 00000 B 00000 22 Re 59 D 00000 B 00000 22 Re i Optional in the PowerPC architecture e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 17 Instruction Set Listings Table A 9 Floating Point Multiply Add Instructions Name 0 56 7 8 9 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 fmaddx 63 D A B C 29 Rc fmaddsx 59 B C 29 Rc fmsubx 63 D A B C 28 Rc fmsubsx 59 D A B C 28 Rc fnmaddx 63 D A B C 31 Rc fnmaddsx 59 D A B C 31 Rc fnmsubx 63 D A B C 30 Re fnmsubsx 59 D A B C 30 Re Table A 10 Floating Point Rounding and Conversion Instructions Name 0 56 7 8 9 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 fefidx
93. Search Operation eescceseecesneeceseeeceeeeecseeeeeneeeenaes 6 35 Check and Set Rand C Bit EE 6 36 Page Bault Setup Eltere artic aaa das lage as Gaia EE 6 37 Setup for Protection Violation Exceptions eceeeecesecsseceseeeseeceaeceeeeseeeeaeecsaeenaeeeaeessaeee 6 38 Pipelined Execution Unit derer Eed eu 7 3 Instruction Flow Diagram for the e 200 7 4 Instruction Flow Diagram for the e300c2 agteereseegeigetei ete oder desssveviandaten ENEE thes seneccesteess 7 5 Instruction Flow Diagram for the e300c3 ss cchist sc ctadaseceenas saaceven sap ceanaeaeescieaseatesieeauweseaaneae 7 7 e300 Core Processor Pipeline Stages yc sc cecet teninin a ea ani e tenes a dete 7 8 Instruction RO AG aCe TAN ie eae wha Noe laa he eau onesie iaai ate 7 12 Instruction Timing Cache Miss sissuccssajedetssosiaisvensdcntasecesaanas sien ciuegedanasagbadevenndendaseaeeaeadegeces 7 15 Branch Instruction TIMIN Eeer ege EE 7 21 Instruction Timing lInteger Execution in the e300C1 Core ee eeeeceseceesseeeeeteceesteeeeneeeenes 7 22 Instruction Timing Integer Execution in the e300c2 and 300 3 oe eee eeseeeeeeeeeeeenees 7 23 Core Interface STOTAIS EE 8 2 Performance Monitor Global Control Register 0 PMGC0 User Performance Monitor Global Control Register 0 UPMGCO0 e ce eeeeeeeeeeeeeteees 11 3 Local Control A Registers PMLCa0 PMLCa3 User Local Control A Registers OUDPMI Ca UPMIL Cat 11 4 Performance Monitor Counter Re
94. The data cache block invalidate debi instruction can be used to invalidate individual cache blocks 4 10 3 2 Instruction Cache Locking Procedures This section describes the procedures for performing instruction cache locking on the e300 core e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 39 Instruction and Data Cache Operation 4 10 3 2 1 To lock the instruction cache the instruction cache enable bit HIDO ICE bit 16 must be set Enabling the Instruction Cache Enable the data cache This corresponds to setting DCE bit in HIDO bit 17 mfspr rl HIDO ori rl rl 0x8000 sync mtspr HIDO 21 isync 4 10 3 2 2 Address Translation for Instruction Cache Locking Two distinct memory areas must be set up to enable cache locking e The first area is where the code that performs the locking resides and is executed e The second area is where the instructions to be locked reside Both areas of memory must be in locations that are translated by the memory management unit MMU This translation can be performed either with the page table or the block address translation BAT registers For the purposes of the cache locking example in this document two areas of memory are defined using the BAT registers The first area is a 1 Mbyte area in the upper region of memory that contains the code performing the cache locking This area of memory must be caching inhibited for instruction ca
95. These include instructions not defined by the PowerPC architecture In addition for 32 bit implementations instructions that are defined only for 64 bit implementations are considered to be illegal instructions For 64 bit implementations instructions that are defined only for 32 bit implementations are considered to be illegal instructions Implementation A particular processor that conforms to the PowerPC architecture but may differ from other architecture compliant implementations for example in design feature set and implementation of optional features The PowerPC architecture has many different implementations Imprecise interrupt A type of synchronous interrupt that is allowed not to adhere to the precise interrupt model see Precise interrupt The PowerPC architecture allows only floating point exceptions to be handled imprecisely Instruction queue A holding place for instructions fetched from the current instruction stream Integer unit The functional unit in the e300 responsible for executing all integer instructions In order An aspect of an operation that adheres to a sequential model An operation is said to be performed in order if at the time that it is performed it is known to be required by the sequential execution model See Out of order Instruction latency The total number of clock cycles necessary to execute an instruction and make ready the results of that instruction e300 Power Architecture Core Family
96. This cache block holds valid data that is identical to the data at this address in system memory No other cache has this data e Shared Only available if HID2 MESISTATE register bit is set The address block is valid in the cache and in at least one other cache This block is always consistent with system memory That is the shared state is shared unmodified there is no shared modified state e Invalid This cache block does not hold valid data Cache coherency is enforced by on chip bus snooping logic Because the e300 core data cache tags are single ported a simultaneous load store and snoop access represents a resource contention The snoop access is given first access to the tags The load or store then occurs on the clock following the snoop Parity is now integrated into both instruction and data cache memory A machine check interrupt is now taken upon the detection of an instruction or data cache parity error Parity is checked whenever valid data is returned from the instruction or data cache for a cache hit or whenever valid data is read out of the cache for a castout or snoop push operation 1 3 3 3 Instruction and Data Cache Way Locking The e300 core implements instruction and data cache way locking which guarantees that certain memory accesses will hit in the cache This provides deterministic access times for those accesses See Chapter 4 Instruction and Data Cache Operation for more information 1 3 4 Interrupt Mo
97. a programming error Table 5 17 Alignment Interrupt Register Settings Register Setting SRRO Set to the effective address of the instruction that caused the interrupt SRR1 0 15 Cleared 16 31 Loaded from MSR 16 31 MSR POW 0 FP 0 FEI 0 RI 0 TGPR 0 ME CE LE Set to value of ILE ILE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 e300 Power Architecture Core Family Reference Manual Rev 3 26 Freescale Semiconductor Interrupts and Exceptions Table 5 17 Alignment Interrupt Register Settings continued Register Setting DSISR 0 11 Cleared 12 13Cleared Note that these bits can be set by several 64 bit PowerPC instructions that are not supported in the e300 core 14 Cleared 15 16For instructions that use register indirect with index addressing set to bits 29 30 of the instruction For instructions that use register indirect with immediate index addressing cleared 17 For instructions that use register indirect with index addressing set to bit 25 of the instruction For instructions that use register indirect with immediate index addressing set to bit 5 of the instruction 18 21 For instructions that use register indirect with index addressing set to bits 21 24 of the instruction For instructions that use register indirect with immediate index addressing set to bits 1 4 of the instruction 22 26Set to bits 6 10 identifying either the source or destination of the instruct
98. a separate interrupt and interrupt vector for power management the system management interrupt smi The e300 core also contains a decrementer timer that allows it to enter the nap or doze mode for a predetermined period and then return to full power operation through the decrementer interrupt e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Power Management The core provides four power modes selectable by setting the appropriate control bits in the MSR and HIDO The four power modes are described briefly as follows Full power This is the default power state of the core The core is fully powered and the internal functional units are operating at the full processor clock speed If the dynamic power management mode is enabled functional units that are idle will automatically enter a low power state without affecting performance software execution or external hardware Doze All the functional units of the core are disabled except for the time base decrementer registers and the bus snooping logic When the processor is in doze mode an external asynchronous interrupt system management interrupt decrementer interrupt hard or soft reset or machine check input mcp brings the core into the full power state The core in doze mode maintains the phase locked loop PLL in a full power state and locked to the system external clock input sysclk so a transition to the full power state takes only a
99. address IABR CEA OR IABR2 CEA Table 10 4 describes the instruction address breakpoint register settings when an address can match one or the other possible addresses an OR condition This requires both IABR registers to be programmed Table 10 4 Two Address OR Matching Register Field Name Condition Register Field Name Condition IABR CEA IABR2 CEA IABR BE 1 IABR2 BE 1 IBCR CNT 0 IBCR SIG_TYPE OR IBCR CMP1 IBCR CMP2 With two address OR matching a match occurs when the instruction s effective address IABR CEA OR the instruction s effective address IABR2 CEA 3 IABR CEA lt instruction s effective address lt IABR2 CEA Table 10 5 describes the instruction address breakpoint register settings for an address matching inside an address range condition e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Debug Features Table 10 5 Address Matching for Inside Address Range Register Field Name Condition Register Field Name Condition IABR CEA IABR2 CEA IABR BE 1 IABR2 BE 1 IBCR CNT 0 IBCR SIG_TYPE AND IBCR CMP1 gt IBCR CMP2 lt With address matching for an inside address range a match occurs when IABR CEA lt instruction s effective address lt IABR2 CEA 4 Instruction s effective address lt IABR CEA OR instruction s
100. and Exceptions Note that the OEA specifies an additional case that may cause a DSI interrupt when an effective address for a load store or cache operation cannot be translated by the TLBs On the e300 core this condition causes a TLB miss interrupt instead These scenarios are common among all processors that implement the PowerPC architecture Finally the e300 core causes a DSI interrupt when either DABR or DABR2 is enabled and the address of an access matches with the value in the CEA field and the breakpoint is enabled for the type of access read or write in DABR DABR2 See Chapter 10 Debug Features for more information DSI interrupts can be generated by load store instructions and cache control instructions debi dcebz dcbst and debf The e300 core supports the crossing of page boundaries However if the second page has a translation error or protection violation associated with it the e300 core takes the DSI interrupt in the middle of the instruction In this case the instruction is re executed If an stwex instruction has an effective address for which a normal store operation would cause a DSI interrupt the e300 core takes the DSI interrupt without checking for the reservation If the XER indicates that the byte count for an Iswi or stswi instruction is zero a DSI interrupt does not occur regardless of the effective address The condition that caused the interrupt is defined in the DSISR These conditions also
101. and data accesses to memory generated by load and store instructions The PowerPC MMU and interrupt model support demand paged virtual memory Virtual memory management permits execution of programs larger than the size of physical memory demand paged implies that individual pages are loaded into physical memory from system memory only when they are first accessed by an executing program The hashed page table is a variable sized data structure that defines the mapping between virtual page numbers and physical page numbers The page table size is a power of two and its starting address is a multiple of its size The page table contains a number of page table entry groups PTEGs A PTEG contains eight page table entries PTEs of 8 bytes each therefore each PTEG is 64 bytes long PTEG addresses are entry points for table search operations e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 29 Overview Address translations are enabled by setting bits in the MSR MSR IR enables instruction address translations and MSR DR enables data address translations 1 3 5 2 Implementation Specific Memory Management The instruction and data memory management units in the e300 core provide 4 Gbytes of logical address space accessible to supervisor and user programs with a 4 Kbyte page size and 256 Mbyte segment size Block sizes range from 128 Kbytes to 256 Mbytes and are software selectable In addition th
102. and data block address translation IBAT and DBAT arrays each containing eight pairs of BATs an increase from four pairs of each type of BATs in the G2 core This increase provides more flexibility in protecting accesses and providing translation on a segment block or page basis for memory accesses and I O accesses Effective addresses are compared simultaneously with all eight entries in the BAT array during block translation In accordance with the PowerPC architecture if an effective address hits in both the TLB and BAT array the BAT translation takes priority As part of the coherent system bus CSB the e300 core has a 64 bit data bus and a 32 bit address bus During normal operation the e300 core provides a three state modified exclusive and invalid coherency protocol which is a compatible subset of a four state modified exclusive shared invalid MESI protocol However the e300 data cache contains a programmable MESI extension that supports the shared cache coherency state similar to other PowerPC processors Both protocols operate coherently in systems that contain four state caches The core also supports single beat and burst data transfers for memory accesses and supports memory mapped I O operations The true little endian mode is another enhanced capability of the e300 core Unlike the PowerPC little endian mode which manipulates only the address bits no longer supported on the e300 the true little endian mode actually ope
103. and fres execute with latencies of 18 to 33 cycles The fdivs fdiv fres mtfsb0 mtfsb1 mtfsfi mffs and mtfsf instructions block the floating point unit pipeline until they complete execution and thereby inhibit the dispatch of additional floating point instructions Except for the merfs instruction all floating point instructions immediately e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 23 Instruction Timing forward their CR results to the BPU for fast branch resolution without waiting for the instruction to be retired by the completion unit and the CR updated See Table 7 5 for floating point instruction execution timing 7 4 4 Load Store Unit Execution Timing The LSU executes all floating point and integer loads and stores It also executes other instructions that address memory The execution of most load and store instructions is pipelined The LSU has two pipeline stages the first is for effective address calculation and MMU translation and the second is for accessing the physically addressed memory Load and store instructions have a two cycle latency and one cycle throughput Floating point loads or stores are not supported on the e300c2 core If operands are misaligned additional latency may be required either for an alignment interrupt to be taken or for additional bus accesses Load instructions that miss in the cache prevent subsequent cache accesses during the cache line refill
104. and hashed page tables in the generation of 32 bit physical addresses These processors also have a BAT mechanism for mapping large blocks of memory Block sizes range from 128 Kbytes to 256 Mbytes and are software programmable Table 6 1 summarizes all e300 core MMU features including the architectural features of PowerPC MMUs defined by the OEA for 32 bit processors and the implementation specific features provided by the core Table 6 1 MMU Features Summary Architecturally Defined e300 Core Specific Feature Feature Category 932 Address ranges Architecturally defined bytes of effective address 252 bytes of virtual address 232 bytes of physical address Page size Architecturally defined 4 Kbytes Segment size Architecturally defined 256 Mbytes Block address translation Architecturally defined Range of 128 Kbytes 256 Mbytes sizes Implemented with IBAT and DBAT registers in BAT array Memory protection Architecturally defined Segments selectable as no execute Pages selectable as user supervisor and read only Blocks selectable as user supervisor and read only Page history Architecturally defined Reference and change bits defined and maintained Page address translation Architecturally defined Translations stored as PTEs in hashed page tables in memory Page table size determined by mask in SDR1 register TLBs Architecturally defined Instructions fo
105. be restarted without resetting the processor The contents of all latches are frozen within two cycles upon e300 Power Architecture Core Family Reference Manual Rev 3 22 Freescale Semiconductor Interrupts and Exceptions entering the checkstop state so that the state of the processor can be analyzed as an aid in problem determination Note that not all processors that implement the PowerPC architecture provide the same level of error checking The reasons a processor can enter checkstop state are implementation dependent 5 5 3 DSI Interrupt 0x00300 A DSI interrupt occurs when no higher priority interrupt exists and a data memory access cannot be performed The condition that caused the DSI interrupt can be determined by reading the DSISR register a supervisor level SPR SPR18 that can be read by using the mfspr instruction Bit settings are provided in Table 5 15 Table 5 15 also indicates the memory element that is saved to the DAR Table 5 15 DSI Interrupt Register Settings Register Setting Description SRRO Set to the effective address of the instruction that caused the interrupt SRR1 0 15 Cleared 16 31 Loaded with MSR 16 31 MSR POW 0 FP 0 FE1 0 RI 0 TGPR 0 ME CE LE Set to value of ILE LE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 DSISR 0 Cleared 1 Set by the data TLB miss interrupt the translation of an attempted access is not found in the primary hash table entry group HTEG or i
106. because valid cache data always agrees with memory Stores to memory that are in write through mode may cause a decrease in performance Each time a store is performed to memory in write through mode the bus is potentially busy for the extra clock cycles required to update memory therefore load operations that miss the on chip cache must wait while the external store operation completes 7 5 3 Cache Inhibited Accesses Data for a page marked cache inhibited cannot be stored in the on chip cache Areas of the memory map can be cache inhibited by the operating system If a cache inhibited access hits in the on chip cache the corresponding cache line is invalidated If the line is marked modified it is copied back to memory before being invalidated In summary the copy back mode allows both load and store operations to use the on chip cache The write through mode allows load operations to use the on chip cache but store operations cause a memory access and a cache update if the data is already in the cache Lastly the cache inhibited mode causes memory access for both loads and stores e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 25 Instruction Timing 7 6 instruction Scheduling Guidelines The performance of the core can be improved by avoiding resource conflicts and promoting parallel utilization of execution units through efficient instruction scheduling Instruction scheduling on the co
107. bit encodings shown in Table 4 5 Table 4 5 e300c1 PLRU Replacement Way Selection BO If the PLRU bits are eet beren tor 0 0 o wo 0 0 Be 1 w1 0 B1 1 0 w2 0 1 ES 1 w3 1 0 o w4 5 B5 7 Ge 1 B2 1 o w6 1 1 BS 1 w7 For e300c2 and e300c3 the BO bit is always 0 so there are effectively only three PLRU bits B1 B3 and B4 for each set in the cache A way is selected for replacement according to the PLRU bit encodings shown in Table 4 6 Table 4 6 e300c2 PLRU Replacement Way Selection If the PLRU bits are Then the way selected for replacement is B1 0 0 w0 0 BS 1 wi 1 0 w2 1 Si 1 w3 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 23 Instruction and Data Cache Operation The PLRU algorithm is shown graphically in Figure 4 7 replace 1st hitting way w0 or w1 or w2 replace 1st valid way wO or w1 or w2 PLRU bits BO B6 after way lock override a BO 1 B1 0 Bi 1 B2 0 B2 1 B3 1 B4 1 B5 1 B6 1 boob Figure 4 7 PLRU Replacement Algorithm Note BO 0 always taken on e300c2 During power up or hard reset all the valid bits of the ways are cleared and the PLRU bits are cleared to point to way 0 of each set This is also the state of the data or instruction cache after setting their respective flash inv
108. bit instruction Table A 14 Integer Store Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 stb 38 S A d stbu 39 S A d stbux 31 S A B 247 0 stbx 31 S A B 215 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 19 Instruction Set Listings Table A 14 Integer Store Instructions continued sth 44 S A d sthu 45 S A d sthux 31 S A 439 0 sthx 31 S A 407 0 stw 36 S A d stwu 37 S A d stwux 31 S A 183 0 stwx 31 S A 151 0 1 64 bit instruction Table A 15 Integer Load and Store with Byte Reverse Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Ihbrx 31 D A B 790 0 Iwbrx 31 D A B 534 0 sthbrx 31 S A B 918 0 stwbrx 31 S A B 662 0 Table A 16 Integer Load and Store Multiple Instructions Name o 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Imw 1 46 D A d stmw 47 S A d 1 Load and store string or multiple instruction Table A 17 Integer Load and Store String Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Iswi 1 31 D A NB 597 0 Iswx 31 D A B 533 0 stswi 31 S A NB 725 0 stswx 31 S A B 661 0 1 Load and store string or multiple instruction e300 Power
109. cache parity error tea dpe ape copied from MSR 30 for mcp If mcp and tea are asserted simultaneously then SRR1 30 is cleared and the interrupt is not recoverable 31 MSR 31 Copy of MSR 31 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 9 Interrupts and Exceptions The e300 core loads SRR1 with specific bits for handling the three TLB miss interrupts as shown in Table 5 6 Table 5 6 SRR1 Bit Settings for Software Table Search Operations Bits Name Description 0 3 CRFO Copy of condition register field 0 CRO 4 Reserved 5 9 MSR 5 9 Copy of MSR bits 5 9 10 11 Reserved 12 KEY TLB miss protection key 13 I D Instruction data TLB miss 0 DTLB miss 1 ITLB miss 14 WAY Bit 14 indicates which TLB associativity set should be replaced O Seto 1 Seti 15 S L Store load protection instruction 0 Load miss 1 Store miss 16 31 MSR 16 31 Copy of MSR bits 16 31 Note that in some implementations every instruction fetch when MSR IR 1 and every instruction execution requiring address translation when MSR DR 1 may modify SRR1 5 2 1 2 CSRRO and CSRR1 Bit Settings The e300 core also implements the CSRRO and CSRR1 to save state for critical interrupt interrupts only Note that the values saved in CSRRO are the same as those saved in SRRO for all other interrupts and the values saved in CSRR1 are the same as those saved
110. combinational matching and watchpoint are used in the core for production testing Transfer attribute signals These signals provide information about the type of transfer such as the transfer size and whether the transaction is bursted write through or cache inhibited NOTE A bar over a signal name indicates that the signal is active low for example artry_in address retry and ts_in transfer start Active low signals are referred to as asserted active when they are low and negated when they are high Signals that are not active low such as ap_in 0 3 address bus parity signals and tt_in 0 4 transfer type signals are referred to as asserted when they are high and negated when they are low Debug Features Some new debug features are specific to the e300 core Accesses to the debug facilities are available only in supervisor mode by using the mtspr and mfspr instructions The e300 provides the following additional feature in the JTAG debug interface Inclusion of breakpoint status and control pins stopped and ext_halt 1 3 8 1 Breakpoint Signaling The breakpoint signaling provided on the e300 core allows observability of breakpoint matches external to the core The iabr iabr2 dabr and dabr2 breakpoint signals are asserted for at least one bus clock cycle when the respective breakpoint occurs The status of the run state of the e300 core is indicated by the stopped pin An asynchronous external breakpoint can be a
111. contamination and pipeline effects by allowing a higher event count than is possible with a single counter Chaining two counters together effectively adds 32 bits to a counter register where the first counter s overflow event acts like a carry out feeding the second counter By defining the event of interest to be another PMC s overflow generation the chained counter increments each time the first counter rolls over to zero Multiple counters may be chained together Because the entire chained value cannot be read in a single instruction an overflow may occur between counter reads producing an inaccurate value A sequence like the following is necessary to read the complete chained value when it spans multiple counters and the counters are not frozen The example shown is for a two counter case loop mfpmr Rx pmctrl load from upper counter mfpmr Ry pmctro0 load from lower counter mfpmr Rz pmctrl load from upper counter cmp cr0 0 Rz Rx sS if old new be 4 2 loop loop if carry occurred between reads The comparison and loop are necessary to ensure that a consistent set of values has been obtained The above sequence is not necessary if the counters are frozen 11 5 2 Event Selection Event selection is specified through the PMLCan registers described in Section 11 2 3 Local Control A Registers PMLCa0 PMLCa3 The event select fields in PMLCan EVENT are described in Table 11 9 which lists
112. controls all performance monitor counters PMGCO PMR400 Access PMGCO Supervisor only UPMGCO PMR384 UPMGCO Supervisor user read only 0 1 2 3 18 19 20 21 22 23 24 31 R FA PMIE FCECE Wi C R FA PMIE FCECE TBSEL TBEE Wi C Reset All zeros Figure 11 1 Performance Monitor Global Control Register 0 PMGCO User Performance Monitor Global Control Register 0 UPMGCO PMGCO is cleared by a hard reset Reading this register does not change its contents Table 11 3 describes PMGCO fields Table 11 3 PMGCO Field Descriptions Bits Name Description 0 FAC Freeze all counters When FAC is set by hardware or software PMLCx FC maintains its current value until it is changed by software 0 The PMCs are incremented if permitted by other PM control bits 1 The PMCs are not incremented 1 DMIE Performance monitor interrupt enable 0 Performance monitor interrupts are disabled 1 Performance monitor interrupts are enabled and occur when an enabled condition or event occurs at which time PMGCO PMIE is cleared Software can clear PMIE to prevent performance monitor interrupts Performance monitor interrupts are caused by time base events or PMCx overflow 2 FCECE Freeze counters on enabled condition or event 0 The PMCs can be incremented if permitted by other PM control bits 1 The PMCs can be incremented if permitted by other PM control bits only unti
113. could create a situation where personal injury or death may occur Should Buyer purchase or use Freescale Semiconductor products for any such unintended or unauthorized application Buyer shall indemnify and hold Freescale Semiconductor and its officers employees subsidiaries affiliates and distributors harmless against all claims costs damages and expenses and reasonable attorney fees arising out of directly or indirectly any claim of personal injury or death associated with such unintended or unauthorized use even if such claim alleges that Freescale Semiconductor was negligent regarding the design or manufacture of the part Freescale and the Freescale logo are trademarks of Freescale Semiconductor Inc The Power Architecture and Power org word marks and the Power and Power org logos and related marks are trademarks and service marks licensed by Power org The PowerPC name is a trademark of IBM Corp and is used under license All other product or service names are the property of their respective owners Freescale Semiconductor Inc 2006 All rights reserved 2 freescale semiconductor e Te fe Ee Contents Paragraph Number Title About This Book e EE Ree E e Re General Informati n E Related Uoeumemtatog ageet euer ANE ee EENEG Conventions tg SE eebe Ee eet Acronyms and Abbreviations ccccssesseeseccesseesoneeccesseecessenconteccensees Chapter 1 Overview 1 1 Eege iscsi esta tected acn
114. data in the cache e The complete write back pipeline stage maintains the correct architectural machine state and transfers the contents of the rename registers to the GPRs and FPRs as instructions are retired If the completion logic detects an instruction causing an interrupt all subsequent instructions are canceled their execution results in rename registers are discarded and instructions are fetched from the correct instruction stream A superscalar processor core issues multiple independent instructions into multiple pipelines allowing instructions to execute in parallel The e300c1 core has independent execution units for integer instructions floating point instructions branch instructions load store instructions and system register instructions The e300c2 does not include a floating point unit The e300c2 and e300c3 provide two IUs which improves the throughput of integer instructions The e300c2 and e300c3 provide two integer units for greater integer instruction throughput along with enhanced multipliers in each IU that reduce the multiply instruction latency to a maximum of two cycles The IU and the FPU each have dedicated register files for maintaining operands GPRs and FPRs respectively allowing integer and floating point calculations to occur simultaneously without interference The e300c2 does not include floating point registers The core provides support for single cycle store and it provides an adder comparator in the s
115. either sreset or hreset Machine check 00200 A machine check is caused by the assertion of the tea signal during a data bus transaction assertion of mcp an address or data parity error or an instruction or data cache parity error Note that when a machine check occurs the e300 has SRR1 register values that are different from the G2 G2_LE cores because the e300 core supports cache parity See Table 5 14 for more information DSI 00300 The cause of a DSI interrupt can be determined by the bit settings in the DSISR listed as follows 1 Setif the translation of an attempted access is not found in the primary hash table entry group HTEG or in the rehashed secondary HTEG or in the range of a DBAT register otherwise cleared 4 Setif a memory access is not permitted by the page or DBAT protection mechanism otherwise cleared Set for a store operation and cleared for a load operation 9 Setif a data address breakpoint interrupt occurs when the data 0 28 in the DABR or DABR2 matches the next data access load or store instruction to complete in the completion unit The different breakpoints are enabled as follows e Write breakpoints enabled when DABR 30 is set e Read breakpoints enabled when DABR 31 is set O e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Interrupts and Exceptions Table 5 2 Interrupts and Exception Conditions continued Interrupt Type Vector O
116. encodings for the selectable events to be monitored Table 11 9 establishes a correlation between each counter events to be traced and the pattern required for the desired selection The Spec Nonspec column indicates whether the event count includes any occurrences due to processing that was not architecturally required by the PowerPC sequential execution model speculative processing e Speculative counts include speculative instructions that were later flushed e Nonspeculative counts do not include speculative operations which are flushed Table 11 8 describes how event types are indicated in Table 11 9 Table 11 8 Event Types Event Type Label Description Reference Ref Shared across counters PMCO PMC3 Applicable to most microprocessors Common Com Shared across counters PMCO PMC3 Counter specific C O 3 Counted only on one or more specific counters The notation indicates the counter to which an event is assigned For example an event assigned to counter PMC2 is shown as C2 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 11 Performance Monitor Table 11 9 describes performance monitor events Pipeline events in Table 11 9 are defined in Instruction timing Table 11 9 Performance Monitor Event Selection Number Event Speci Count Description Nonspec
117. in the MSR for critical interrupts CSRRO and CSRR1 have unique SPR numbers as described in Chapter 2 Register Model Figure 5 3 shows the format of CSRRO SPR 58 Access Supervisor read write 0 29 30 31 R CSRRO W Reset All zeros Figure 5 3 Critical Interrupt Save Restore Register 0 CSRRO When a critical interrupt occurs CSRRO is set to point to an instruction such that all prior instructions have completed execution and no subsequent instruction has begun execution When an rfci instruction is executed the contents of CSRRO are copied to the next instruction address the 32 bit address of the next instruction to be executed Figure 5 4 shows the format of CSRR1 e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Interrupts and Exceptions SPR 59 Access Supervisor read write 2 31 R CSRR1 W Reset All zeros Figure 5 4 Critical Interrupt Save Restore Register 1 CSRR1 When an interrupt occurs CSRR1 0 31 are loaded with the values of MSR 0 31 which are placed in corresponding CSRR1 bit positions When rfci executes MSR 0 31 are loaded from CSRR1 0 31 5 2 1 3 SPRGO SPRG7 The e300 core provides eight SPRG SPRG4 SPRG7 registers for general operating system use such as performing a fast state save or for supporting multiprocessor implementations SPRGO SPRG7 have unique SPR numbers as described in Chapter 2 Register Model The formats of SP
118. including the following e Whether the branch requires prediction e Whether the branch is predicted as taken or not taken e Whether the branch is taken e Whether the target instruction stream is in the on chip cache e Whether the prediction is correct 7 4 1 1 Branch Folding When a branch instruction is encountered by the fetcher the BPU immediately tries to pull that instruction out of the instruction stream and resolve it When the BPU removes the branch instruction from the stream the subsequent instruction is shifted down to take the place of the removed branch instruction This technique is called branch folding Often it eliminates the penalties of flow control instructions because instruction execution proceeds as though the branch were never there If the folded branch instruction changes program flow the branch is said to be taken the BPU immediately requests the instructions at the new target from the on chip cache In most cases the new instructions arrive in the IQ before any bubbles are introduced into the execution units If the folded branch does not change program flow the branch is not taken the branch instruction is already removed and execution continues as if there were never a branch in the original sequence When a conditional branch cannot be resolved due to a CR data dependency the branch is executed by means of static branch prediction and instruction fetching proceeds down the predicted path If the predicti
119. increment For example if PMC2 is selected to count PMC2 overflow events PMC2 does not increment e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Performance Monitor e300 Power Architecture Core Family Reference Manual Rev 3 14 Freescale Semiconductor Appendix A Instruction Set Listings This appendix lists the e300 core microprocessor s instruction set as well as the additional PowerPC instructions not implemented in the e300 core Instructions are sorted by mnemonic opcode function and form Also included is a quick reference table that contains general information such as the architecture level privilege level and form and indicates if the instruction is 64 bit and optional Note that split fields representing the concatenation of sequences from left to right are shown in lowercase For more information refer to Chapter 8 Instruction Set in the Programming Environments Manual The following key applies to the tables in this appendix Key Reserved Bits Instruction Not Implemented in the e300 core A 1 Instructions Sorted by Mnemonic Table A 1 lists the instructions implemented in the PowerPC architecture in alphabetical order by mnemonic Table A 1 Complete Instruction List Sorted by Mnemonic Name o 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 addx 31 D A B OE
120. initiated when a dcbz instruction is executed When snooped by the core the addressed cache block is invalidated if in the exclusive or shared state or flushed to memory and invalidated if in the modified state and any associated reservation is canceled Write with kill MEI MESI In a write with kill operation the core snoops the cache for a copy of the addressed block If one is found an additional snoop action is initiated internally and the cache block is forced to the invalid state killing modified data that may have been in the block Any reservation associated with the block is also canceled There is an exception to this rule if there is any pending single beat write atomic write through stwex in BIU the core treats the write with kill as if it were a write with flush bus transaction Read Read atomic MEI The read operation is used by most single beat read and burst read operations on the bus All burst reads observed on the bus are snooped as if they were writes RWITM causing the addressed cache block to be flushed A read on the bus with the gbi signal asserted causes the following responses e f the addressed block in the cache is in the invalid state the core takes no action e If the addressed block in the cache is in the exclusive state the block is invalidated e If the addressed block in the cache is in the modified state the block is flushed to memory and the block is invalidated e Ifthe snooped tran
121. instructions are not aligned e The instruction is Iswi Iswx stswi stswx and the core is in little endian mode Note that PowerPC little endian mode is not supported on the e300 core e The operand of debz is in memory that is write through required or caching inhibited Program 00700 Caused by one of the following exception conditions which correspond to bit settings in SRR1 and arise during execution of an instruction Floating point enabled exception A floating point enabled exception condition is generated when the following condition is met MSR FEO MSR FE1 and FPSCR FEX is 1 e FPSCRI FEX is set by the execution of a floating point instruction that causes an enabled exception or by the execution of one of the Move to FPSCR instructions that results in both an exception condition bit and its corresponding enable bit being set in the FPSCR Illegal instruction An illegal instruction program interrupt is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode fields including PowerPC instructions not implemented in the core or when execution of an optional instruction not provided in the core is attempted these do not include those optional instructions that are treated as no ops e Privileged instruction A privileged instruction program interrupt is generated when the execution of a privileged instruction is attempted and the MSR r
122. likely to take Store Queue Holds store operations that have not been committed to memory resulting from completed or retired instructions Superscalar A superscalar processor is one that can dispatch multiple instructions concurrently from a conventional linear instruction stream In a superscalar implementation multiple instructions can be in the same stage at the same time Supervisor mode The privileged operation state of a processor In supervisor mode software typically the operating system can access all control registers and can access the supervisor memory space among other privileged operations Synchronization A process to ensure that operations occur strictly in order See Context synchronization and Execution synchronization e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Glossary 11 Synchronous interrupt An interrupt that is generated by the execution of a particular instruction or instruction sequence There are two types of synchronous interrupts precise and imprecise System memory The physical memory available to a processor Tenure The period of bus mastership For the e300 there can be separate address bus tenures and data bus tenures A tenure consists of three phases arbitration transfer and termination TLB translation lookaside buffer A cache that holds recently used page table entries Throughput The measure of the number of instructions that are
123. locking the instruction cache 1 1 2 level bus pipelining 1 level bus pipelining For the e300 a new transaction can complete an address tenure when the previous transaction has been granted the data bus for the G2_LE a new transaction must wait until the previous data tenure has completed before completing its address tenure PowerPC little endian not supported PowerPC little endian supported PowerPC little endian will not be supported in the e300 core although true little endian will be fully supported Data retry mode removed Data retry mode available drtry and drtrymode will no longer be supported on the e300 and future versions External control instructions removed External control instructions available The eciwx and ecowx instruction pair will not be supported on the e300 core These are optional instructions in the PowerPC architecture Reduced pin mode removed Reduced pin mode available Reduced pinout mode and the signal redpinmode will not be supported in the e300 core e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 35 Overview 1 5 Table 1 4 describes the differences between the e300 cores Differences Between e300 Cores Table 1 4 Differences Between e300 Cores Cache Sizes Floating Point Integer Units enhanced Performance PVR Multipliers Monitor e300c1 32Kbyte 8 way
124. lwzx rD rA rB Load Word and Zero with Update lwzu rD d rA Load Word and Zero with Update Indexed lwzux rD rA rB 3 2 4 3 4 Integer Store Instructions For integer store instructions the contents of rS are stored into the byte half word word or double word in memory addressed by the effective address Many store instructions have an update form in which rA is updated with the EA For these forms the following rules apply e IfrA 0 the EA is placed into rA e IfrS rA the contents of rS are copied to the target memory element then the generated EA is placed into rA rS The core defines store with update instructions with rA 0 and integer store instructions with the CR update option enabled Rc field bit 31 in the instruction encoding 1 to be invalid forms Table 3 15 provides a list of the integer store instructions for the core Table 3 15 Integer Store Instructions Name Mnemonic Operand Syntax Store Byte stb rS d rA Store Byte Indexed stbx rS rA rB Store Byte with Update stbu rS d rA Store Byte with Update Indexed stbux rS rA rB Store Half Word sth rS d rA Store Half Word Indexed sthx rS rA rB Store Half Word with Update sthu rS d rA Store Half Word with Update Indexed sthux rS rA rB Store Word stw rS d rA Store Word Indexed Lens Ing O e300 Power Architecture Core Family Reference Manual Rev 3 18 Freescale Semiconductor Instruction Set
125. manual refer to the world wide web at http www freescale com A list of major differences between the G2 core the G2_LE core and the e300 core configurations are provided in Section 1 4 Differences Between Cores Furthermore the minor differences between cores are documented by footnotes throughout this book Audience This manual is intended to be used as a reference for many semiconductor products targeting a range of markets including automotive communication consumer networking and computer peripherals It is intended for system software and hardware developers and applications programmers who want to develop products using the cores It is assumed that the reader understands operating systems core system design and details of the PowerPC architecture Organization Following is a summary and a brief description of the major sections of this manual e Chapter 1 Overview is useful for readers who want a general understanding of e300 features and functions and the differences between the e300 and G2 cores It generally describes the register set instruction set and addressing modes cache model interrupt model memory management model instruction timing system support interface and debug features for the e300 core This chapter indicates which features are defined by the PowerPC architecture and which ones are e300 specific e Chapter 2 Register Model provides a brief synopsis of the registers implemente
126. match type conditions for IABR and IABR2 Note that IABR and IABR2 must be enabled before the effects of IBCR are realized SPR 309 Access Supervisor read write 0 5 6 7 8 9 10 11 12 13 14 15 R IABR IABR2 SIG_ w gt STAT STAT SE SS gt Type PNS Reset All zeros Figure 2 20 IBCR Register Table 2 14 describes the IBCR fields Table 2 14 Instruction Address Breakpoint Control Registers IBCR Bits Name Description 0 5 Reserved 6 IABRSTAT IABR status 0 Match on IABR has not occurred 1 Match on IABR has occurred e300 Power Architecture Core Family Reference Manual Rev 3 24 Freescale Semiconductor Register Model Table 2 14 Instruction Address Breakpoint Control Registers IBCR continued Bits Name Description 7 IABR2STAT IABR2 status O Match on IABR2 has not occurred 1 Match on IABR2 has occurred 8 9 CMP IABR breakpoint compare type 00 Match if instruction s EA equals IABR CEA 01 Reserved 10 Match if instruction s EA is less than IABR CEA 11 Match if instruction s EA is greater than or equal to IABR CEA 10 11 CMP2 IABR2 breakpoint compare type 00 Match if instruction s EA equals IABR2 CEA 01 Reserved 10 Match if instruction s EA less than IABR2 CEA 11 Match if instruction s EA greater than or equal to IABR2 CEA 12 Reserved 13 Reserved 14 SIG_TYPE Combinational signal type
127. maximum effective address the memory operand is considered to wrap around from the maximum effective address through effective address 0 as described in the following paragraphs Effective address computations for both data and instruction accesses use 32 bit unsigned binary arithmetic A carry from bit 0 is ignored Load and store operations have three categories of effective address generation e Register indirect with immediate index mode e Register indirect with index mode e Register indirect mode Section 3 2 4 3 2 Integer Load and Store Address Generation describes effective address generation for load and store operations Branch instructions have three categories of effective address generation e Immediate e Link register indirect e Count register indirect Section 3 2 4 4 1 Branch Instruction Address Calculation describes branch instruction effective address generation 3 2 2 4 Synchronization The synchronization described in this section refers to the state of the core performing the synchronization 3 2 2 4 1 Context Synchronization The System Call sc and Return from Interrupt rfi instructions perform context synchronization by allowing previously issued instructions to complete before performing a change in context Execution of one of these instructions ensures the following e No higher priority interrupt exists sc e All previous instructions have completed to a point where they can no longer cau
128. may be bytes half words words or double words or for the load store multiple and load store string instructions a sequence of bytes or words The address of a memory operand is the address of its first byte that is of its lowest numbered byte Operand length is implicit for each instruction The PowerPC architecture supports both big and little endian byte ordering The default byte and bit ordering is big endian See Section 3 1 2 Byte Ordering in the Programming Environments Manual for more information about big and little endian byte ordering The operand of a single register memory access instruction has a natural alignment boundary equal to the operand length In other words the natural address of an operand is an integral multiple of the operand length A memory operand is said to be aligned if it is aligned at its natural boundary otherwise it is e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Instruction Set Model misaligned For a detailed discussion about memory operands see Chapter 3 Operand Conventions in the Programming Environments Manual 3 2 2 3 Effective Address Calculation An effective address EA is the 32 bit sum computed by the processor core when executing a memory access or branch instruction or when fetching the next sequential instruction For a memory access instruction if the sum of the effective address and the operand length exceeds the
129. mechanism or for accesses that correspond to direct store interface T 1 segments Furthermore R and C bits are maintained only for accesses made while address translation is enabled MSR IR 1 or MSR DR 1 In the e300 core the reference and change bits are updated as follows e For TLB hits the C bit is updated according to Table 6 7 e For TLB misses when a table search operation is in progress to locate a PTE the R and C bits are updated set if required to reflect the status of the page based on this access Table 6 7 shows that the status of the C bit in the TLB entry in the case of a TLB hit is what causes the processor to update the C bit in the PTE the R bit is assumed to be set in the page tables if there is a TLB hit Therefore when software clears the R and C bits in the page tables in memory it must invalidate the TLB entries associated with the pages whose reference and change bits were cleared Table 6 7 Table Search Operations to Update History Bits TLB Hit Case R and C Bits in z TLB Entry Processor Action 00 Combination does not occur 01 Combination does not occur 10 Read No special action Write Table search operation required to update C Causes a data TLB miss on store interrupt 11 No special action for read or write The core causes the R bit to be set for the execution of the debt or dcbtst instruction to that page by causing a TLB miss interrupt to load the TLB entry
130. mtfsf mtfsb0 and mtfsb1 instructions can alter FX explicitly This is a sticky bit 1 FEX Floating point enabled exception summary Signals the occurrence of any enabled exception conditions It is the logical OR of all the floating point exception bits masked by their respective enable bits FEX VX amp VE OX amp OE UX amp UE ZX amp ZE XX amp SEI The merfs mifsf mtfsfi mtfsb0 and mtfsb1 instructions cannot alter FPSCR FEX explicitly This is not a sticky bit 2 VX Floating point invalid operation exception summary This bit signals the occurrence of any invalid operation exception It is the logical OR of all of the invalid operation exception bits The merfs mtfsf mtfsfi mtfsb0 and mtfsb1 instructions cannot alter FPSCR VX explicitly This is not a sticky bit 3 OX Floating point overflow exception This is a sticky bit 4 UX Floating point underflow exception This is a sticky bit 5 ZX Floating point zero divide exception This is a sticky bit 6 XX Floating point inexact exception This is a sticky bit XX is the sticky version of FPSCR FI A given instruction sets XX as follows e If the instruction affects FPSCR FI the new value of FRSCR XX is obtained by logically ORing the old value of FPSCR XX with the new value of FPSCRIFI e If the instruction does not affect FPSCR FI the value of FRSCR XX is unchanged 7 VXSNAN Floating point invalid operation exception for SNaN This is a sticky bit
131. of the TLB entry selected for replacement by the LRU algorithm The software can change this value effectively overriding the replacement algorithm The SRRI KEY bit is used by the table search software to determine if there is a protection violation associated with the access useful on data write misses for determining if the C bit should be updated in the table Table 6 10 summarizes the SRR1 bits updated whenever one of the three TLB miss interrupts occurs Table 6 10 Implementation Specific SRR1 Bits Bits Name Function 0 3 CRFO Condition register field 0 bits 12 KEY Key for TLB miss either Ks or Kp from segment register depending on whether the access is a user or supervisor access 13 DI Set if instruction TLB miss 14 WAY Next TLB set to be replaced set per LRU 15 GIL Set if data TLB miss was for a load instruction The key bit saved in SRR1 is derived as follows Select KEY from segment register If MSR PR 0 KEY Ks If MSR PR 1 KEY Kp The rest of this section describes the format of the implementation specific SPRs used by the TLB miss interrupt handlers These registers can be accessed by supervisor level instructions only Because DMISS IMISS DCMP ICMP HASH1 HASH2 and RPA are used to access the translation tables for software table search operations they should only be accessed when address translation is disabled MSR IR 0 and MSR DR 0 Note
132. of this bit is to mask out machine check interrupts caused by assertion of mcp similar to how MSR EE can mask external interrupts 0 Masks mcp Asserting mcp does not generate a machine check interrupt or a checkstop 1 Asserting mcp causes checkstop if MSR ME 0 or a machine check interrupt if ME 1 1 ECPE Enable cache parity errors 0 Disables instruction and data cache parity error reporting 1 Allows a detected cache parity error to cause a machine check interrupt if MSR ME 1 ora checkstop if MSR ME 0 2 EBA Enable ap_in 0 3 and ape for address parity checking 0 Disables address parity checking during a snoop operation 1 Allows an address parity error during snoop operations to cause a checkstop if MSR ME 0 ora machine check interrupt if MSR ME 1 3 EBD Enable dpe for data parity checking 0 Disables data parity checking 1 Allows a data parity error during reads to cause a checkstop if MSR ME 0 or a machine check interrupt if MSR ME 1 4 SBCLK _ clk_out output enable Used in conjunction with HIDO ECLK and hreset to configure ck out 5 Reserved should be cleared 6 ECLK clk_out output enable Used in conjunction with HIDO SBCLK and the hreset signal to configure ck out 7 PAR Disable precharge of artry_out 0 Precharge of artry_out enabled 1 Alters bus protocol slightly by preventing the processor from driving artry_out to high negated state If this is done the integrated device must restore the signals to the high state
133. one device uses data stored in a page marked as copy back snooping must be enabled to allow copy back operations and cache invalidations of modified data The e300 core implements snooping hardware to prevent other devices from accessing invalid data When bus snooping is enabled depending on the device integration the processor can monitor the transactions of the other devices For example if another device accesses a memory location and its memory coherent M bit is set and the core on chip cache has a modified value for that address the processor preempts the bus transaction and updates memory with the cache data If the cache contents associated with the snooped address are unmodified the core invalidates the cache block The other device can then attempt an access to the updated address See Chapter 4 Instruction and Data Cache Operation Copy back mode provides complete cache memory coherency as well as maximizing available external bus bandwidth 7 5 2 Write Through Mode Store operations to memory in write through mode always update memory as well as the on chip cache on cache hits Write through mode is used when the data in the cache must always agree with external memory for example video memory when shared global data may be used frequently or when allocation of a cache line on a cache miss is undesirable Automatic copy back of cached data is not performed if that data is from a memory page marked as write through mode
134. operation 0 Modified PowerPC little endian mode not supported in the e300 core 1 True little endian mode when MSR LE 1 Changing the value of this bit during normal operation is not recommended IFEB Instruction fetch burst extension This bit enables the instruction fetch burst extension 0 Instruction fetch burst extension disabled 1 Instruction fetch burst extension enabled Reserved should be cleared e300 Power Architecture Core Family Reference Manual Rev 3 16 Freescale Semiconductor Register Model Table 2 8 e300 HID2 Field Descriptions continued Bits Name Description 7 MESISTATE MESI state enable This bit enables the four state MESI cache coherency protocol 0 MESI disabled The data cache uses a three state MEI coherency protocol 1 MESI enabled The data cache uses a four state MESI protocol 8 IFEC Instruction fetch cancel extension This bit enables the instruction fetch cancel extension 0 Instruction fetch cancel extension disabled 1 Instruction fetch cancel extension enabled 9 EBQS Enable BIU queue sharing This bit enables data cache queue sharing 0 Data cache queue sharing disabled 1 Data cache queue sharing enabled 10 EBPX Enable BIU pipeline extension This bit enables the bus interface unit pipeline extension 0 BIU pipeline extension disabled 1 level pipeline 1 BIU pipeline extension enabled 1 1 2 level pipeline 11 12 kg Reser
135. prior to page table search operations The LSU calculates effective addresses for data loads and stores performs data alignment to and from cache memory and provides the sequencing for load and store string and multiple word instructions The instruction unit calculates effective addresses for instruction fetching After an EA is generated its higher order bits are translated by the appropriate MMU into physical address bits The lower order EA bits are the same on the physical address which are directed to the on chip cache and formed the index into a four way set associative tag array After translating the address the MMU passes the higher order physical address bits to the cache and the cache lookup completes For caching inhibited accesses or accesses that miss in the cache the untranslated lower order address bits are concatenated with the translated higher order address bits the resulting 32 bit physical address is then used by the memory unit and the core interface to access external memory The MMU also directs the address translation and enforces the protection hierarchy programmed by the operating system in relation to the supervisor user privilege level of the access and in relation to whether the access is a load or store For instruction fetches the IMMU looks for the address in the ITLB and in the IBAT array If an address hits both the IBAT array translation is used Data accesses cause a lookup in the DTLB and DBAT array In
136. processed per clock cycle Transaction A complete exchange between two bus devices A transaction is typically comprised of an address tenure and one or more data tenures which may overlap or occur separately from the address tenure A transaction may be minimally comprised of an address tenure only Transfer termination Signal that refers to both signals that acknowledge the transfer of individual beats of both single beat transfer and individual beats of a burst transfer and to signals that mark the end of the tenure UISA user instruction set architecture The level of the architecture to which user level software should conform The UISA defines the base user level instruction set user level registers data types floating point memory conventions and interrupt model as seen by user programs and the memory and programming models Underflow A condition that occurs during arithmetic operations when the result cannot be represented accurately in the destination register For example underflow can happen if two floating point fractions are multiplied and the result requires a smaller exponent and or mantissa than the single precision format can provide In other words the result is too small to be represented accurately User mode The operating state of a processor used typically by application software In user mode software can access only certain control registers and can access only user memory space No privileged operations
137. reset caused by the assertion of sreset This interrupt takes priority over any other pending interrupts except nonrecoverable interrupts listed above This interrupt is taken immediately when a recoverable state is reached Asynchronous maskable recoverable system management interrupt critical interrupt external interrupt decrementer interrupt Before handling this type of interrupt the next instruction in program order must complete or except If this action causes another type of interrupt that interrupt is taken and the asynchronous maskable recoverable interrupt remains pending Once an instruction can complete without causing an interrupt further instruction completion is halted while the interrupt not taken remains pending The interrupt is taken when a recoverable state is reached Instruction fetch ITLB ISI When this type of interrupt is detected dispatch is halted and the current instruction stream is allowed to drain If completing any instructions in this stream causes an interrupt that interrupt is taken and the instruction fetch interrupt is forgotten Otherwise as soon as the machine is empty and a recoverable state is reached the instruction fetch interrupt is taken Instruction dispatch execution program DSI alignment emulation trap system call DTLB miss on load or store LABR This type of interrupt is determined at dispatch or execution of an instruction The interrupt remains pending until all instructi
138. state See Section 2 2 14 Instruction Address Breakpoint Registers ABR and IABR2 for bit descriptions e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Debug Features 10 1 2 Instructional Address Control Register IBCR IBCR is a supervisor level SPR It controls the compare and match type conditions for ABR and IABR2 Note that IABR and IABR2 must be enabled before the effects of IBCR are realized The e300 includes additional bits in IBCR 6 7 that contain the status of whether an instruction address breakpoint has matched See Section 2 2 15 Instruction Address Breakpoint Control Register IBCR for bit descriptions 10 1 3 Data Address Breakpoint Registers DABR DABR2 The DABR and DABR2 registers are used to cause a breakpoint interrupt if the specified address is encountered for a data access DABR CEA and DABR2 CEA hold an effective address to which each address of a data access is compared The breakpoint translation bit of DABR is also compared with MSR DR to check for a complete match A match occurs when MSR DR DABR BT The data address write and data address read interrupts are enabled by setting DABR WBE RBE and DABR2 WBE RBE The data access tagged with the match does not complete before the breakpoint interrupt is taken The DSI interrupt 0x00300 occurs when there is a data address breakpoint match The DSI interrupt is taken before the load or store i
139. subsequent icbt A sync instruction must follow an icbt instruction to ensure all cache load operations are completed An isync instruction must also follow an icbt if the newly touched and loaded instructions is expected to be fetched immediately after the icbt is executed 4 5 2 8 Instruction Cache Block Invalidate icbi Instruction The icbi instruction unconditionally invalidates all ways of the target cache set that is icbi invalidates by tag index only No address comparison is performed to check for a cache hit The icbi instruction does not broadcast to the bus If the instruction cache way protect bit HID2 ICWP is set only the non locked ways of the instruction cache set are invalidated by the icbi instruction An icbi instruction should always be followed by a syne and an isync instruction This ensures that the effects of the icbi are seen by the instruction fetches following the icbi itself For self modifying code the following sequence should be used to synchronize the instruction stream 1 dcbst push new code from the data cache out to memory sync wait for the dcbst to complete icbi invalidate the old instruction cache entry in this processor sync wait for the icbi to complete its bus operation Do PS isync re sync this processor s instruction fetch The second sync instruction ensures completion of all prior icbi instructions Note that the second syne instruction is not shown in Section 5 1 5 2 I
140. sync rfi rfci or mtmsr instructions MSR SE can be set by using mtmsr or by setting the SRRO bit corresponding to MSR SE before returning from an interrupt If MSR SE is set by restoring SRRO to the MSR on the return from an interrupt single stepping is enabled and one instruction is executed followed by a trace interrupt A typical software debugging procedure is to set an instruction address breakpoint at the instruction address to be single stepped When the IABR interrupt is taken the interrupt routine disables the instruction address breakpoint and sets SRRO to set the MSR SE on the rfi The trace interrupt is then taken upon the completion of the first instruction after return from the IABR interrupt For any interrupt the value of MSR is saved in SRRO The value of MSR SE is automatically cleared within the interrupt handler disabling single stepping while the trace interrupt handler is executed In this typical case the trace interrupt handler can then examine the results of the execution of the instruction in question The trace interrupt handler can then clear the appropriate bit in SRRO to disable single stepping the bit in SRRO that will cause MSR SE 0 on the rfi if no more single stepping is needed Single stepping skips isync sync rfi rfci and branch instructions because these instructions do not enter the instruction pipeline The branch trace feature described in Section 10 2 2 Branch Tracing may be used t
141. tagged with the match cannot complete before the breakpoint interrupt is taken The address of e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 23 Register Model the instruction which matches the breakpoint condition is stored in SRRO The tagged instruction is completed and retired on return from the interrupt rfi or rfci The results are then committed to the destination registers and address Note that if IABR IABR2 values are set to any interrupt vector an unrecoverable processor state occurs SPR 1010 IABR Access User read write 1018 IABR2 0 29 30 31 R CEA BE WwW Reset All zeros Figure 2 19 IABR and IABR2 Registers The bits in the IABR and IABR2 are defined in Table 2 13 For more information about the instruction breakpoint interrupt see Section 5 5 17 Instruction Address Breakpoint Interrupt 0x01300 Table 2 13 Instruction Address Breakpoint Register IABR and IABR2 Bit Settings Bits Name Description 0 29 CEA Compare effective address Word address to be compared 30 BE Breakpoint enable IABR or IABR2 enabled Setting this bit enables the IABR interrupt 31 Reserved 2 2 15 Instruction Address Breakpoint Control Register IBCR The IBCR shown in Figure 2 20 is a supervisor level register with SPR309 on the e300 core which is accessible only by using an mtspr or mfspr instruction The IBCR controls the compare and
142. tdi 1 02 TO A SIMM tlbia 2 3 31 00000 00000 00000 370 0 tlbie 23 31 00000 00000 B 306 0 tlbid 2 31 00000 00000 B 978 0 tibli 2 31 00000 00000 B 1010 0 tlbsync 3 31 00000 00000 00000 566 0 tw 31 TO A B 4 0 twi 03 TO A SIMM xorx 31 S A B 316 Rc xori 26 S A UIMM xoris 27 S A UIMM 1 64 bit instruction 2 Supervisor level instruction S Optional in the PowerPC architecture 4 Load and store string or multiple instruction S Supervisor and user level instruction 6 Implementation specific instruction e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Instruction Set Listings A 2 Instructions Sorted by Opcode Table A 2 lists the instructions defined in the PowerPC architecture in numeric order by opcode Name tdi twi mulli subfic cmpli cmpi addic addic addi addis bcx sc bx mcrf belrx crnor rfi crandc isync crxor crnand crand creqv crorc cror bectrx rlwimix rlwinmx rlwnmx ori oris xori xoris Table A 2 Complete Instruction List Sorted by Opcode 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 000010 TO A SIMM 000011 TO SIMM 000111 D A SIMM 001000 D A SIMM 001010 cfD O L A UIMM 001011 cfD O L A SIMM 001100 D A SIMM 001101 D A SIMM 001110 D A SIMM 001111 D A SIMM 010000 BO BI BD AA LK 010001 00000 00000
143. the context synchronizing instruction if the breakpoint was disabled by the mtspr instruction See Synchronization Requirements for Special Registers and TLBs in Chapter 2 Register Set in the Programming Environments Manual for more information on this requirement Table 5 23 Breakpoint Action for Multiple Modes Enabled for the Same Address IABR IE MSR BE MSR SE First Action Next Action Comments 1 1 0 Instruction address Trace branch Enabling both modes is useful only if both trace and breakpoint address breakpoint interrupts are needed 1 0 1 Instruction address Trace Enabling both modes is useful only if different breakpoint single step breakpoint actions are required 0 1 1 Trace branch None The action for branch trace and single step trace is the same Enabling both trace modes is redundant except for hard stop on branches 1 1 1 Instruction address Trace Enabling all modes is redundant This entry is for breakpoint clarification only Section 2 2 14 Instruction Address Breakpoint Registers IABR and IABR2 and Chapter 10 Debug Features provide more information about the instruction breakpoint facility e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 35 Interrupts and Exceptions 5 5 18 System Management Interrupt 0x01400 The system management interrupt behaves like an external interrupt except for the si
144. the e300c2 does not support floating point operations Remove references to a moded 32 or 64 bit data bus because that is a feature that will be phased out of e300 cores The default now is a 64 bit data bus Chapter 1 Add Section 1 5 Differences Between e300 Cores Add notes that the e300c2 improves integer instruction throughput and significantly improves multiply instructions 1 0 1 1 Change wording to explain references to e300 e300c1 and e300c2 1 1 1 1 Add sentence stating that the e300c2 significantly improves multiply instructions 1 1 1 1 Add sentence stating that the e300c2 eliminates the FPU 1 1 1 3 Add Figure 1 2 e300c2 Core Block Diagram 1 3 3 2 1 22 Add paragraph for e300c2 implementation and replace Figure 1 3 Data Cache Organization with two figures e300c1 Data Cache Organization and e300c2 Data Cache Organization 1 3 4 2 1 26 In Table 1 2 add the following sentence to the description of exception conditions for Floating point unavailable In the e300c2 core any attempt to execute a floating point instruction results in a floating point unavailable exception 1 4 7 1 55 Remove paragraph describing bus arbitration scheme 2 1 2 4 Add PVR value of the e300c2 core in the bullet describing the PVR e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Revision History 2 1 2 5 2 1 2 5 2 1 2 6 2 7 2 2 1 2 11 2 2 3 2 15
145. the kind of access performed on the bus global or local This attribute is provided to allow improved performance in systems where hardware enforced coherency is relatively slow and where software is able to enforce the required coherency When M 0 the processor does not enforce data coherency When M 1 the processor enforces data coherency and the corresponding access is considered to be a global access When the M attribute is set and the access is performed the global signal is asserted to indicate that the access is global Snooping devices affected by the access must then respond to this global access if their data is modified by signaling retry and by updating the memory location For instruction accesses HIDO IFEM control how accesses are performed global or local on the bus If translation is enabled setting IFEM causes the core to reflect the M bit state on the bus during instruction fetches 4 4 1 4 Guarded Attribute G When the guarded bit is set the memory area block or page is designated as guarded This setting can be used to protect certain memory areas from read accesses made by the processor that are not dictated e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Instruction and Data Cache Operation directly by the program If there are areas of memory that are not fully populated in other words there are holes in the memory map within this area this setting can
146. throughput In cycle 7 instruction 6 completes instruction 7 is in the second FPU execute stage and although instruction 8 has executed it must wait for instruction 7 to complete Instruction 9 dispatches to the IU Instructions 10 and 11 move down in the IQ Fetching resumes with instructions 13 and 14 In cycle 8 instruction 7 is in the third FPU execute stage Instructions 8 and 9 have executed and they remain in the CQ until instruction 7 completes Instruction 10 is dispatched to the IU In cycle 9 instruction 7 completes allowing instruction 8 to complete Because the CQ is full instructions 12 and 13 cannot be dispatched e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Instruction Timing 10 In cycle 10 instructions 9 and 10 complete Instruction 11 has executed but cannot exit the CQ from CQ2 Instructions 12 and 13 are dispatched to the FPU and IU respectively Instruction 14 drops into IQO 11 In cycle 11 instruction 11 completes and instruction 12 is in the second FPU execute stage Instruction 13 has executed but must remain in the CQ until instruction 12 completes Instruction 14 enters the first FPU execute stage 7 3 2 3 Cache Miss Figure 7 7 shows an instruction fetch that misses the on chip cache and shows how that fetch affects the instruction dispatch Note that a processor bus clock ratio of 1 2 is used The same instruction sequence is used as in Section 7 3 2 2
147. transactions can be caused by cache write through accesses caching inhibited accesses I bit of the WIMG bits for the page is set accesses that miss when the cache is locked HIDO DLOCK is set accesses when the cache is disabled HIDO DCE bit is cleared and misaligned accesses 4 9 2 Burst Transactions Burst transactions on the core always transfer eight words of data at a time and are aligned to a double word boundary Burst transactions have an assumed address order For caching allowed read operations or caching allowed non write through write operations that miss the cache the core presents the double word aligned address associated with the load or store instruction that initiated the transaction As shown in Figure 4 9 this quad word contains the address of the load or store that missed the cache This minimizes latency by allowing the critical code or data to be forwarded to the processor before the rest of the block is filled For all other burst operations however the entire block is transferred in order eight word aligned Critical double word first fetching on a cache miss applies to both the data and instruction cache e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 27 Instruction and Data Cache Operation e300 Core Cache Address Bits 27 28 00 01 10 11 A B C D If the address requested is in double word A the address placed on the bus is that of double wo
148. translated by the ITLB Data load 01100 A data load translation miss interrupt is caused when the effective address for a data load translation miss operation cannot be translated by the DTLB Data store 01200 A data store translation miss interrupt is caused when the effective address for a data store translation miss operation cannot be translated by the DTLB or where a DTLB hit occurs and the change bit in the PTE must be set due to a data store operation Instruction 01300 An instruction address breakpoint interrupt occurs when the address bits 0 29 in the address IABR matches the next instruction to complete in the completion unit and ABR 30 is set breakpoint Note that the e300 core also implements IABR2 which functions identically to IABR System 01400 A system management interrupt is caused when MSR EE 1 and the smi input signal is management asserted interrupt Reserved 01500 02FFF Interrupts are roughly prioritized by class as follows 1 Nonmaskable asynchronous interrupts have priority over all other interrupts system reset and machine check interrupts although the machine check exception conditions can be disabled so the condition causes the processor to go directly into the checkstop state These interrupts cannot be delayed and do not wait for the completion of any precise interrupt handling 2 Synchronous precise interrupts are caused by instructions and are taken in strict program order
149. two kinds of interrupts in the e300 core those caused directly by the execution of an instruction and those caused by an asynchronous event Either may cause components of the system software to be invoked Interrupts can be caused directly by the execution of an instruction as follows e An attempt to execute an illegal instruction causes the illegal instruction program interrupt handler to be invoked An attempt by a user level program to execute the supervisor level instructions listed below causes the privileged instruction program interrupt handler to be invoked The core provides the following supervisor level instructions icbt dcbi mfmsr mfspr mfsr mfsrin mtmsr mtspr mtsr mtsrin rfi tlbie tlbsync tlbld and tlbli Note that the privilege level of the mfspr and mtspr instructions depends on the SPR encoding e An attempt to access memory that is not available page fault causes the ISI interrupt handler to be invoked e An attempt to access memory with an effective address alignment that is invalid for the instruction causes the alignment interrupt handler to be invoked e The execution of an se instruction invokes the system call interrupt handler that permits a program to request the system to perform a service e The execution of a trap instruction invokes the program interrupt trap handler e The execution of a floating point instruction when floating point instructions are disabled or unavailable invokes the flo
150. typically used by the operating system and user mode of operation used by the application software The programming models incorporate 32 GPRs 32 FPRs special purpose registers SPRs and several miscellaneous registers Each core also has its own unique set of hardware implementation HID registers Having access to privileged instructions registers and other resources allows the operating system to control the application environment providing virtual memory and protecting operating system and critical machine resources Instructions that control the state of the e300 core the address translation mechanism and supervisor registers can be executed only when the core is operating in supervisor mode Figure 1 4 shows all the core registers available at the user and supervisor level The numbers to the right of the SPRs indicate the number that is used in the syntax of the instruction operands for the move to from SPR instructions e300 Power Architecture Core Family Reference Manual Rev 3 16 Freescale Semiconductor USER MODEL General Purpose Registers 32 Bit Floating Point Registers 64 Bit XER SPR 1 Link Register LR SPR 8 Count Register CTR SPR 9 Time Base Facility For Reading TBL SPR268 TBU SPR269 Performance Monitor read only UPMGCO PMR384 UPMCs PMR 0 3 UPMLCas PMR 128 131 SPRGs Critical Interrupt Registers CSRRO SPR58 CSRR11 SPR59 SUPERVISOR MODEL Configu
151. unit GPR General purpose register HBE High BAT enable HIDO Hardware implementation register 0 HID1 Hardware implementation register 1 HID2 Hardware implementation register 2 l Cache inhibited IABR Instruction address breakpoint register 1 IABR2 Instruction address breakpoint register 2 IBAT Instruction BAT IBCR Instruction breakpoint control register ICE Instruction cache enable ICFI Instruction cache flash invalidate ICMP Instruction TLB compare IEE External interrupt enable IEEE Institute of Electrical and Electronics Engineers IFEM Instruction fetch enable M bit ILE Interrupt little endian mode ILOCK Instruction cache lock IMISS Instruction TLB miss address IMMU Instruction memory management unit IP interrupt prefix IQ Instruction queue IR Instruction address translation enable ITLB Instruction translation lookaside buffer IU Integer unit IWLCK Instruction cache way lock L2 Secondary cache LE Little endian mode enable LET True little endian mode bit LIFO Last in first out LR Link register e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Xxix Table i Acronyms and Abbreviated Terms continued Term Meaning LRU Least recently used Isb Least significant bit LSB Least significant byte LSU Load store unit M Memory coherent MBAR System memory base address ME Machin
152. up to 24 processor clock cycles to convert between the internal double precision format and the external single precision format 3 2 4 2 1 Floating Point Arithmetic Instructions The floating point arithmetic instructions are listed in Table 3 8 Table 3 8 Floating Point Arithmetic Instructions Name Mnemonic Operand Syntax Floating Add Double Precision fadd fadd frD frA frB Floating Add Single fadds fadds _ frD frA frB Floating Divide Double Precision fdiv fdiv frD frA frB Floating Divide Single fdivs fdivs frD frA frB Floating Multiply Double Precision fmul fmul frD frA frC Floating Multiply Single fmuls fmuls _ frD frA frC Floating Reciprocal Estimate Single fres fres frD frB Floating Reciprocal Square Root Estimate frsqrte frsqrte frD frB Floating Select fsel fsel frD frA frC frB Floating Subtract Double Precision fsub fsub frD frA frB Floating Subtract Single fsubs fsubs frD frA frB 3 2 4 2 2 Floating Point Multiply Add Instructions These instructions combine multiply and add operations without an intermediate rounding operation The fractional part of the intermediate product is 106 bits wide and all 106 bits take part in the add subtract portion of the instruction The floating point multiply add instructions are listed in Table 3 9 Table 3 9 Floating Point Multiply Add Instructions Name Mnemonic Operan
153. upper 7 bits of the HTABORG field from SDR1 7 25 Hashed page address Address bits 7 25 of the PTEG to be searched 26 31 Reserved e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 19 Register Model 2 2 7 Required Physical Address Register RPA During a page table search operation the software must load the RPA shown in Figure 2 11 with the second word of the correct PTE When the tlbld or tlbli instruction is executed the RPA and DMISS or IMISS register are merged and loaded into the selected TLB entry The referenced R bit is ignored when the write occurs no location exists in the TLB entry for this bit The RPA register is read and write accessible to the software SPR 982 Access Supervisor read write 0 19 20 22 23 24 25 28 29 30 31 R RPN RIC WIMG PP W Reset All zeros Figure 2 11 Required Physical Address Register RPA Table 2 11 describes the bit settings of the RPA register Table 2 11 RPA Bit Settings Bits Name Description 0 19 RPN Physical page number from PTE 20 22 Reserved 23 R Referenced bit from PTE 24 C Changed bit from PTE 25 28 WIMG Memory cache access attribute bits 29 Reserved 30 31 PP Page protection bits from PTE 2 2 8 BAT Registers BAT4 BAT7 The MMU has four additional IBAT and four additional DBAT array entries
154. use the data address register DAR as shown in Table 5 15 When a DSI interrupt is taken instruction execution for the handler begins at offset 0x00300 from the physical base address indicated by MSR IP The architecture permits certain instructions to be partially executed when they cause a DSI interrupt These are as follows e Load string instructions some registers in the range of registers to be loaded may have been loaded e Store multiple or store string instructions some bytes of memory in the range addressed may have been updated In these cases the number of registers and amount of memory altered are instruction and boundary dependent However memory protection is not violated For update forms the update register rA is not altered 5 5 4 ISI Interrupt 0x00400 The ISI interrupt is implemented as it is defined by the PowerPC architecture An ISI interrupt occurs when no higher priority interrupt exists and an attempt to fetch the next instruction fails for any of the following reasons e Ifan instruction TLB miss fails to find the desired PTE then a page fault is synthesized The ITLB miss handler branches to the ISI interrupt to retrieve the translation from a storage device e An attempt is made to fetch an instruction from no execute memory e The fetch access violates memory protection e300 Power Architecture Core Family Reference Manual Rev 3 24 Freescale Semiconductor Interrupts and Exceptions
155. virtual page numbers and physical page numbers The page table size is a power of two and its starting address is a multiple of its size Also as specified by the PowerPC architecture the page table contains a number of PTEGs A PTEG contains 8 PTEs of 8 bytes each therefore each PTEG is 64 bytes long PTEG addresses are entry points for table search operations 1 3 6 Instruction Timing The e300 core is a pipelined superscalar processor core Because instruction processing is reduced into a series of stages an instruction does not require all of the resources of an execution unit at the same time For example after an instruction completes the decode stage it can pass on to the next stage while the subsequent instruction can advance into the decode stage This improves the throughput of the instruction flow For example it may take three cycles for a single floating point instruction to execute but if there are no stalls in the floating point pipeline a series of floating point instructions can have a throughput of one instruction per cycle The core instruction pipeline has four major pipeline stages described as follows e The fetch pipeline stage primarily involves retrieving instructions from the memory system and determining the location of the next instruction fetch Additionally if possible the BPU decodes branches during the fetch stage and folds out branch instructions before the dispatch stage e300 Power Architecture Core
156. 0 IR 0 PR 0 BE 0 DR 0 When a system management interrupt is taken instruction execution for the handler begins at offset 0x01400 from the physical base address indicated by MSR IP The e300 core recognizes the interrupt condition smi asserted only if the MSR EE bit is set otherwise the interrupt condition is ignored To guarantee that the external interrupt is taken the smi signal must be held active until the e300 core takes the interrupt If the smi signal is negated before the interrupt is taken the e300 core is not guaranteed to take a system management interrupt The interrupt handler must send a command to the device that asserted smi acknowledging the interrupt and instructing the device to negate smi e300 Power Architecture Core Family Reference Manual Rev 3 36 Freescale Semiconductor Chapter 6 Memory Management This chapter describes the e300 core implementation of the memory management unit MMU specifications provided by the PowerPC operating environment architecture OEA The MMU implementation of the e300 core is the same as that of previous PowerPC microprocessors However the e300 core implements four additional IBAT entries and four additional DBAT entries The primary function of the MMU in a processor of this family is the translation of logical effective addresses to physical addresses referred to as real addresses in the architecture specification for memory accesses and I O accesses I
157. 0 core master pipeline and two of the core execution units the FPU and LSU are also multiple stage pipelines The e300 core contains the following execution units that operate independently and in parallel e Branch processing unit BPU e 32 bit integer unit IU executes all integer instructions dual integer units are supported on the 300c2 e300c3 e 64 bit floating point unit FPU not supported on the e300c2 core e Load store unit LSU e System register unit SRU e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Instruction Timing The core can retire two instructions on every clock cycle In general the core processes instructions in four stages fetch decode dispatch execute and complete as shown in Figure 7 2 for the e300c1 Figure 7 3 for the e300c2 core and Figure 7 4 for the e300c3 Note that the example of a pipelined execution unit in Figure 7 1 is similar to the three stage FPU pipeline in Figure 7 2 Fetch Maximum Two Instruction Fetch per Clock Cycle Branch Processing Unit Instruction Queue E EC OH 4 gt 1Q3 gt I ES OH on in Program Order Maximum Two Instruction Dispatch per Clock Cycle Dispatch eS oe eS a Ss SS ee fae SS Se ee Se Gi Completion Buffer Assignment l E Reservation _ 4 S T 7 r Stations L A L 4 L L l 8 SRU 2 Entry WW 1 Store Queue Finis
158. 00 Power Architecture Core Family Reference Manual Rev 3 Glossary 14 Freescale Semiconductor Index A Accesses core interface overview 8 7 Address calculation branch instruction 3 23 effective address 3 8 Address generation floating point load store 3 21 integer load store 3 17 physical address generation MMU 6 1 see also Memory management unit MMU 6 1 Address translation see Memory management unit MMU Addressing memory 3 7 memory operand 3 1 modes 3 7 Alignment interrupt 5 25 6 15 see also Interrupt handling 5 4 misaligned accesses 3 2 overview 1 28 Architecture PowerPC xxiii 1 15 Assembly language programs simplified mnemonics 3 33 Asynchronous interrupts maskable 5 2 5 5 nonmaskable 5 2 5 5 Atomic memory references using lwarx stwex 4 14 Automatic power reduction mode 9 1 Base decrementer registers 9 2 Big endian mode 5 14 Block address translation BAT 1 5 6 19 BAT registers and cache locking implications 4 35 4 40 block address translation flow 6 11 see also Memory Management Unit MMU 1 5 selection of block address translation 6 8 Block diagram e300 core 1 2 Boundedly undefined definition 3 5 Branch instructions address calculation 3 23 branch and flow control instructions 3 22 condition register logical A 22 summary A 22 system linkage A 22 tracing on branch instructions 5 13 5 32 10 3 10 4 trap A 23 Branch processing unit BPU 1
159. 000 22 Rc 59 D 00000 B 00000 22 Rc 63 D A B 00000 20 Rc 59 D A B 00000 20 Rc 31 00000 A B 982 0 31 00000 A B 22 0 19 00000 00000 00000 150 0 34 D A d 35 D A d 31 D A 119 0 31 D A 87 0 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Instruction Set Listings Table A 1 Complete Instruction List Sorted by Mnemonic continued Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 EN 58 D A ds 0 Idarx 31 D A B 84 0 Idu 58 D A ds 1 Idux 31 D A B 53 Idx 31 D A B 21 Ifd 50 D A d Ifdu 51 D A d Ifdux 31 D A 631 Ifdx 31 D A 599 Ifs 48 D A d Heu 49 D A d Ifsux 31 D A 567 0 Ifsx 31 D A 535 0 Iha 42 D A d Ihau 43 D A d Ihaux 31 D A 375 0 Ihax 31 D A 343 0 Ihbrx 31 D A 790 0 Ihz 40 D A d Ihzu 41 D A d Ihzux 31 D A 311 0 Ihzx 31 D A 279 0 Imw4 46 D A d Iswi 4 31 D A NB 597 0 Iswx 4 31 D A B 533 0 Iwa 58 D A ds 2 lwarx 31 D A B 20 0 Iwaux 31 D A B 373 0 Iwax 31 D A B 341 0 Iwbrx 31 D A B 534 0 lwz 32 D A d Iwzu 33 D A d lwzux 31 D A B 55 0 lwzx 31 D A B 23 0 mert 19 cD 00 crfS 00 00000 0 0 merfs 63 cD 00 cfs 00 00000 64 0 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Instruction Set Listings Table A 1 Comple
160. 000000000000000 1 0 010010 LI AAILK 010011 crfD 00 crfS 00 00000 0000000000 0 010011 BO BI 00000 0000010000 LK 010011 crbD crbA crbB 0000100001 0 010011 00000 00000 00000 0000110010 0 010011 crbD crbA crbB 0010000001 0 010011 00000 00000 00000 0010010110 0 010011 crbD crbA crbB 0011000001 0 010011 crbD crbA crbB 0011100001 0 010011 crbD crbA crbB 0100000001 0 010011 crbD crbA crbB 0100100001 0 010011 crbD crbA crbB 0110100001 0 010011 crbD crbA crbB 0111000001 0 010011 BO BI 00000 1000010000 LK 010100 S A SH MB ME Rc 010101 S A SH MB ME Rc 010111 S A B MB ME Rc 011000 S A UIMM 011001 S A UIMM 011010 S A UIMM 011011 S A UIMM e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Instruction Set Listings Table A 2 Complete Instruction List Sorted by Opcode continued Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 andi 011100 S A UIMM andis 011101 A UIMM ridiclx 011110 S A sh mb 000 ebe ridicrx 011110 S A sh me 001 sh Rc rdicx 011110 S A sh mb 010 ebe rdimix 011110 S A sh mb 011 shiRe ridcix 011110 S A B mb 01000 De ridcrx 011110 S A B me 01001 Re cmp 011111 e O L A B 0000000000 0 tw 011111 TO A B 0000000100 0 subfcex 011111 D A B OE 0000001000 Re mulhdux 011111 D A B 0 0000001001 Re a
161. 001 that is e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 9 Instruction and Data Cache Operation write back caching not inhibited and memory coherency required The MEI protocol is the default cache coherency protocol for the e300 core SH CRW SH CRW RM WH SH CLN Modified RH SH SBR Bus Transactions SH Snoop Hit Snoop Push RH Read Hit RM Read Miss Cache Line Fill WHe Write Hit WM Write Miss SH CRW Snoop Hit Caching Allowed Read Write SH SBR Snoop Hit Single Beat Read SH CLN Snoop hit Clean broadcast Figure 4 5 MEI Cache Coherency Protocol State Diagram WIM 001 4 4 2 2 MESI Coherency Protocol Before enabling MESI mode the data cache must first be flushed and a flash invalidate performed to empty the cache of any valid data When operating in the four state protocol a store hit to a shared line is treated as a cache miss and the line is re fetched from memory with intent to modify status on the bus The newly re fetched line replaces the same way entry in the cache Also any clean operation in the data cache debst or any snoop clean type puts the cache line in the shared state In MEI mode a clean operation puts the line in the exclusive state When MESI mode is enabled other nuances of core operation are affected The transaction type signals on the bus reflect the MESI intention rather than the MEI intention In particular a l
162. 0011 cause data accesses to be considered cacheable I 0 and thus load and store accesses are weakly ordered This is the case even if the data cache is disabled in the HIDO register as it is out of hard reset If I O devices require load and store accesses to occur in strict program order strongly ordered translation must be enabled so that the corresponding I bit can be set Also for instruction accesses the default memory access mode bits WIMG are 0b0001 That is instruction accesses are considered cacheable I 0 and the memory is guarded Again instruction cache accesses are considered cacheable even if the instruction cache is disabled in the HIDO register as it is out of hard reset The W and M bits have no effect on the instruction cache For information on the synchronization requirements for changes to MSR IR and MSR DR refer to Synchronization Requirements for Special Registers and for Lookaside Buffers in Chapter 2 Register Set in the Programming Environments Manual 6 3 Block Address Translation The block address translation BAT mechanism in the OEA provides a way to map ranges of effective addresses larger than a single page into contiguous areas of physical memory Such areas can be used for data that is not subject to normal virtual memory handling paging such as a memory mapped display buffer or an extremely large array of numerical data The software model for block address translation in t
163. 00c3 only System software can set PMM when a marked process is running to enable statistics to be gathered only during the execution of the marked process MSR PR and MSR PMM together define a state that the processor supervisor or user and the process marked or unmarked may be in at any time If this state matches an individual state specified in the PMLCan the state for which monitoring is enabled counting is enabled 30 RI Recoverable interrupt for system reset and machine check interrupts 0 Interrupt is not recoverable 1 Interrupt is recoverable 31 LE Little endian mode enable 0 The processor runs in big endian mode 1 The processor runs in little endian mode See Section 3 1 2 Endian Modes and Byte Ordering for a description of the core operating in true little endian mode 1 All reserved bits should be set to zero for future compatibility NOTE The core defines MSR 13 as the power management enable POW bit and MSR 14 as the temporary GPR remapping TGPR bit The e300 allocates MSR 24 is used to enable the critical interrupt and rfci the return from critical interrupt instruction MSR 31 is used in conjunction with HID2 LET to indicate the endian mode of operation of the e300 core These bits are described in Table 2 4 Memory management registers Block address translation BAT registers The device also supports 16 block address translation registers BATs through the use of 2 inde
164. 1 gt 3 slbie 12 3 tlbia 19 tlbie 15 tlbld 14 tlbli 1 4 tlbsyne 13 A OO N A 4 Table A 30 through Table A 44 list the PowerPC instructions grouped by form Name bx Name bcx Table A 29 Lookaside Buffer Management Instructions 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 31 00000 00000 00000 498 0 31 00000 00000 B 434 0 31 00000 00000 00000 370 0 31 00000 00000 B 306 0 31 00000 00000 B 978 0 31 00000 00000 B 1010 0 31 00000 00000 00000 566 0 Supervisor level instruction 64 bit instruction Optional in the PowerPC architecture e300 core implementation specific instruction Instructions Sorted by Form Table A 30 l Form 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD LI AA LK Specific Instruction 18 LI AA LK Table A 31 B Form 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD BO BI BD AA LK Specific Instruction 16 BO BI BD AA LK e300 Power Architecture Core Family Reference Manual Rev 3 24 Freescale Semiconductor Instruction Set Listings Table A 32 SC Form Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD 00000 00000 000000000000000 1 0 Specific Instruction sc 17 00000 00000 000000000000000 1 0 Table A 33 D Form
165. 1 1 5 5x 8 11 0101 1 5 5x 8 00 0110 0 6x 2 o 0110 0 6x 4 10 0110 0 6x 8 11 0110 0 6x 8 00 0110 1 6 5x 2 o 0110 1 6 5x 4 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Core Interface Operation Table 8 2 Core PLL Configuration continued Bere dE PLL_CFG 6 Beste VCO divider 10 0110 1 6 5x 8 11 0110 1 6 5x 8 00 0111 0 7x 2 01 0111 0 7x 4 10 0111 0 7x 8 11 0111 0 7x 8 00 0111 1 7 5x 2 01 0111 1 7 5x 4 10 0111 1 7 5x 8 11 0111 1 7 5x 8 00 1000 0 8x 2 o 1000 0 8x 4 10 1000 0 8x 8 11 1000 0 8x 8 00 1000 1 8 5x 2 01 1000 1 8 5x 4 10 1000 1 DS 8 11 1000 1 DS 8 00 1001 0 9x 2 01 1001 0 9x 4 10 1001 0 9x 8 11 1001 0 9x 8 00 1001 1 9 5x 2 01 1001 1 9 5x 4 10 1001 1 9 5x 8 11 1001 1 9 5x 8 00 1010 0 10x 2 o 1010 0 10x 4 10 1010 0 10x 8 11 1010 0 10x 8 00 1010 1 10 5x 2 01 1010 1 10 5x 4 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Core Interface Operation Table 8 2 Core PLL Configuration continued Zar dE PLL_CFG 6 Beste VCO divider 10 1010 1 10 5x 8 11 1010 1 10 5x 8 00 1011 0 11x 2 01 1011 0 11x 4 10 1011 0 11x 8 11 1011 0 11x 8 00 1011 1 11 5x 2 01 1011 1 11 5x 4 10 1011 1 11 5x 8 11 1011 1 11 5x 8 00 1100 0 12x 2 01 1100 0 12x
166. 1 7 PLL phase locked loop 9 2 PLL configuration HIDn 2 15 8 4 PMC0 3 performance monitor counter registers 11 5 PMGCO global control register 0 11 3 PMLCa0 PMLCa3 performance monitor local control registers A 0 3 11 4 Power management default power state 9 2 doze mode 9 2 9 3 enabling with MSR POW bit 5 12 9 5 example code 9 6 full power mode 9 2 with DPM disabled 9 3 with DPM enabled 9 3 modes 1 13 9 3 overview 9 1 software considerations 9 5 PowerPC architecture instruction list A 1 A 8 A 15 levels of implementation 1 15 operating environment architecture OEA xxiii overview 1 15 user instruction set architecture UISA xxiii 2 1 virtual environment architecture VEA xxiii Privilege level privileged state see Supervisor mode problem state see User mode Privilege level supervisor or user 5 12 Program interrupt 1 28 5 4 5 28 Program order definition 7 2 Protection of memory areas no execute protection N bit 6 12 options available 6 9 protection violations 6 14 PTEGs PTE groups 6 25 PTEs page table entries 6 25 PVR processor version register 2 6 Q Quiescent state 9 4 quiesce acknowledge signal gack 9 4 quiesce request signal greq 9 4 R Real addresses RA see Memory management unit MMU Reference R bit 6 10 6 21 checking 6 36 maintenance recording 6 10 6 20 6 22 6 29 Registers block address translation registers BATs and cache lockin
167. 10 A 11 A 12 A 13 A 14 A 15 A 16 A 17 A 18 A 19 A 20 A 21 A 22 A 23 A 24 A 25 A 26 A 27 A 29 Tables Page Title Number Single Address Matching Bit Senge 10 5 Two Address OR Matching EE 10 5 Address Matching for Inside Address Range 2 232 c 0cs i occas hese deangoess tech eed deeiedetaion 10 6 Address Matching for Outside Address Range cssscsscsseccetsceecetseccenseeseneescenteneensers 10 6 Performance Monitor Registers Supervisor Level 11 2 Performance Monitor Registers User Level Read Only 11 2 PMGECO Field IDES el ele EE 11 3 PMLCa0 PMLCa3 Field Descriptions ceeeeceeececssccecsecceceecececeececeeceeeecseeeecseeeeseeeeees 11 5 PMCO PMG 3 Field Descriptions eege Seed 11 6 Performance Monitor APU Instructions eet 2 ccecs cee ninnsuiaeninadateeanmienes 11 7 Processor States and PMLCa0 PMLCa3 Bit Senge 11 10 Event KEE 11 11 Performance Monitor Event Selection ies c2 ci vasstdvsiusteatiseaiten saasyedeavedeand waavanaeedats aortas tess 11 12 Complete Instruction List Sorted by Mnemonic eee eesceseeeeeeeneeceeeeseeesaeeceaeeneeseeeeeneees A 1 Complete Instruction List Sorted by Opcode eesessssseessesesesressessesressersresressersresrerrenseesesees A 8 Integer Arithmetic InstructOns sessio ea a ee eee A 15 Integer Compare InstrUcti ns nsei sii ena a e i cent n a T nites A 16 Integer Logical Instructions eene ENEE EENS A 16 ILE Per e EE TEE A 16 Integer Shift Instr ucuom
168. 10000 Performance monitor local control a0 PMLCa0 145 00100 10001 Performance monitor local control a1 PMLCa1 146 00100 10010 Performance monitor local control a2 PMLCa2 147 00100 10011 Performance monitor local control a3 PMLCa3 400 01100 10000 Performance monitor global control 0 PMGCO The user level performance monitor registers in Table 11 2 are read only and are accessed with the mfpmr instruction Attempting to write these user level registers in either supervisor or user mode causes an illegal instruction exception Table 11 2 Performance Monitor Registers User Level Read Only Number PMR 0 4 PMR 5 9 Name Abbreviation 0 00000 00000 Performance monitor counter 0 UPMCO 1 00000 00001 Performance monitor counter 1 UPMC1 2 00000 00010 Performance monitor counter 2 UPMC2 3 00000 00011 Performance monitor counter 3 UPMC3 128 00100 00000 Performance monitor local control a0 UPMLCa0d 129 00100 00001 Performance monitor local control a1 UPMLCa1 130 00100 00010 Performance monitor local control a2 UPMLCa2 131 00100 00011 Performance monitor local control a3 UPMLCa3 384 01100 00000 Performance monitor global control 0 UPMGCO e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Performance Monitor 11 2 1 Global Control Register 0 PMGCO The performance monitor global control register PMGCO shown in Figure 11 1
169. 10110 SPRG6 2 Supervisor 279 01000 10111 SPRG7 2 Supervisor 284 01000 11100 TBL Supervisor 285 01000 11101 TBU Supervisor 286 01000 11110 SVR Supervisor 287 01000 11111 PVR Supervisor 309 01001 10101 IBCR 2 Supervisor 310 01001 10110 DBCR Supervisor 311 01001 10111 MBAR 2 Supervisor 317 01001 11101 DABR2 2 Supervisor 528 10000 10000 IBATOU Supervisor 529 10000 10001 IBATOL Supervisor 530 10000 10010 IBAT1U Supervisor 531 10000 10011 IBAT1L Supervisor 532 10000 10100 IBAT2U Supervisor 533 10000 10101 IBAT2L Supervisor 534 10000 10110 IBAT3U Supervisor 535 10000 10111 IBAT3L Supervisor 536 10000 11000 DBATOU Supervisor 537 10000 11001 DBATOL Supervisor 538 10000 11010 DBAT1U Supervisor 539 10000 11011 DBATIL Supervisor 540 10000 11100 DBAT2U Supervisor 541 10000 11101 DBAT2L Supervisor 542 10000 11110 DBAT3U Supervisor 543 10000 11111 DBAT3L Supervisor 560 10001 10000 IBAT4U 2 Supervisor e300 Power Architecture Core Family Reference Manual Rev 3 30 Freescale Semiconductor Table 3 32 Implementation Specific SPR Encodings mfspr continued SPR Register Name Access Decimal spr 5 9 spr 0 4 561 10001 10001 IBAT4L 2 Supervisor 562 10001 10010 IBAT5U 2 Supervisor 563 10001 10011 IBAT5L 2 Supervisor 564 10001 10100 IBAT6U 2 Supervisor 565 10001 10101 IBAT6L 2 Supervisor 566 10001 10110 IBAT7U 2 Supervisor 567 10001
170. 2 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD D A ds XO OPCD S A ds XO Specific Instructions ld 58 D A ds 0 Idu 58 D A ds 1 Iwa 58 D A ds 2 std 62 S A ds 0 stdu 62 S A ds 1 1 64 bit instruction e300 Power Architecture Core Family Reference Manual Rev 3 26 Freescale Semiconductor Name andx andcx cmp cmpl entizdx cntlzwx dcbf Table A 35 X Form Instruction Set Listings 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD D A B XO 0 OPCD D A NB XO 0 OPCD D 00000 B XO 0 OPCD D 00000 00000 XO 0 OPCD D 0 SR 00000 XO 0 OPCD S A B XO Rc OPCD S A B XO 1 OPCD S A B XO 0 OPCD S A NB XO 0 OPCD S A 00000 XO Rc OPCD S 00000 B XO 0 OPCD S 00000 00000 XO 0 OPCD S 0 SR 00000 XO 0 OPCD S A SH XO Rc OPCD cfD O L A B XO 0 OPCD crfD 00 A B XO 0 OPCD crfD 00 crfS 00 00000 XO 0 OPCD crfD 00 00000 00000 XO 0 OPCD crfD 00 00000 IMM 0 XO Re OPCD TO A B XO 0 OPCD D 00000 B XO Re OPCD D 00000 00000 XO Re OPCD crbD 00000 00000 XO Re OPCD 00000 A B XO 0 OPCD 00000 00000 B XO 0 OPCD 00000 00000 00000 XO 0 Specific Instructions 31 A B 28 Re 31 A B 60 Re 31 cD O L A B 0 0 31 cfiD Dit A B 32 0 31 S A 00000 58 Rc 31 S A 00000 26 Rc 31 00000 A B 86 0 e300 Power Architectu
171. 25 26 27 28 29 30 31 OPCD A B mb XO Re OPCD A B me XO Re Specific Instructions ridelx 30 S A mb 8 Rc rldcrx 30 S A me 9 Rc 1 64 bit instruction e300 Power Architecture Core Family Reference Manual Rev 3 34 Freescale Semiconductor Instruction Set Listings Ab Instruction Set Legend Table A 45 provides general information on the PowerPC instruction set such as the architectural level privilege level and form Table A 45 PowerPC Instruction Set Legend UISA VEA OEA Supervisor Level Optional 64 Bit Form addx V XO addcx V XO addex V XO addi V addic V addic V addis V addmex V xO addzex V XO andx V X andcx V X andi V D andis V D bx V l bcx V B bectrx V XL belrx V XL cmp V X cmpi V D cmpl V X cmpli V D cntizdx d d D cntlzwx d xX crand V XL crandc V XL creqv V XL crnand V XL crnor V XL cror V XL crorc V XL crxor V XL e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 35 Instruction Set Listings Table A 45 PowerPC Instruction Set Legend continued dcbf UISA VEA OEA Supervisor Level Optional 64 Bit Form debi dcbst dcbt dcbtst dcbz lt lt lt divdx divdux divwx divwux lt lt ay eciwx 3 ecowx 3
172. 266 Rc addcx 31 D A B OE 10 Rc addex 31 D A B OE 138 Rc addi 14 D A SIMM addic 12 D A SIMM addic 13 D A SIMM addis 15 D A SIMM addmex 31 D A 00000 JOE 234 Rc addzex 31 D A 00000 JOE 202 Rc andx 31 S A B 28 Rc andcx 31 S A B 60 Rc andi 28 S A UIMM andis 29 S A UIMM bx 18 LI AAJLK bcx 16 BO BI BD AA LK bectrx 19 BO BI 00000 528 LK e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Instruction Set Listings Table A 1 Complete Instruction List Sorted by Mnemonic continued Name o 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 bclrx 19 BO BI 00000 16 LK cmp 31 crfD O L A B 0 0 cmpi 11 crfD O L A SIMM cmpl 31 crfD O L A B 32 0 cmpli 10 CD O L A UIMM entizdx 31 S A 00000 58 Rc cntlzwx 31 S A 00000 26 Rc crand 19 crbD crbA crbB 257 0 crandc 19 crbD crbA crbB 129 0 creqv 19 crbD crbA crbB 289 0 crnand 19 crbD crbA crbB 225 0 crnor 19 crbD crbA crbB 33 0 cror 19 crbD crbA crbB 449 0 crorc 19 crbD crbA crbB 417 0 crxor 19 crbD crbA crbB 193 0 dcbf 31 00000 A 86 0 debi 31 00000 B 470 0 dcbst 31 00000 A B 54 0 debt 31 00000 A B 278 0 debtst 31 00000 A B 246 0 dcbz 31 00000 A B 1014 0 divdx 31 D A B OE 489 Re divdux 31 D A B OE 457 Rc divwx 31 D A B OE 491 Rc divwux 31 D A B OE 459 Rc eciwx 3 31 D A B 310 0 ecowx 31 S
173. 3 3 8 ie 19 m 143 1a ala 7 7 e lhota cl olol ilai IPs 6 6 dE I mar olee Figure 7 7 Instruction Timing Cache Miss e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 15 Instruction Timing 7 3 3 Instruction Dispatch and Completion Considerations Several factors affect the ability of the core to dispatch instructions at a peak rate of two per cycle the availability of the execution unit destination rename registers and completion queue as well as the handling of completion serialized instructions Several of these limiting factors are illustrated in the previous instruction timing examples To reduce dispatch unit stalls due to instruction data dependencies the core provides a single entry reservation station for the FPU SRU and each IU and a two entry reservation station for the LSU If a data dependency keeps an instruction from starting execution that instruction is dispatched to the reservation station associated with its execution unit and the rename registers are assigned thereby freeing the positions in the instruction queue so instructions can be dispatched to other execution units Execution begins during the same clock cycle that the rename buffer is updated with the data the instruction is dependent on If both instructions in IQO and IQ1 require the same execution unit the instruction in IQ1 cannot be dispatched until the first instruction proceeds through the pipel
174. 3 7 Memory Operands i 25 c09 4 aes ee EREE ee a es Be A 3 7 Effective Address Calculation EE 3 8 KEE le ET 3 8 COMEXES VNCHONIZATION EE 3 8 Execution Synchronization EE 3 9 Instruction Related Tute gedeebeeiend e eege need end eegk 3 9 Instruction Set Ehre deeg 3 10 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor v Paragraph Number 3 2 4 3 2 4 1 3 2 4 1 1 3 2 4 1 2 3 2 4 1 3 3 2 4 1 4 i ea 4 2 3 2 4 2 1 3242 2 3 2 4 2 3 3 2 4 2 4 3 2 4 2 5 3 2 4 2 6 3 2 4 3 3 2 4 3 1 3 2 4 3 2 3 2 4 3 3 3 2 4 3 4 3 2 4 3 5 3 2 4 3 6 3 2 4 3 7 3 2 4 3 8 3 2 4 3 9 3 2 4 3 10 3 2 4 4 3 2 4 4 1 3 2 4 4 2 3 2 4 4 3 3 2 4 5 3 2 4 6 3 2 4 6 1 3 2 4 7 3 2 5 3 2 5 1 3 2 5 2 3 2 5 3 3 2 6 3 2 6 1 3 2 6 2 3 2 6 2 1 3 2 6 2 2 Page Title Number PowerPC UISA Instructions esssssssssessesssssecssessesssessessssssessessssnessiessrsscsseersesdessrenssset 3 10 Tnt ger Oste eege cuevaede a teia s ieis 3 10 Integer Arithmetic Instructions ssseeeeesesessesesressessrssressessrssresseesreseenseesresereses 3 10 Integer Compare Instructions EE 3 11 tree Logical Otter enee 3 12 Integer Rotate and Shift Instructions 0 ceeceeeeeceeneeceeececeeneeceeeeeceeeeeenteeeesaes 3 12 Flo tings Point ISH UCH ONS eege EE eg hd 3 13 Floating Point Arithmetic Instructions 2 0 0 0 eeesceceseceeeeececeeeeceeeeeceeeeecneeeesaes 3 14 Floating Point Multiply Add Instructions 20 0 0 eeceeescece
175. 3 SC H g 3 3 sl r ri 1 2 T D vs GA A k CO 0 During clock cycle 0 instructions 0 and 1 are dispatched in the beginning of clock cycle 1 1 Inclock cycle 1 instructions 2 and 3 are fetched in the IQ Instruction 2 is a branch instruction that updates the CTR and instruction 3 is a mulhw instruction on which instruction 4 depends Instruction 0 enters the IU Instruction 1 has a single cycle stall 2 In clock cycle 2 instructions 4 a second be instruction and 5 are fetched The second be instruction is predicted as taken It can be folded but it cannot be resolved until instruction 3 writes back Instruction 0 completes at the end of this cycle Instruction 1 is dispatched to the IU Instruction 2 takes entry in the CQ 3 Inclock cycle 3 target instruction TO and T1 are fetched Instructions and 2 complete instruction 4 has been folded and instruction 5 has been flushed from the IQ Instruction 3 is assigned to CQ2 4 Inclock cycle 4 target instructions T2 and T3 are fetched IU instructions TO and T1 have multiple stalls as one execution possible in a clock cycle Instruction 3 is assigned to CQO 5 Inclock cycle 5 instruction 3 on which the second branch instruction depended writes back and the branch prediction is proven incorrect Even though TO is in CQO where it could be written back it is not because the prediction was incorrect All target instructions are flushed from their positions in the pipeline at the end of th
176. 32 bit implementations such as the e300 core The class is determined by examining the primary opcode and the extended opcode if any If either is not that of a defined instruction or of a reserved instruction the instruction is illegal In future versions of the PowerPC architecture instruction codings that are now illegal may become assigned to instructions in the architecture or may be reserved by being assigned to processor specific instructions 3 2 1 1 Definition of Boundedly Undefined If instructions are encoded with incorrectly set bits in reserved fields the results on execution can be said to be boundedly undefined If a user level program executes the incorrectly coded instruction the resulting undefined results are bounded in that a spurious change from user to supervisor state is not allowed and the level of privilege exercised by the program in relation to memory access and other system resources e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Instruction Set Model cannot be exceeded Boundedly undefined results for a given instruction may vary between implementations and between execution attempts in the same implementation 3 2 1 2 Defined Instruction Class Defined instructions are guaranteed to be supported in all PowerPC implementations except as stated in the instruction descriptions in Chapter 8 Instruction Set in the Programming Environments Manual The e300
177. 4 CE Critical interrupt enable e300 implementation specific O Critical interrupts disabled 1 Critical interrupts enabled critical interrupt and rfci instruction enabled The critical interrupt is an asynchronous implementation specific interrupt The critical interrupt vector offset is OxOOAO0 The rfci instruction is implemented to return from these interrupt handlers Also CSRRO and CSRR1 are used to save and restore the processor state for critical interrupts e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Register Model Table 2 4 MSR Bit Settings continued Bits Name Description 25 IP Interrupt prefix The setting of this bit specifies whether an interrupt vector offset is prepended with Fs or Os In the following description nnnnn is the offset of the interrupt See Table 5 2 0 Interrupts are vectored to the physical address 0x000n_nnnn 1 Interrupts are vectored to the physical address OxFFFn_nnnn 26 ID Instruction address translation 0 Instruction address translation is disabled 1 Instruction address translation is enabled See Chapter 6 Memory Management 27 DR _ Data address translation 0 Data address translation is disabled 1 Data address translation is enabled See Chapter 6 Memory Management 28 29 Reserved Full function Bit 29 reserved on e300c1 and e300c2 only 29 PMM Performance monitor mark bit e3
178. 5 4 5 28 trap instr exceptions program interrupt 5 4 5 28 Execution synchronization 3 9 Execution timing cache related latency cache inhibited pages 7 25 i cache arbitration 7 10 i cache hit 7 11 definitions branch folding 7 1 branch prediction 7 1 branch resolution 7 1 completion 7 1 finish 7 1 latency 7 1 retirement 7 2 stall 7 2 throughput 7 2 examples cache hit case 7 12 cache miss case 7 15 7 22 7 23 C 3 execution units 7 3 branch processing unit BPU 7 18 7 21 branch folding 7 1 7 18 branch prediction 7 1 7 19 branch resolution 7 1 7 26 floating point unit FPU 7 23 7 31 integer unit IU 7 3 7 21 7 29 load store unit LSU 7 24 7 32 system register unit SRU 7 24 7 28 7 29 instruction flow 7 9 instruction latency summary 7 28 instruction pipeline stages 7 1 7 2 7 4 7 5 7 8 completion 7 1 7 2 completion considerations 7 16 resource requirements 7 27 dispatch dispatch considerations 7 16 resource requirements 7 27 finish 7 1 write back 7 2 instruction queue IQ 7 9 instruction scheduling guidelines 7 26 memory performance considerations 7 24 memory coherency required M bit 7 25 rename registers 7 2 7 16 reservation stations 7 2 External input int interrupt 5 4 5 25 enabling with MSR EE 5 12 External system logic 9 2 Finish cycle definition 7 1 see also Execution timing Floating point model enabling FP available MSR FP
179. 5 locked 0b111 Ways 0 1 2 3 4 5 and 6 locked e300 Power Architecture Core Family Reference Manual Rev 3 38 Freescale Semiconductor Instruction and Data Cache Operation Table 4 16 shows the HID2 DWLCK 0 2 settings for the e300c2 and e300c3 core embedded processor Table 4 17 e300c2 and e300c3 Core DWLCK 0 2 Encodings DWLCK 0 2 Ways Locked 0b000 No ways locked 0b001 Way 0 locked 0b010 Ways 0 and 1 locked 0b011 Ways 0 1 and 2 locked 0b100 Reserved Ways 0 1 and 2 locked 0b101 Reserved Ways 0 1 and 2 locked 0b110 Reserved Ways 0 1 and 2 locked 0b111 Reserved Ways 0 1 and 2 locked Note that on the e300c2 and e300c3 values greater than 0b011 are reserved but default to the maximum number of ways locked Ways 0 1 and 2 The following assembly code locks way 0 of the e300 core data cache Lock way 0 of the data cache This corresponds to setting dwlck 0 2 0b001 bits 24 26 mfspr rl HID2Z lis r2 OxFFFF ori r2 r2 OxFFIF and Fly Eh 22 ori rl xl 0x0020 sync mtspr HID2 rl isync 4 10 3 1 8 Invalidating the Data Cache Even if Locked There are two methods for invalidating the instruction or data cache e Invalidate the entire cache by setting and then immediately clearing the data cache flash invalidate bit HIDO DCFI bit 21 Even when a cache is locked toggling the DCFI bit invalidates all of the data cache e
180. 7 Ifa match is not found step 6 is repeated for each of the other seven PTEs in the secondary PTEG 8 Ifa match is found the PTE is written into the on chip TLB and the R bit is updated in the PTE in memory if necessary If there is no memory protection violation the C bit is also updated in memory and the table search is complete 9 Ifno match is found in the eight PTEs of the secondary PTEG the search fails and a page fault interrupt condition occurs either an ISI or DSI interrupt Note that the software routines that implement this algorithm must synthesize this condition by appropriately setting the SRR1 or DSISR and branching to the ISI or DSI handler routine Reads from memory for table search operations should be performed as global but not exclusive cacheable operations and can be loaded into the on chip cache Figure 6 9 and Figure 6 10 provide conceptual flow diagrams of primary and secondary page table search operations as described in the OEA for 32 bit processors Recall that the architecture allows implementations to perform the page table search operations automatically in hardware or with software assistance may be required as is the case with the e300 core Also the elements in the figure that apply to TLBs are shown as optional because TLBs are not required by the architecture e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 27 Memory Management Primary Page
181. 7 2 1 30 2 1 2 12 2 20 Chapter 2 2 1 1 2 5 2 1 1 2 6 2 1 1 2 8 Changes Add preface Remove references to softstop Add indexing Add Appendix C Revision History Change references of e300v1 to e300c1 to denote e300 configuration 1 Change references of e300 to e300c1 where an implementation specific feature was explained Change term exception to interrupt where applicable Remove caveats that debi should not be used on e300 Change references to JTAG COP to JTAG debug Add references to Instruction Cache Block Touch icbt instruction Change wording to describe references to e300 and e300c1 Change LRU to PLRU when describing cache replacement policy Change LRU to PLRU when describing cache replacement policy Remove bullet describing the EAR register and the eciwx ecowx instructions no longer supported Include other address breakpoint registers as SPRs in the e300 Change Table 1 2 to show that ISI exceptions are not caused when an instruction fetch cannot be performed when the fetch accesses a direct store segment because direct stores are no longer supported Change Table 1 2 to show that Imw and stmw do not cause an alignment exception when in true little endian mode Change Table 1 2 to show that alignment exceptions are not caused when crossing into a direct store segment because direct stores are no longer supported Change signals description to reflect a simpler introducti
182. 8 Re fdivsx 59 D A B 00000 18 Re fmaddx 63 D A B C 29 Rc fmaddsx 59 D A B C 29 Rc fmsubx 63 D A B C 28 Rc fmsubsx 59 D A B C 28 Rc fmulx 63 D A 00000 C 25 Rc fmulsx 59 D A 00000 C 25 Rc fnmaddx 63 D A B C 31 Rc fnmaddsx 59 D A B C 31 Rc fnmsubx 63 D A B C 30 Re fnmsubsx 59 D A B C 30 Rc fresx 59 D 00000 B 00000 24 Rc frsqrtex 63 D 00000 B 00000 26 Rc fselx 63 D A B C 23 Rc fsqrtx 63 D 00000 B 00000 22 Rc fsqrtsx 59 D 00000 B 00000 22 Rc fsubx 63 D A B 00000 20 Rc fsubsx 59 D A B 00000 20 Rc e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 33 Instruction Set Listings l Optional in the PowerPC architecture Table A 42 M Form Name 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD A SH MB ME Rc OPCD S A B MB ME Rc Specific Instructions rlwimix 20 S A SH MB ME Rc rlwinmx 21 A SH MB ME Re rlwnmx 23 S A B MB ME Rc Table A 43 MD Form Name 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD S A sh mb XO ebe OPCD A sh me XO ebe Specific Instructions ridicx 30 S A sh mb 2 sh Re ridiclx 30 S A sh mb D sh Re rldicrx 30 S A sh me 1 sh Re ridimix 30 S A sh mb 3 sh Re 1 64 bit instruction Table A 44 MDS Form Name 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
183. 8 VXISI Floating point invalid operation exception for This is a sticky bit 9 VXIDI Floating point invalid operation exception for This is a sticky bit 10 VXZDZ_ Floating point invalid operation exception for O 0 This is a sticky bit 11 VXIMZ_ Floating point invalid operation exception for 0 This is a sticky bit 12 VXVC_ Floating point invalid operation exception for invalid compare This is a sticky bit 13 FR Floating point fraction rounded The last arithmetic rounding or conversion instruction incremented the fraction This bit is not sticky 14 Fl Floating point fraction inexact The last arithmetic rounding or conversion instruction either produced an inexact result during rounding or caused a disabled overflow exception This is not a sticky bit For more information regarding the relationship between FPSCR FI and FPSCR XX see the description of the FPSCR XX bit e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Register Model Table 2 1 FPSCR Bit Settings continued Bits Name Description 15 19 FPRF Floating point result flags For arithmetic rounding and conversion instructions FPRF is based on the result placed into the target register except that if any portion of the result is undefined the value placed here is undefined 15 Floating point result class descriptor C Arithmetic rounding and conv
184. 9 20 21 22 23 24 25 26 27 28 29 30 31 fabsx 63 D 00000 B 264 Re fmrx 63 D 00000 B 72 Re fnabsx 63 D 00000 B 136 Re fnegx 63 D 00000 B 40 Re Table A 22 Branch Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 bx 18 LI AA LK bcx 16 BO BI BD AA LK bectrx 19 BO Bl 00000 528 LK belrx 19 BO BI 00000 16 LK Table A 23 Condition Register Logical Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 crand 19 crbD crbA crbB 257 0 crandc 19 crbD crbA crbB 129 0 creqv 19 crbD crbA crbB 289 0 crnand 19 crbD crbA crbB 225 0 crnor 19 crbD crbA crbB 33 0 cror 19 crbD crbA crbB 449 0 crorc 19 crbD crbA crbB 417 0 crxor 19 crbD crbA crbB 193 0 merf 19 crfD 00 crfS 00 00000 0000000000 0 Table A 24 System Linkage Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 rfi 1 19 00000 00000 00000 50 0 sc 17 00000 00000 000000000000000 1 0 d Supervisor level instruction e300 Power Architecture Core Family Reference Manual Rev 3 22 Freescale Semiconductor Instruction Set Listings Table A 25 Trap Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 td 31 TO A B 68 0 tdi 1 03 TO A SIMM tw 31 TO A B 4 0
185. A lt rB TLB entry created from DCMP and RPA DTLB entry selected by EA 15 19 and SRR1 WAY lt created TLB entry The EA is the contents of rB The tlbld instruction loads the contents of the data PTE compare DCMP and required physical address RPA registers into the first word of the selected data TLB entry The specific DTLB entry to be loaded is selected by the EA and SRR1 WAY bit The tlbld instruction should only be executed when address translation is disabled MSR IR 0 and MSR DR 0 Note that it is possible to execute the tlbld instruction when address translation is enabled however extreme caution should be used in doing so If data address translation is set MSR DR 1 tlbld must be preceded by a sync instruction and succeeded by a context synchronizing instruction Also note that care should be taken to avoid modification of the instruction TLB entries that translate current instruction prefetch addresses This is a supervisor level instruction it is also an e300 core specific instruction and not part of the PowerPC instruction set Other registers altered e None e300 Power Architecture Core Family Reference Manual Rev 3 36 Freescale Semiconductor Instruction Set Model tibli tlbli Load Instruction TLB Entry Integer Unit tlbld rB 0 5 6 10 11 15 16 20 21 31 31 00000 00000 B 1010 0 EA lt rB TLB entry created from ICMP and RPA ITLB entry selected by E
186. A 15 19 and SRR1 WAY lt created TLB entry The EA is the contents of rB The tlbli instruction loads the contents of the instruction PTE compare ICMP and required physical address RPA registers into the first word of the selected instruction TLB entry The specific ITLB entry to be loaded is selected by the EA and SRR1 WAY bit The tlbli instruction should only be executed when address translation is disabled MSR IR 0 and MSR DR 0 Note that it is possible to execute the tlbld instruction when address translation is enabled however extreme caution should be used in doing so If instruction address translation is set MSR IR 1 tlbli must be followed by a context synchronizing instruction such as isync or rfi Also note that care should be taken to avoid modification of the instruction TLB entries that translate current instruction prefetch addresses This is a supervisor level instruction it is also an e300 core specific instruction and not part of the PowerPC instruction set Other registers altered e None e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 37 Instruction Set Model rfci rfci Return from Critical Interrupt 0 5 10 11 15 16 20 21 30 31 19 00 000 0 0000 0000 0 51 0 MSR 16 27 30 31 lt CSRR1 16 27 30 31 NIA iea CS RRO 0 29 Ob00 Bits CSRR1 16 27 30 31 are placed into the corresponding bits of the MSR
187. ABR2 values must not be set to match within the instruction address breakpoint interrupt handler and the DABR or DABR2 values must not be set to match within the DSI interrupt handler Setting a breakpoint within the instruction address breakpoint interrupt or DSI handler may result in an unrecoverable and indeterminate processor core state e300 Power Architecture Core Family Reference Manual Rev 3 6 Freescale Semiconductor Debug Features If an IABR match and DABR match occur on the same instruction the instruction address breakpoint interrupt is taken before the DSI interrupt If an IABR match occurs on a branch instruction the instruction address breakpoint interrupt is set to the effective address of the branch instruction e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Debug Features e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Chapter 11 Performance Monitor This chapter describes the performance monitor as implemented in the e300c3 core The programming model is similar to that defined by the EIS some features are defined by the e300c3 implementation in particular the events that can be counted 11 1 Overview The performance monitor provides the ability to count predefined events and processor clocks associated with particular operations for example cache misses mispredicted branches or the number of cycles an execu
188. ASH2 If the PTE is also not found in the eight entries of the secondary page table a page fault condition exists and a page fault interrupt must be synthesized Thus the appropriate bits must be set in SRR1 or DSISR and the TLB miss handler must branch to either the ISI or DSI interrupt handler which handles the page fault condition The following section provides a flow diagram outlining some example software that can be used to handle the three TLB miss interrupts and sample assembly language that implements that flow 6 5 2 2 1 Flow for Example Interrupt Handlers Figure 6 15 shows the flow for the example TLB miss interrupt handlers The flow shown is common for the three interrupt handlers except that the IMISS and ICMP registers are used for the instruction TLB miss interrupt while the DMISS and DCMP registers are used for the two data TLB miss interrupts Also for the cases of store instructions that cause either a TLB miss or require a table search operation to update the C bit the flow shows that the C bit is set in both the TLB entry and PTE in memory Finally in the case of a page fault no PTE found in the table search operation the setup for the ISI or DSI interrupt is slightly different for these two cases e300 Power Architecture Core Family Reference Manual Rev 3 34 Freescale Semiconductor Memory Management TLB Miss Interrupt Save Old Counter and CRO Bits Set Counter cnt lt 8 Load Primary PTEG Pointe
189. B 792 Rc srawix 31 S A SH 824 Rc srdx 31 S A B 539 Rc srwx 31 S A B 536 Rc stbux 31 S A B 247 0 stbx 31 S A B 215 0 stdex 31 S A B 214 1 stdux 31 S A B 181 0 stdx 31 S A B 149 0 stfdux 31 S A B 759 0 stfdx 31 S A B 727 0 stfiwx 4 31 S A B 983 0 stfsux 31 S A B 695 0 stfsx 31 S A B 663 0 sthbrx 31 S A B 918 0 sthux 31 S A B 439 0 sthx 31 S A B 407 0 stswi 7 31 S A NB 725 0 stswx 3 31 S A B 661 0 stwbrx 31 S A B 662 0 stwcx 31 S A B 150 1 stwux 31 S A B 183 0 stwx 31 S A B 151 0 sync 31 00000 00000 00000 598 0 td 31 TO A B 68 0 tlbia 24 31 00000 00000 00000 370 0 tlbie 2 4 31 00000 00000 B 306 0 tlbld 2 5 31 00000 00000 B 978 0 tlbli gt 5 31 00000 00000 B 1010 0 tlbsync 4 31 00000 00000 00000 566 0 tw 31 TO A B 4 0 xorx 31 S A B 316 Rc 1 64 bit instruction 2 Supervisor and user level instruction 3 Load and store string or multiple instruction 4 Optional in the PowerPC architecture e300 Power Architecture Core Family Reference Manual Rev 3 30 Freescale Semiconductor Instruction Set Listings 5 e300 core implementation specific instruction Table A 36 XL Form Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD BO BI 00000 XO LK OPCD crbD crbA crbB XO 0 OPCD crfD 00 crfS 00 00000 XO 0 OPCD 00000 00000 00000 XO 0 Spec
190. B entry requires updating See Section 6 4 1 3 Scenarios for Reference and Change Bit Recording for more details e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Memory Management 6 1 6 General Flow of MMU Address Translation The following sections describe the general flow used by processors that implement the PowerPC architecture to translate effective addresses to virtual and then physical addresses 6 1 6 1 Real Addressing Mode and Block Address Translation Selection When an instruction or data access is generated and the corresponding instruction or data translation is disabled MSR IR 0 or MSR DR 0 real addressing mode translation is used physical address equals effective address and the access continues to the memory subsystem as described in Section 6 2 Real Addressing Mode Figure 6 5 shows the flow used by the MMUs in determining whether to select real addressing mode block address translation or to use the segment descriptor to select page address translation Note that if the BAT array search results in a hit the access is qualified with the appropriate protection bits If the access violates the protection mechanism an interrupt ISI or DSI interrupt is generated Effective Address Generated l Access D Access Instruction Wetten Dat Data Translation Disabled ege lon A ata Translation Disabled MSR IR 0 Translation Enabled Translation Enabl
191. BAT7U o 14 15 18 19 29 30 31 R BEPI BL Vs Vp W Reset All zeros Figure 2 12 Upper BAT Register SPR 529 IBATOL 537 DBATOL Access Supervisor read write 531 IBAT1L 539 DBAT1L 533 IBAT2L 541 DBAT2L 535 IBAT3L 543 DBAT3L 561 IBAT4L 569 DBAT4L 563 IBAT5L 571 DBAT5L 565 IBAT6L 573 DBAT6L 567 IBAT7L 575 DBAT7UI 0 14 15 24 25 28 29 30 31 R BRPN WIMG PP W Reset All zeros 1 W and G bits are not defined for IBAT registers Attempting to write to these bits causes boundedly undefined results Figure 2 13 Lower BAT Register The BAT registers contain the effective to physical address mappings for blocks of memory This mapping includes the effective address bits that are compared with the effective address of the access the memory cache access mode bits WIMG and the protection bits for the block The size of the block and the starting address of the block are defined by the physical block number BRPN and block size mask BL fields The sixteen new BAT registers are enabled by HID2 HBE However regardless of the setting of this bit the BAT registers are accessible by the mfspr and mtspr instructions and are only accessible to supervisor level programs See Section 2 2 3 Hardware Implementation Register 2 HID2 for more information on the HBE bit 2 2 9 Critical Interrupt Save Restore Register 0 CSRRO CSRRO is used to save the return address of critical interrupts I
192. C ID This required field is used to identify the SoC device 16 19 PROC Process revision field This optional field is used to indicate different process revisions of the SoC 20 23 MFG _ Manufacturing revision This optional field identifies uniquely different manufacturing revisions of the SoC 24 27 MJREV Major SoC design revision indicator This is a required field 28 31 MNREV Minor SoC design revision indicator This is a required field 1 The SoC value is an optional field assigned by the SoC design integrator 2 2 13 System Memory Base Address MBAR The core implements a new memory base address register MBAR to support the system level memory map The MBAR can be accessed with mtspr or mfspr using SPR311 in supervisor mode The present memory base address for the system memory map is stored in this register It is important to ensure that the present value of the base offset is current in the system memory 2 2 14 Instruction Address Breakpoint Registers ADR and IABR2 The IABR shown in Figure 2 19 controls the instruction address breakpoint interrupt In the core an additional address breakpoint register IABR2 is implemented ABR CEA holds an effective address to which each instruction s address is compared The interrupt is enabled by setting IABR BE The interrupt is taken when there is an instruction address breakpoint match on the next instruction to complete The instruction
193. CMP and RPA registers into the ITLB specific tlbid Load Data TLB Entry implementation Loads the contents of the DCMP and RPA registers into the DTLB specific 1 These instructions are defined by the PowerPC architecture but are optional Table 6 6 summarizes the registers that the operating system uses to program the e300 core MMUs These registers are accessible to supervisor level software only These registers are described in Chapter 2 Register Set in the Programming Environments Manual For e300 core specific registers see Chapter 2 Register Model of this book Table 6 6 MMU Registers Register Description Segment registers SRO SR15 The sixteen 32 bit segment registers are present only in 32 bit implementations of the PowerPC architecture The fields in the segment register are interpreted differently depending on the value of bit 0 The segment registers are accessed by the mtsr mtsrin mfsr and mfsrin instructions BAT registers e300 IBATOU IBAT7U IBATOL IBAT7L DBATOU DBAT7U and DBATOL DBAT7L The e300 core has 32 BAT registers organized as 8 pairs of instruction BAT registers IBATOU IBAT7U paired with IBATOL IBAT7L and 8 pairs of data BAT registers DBATOU DBAT7U paired with DBATOL DBAT7L The BAT registers are defined as 32 bit registers in 32 bit implementations These are special purpose registers that are accessed by the mtspr and mfsp
194. DO ILOCK will lock all ways in all e300 cores 19 ICWP Instruction cache way protection Used to protect locked ways in the instruction cache from being invalidated 0 Instruction cache way protection disabled 1 Instruction cache way protection enabled 20 23 Reserved should be cleared e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 17 Register Model Table 2 8 e300 HID2 Field Descriptions continued Bits Name Description 24 26 DWLCK 0 2 Data cache way lock Useful for locking blocks of data into the data cache for time critical applications where deterministic behavior is required 000 no ways locked 001 way 0 locked 010 way 0 through way 1 locked 011 way 0 through way 2 locked 100 way 0 through way 3 locked in e300c1 Way 0 through way 2 locked in e300c2 e300c3 101 way 0 through way 4 locked in e300c1 Way 0 through way 2 locked in e300c2 e300c3 110 way 0 through way 5 locked in e300c1 Way 0 through way 2 locked in e300c2 e300c3 111 way 0 through way 6 locked in e300c1 Way 0 through way 2 locked in e300c2 e300c3 Setting HIDO DLOCK will lock all ways in all e300cores 27 31 Reserved should be cleared 2 2 4 Data and Instruction TLB Miss Address Registers DMISS and IMISS DMISS and IMISS shown in Figure 2 8 are loaded automatically on a data or instruction TLB miss DMISS and IMISS contain the effective address of the a
195. Dy Hard E E 5 19 5 13 Soft Reset Interrupt Register Settings c scssvssadiscseccatssscosdesasssceceastecdesdesdaasanusececvansccedaseteeaye 5 20 5 14 Machine Check Interrupt Register Settings sssessssssesssesesseeessressresseesseeesseesseesseesseeessees 5 22 5 15 DSI Interrupt Register Settings see 5 23 5 16 External Interr pt Ree ister Settings ee Eed 5 25 5 17 Alignment Interrupt Register Settings eessesesssseseesessresressessrerrtsstsstesrersteseesrrssreseeseresreses 5 26 5 18 KEE 5 27 5 19 Critical Interrupt Register Settings esesseesssseesseesseesseeesseessstessressersseessseeesseesresseeesseeessets 5 31 5 20 Trace Interrupt Register Sete s issceacds coasaetivsdcasdeny acassasien sss sedaaasedactecandendy neceetendepece 5 32 5 21 Instruction and Data TLB Miss Interrupts Register Setting 5 34 5 22 Instruction Address Breakpoint Interrupt Reegister Settings 0 0 ee eeeceeeeereeeeteeeeeteeeeeees 5 35 5 23 Breakpoint Action for Multiple Modes Enabled for the Same Address 5 35 5 24 System Management Interrupt Register Settngs 0 ee eeeeseeeseceeeeeeeceaeeeeeeeeeeeeaeessaeens 5 36 6 1 MMU Features Summary NEESS ee REESEN EAE EE Seed 6 2 6 2 Access Protection Options for Pages geet eueeeeete geen eodege EEGEN E ent 6 9 6 3 Translation Exception Comditions viscc issjcccssoaiaccediscceessdcedeawasdeaces onsdec sa desdsananusede svsnddeessaeneanes 6 14 6 4 Other MMU Exception CondipOns cniautis ana
196. FCU FCM1 FCM0 CE EVENT Reset All zeros Figure 11 2 Local Control A Registers PMLCa0 PMLCa3 User Local Control A Registers UPMLCa0 UPMLCa3 e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Performance Monitor PMLCa registers are cleared by a hard reset Table 11 4 describes PMLCa fields Table 11 4 PMLCa0 PMLCa3 Field Descriptions Bits Name Description 0 FC Freeze counter 0 The PMC can be incremented if enabled by other performance monitor control fields 1 The PMC cannot be incremented 1 FCS Freeze counter in supervisor state 0 The PMC can be incremented if enabled by other performance monitor control fields 1 The PMC cannot be incremented if MSR PR is cleared 2 FCU Freeze counter in user state 0 The PMC can be incremented if enabled by other performance monitor control fields 1 The PMC cannot be incremented if MSR PR is set 3 FCM1 Freeze counter while mark is set 0 The PMC can be incremented if enabled by other performance monitor control fields 1 The PMC cannot be incremented if MSR PMM is set 4 FCMO Freeze counter while mark is cleared 0 The PMC can be incremented if enabled by other performance monitor control fields 1 The PMC cannot be incremented if MSR PMM is cleared 537 CE Condition enable 0 Overflow conditions for PMCn cannot occur PMCn cannot cause interrupts or freeze counters 1 Overflow condition
197. IDO EMCP is cleared Change statement describing a DSI interrupt caused by a translation error or protection violation Instead of stating that the DAR points to the byte address of the offending page now states that entire instruction should be re executed This is due to a previously documented errata Remove load multiple instructions as instructions that can be partially executed on a DSI interrupt e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Revision History 5 5 4 5 25 Remove statement that an ISI interrupt occurs when an attempt is made to fetch an instruction from guarded memory when MSR IR 1 because no ISI occurs with guarded bit on 5 5 6 5 26 Remove Imw and stmw from list of instructions mentioned that cause an alignment exception when in little endian mode 5 0 5 1 Add reference to CSRRO CSRR1 as save restore registers for critical interrupts 5 1 5 4 Remove references to direct store segments from Table 5 2 5 2 1 1 5 10 Change Table 5 4 on SRR1 bit settings on an MCE to match the settings on Table 5 14 5 5 3 5 23 Remove mention of direct stores in description of bit 0 of DSISR in Table 5 14 Chapter 6 Change terminology to Reference and Change bits from Referenced and Changed bits 10 1 10 1 Add mention of address breakpoint interrupt and DSI interrupt when breakpoint conditions are met 10 1 6 10 3 Remove Software Debug section as it was beyond the scope of this doc
198. IMG bits as described in Section 4 4 1 Memory Cache Access Attributes WIMG Bits The following sections describe the HIDO HID2 register controls and the cache control instructions 4 5 1 Cache Control Parameters in HIDO and HID2 The L1 caches are controlled by programming specific bits in the HIDO and HID2 special purpose registers This section describes the HID register cache control bits 4 5 1 1 Cache Parity Error Reporting HIDO ECPE Both instruction and data caches support parity generation checking and reporting Cache parity generation is always on and stored in the cache but the reporting of cache parity errors is enabled or disabled by using the enable cache parity error bit HIDO ECPE When cache parity errors are enabled instruction and data cache parity errors are reported by a machine check interrupt 4 5 1 2 Data Cache Enable HIDO DCE The data cache may be disabled by using the data cache enable bit HIDO DCE DCE is cleared on power up disabling the data cache When the data cache is in the disabled state HIDO DCE 0 the cache tag state bits are ignored and all accesses are propagated to the CSB as single beat transactions e300 Power Architecture Core Family Reference Manual Rev 3 14 Freescale Semiconductor Instruction and Data Cache Operation The changing of the DCE bit must be preceded by a sync instruction to prevent the cache from being enabled or disabled in the middle of a data acces
199. If the new MSR value does not enable any pending interrupts then the next instruction is fetched under control of the new MSR value from the address CSRRO 0 29 Il 0b00 If the new MSR value enables one or more pending interrupts the interrupt associated with the highest priority pending interrupt is generated in this case the value placed into CSRRO by the interrupt processing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred Note that an implementation may define additional MSR bits and in this case may also cause them to be saved to CSRR1 from MSR on an interrupt and restored to MSR from CSRRI on an rfci This is a supervisor level context synchronizing instruction This instruction is defined only for 32 bit implementations Other registers altered e MSR e300 Power Architecture Core Family Reference Manual Rev 3 38 Freescale Semiconductor Instruction Set Model The following instruction has been added to support performance monitor operations mfpmr mfpmr Move From Performance Monitor Register Integer Unit mfpmr rD PMRN O 1 1 1 1 1 rD PMRNS 9 PMRNo 4 o 1 0 10 01 1 Io o 5 6 10 11 15 16 20 21 31 GPR rD lt PMREG PMRN PMRN denotes a performance monitor register as listed in Table 11 1 and Table 11 2 The contents of the designated performance monitor register are placed into GPR rD When MSR PR 1 specifying a
200. Implementation Notes The following describes how the e300 core handles alignment and misaligned accesses e The core provides hardware support for some misaligned memory accesses However misaligned accesses suffer a performance degradation compared to aligned accesses of the same type e300 Power Architecture Core Family Reference Manual Rev 3 2 Freescale Semiconductor Instruction Set Model e The core does not provide hardware support for floating point load store operations that are not word aligned In such a case the core invokes an alignment interrupt and the interrupt handler must break up the misaligned access For this reason floating point single and double word accesses should always be word aligned Note that a floating point double word access on a word aligned boundary requires an extra cycle to complete The e300c2 core does not support any floating point operations Any half word word double word and string reference access that crosses an alignment boundary must be broken into multiple discrete accesses For string accesses the hardware makes no attempt to get aligned to reduce the number of accesses Multiple word accesses are architecturally required to be aligned The resulting performance degradation depends on how well each individual access behaves with respect to the memory hierarchy At a minimum additional cache access cycles are required More dramatically each discrete access to a noncacheable page in
201. Locking csccssccssssscssssscsesssssssseessacssensncesnacescasesennes 4 40 4 19 MSR Bits for Disabling Interrupts c casivsecesisscyeateassaccesavecesastavseaceascecedenspeecdaseasnetessaeasnasdeveaye 4 4 4 20 es00c Gore TWECKIO 2 Sem O Cin gs 255 05 i ones uae iusto EE aed aden 4 43 4 21 e300c2 Core IWLCK 0 2 BMC OMIM GS vg c cass anus coaacaseqaceaessecedsaveecetedesasdeaase coda nedonateoaedagses 4 44 5 1 Interrupt Classiticauions vie 2 sateen Eed ee 5 3 5 2 Interrupts and Exception COnditiOns yc scciereceracavedgesesdaladeas eelere DAN e eier 5 3 5 3 Int rr pt Priorities buede Eege ional dag oad E i sees dada snndea Eddie 5 6 5 4 SRR1 Bit Settings for Machine Check Interrupts 0 eee ceeecceeeseceeeneceesteeeenaeceenaeeeesaeeeeaeeees 5 9 5 5 SRR1 Bit Settings for Program Interrupts 0000 00 eeeececsseceseececeecceceecceceeececseeecseeeesteeeenaeees 5 9 5 6 SRR1 Bit Settings for Software Table Search Operations 0 ceceeeseeeeseeceeeceeeeeeeeeteeeenaes 5 10 5 7 Conventional Uses of SPRGO SPRGF A esdeekigiage dgt ege Eege eege 5 11 e300 Power Architecture Core Family Reference Manual Rev 3 xviii Freescale Semiconductor Tables Table Page Number Title Number 5 8 MSR Bit Setting S serie asror eien an sa n E EEA E E A N E A E R 5 12 5 9 IEEE Floating Point Exception Mode Butz 5 14 5 10 MSR Setting Dueto e EE 5 17 5 11 Hard Reset MSR Value and Interrupt Vector ccc dnt geed 5 18 5 12 Settimes Caused
202. Management Table 6 3 Translation Exception Conditions continued Exception Condition Description Interrupt No execute protection violation Attempt to fetch instruction when SR N 1 ISI interrupt SRR1 3 1 Instruction fetch from direct store Attempt to fetch instruction when SR T 1 ISI interrupt segment SRR1 3 1 Data access to direct store segment Attempt to perform load or store including DSI interrupt including floating point accesses floating point load or store when SR T 1 DSISR 5 1 Instruction fetch from guarded memory Attempt to fetch instruction when MSR IR 1 and ISI interrupt with MSR IR 1 either matching xBAT G 1 or no matching BAT SRR1 3 1 entry and PTE G 1 1 The e300 core hardware does not vector to these interrupts automatically It is assumed that the software that performs the table search operation vectors to these interrupts and sets the appropriate bits when a page fault condition occurs 2 The table search software can also vector to these exception conditions 3 Floating point not supported on the e300c2 In addition to the translation exceptions there are other MMU related conditions some of them defined as implementation specific and therefore not required by the architecture that can cause an interrupt to occur in the core These exception conditions map to the processor interrupt as shown in Table 6 4 For example the core also defines three ex
203. Model Table 3 15 Integer Store Instructions continued Name Mnemonic Operand Syntax Store Word with Update stwu rS d rA Store Word with Update Indexed stwux rS rA rB 3 2 4 3 5 Integer Load and Store with Byte Reverse Instructions Table 3 16 describes integer load and store with byte reverse instructions When used in a system operating with the default big endian byte order these instructions have the effect of loading and storing data in little endian order Likewise when used in a system operating with little endian byte order these instructions have the effect of loading and storing data in big endian order When operating with true little endian byte order these instructions have the effect of loading and storing data in true little endian order For more information about big and little endian byte ordering see Section 3 1 2 Byte Ordering in the Programming Environments Manual For more information about true little endian operation see Section 3 1 2 Endian Modes and Byte Ordering The e300 core supports the true little endian mode In true little endian mode the core treats the memory and I O subsystems as little endian memory In this case instruction and data bytes are reserved as follows e The byte reversing for instruction accesses occurs before the instruction is decoded e The byte reversing occurs for data accesses when the data item is being moved to or from the GPR Therefore byte rever
204. O but cannot be dispatched ahead of one in IQO Instruction state and all information required for completion is kept in the five entry FIFO completion queue A completion queue entry is allocated for each instruction when it is dispatched to an execute unit if no entry is available the dispatch unit stalls A maximum of two instructions per cycle may be completed and retired from the completion queue and the flow of instructions can stall when a longer latency instruction reaches the last position in the completion queue Store instructions and instructions executed by the FPU and SRU except for of integer add and compare instructions can only be retired from the last position in the completion queue Subsequent instructions cannot be completed and retired until that longer latency instruction completes and retires Examples of this are shown in Section 7 3 2 2 Cache Hit and Section 7 3 2 3 Cache Miss The rate of instruction completion is also affected by the ability to write instruction results from the rename registers to the architected registers The core can perform two write back operations from the rename registers to the GPRs each clock cycle but can perform only one write back per cycle to the CR FPR LR and CTR 7 3 2 Instruction Fetch Timing Instruction fetch latency depends on the fetch hits of the on chip instruction cache If no hit occurs a memory transaction is required in which case fetch latency is affecte
205. O accesses are assumed to be memory mapped In addition the MMU provides access protection on a segment block or page basis This chapter describes the specific hardware used to implement the MMU model of the OEA in the core Refer to Chapter 7 Memory Management in the Programming Environments Manual for a complete description of the conceptual model Two general types of accesses generated by processors that implement the PowerPC architecture require address translation instruction accesses and data accesses to memory generated by load and store instructions Generally the address translation mechanism is defined in terms of segment descriptors and page tables defined by the PowerPC architecture for locating the effective to physical address mapping for instruction and data accesses The segment information translates the effective address to an interim virtual address and the page table information translates the virtual address to a physical address The segment descriptors used to generate the interim virtual addresses are stored as on chip segment registers on 32 bit implementations such as the e300 core In addition two translation lookaside buffers TLBs are implemented on the core to keep recently used page address translations on chip Although the OEA describes one MMU conceptually the core hardware maintains separate TLBs and table search resources for instruction and data accesses that can be accessed independently and s
206. Otherwise PTE VSID API H V Segment Descriptor VSID EA API 1 1 Otherwise Secondary Page Table Search Hit Last PTE in PTEG See Figure 6 9 ne Fault Instruction Access Data Access Set SRR1 1 1 Set DSISR 1 1 ISI Interrupt Figure 6 10 Secondary Page Table Search Flow Conceptual Flow DSI Interrupt Figure 6 9 shows the case of a debz instruction that is executed with W 1 or I 1 and that the R bit may be updated in memory if required before the operation is performed or the alignment interrupt occurs The R bit may also be updated by a memory protection violation 6 5 2 Implementation Specific Table Search Operation The e300 core has a set of implementation specific registers interrupts and instructions that facilitate very efficient software searching of the page tables in memory This section describes those resources and provides three example code sequences that can be used in an e300 core system for an efficient search of the translation tables in software These three code sequences can be used as handlers for the three interrupts requiring access to the PTEs in the page tables in memory instruction TLB miss data TLB miss on load and data TLB miss on store interrupts 6 5 2 1 Resources for Table Search Operations In addition to setting up the translation page tables in memory the system software must assist the processor in loading PTEs into the on chip TLBs When a required
207. P General Events Ref 0 Nothing Nonspec Register counter holds current value Ref 1 Processor cycles Nonspec Every processor cycle Ref 2 Instructions completed Nonspec Completed instructions 0 1 or 2 per cycle Com 4 Instructions fetched Spec Fetched instructions 0 1 2 3 or 4 per cycle instructions written to the IQ Com 6 PM_EVENT transitions Spec 0 to 1 transitions on the pm_event input Com 7 PM_EVENT cycles Spec Processor bus cycles that occur when the pm event input is asserted Instruction Types Completed Com 8 Branch instructions completed Nonspec Completed branch instructions Com 9 Load completed Nonspec Completed load I load update 1 load load multiple 1 32 debt L1 CT 0 and debtst L1 CT 0 Com 10 Store completed Nonspec Completed store st store update 1 store store multiple 1 32 icbi dcbf dcbst dcbt CT 1 dcbtst CT 1 dcbz icbt CT 1 Branch Prediction and Execution Events Com 12 Branches finished Spec Includes all branch instructions includes folded branches Com 13 Taken branches finished Spec Includes all taken branch instructions includes folded branches Com 15 Branches mispredicted for any Spec Counts branch instructions mispredicted due to direction target for reason example if the CTR contents change or IAB prediction Does not count instructions that the branch predictor incorrectly predicted to be branches Pipeline Stalls Com 18 Cycles decode stal
208. PI W Reset All zeros Figure 2 9 DCMP and ICMP Registers Table 2 9 describes the bit settings for the DCMP and ICMP registers Table 2 9 DCMP and ICMP Bit Settings Bits Name Description 0 V Valid bit Set by the processor on a TLB miss interrupt 1 24 VSID Virtual segment ID Copied from VSID field of corresponding segment register 25 H Hash function identifier Cleared by the processor on a TLB miss interrupt 26 31 API Abbreviated page index Copied from API of effective address 2 2 6 Primary and Secondary Hash Address Registers HASH1 and HASH2 HASH1 and HASH2 shown in Figure 2 10 contain the physical addresses of the primary and secondary PTEGs respectively for the access that caused the TLB miss interrupt For convenience the device automatically constructs the full physical address by routing SDR1 bits 0 6 into HASH1 and HASH2 and clearing the lower 6 address bits These read only registers are constructed from the DMISS or IMISS contents the register choice is determined by which miss most recently occurred SPR 978 HASH1 Access Supervisor read only SPR 979 HASH2 o 6 7 25 26 31 R HTABORG Hashed Page Address W Reset All zeros Figure 2 10 HASH1 and HASH2 Registers Table 2 10 describes the bit settings of the HASH1 and HASH2 registers Table 2 10 HASH1 and HASH2 Bit Settings Bits Name Description 0 6 HTABORG Copy of the
209. PRG4 SPRG7 are not supported in the G2 core SPR 272 SPRGO 276 SPRG4 Access Supervisor read write 273 SPRG1 277 SPRG5 274 SPRG2 278 SPRG6 275 SPRG3 279 SPRGQ 0 31 R SPRGn W Reset All zeros Figure 2 17 SPRGn Register For information on conventional uses for SPRG4 SPRG7 refer to Section 5 2 1 3 SPRG0 SPRG7 e300 Power Architecture Core Family Reference Manual Rev 3 22 Freescale Semiconductor Register Model 2 2 12 System Version Register SVR The system version register SVR is a 32 bit read only register that identifies the specific version model and revision level of the system on a chip SoC Supervisor mode write access is reserved for future use Figure 2 18 shows an implementation of the SVR although it should be noted that this register is determined by the SoC SPR 286 Access Supervisor read only 0 31 R SVR Ww Rest 1 0 00 0 000 0 0 0000 0 0000 00000 0000 0 000 Figure 2 18 SVR Register The SVR can be accessed with mfspr using SPR286 The bits in SVR are defined in Table 2 12 Note that all bits within this register are guaranteed to be configured by the SOC and unused bits are cleared to zero Also SVR4 SVR15 are control fields for this register Table 2 12 System Version Register SVR Bit Settings Bits Name Description 0 3 CID Company or manufacturer ID These bits are required Bit O must set to 1 4 15 SID So
210. Programming Model Registers The following sections describe the e300 core implementation specific features as they apply to registers 1 3 1 1 UISA Registers UISA registers are user level registers that include the following 1 3 1 1 1 General Purpose Registers GPRs The PowerPC architecture defines 32 user level GPRs that are 32 bits wide in 32 bit cores The GPRs serve as the data source or destination for all integer instructions 1 3 1 1 2 Floating Point Registers FPRs The PowerPC architecture also defines 32 user level 64 bit FPRs The FPRs serve as the data source or destination for floating point instructions These registers can contain data objects of either single or double precision floating point formats PPR are not included in the e300c2 core 1 3 1 1 3 Condition Register CR The CR is a 32 bit user level register that provides a mechanism for testing and branching It consists of eight 4 bit fields that reflect the results of certain operations such as move integer and floating point comparisons arithmetic and logical operations 1 3 1 1 4 Floating Point Status and Control Register FPSCR The user level FPSCR contains all floating point exception signal bits exception summary bits exception enable bits and rounding control bits needed for compliance with the IEEE 754 standard FPSCRs are not included in the e300c2 core 1 3 1 1 5 User Level SPRs The PowerPC architecture defines numerous special purp
211. R RI as follows e In the machine check and system reset interrupts If SRR1 RI is cleared the interrupt is not recoverable If it is set the interrupt is recoverable with respect to the processor e Ineach interrupt handler When enough state information has been saved that a machine check or system reset interrupt can reconstruct the previous state set MSR RI e In each interrupt handler Clear MSR RI set the SRRO and SRR1 or CSRRO and CSRR1 registers appropriately and then execute rfi or rfci e Note that the RI bit being set indicates that with respect to the processor enough processor state data is valid for the processor to continue but it does not guarantee that the interrupted process can resume 5 2 5 Returning from an Interrupt with rfi The Return From Interrupt rfi instruction performs context synchronization by allowing previously issued instructions to complete before returning to the interrupted process In general execution of the rfi instruction ensures the following e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 15 Interrupts and Exceptions e All previous instructions have completed to a point where they can no longer cause an interrupt e Previous instructions complete execution in the context privilege protection and address translation under which they were issued e The rfi instruction copies SRR1 bits back into the MSR s The instructions followin
212. RG4 SPRG7 are shown in Figure 5 5 SPR 272 SPRGO 276 SPRG4 Access Supervisor read write 273 SPRG1 277 SPRG5 274 SPRG2 278 SPRG6 275 SPRG3 279 SPRGQ 0 31 R SPRGn W Reset All zeros Figure 5 5 SPRGn Register Table 5 7 describes conventional uses of SPRG4 SPRG7 for the e300 core Table 5 7 Conventional Uses of SPRGO SPRG7 Register Descriptions SPRGO SPRGO may be used by the operating system as needed SPRG1 SPRG1 may be used by the operating system as needed SPRG2 SPRG2may be used by the operating system as needed SPRG3_ SPRG3 may be used by the operating system as needed SPRG4_ Software may load a unique physical address in this register to identify an area of memory reserved for use by the first level interrupt handler This area must be unique for each processor in the system SPRG5 SPRG5 may be used as a scratch register by the first level interrupt handler to save the content of a GPR That GPR then can be loaded from SPRG4 and used as a base register to save other GPRs to memory SPRG6 SPRG6 may be used by the operating system as needed SPRG7_ SPRG7 may be used by the operating system as needed e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 11 Interrupts and Exceptions 5 2 1 4 MSR Bit Settings The MSR is shown in Figure 5 6 When an interrupt occurs MSR bits as described in Ta
213. Register belrx instruction and can optionally be used to hold the logical address referred to as the effective address in the architecture specification of the instruction that follows a branch and link instruction typically used for linking to subroutines Count register CTR The 32 bit CTR can be used to hold a loop count that can be decremented during execution of appropriately coded branch instructions It can also provide the branch target address for the Branch Conditional to Count Register bectrx instruction User level registers VEA The VEA introduces the time base facility TB for reading The TB is a 64 bit register pair whose contents are incremented once every four core input clock cycles The TB consists of two 32 bit registers time base upper TBU and time base lower TBL Note that the time base registers are read only in user state The core supervisor level registers are described as follows Supervisor level registers QEA The OEA defines the registers an operating system uses for memory management configuration and interrupt handling The PowerPC architecture defines the following supervisor level registers Configuration registers Processor version register PVR This read only register identifies the version model and revision level of this processor core The contents of the PVR can be copied to a GPR by the mfspr instruction Read access to the PVR is supervisor level only write acces
214. SH1 Access Supervisor read only SPR 979 HASH2 o 6 7 25 26 31 R HTABORG Hashed Page Address W Reset All zeros Figure 6 13 HASH1 and HASH2 Registers Table 6 12 describes the bit settings of the HASH1 and HASH2 registers Table 6 12 HASH1 and HASH2 Bit Settings Bits Name Description 0 6 HTABORG 0 6 Copy of the upper 7 bits of the HTABORG field from SDR1 7 25 Hashed page address Address bits 7 25 of the PTEG to be searched 26 31 Reserved 6 5 2 1 4 Required Physical Address Register RPA The RPA is shown in Figure 6 14 During a page table search operation the software must load the RPA with the second word of the correct PTE When the tlbld or tlbli instruction is executed data from IMISS and ICMP or DMISS and DCMP and the RPA registers is merged and loaded into the selected TLB entry The TLB entry is selected by the effective address of the access loaded by the table search software from the DMISS or IMISS register and SRRI WAY SPR 982 Access Supervisor read write 0 19 20 22 23 24 25 o 29 30 31 R RPN RIC WIMG PP W Reset All zeros Figure 6 14 Required Physical Address Register RPA Table 6 13 describes the bit settings of the RPA register Table 6 13 RPA Bit Settings Bits Name Description 0 19 RPN Physical page number from PTE 20 22 Reserved 23 R Reference bit from PTE 24 C Change bit
215. SISR settings 6 1 8 MMU Instructions and Register Summary The MMU instructions and registers provide the operating system with the ability to set up the block address translation areas and the page tables in memory Note that because the implementation of TLBs is optional the instructions that refer to these structures are also optional However because these structures serve as caches of the page table the architecture specifies a software protocol for maintaining coherency between these caches and the tables in memory whenever changes are made to the tables in memory When the tables in memory are changed the operating system purges these caches of the corresponding entries allowing the translation caching mechanism to refetch from the tables when the corresponding entries are required Note that the e300 core implements all TLB related instructions except tlbia which is treated as an illegal instruction The core also uses some implementation specific instructions to load two on chip TLBs Because the MMU specification for these processors is so flexible it is recommended that the software that uses these instructions and registers be encapsulated into subroutines to minimize the impact of migrating across the family of implementations Table 6 5 summarizes e300 core instructions that specifically control the MMU For more detailed information about the instructions refer to Chapter 3 Instruction Set Model in this book and Chapt
216. TLB entry is not found in the e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 29 Memory Management appropriate TLB the processor vectors to one of the three TLB miss interrupt handlers so that the software can perform a table search operation and load the TLB When this occurs the processor automatically saves information about the access and the executing context Table 6 9 provides a summary of the implementation specific interrupts registers and instructions that can be used by the TLB miss interrupt handler software in e300 core systems Refer to Chapter 5 Interrupts and Exceptions for more information about interrupt processing Table 6 9 Implementation Specific Resources for Table Search Operations Resource Name Description Interrupts Instruction TLB miss interrupt No matching entry found in ITLB vector offset 0x1000 Data TLB miss on load interrupt No matching entry found in DTLB for a load data access vector offset 0x1 100 Data TLB miss on store No matching entry found in DTLB for a store data access or matching DLTB entry interrupt also caused when has C 0 and access is a store change bit must be updated vector offset 0x1200 Registers IMISS and DMISS When a TLB miss interrupt occurs IMISS or DMISS contains the 32 bit effective address of the instruction or data access that caused the miss interrupt ICMP and DCMP ICMP and DCMP contain the
217. U 1 1 14 fnmsubs 59 030 FPU 1 1 14 fnmaddsj 59 031 FPU 1 1 14 fempu 63 000 FPU 1 1 14 frsp 63 012 FPU 1 1 14 fctiw 63 014 FPU 1 1 14 fctiwz 63 015 FPU 1 1 14 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 31 Instruction Timing Table 7 5 Floating Point Instructions continued E EE Primary Extended Unit Latency Opcode Opcode in Cycles fdiv 63 018 FPU 334 fsub 63 020 FPU 1 1 14 fadd 63 021 FPU 1 1 14 fsel 63 023 FPU 1 1 14 fmull 63 025 FPU 2 1 1 frsqrtel 63 026 FPU 1 1 1 fmsubf 63 028 FPU 2 1 14 fmadd 63 029 FPU 2 1 14 fnmsub 63 030 FPU 2 1 14 fnmadd 63 031 FPU 2 1 14 fempo 63 032 FPU 1 1 14 mtfsb1 63 038 FPU 1 1 18 amp 4 fneg 63 040 FPU 1 1 14 merfs 63 064 FPU 1 1 1 amp mtfsbO 63 070 FPU 1 1 1 amp 4 fmr 63 072 FPU 1 1 14 mtfsfi 63 134 FPU 1 1 18 amp 4 fnabs 63 136 FPU 1 1 14 fabs 63 264 FPU 1 1 14 mffs 63 583 FPU 1 1 18 amp mtfsf 63 711 FPU 1 1 18 amp 4 Note Cycle times marked with amp require a variable number of cycles due to completion serialization Cycle times marked with immediately forward their CR results to the BPU for fast branch resolution Cycle times marked with a specify the number of clock cycles in each pipeline stage Instructions with a single entry in the cycles column are no
218. Word sradi Shift Right Algebraic Double Word Immediate srd Shift Right Double Word std Store Double Word stdex Store Double Word Conditional Indexed stdu Store Double Word with Update stdux Store Double Word Indexed with Update stdx Store Double Word Indexed td Trap Double Word tdi Trap Double Word Immediate Table B 3 provides the 64 bit SPR encoding that is not implemented by the e300 core Table B 3 64 Bit SPR Encoding Not Implemented by the e300 core SPR F Register Name Access Decimal spr 5 9 spr 0 4 280 01000 11000 ASR Supervisor e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Appendix C Revision History This appendix provides a list of the major differences between the e300 PowerPC Core Reference Manual Revision 0 and the e300 Power Architecture Core Family Reference Manual Revision 3 C 1 Changes from Revision 2 to Revision 3 Major changes from the e300 PowerPC Core Reference Manual Revision 2 to the e300 Power Architecture Core Family Reference Manual Revision 3 are as follows Section Page Changes Throughout Provide information pertaining to the e300c3 configuration C 2 Changes from Revision 1 to Revision 2 Major changes from the e300 PowerPC Core Reference Manual Revision 1 to the e300 Power Architecture Core Family Reference Manual Revision 2 are as follows Section Page Changes Book Add caveats that
219. a cache miss causes a cache block to be replaced Changed bit One of two page history bits found in each page table entry PTE The processor sets the changed bit if any store is performed into the page See also Page access history bits and Referenced bit Clean An operation that causes a cache block to be written to memory if modified and then left in a valid unmodified state in the cache Clear To cause a bit or bit field to register a value of zero See also Set Completion Completion occurs when an instruction has finished executing written back any results and is removed from the completion queue CQ When an instruction completes it is guaranteed that this instruction and all previous instructions can cause no interrupts Context synchronization An operation that ensures that all instructions in execution complete past the point where they can produce an interrupt that all instructions in execution complete in the context in which they began execution and that all subsequent instructions are fetched and executed in the new context Context synchronization may result from executing specific instructions such as isyne or rfi or when certain events occur such as an interrupt Copy back operation A cache operation in which a cache line is copied back to memory to enforce cache coherency Copy back operations consist of snoop push out operations and cache cast out operations Denormalized number A nonzero floating poin
220. a prioritized list of the R and C bit settings for all scenarios The entries in the table are prioritized from top to bottom such that a matching scenario occurring closer to the top of the table takes precedence over a matching scenario closer to the bottom of the table For example if an stwex instruction causes a protection violation and there is no reservation the C bit is not altered as shown for the protection violation case Note that in the table load operations include those generated by load instructions by the eciwx instruction and by the cache management instructions that are treated as a load with respect to address translation Similarly store operations include those operations generated by store instructions by the ecowx instruction and by the cache management instructions that are treated as a store with respect to address translation Note that the e300 core does not support the eciwx or ecowx instructions which are optional in the PowerPC architecture In the columns for the e300 core the combination of the core itself and the software used to search the page tables described in Section 6 5 2 Implementation Specific Table Search Operation is assumed Table 6 8 Model for Guaranteed R and C Bit Settings R Bit Set C Bit Set Priority Scenario OEA G2Core OEA G2 Core 1 No execute protection violation No No No No 2 Page protection violation Maybe Yes No No 3 Out of order instruction fetch or load
221. able 6 2 shows the eight protection options supported by the MMUs for pages Table 6 2 Access Protection Options for Pages User Read Supervisor Read F e User Supervisor EES Write Write I Fetch Data I Fetch Data Supervisor only y N N Supervisor only no execute N Ni e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Memory Management Table 6 2 Access Protection Options for Pages continued User Read Supervisor Read e User Supervisor Option Write ite I Fetch Data I Fetch Data Supervisor write only d y Ni N N Supervisor write only no execute H Ni N Both user supervisor V V V V V y Both user supervisor no execute V V V y Both read only V y y N Both read only no execute y SE Note V access permitted protection violation The operating system programs whether instructions can be fetched from an area of memory by appropriately using the no execute option provided in the segment descriptor Each of the remaining options is enforced based on a combination of information in the segment descriptor and the page table entry Thus the supervisor only option allows only read and write operations generated while the processor is operating in supervisor mode corresponding to MSR PR 0 to access the page User accesses that map into a supervisor onl
222. ache Locking 4 10 3 Performing Cache Locking This section outlines the basic procedures for locking the data and instruction caches and provides some example code for locking the caches The procedures for the data cache are described first followed by the corresponding sections for locking the instruction cache The basic procedures for cache locking are as follows e Enabling the cache e Enabling address translation for example code e Disabling interrupts e Loading the cache e Locking the cache entire cache locking or cache way locking In addition this section describes how to invalidate the data and instruction caches even when they are locked e300 Power Architecture Core Family Reference Manual Rev 3 34 Freescale Semiconductor Instruction and Data Cache Operation 4 10 3 1 This section describes the procedures for performing data cache locking on the e300 core Data Cache Locking Procedures 4 10 3 1 1 Enabling the Data Cache To lock the data cache the data cache enable bit HIDO DCE bit 17 must be set The following assembly code enables the data cache Enable the data cache This corresponds to setting DCE bit in HIDO bit 17 mfspr rl HIDO ori rl rl 0x4000 sync mtspr HIDO rl isync 4 10 3 1 2 Address Translation for Data Cache Locking Two distinct memory areas must be set up to enable cache locking e The first area is where the code that performs the lockin
223. ache management instructions that provide a means by which the application programmer can affect the cache contents 1 3 3 2 Implementation Specific Cache Organization The e300c1 provide independent 32 Kbyte eight way set associative instruction and data caches The e300c2 and e300c3 provides 16 Kbyte four way set associative instruction and data caches The caches are physically addressed and the data cache can operate in either write back or write through mode as specified by the PowerPC architecture The data cache is configured as 128 sets of 8 blocks each on the e300c1 The data cache is configured as 128 sets of 4 blocks each on the e300c2 and e300c3 Each block consists of 32 bytes 2 state bits and an address tag The two state bits implement the three state MEI modified exclusive invalid protocol Each block contains eight 32 bit words Note that the PowerPC architecture defines the term block as the cacheable unit For the core the block size is equivalent to a cache line A block diagram of the data cache organization is shown in Figure 1 5 The instruction cache is configured as 128 sets of 8 blocks each on the e300c1 The instruction cache is configured as 128 sets of 4 blocks each on the e300c2 and e300c3 Each block consists of 32 bytes an address tag and a valid bit The instruction cache may not be written to except through a block fill operation In the e300 core the instruction cache is blocked only until the crit
224. aching inhibited access Cache entries that are invalid at the time of locking remain invalid and inaccessible until the cache is unlocked When the cache has been unlocked all entries including invalid entries are available Entire cache locking is inefficient if the number of instructions or the size of data to be locked is small compared to the cache size Way locking Locking only a portion of the cache is accomplished by locking ways within the cache Locking always begins with the first way way 0 and is sequential that is locking ways 0 1 and 2 is possible but it is not possible to lock only way 0 and way 2 When using way locking at least one way must be left unlocked The maximum number of lockable ways is seven on the e300c1 core way 0 way 6 and three on the e300c2 and e300c3 way 0 way 2 Unlike entire cache locking invalid entries in a locked way are accessible and available for data replacement As hits to the cache fill invalid entries within a locked way the entries become valid and locked This behavior differs from entire cache locking in which invalid entries cannot be allocated Unlocked ways of the cache behave normally Table 4 10 summarizes the e300 core cache organization Table 4 10 Cache Organization Instruction Cache Size Data Cache Size Associativity Block Size Way Size e300c1 32 Kbytes 32 Kbytes 8 way 8 words 4 Kbytes e300c2 16 Kbytes 16 Kbytes 4 way 8 words 4 Kbytes e300c3 16 Kbyte
225. ages of exception processing Exception A condition that if enabled generates an interrupt Recognition Interrupt recognition occurs when the exception that can cause an interrupt is Taken identified by the processor An interrupt is said to be taken when control of instruction execution is passed to the interrupt handler that is the context is saved and the instruction at the appropriate vector offset is fetched and the interrupt handler routing is executed in supervisor mode Handling Interrupt handling is performed by the software linked to the appropriate vector 5 1 offset Interrupt handling is performed at the supervisor level Interrupt Classes The PowerPC architecture supports four types of interrupts Synchronous precise These are caused by instructions All instruction caused interrupts are handled precisely that is the machine state at the time the interrupt occurs is known and can be completely restored This means that excluding the trap and system call interrupts the address of the faulting instruction is provided to the interrupt handler and that neither the faulting instruction nor subsequent instructions in the code stream will complete execution before the interrupt is taken Once the interrupt is processed execution resumes at the address of the faulting instruction or at an alternate address provided by the interrupt handler When an interrupt is taken due to a trap or system call instruction execut
226. al reset of the cache There is no broadcast of a flash invalidate operation and any modified data in the cache is lost Flash invalidation of the data cache is accomplished by setting DCFI and subsequently clearing DCFI in two consecutive mtspr HIDO instructions e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 15 Instruction and Data Cache Operation The data cache is automatically invalidated when the core is powered up and during a hard reset However a soft reset does not automatically invalidate the data cache Software must set and clear HIDO DCFI to invalidate the entire data cache after a soft reset 4 5 1 6 Instruction Cache Enable HIDO ICE The instruction cache may be disabled through the use of the instruction cache enable bit HIDO ICE ICE is cleared at power on reset disabling the instruction cache To prevent the cache from being enabled or disabled in the middle of an instruction fetch an isync instruction should be issued before changing the value of ICE When the instruction cache is in the disabled state the cache tag state bits are ignored and all accesses are propagated to the bus as single beat transactions when HID2 IFEB is cleared when HID2 IFEB is set all accesses are propagated as burst transactions Note that disabling the instruction cache does not affect the translation logic translation for instruction accesses is controlled by MSR IR 4 5 1 7 Instruction Ca
227. al type SMI System management interrupt SOC System on a chip SPR Special purpose register SR Segment register SRRO Machine status save restore register 0 SRR1 Machine status save restore register 1 SRU System register unit SVR System version register T Translation control bit TAP Test access port TB Time base facility TBL Time base lower register TBU Time base upper register TGPR Temporary GPR remapping TLB Translation lookaside buffer TTL Transistor to transistor logic UIMM Unsigned immediate value UISA User instruction set architecture UTLB Unified translation lookaside buffer UUT Unit under test VEA Virtual environment architecture VPN Virtual page number Ww Write through e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor XXxi Table i Acronyms and Abbreviated Terms continued Term Meaning WAR Write after read WAW Write after write WIMG Write through caching inhibited memory coherency enforced guarded bits XATC Extended address transfer code XER Register used for indicating conditions such as carries and overflows for integer operations Terminology Conventions Table ii describes terminology conventions used in this manual Table ii Terminology Conventions The Architecture Specification This Manual Data storage interrupt DSI DSI interrupt Extended mnemonics Simplified mnemonics Fixed point unit FXU Integer uni
228. ale Semiconductor performance monitor uses 1 14 11 1 registers 10 1 10 3 see also Breakpoints single stepping 10 1 10 3 tracing on branch instructions 5 13 5 32 10 3 10 4 using breakpoints 10 3 DEC decrementer register 2 10 Decrementer 1 28 exception 9 2 interrupt 5 4 5 30 9 1 timer 9 1 Direct address translation translation disabled data accesses 6 11 6 19 instruction accesses 6 11 6 19 see also Memory management unit MMU 6 19 DMISS data TLB miss address reg 2 18 6 31 6 34 Doze mode 9 2 9 3 DSI data storage interrupt 1 27 5 3 5 23 10 3 10 4 DSISR DSI status register 2 10 5 1 10 2 10 3 DTLB 1 5 Dynamic power management 9 1 Dynamic power management modes 9 2 E e300 core differences between cores 1 34 e300 specific instructions 3 34 e300 specific registers 2 10 2 27 Effective address EA calculation 3 8 translation see Memory management unit MMU eieio 4 14 Endian modes 3 1 byte ordering interrupt LE mode MSR ILE bit 5 12 Endian modes and byte ordering little endian mode MSR LE bit 5 14 Event counting see Performance monitor APU Exceptions enabling and disabling interrupts and exceptions 5 14 overview 1 25 see also Interrupt handling types more granular than interrupts floating point enabled exceptions program interrupt 5 4 5 28 illegal instr exception program interrupt 5 4 5 28 privileged instr exception program interrupt
229. alidate bits HIDO DCFI or HIDO ICFI e300 Power Architecture Core Family Reference Manual Rev 3 24 Freescale Semiconductor Instruction and Data Cache Operation Each time a cache block is accessed it is tagged as the most recently used way of the set For every hit in the cache or when a new block is reloaded the PLRU bits for the set are updated using the rules specified in Table 4 7 Table 4 7 PLRU Bit Update Rules If the current access is to Bo B1 Then the PLRU bits in the set are changed to B2 B3 B4 B5 B6 w0 wi w2 w3 O w4 x w5 w6 0 Xx w7 o ojojo 0 Xx x Does not change 1 Note that the e300c2 only has 4 ways so BO is always 0 Note that for the e300c1 only three PLRU bits are updated for any given access and for the e300c2 and e300c3 only two PLRU bits are updated for any given access In the case of way locking the PLRU value read from the cache is first modified before using it to ensure that a locked way is not selected Because way locking involves locking an incrementing range of ways starting with way 0 way 0 or way 0 1 or way 0 2 etc the appropriate bits of the PLRU value are simply overridden to in an incrementing fashion away from the locked ways to prevent that range of ways from being selected In the binary tree figure this can be visually described as forcing the bi
230. an invalid physical address a machine check condition may result when an attempt is made to write that cache block back to memory The cache block could be written back as a result of the execution of an instruction that causes a cache miss and the invalid addressed cache block is the target for replacement or a Data Cache Block Store debst instruction Table 3 28 lists the cache instructions that are accessible to user level programs 3 2 6 PowerPC OEA Instructions The OEA includes the structure of the memory management model supervisor level registers and interrupt model 3 2 6 1 System Linkage Instructions This section describes the system linkage instructions see Table 3 29 The se instruction is a user level instruction that permits a user program to call on the system to perform a service and causes the processor to take an interrupt The Return from Interrupt rfi instruction is a supervisor level instruction that is useful for returning from an interrupt handler The Return from Critical Interrupt rfci instruction is a supervisor level instruction that is implemented in the core The rfci instruction is useful for returning from a critical interrupt handler This instruction is described in Section 3 2 8 Implementation Specific Instructions Table 3 29 System Linkage Instructions Name Mnemonic Operand Syntax Return from Interrupt rfi System Call sc 3 2 6 2 Processor Control Instruc
231. anch prediction is limited by the fact that instructions in the predicted stream cannot update the register files or memory until the branch is resolved That is instructions may be dispatched and executed but cannot reach the write back stage in the completion unit instead it stalls in the completion queue When CQ is full no more instructions can be dispatched In the case of a misprediction the core is able to redirect the machine state rather effortlessly because the programing model has not been updated When a branch is found to be mispredicted all instructions that were dispatched subsequent to the predicted branch instruction are simply flushed from the completion queue and their results flushed from the rename registers No architected register state needs to be restored because no architected register state was modified by the instructions following the unresolved predicted branch e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 19 Instruction Timing 7 4 1 2 1 Predicted Branch Timing Examples Figure 7 8 shows how both taken and non taken branches are handled and how the e300 core handles both correct and incorrect predictions The example shows the timing for the following instruction sequence note that the first be instruction is correctly taken whereas the second be is incorrectly predicted 0 add add be mulhw be TO fadd and add add add add and or Nu FPF WN r
232. are and performance transparent 9 3 1 3 Doze Mode Doze mode disables most functional units but maintains cache coherency by enabling the bus interface unit and snooping A snoop hit causes the core to enable the data cache copy the data back to memory disable the cache and fully return to the doze mode Doze mode is characterized by the following features e Most functional units disabled e Bus snooping and time base decrementer still enabled e PLL running and locked to internal sysclk To enter the doze mode the following conditions must occur e Set doze bit HIDO 8 1 MSR POW is set e e300core enters doze mode after several processor clocks To return to full power mode the following conditions must occur e Assert internal int smi or mcp signals or decrementer interrupts e Hard reset or soft reset e Transition to full power state occurs only after a few processor cycles e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Power Management 9 3 1 4 Nap Mode The nap mode disables the core except for the processor PLL and time base decrementer The time base can be used to restore the core to a full on state after a specified period Because bus snooping is disabled for nap and sleep mode a hardware handshake using the quiesce request qreq and quiesce acknowledge qack signals are required to maintain data coherency The core asserts the greq signal to indicate that it is read
233. as a cache line e A four state modified exclusive shared invalid MESI or a three state modified exclusive invalid MEI coherency protocol for the data cache e Two status bits for each data cache block to indicate the coherency state as follows Modified M Exclusive E Shared S Invalid 1 e A single status bit for each instruction cache block that allows encoding for the following two possible states Invalid Valid e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Instruction and Data Cache Operation e Cache locking Entire cache locking for each cache by setting the appropriate bits in the HIDO Way locking support for instruction and data caches using controls in HID2 e Cache invalidation by setting the appropriate bits in HIDO e A pseudo least recently used PLRU replacement algorithm within each set e An instruction cancel mechanism that improves use of instruction cache by supporting hits under cancels and misses under cancels e Data cache queue sharing that makes cast outs and snoop pushes more efficient e New icbt instruction supports initializing instruction cache e An instruction fetch burst feature that allows all instruction fetches from caching inhibited space to be performed on the bus as burst transactions e Parity support for both instruction and data caches 4 1 2 Overview The core supports a fully coherent 4 Gbyte physical me
234. at forwards the required data at completion When the source data reaches the rename register execution can begin Instruction results are transferred from rename registers to architected registers when an instruction is retired from the CQ after any associated interrupts are handled and any predicted branch conditions preceding it in the CQ are resolved If a branch prediction is incorrect the instructions following the branch are flushed from the CQ and any results of those instructions are flushed from the rename registers e300 Power Architecture Core Family Reference Manual Rev 3 16 Freescale Semiconductor Instruction Timing 7 3 3 2 Instruction Serialization Although the core can dispatch and complete two instructions per cycle serializing instructions can be used to limit dispatch and completion to one instruction per cycle Serialization falls into three categories completion dispatch and refetch serialization which are described as follows e Completion serialized instructions are held in the execution unit until all prior instructions in the completion unit have been retired Completion serialization is used for instructions that access or modify a resource for which no rename register exists Results from these instructions are not available or forwarded for subsequent instructions until the serializing instruction is retired Instructions that are completion serialized are as follows Instructions with the interrup
235. ata cache cast out store address buffers associated data line buffer located in cache Data cache single beat write address buffers associated data line buffer located in cache Data cache snoop copy back address buffer associated data line buffer located in cache Reservation address buffer for snoop monitoring e Pipeline collision detection for data cache buffers e Reservation address snooping for lwarx stwex instructions e One level or one and a half level address pipelining e Load ahead of store capability e300 Power Architecture Core Family Reference Manual Rev 3 26 Freescale Semiconductor Instruction and Data Cache Operation Figure 4 8 is a conceptual block diagram of the bus interface The address register queues hold transaction requests that the bus interface may issue on the CSB independently of the other requests The bus interface may have up to two transactions operating at any given time through the use of address pipelining Instruction Cache BIU l Cache D Cache Control LD Addr LD Addr Control Address Address Data Coherent System Bus Figure 4 8 Bus Interface Address Buffers 4 9 Caches and CSB Transactions The core transfers data to and from the data cache in single beat transactions of two words or in four beat transactions of eight words which fill a cache block 4 9 1 Single Beat Transactions Single beat bus transactions can transfer from 1 to 8 bytes to or from the core Single beat
236. atched from IQO and IQ1 Because dispatch is instantaneous it is perhaps more useful to describe it as an event that marks the point in time between the last cycle in the fetch stage and the first cycle in the execute stage Execute The operations specified by an instruction are being performed by the appropriate execution unit The black stripe is a reminder that the instruction occupies an entry in the CQ described in Figure 7 5 ES Complete The instruction is in the CQ In the final stage the results of the executed instruction are written back and the instruction is retired The CQ has five entries CQO CQ4 In retirement entry Completed instructions can be retired from CQO and CO Like dispatch retirement is an event that in this case occurs at the end of the final cycle of the complete stage Figure 7 5 shows the stages of the core execution units IU SRU Instructions In Dispatch Fetch Entry Execute Complete Retire Loo m LSU Instructions 1 Execute In Dispatch EA Fetch Entry Calculation Cache Align Complete Retire LO m i FPU Instructions Execute In Dispatch Round Fetch Entry Multiply Add Normalize Complete Retire OOo SSS BPU Instructions Fetch In Dispatch In Completion Fetch Predict Entry Queue Complete Retire Loo o T m 1 Several integer instructions such as multiply and divide instructions require multiple cycles in the execute stage S Only those branch instructions that upda
237. ating point unavailable interrupt handler e The execution of an instruction that causes a floating point exception while exceptions are enabled in the MSR invokes the program interrupt handler Floating point instructions are not supported on the e300c2 core Interrupts caused by asynchronous events are described in Chapter 5 Interrupts and Exceptions e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 9 Instruction Set Model 3 2 3 Instruction Set Overview This section provides a brief overview of the PowerPC instructions implemented in the core and highlights any special information with respect to how the e300 core implements a particular instruction Note that the categories used in this section correspond to those used in Chapter 4 Addressing Modes and Instruction Set Summary in the Programming Environments Manual These categorizations are somewhat arbitrary and are provided for the convenience of the programmer and do not necessarily reflect the PowerPC architecture specification Note that some of the instructions have the following optional features e CR Update The dot suffix on the mnemonic enables the update of the CR e Overflow option The o suffix indicates that the overflow bit in the XER is enabled 3 2 4 PowerPC UISA Instructions The UISA includes the base user level instruction set excluding a few user level cache control synchronization and time base instru
238. ation a page fault condition exists and the TLB miss interrupt handlers synthesize either an ISI or DSI interrupt to handle the page fault e300 Power Architecture Core Family Reference Manual Rev 3 12 Freescale Semiconductor Memory Management Address Translation with Segment Descriptor Use EAO EA3 to Select 1 of 16 On Chip Segment Registers Check T Bit in Segment Descriptor Page Address pes Direct Store Segment Address T 1 Otherwise Generate 52 Bit Virtual Address from Segment Descriptor Compare Virtual Address with TLB Entries G a gi eg es est e al Ow xN D xN TLB TLB i Miss Hit x See Figure 6 8 sy Perform Page Table See Figure 6 9 Search Operation PTE Not Found PTE Found Load TLB Entry L za Access Faulted DSI ISI Interrupt I Fetch with N Bit Set in Segment Descriptor No Execute N bw lt bw Access Access Permitted Protected O Translate Address Continue Access to Memory Subsystem Access Faulted Optional to the PowerPC architecture Implemented in the e300 In the case of instruction accesses causes ISI interrupt Figure 6 6 General Flow of Page and Direct Store Interface Address Translation e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Memory Management 6 1 7 MMU Interrupts Summary In order to complete any memory access the effective addr
239. ba never cause a match If debz or deba causes a match some or all of the target memory locations may have been updated When a match occurs a DSI interrupt is generated Refer to Section 5 5 3 DSI Interrupt 0x00300 more information on the data address breakpoint facility e300 Power Architecture Core Family Reference Manual Rev 3 26 Freescale Semiconductor 2 2 17 Data Address Breakpoint Control Register DBCR The DBCR is a supervisor level register with SPR310 on the e300 core which is accessible only by using mtspr and mfspr The DBCR controls the compare and match type conditions for DABR and DABR2 Figure 2 22 shows the format of the DBCR Register Model SPR 310 Access Supervisor read write 0 5 6 7 8 9 10 11 12 13 14 15 R DABR DABR2 SIG_ Ww a STAT STAT ae EME TPE PNS Reset All zeros Figure 2 22 DBCR Register Table 2 16 provides the description of DBCR bit settings Table 2 16 Data Address Breakpoint Control Registers DBCR Bits Name Description 0 5 Reserved 6 DABRSTAT DABR status 0 Match on DABR has not occurred 1 Match on DABR has occurred 7 DABR2STAT DABR2 status 0 Match on DABR2 has not occurred 1 Match on DABR2 has occurred 8 9 CMP DABR breakpoint compare type 00 Match if data s EA equals DABR CEA 01 Reserved 10 Match if data s EA less than DABR CEA 11 Match if data s EA greater than or equal to DABR CEA
240. ber 1 3 1 1 2 Floating Point Registers FPRS AA 1 18 1 3 1 1 3 Condition Resister CRecce sieer et i Einen 1 18 1 3 1 1 4 Floating Point Status and Control Register FPSCR A 1 18 1 3 1 1 5 ISTIC VEL Een deed 1 18 1 3 1 2 AA WA E 1 19 1 3 1 3 OEA Keeser egen ed gege E E A R eeng 1 19 1 3 1 3 1 Machine State Register MSR sessssssesssessesssesessseessessresseressesssresseessresseesseee 1 19 1 3 1 3 2 Segment Registers SRS vi cessscis inresa in a a ia a 1 19 1 3 1 3 3 S pervisor Level E 1 19 1 3 2 Instruction Set and Addressing Modes csscccsssecssncecesececsseceessccecssecesssceeesseeeees 1 21 1 3 2 1 PowerPC Instruction Set and Addressing Modes 1 21 1 3 2 2 Implementation Specific Instruction Sei 1 22 1 3 3 Cache Implementation cenieni sensie an iEn a aE REE EE ANEA 1 23 1 3 3 1 PowerkC Cache e EE 1 23 1 3 3 2 Implementation Specific Cache Orgamnzatpon 1 23 1 3 3 3 Instruction and Data Cache Wav Lockmg 1 25 1 3 4 Interrupt M del dn Ennen anaa ie ege 1 25 1 3 4 1 PowerPC Interrupt Model ccci ccccscssccsscteasscseasdsvececeatnccebtasecveatisavanctacnsccdeaveassaceesstens 1 25 1 3 4 2 Implementation Specific Interrupt Model 1 27 1 3 5 Memory Management ceense neiii e e E aa ai 1 29 1 3 5 1 PowerPC Memory Manacsenient cv ck wusgies eta Ga eer 1 29 1 3 5 2 Implementation Specific Memory Management 1 30 1 3 6 Koster TE ee 1 30 1 3 7 one et 1 31 1 3 7 1 Memory ACCESSES s scccidssicccesanvadsansssbanbiessevetaasrpsacav
241. bit invalidates the entire instruction cache See Section 4 10 3 2 7 Invalidating the Instruction Cache Even if Locked 21 DCFI Data cache flash invalidate Setting and then clearing this bit invalidates the entire data cache See Section 4 10 3 1 4 Invalidating the Data Cache Table 4 12 HID2 Bits Used to Perform Cache Way locking Bits Name Description 16 18 IWLCK Instruction cache way lock These bits are used to lock individual ways in the instruction cache See Section 4 10 3 2 6 Way Locking the Instruction Cache 24 26 DWLCK Data cache way lock These bits are used to lock individual ways in the data cache See Section 4 10 3 1 7 Way Locking the Data Cache Table 4 13 MSR Bits Used to Perform Cache Locking Bits Name Description 16 EE External interrupt enable This bit must be cleared during instruction and data cache loading See Section 4 10 3 1 3 Disabling Interrupts for Data Cache Locking 19 ME Machine check enable This bit must be cleared during instruction and data cache loading See Section 4 10 3 1 3 Disabling Interrupts for Data Cache Locking 26 ID Instruction address translation This bit must be set to enable instruction address translation by the MMU See Section 4 10 3 1 2 Address Translation for Data Cache Locking 27 DR Data address translation This bit must be set to enable data address translation by the MMU See Section 4 10 3 1 2 Address Translation for Data C
242. ble 5 8 are altered as determined by the interrupt Access Supervisor read write o 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 W POW TGPR ILE EE PR FP ME FE0 SE BE FE1 CE IP IR DR RILE Reset 0000_0040 or 0000_0000 or 0001_0041 or 0001_0001 Figure 5 6 Machine State Register MSR Table 5 8 shows the bit definitions for the MSR Full function reserved bits are saved in SRR1 when an interrupt occurs partial function reserved bits are not saved Table 5 8 MSR Bit Settings Bits Name Description 0 Reserved Full function 1 4 Reserved Partial function 5 9 _ Reserved Full function 10 12 _ Reserved Partial function 13 POW Power management enable implementation specific 0 Disables programmable power modes normal operation mode 1 Enables programmable power modes nap doze or sleep mode This bit controls the programmable power modes only it has no effect on dynamic power management DPM MSR POW may be altered with an mtmsr instruction only Also when altering the POW bit software may alter only this bit in the MSR and no others The mtmsr instruction must be followed by a context synchronizing instruction See Chapter 9 Power Management for more information 14 TGPR Temporary GPR remapping implementation specific 0 Normal operation 1 TGPR mode GPRO GPR3 are remapped to TGPRO TGPR3 for use by TLB miss routines The conten
243. cache from being locked during a data access A locked data cache supplies data normally on a cache hit but cache misses are treated as caching inhibited accesses On a miss the transaction to the CSB is single beat however ci still reflects the state of the I bit in the MMU for that page regardless of whether the cache is locked or disabled A snoop hit to a locked data cache performs as if the cache were not locked Any cache block invalidated by a snoop hit remains invalid until the cache is unlocked The e300 core also provides data cache way locking in addition to entire data cache locking as described in Section 4 5 1 4 Data Cache Way lock HID2 DWLCK 4 5 1 4 Data Cache Way lock HID2 DWLCK Locking only a portion of the data cache is accomplished by locking ways within the cache using HID2 DWLCK Locking always begins with the first way way 0 and is sequential That is it is valid to lock ways 0 1 and 2 but it is not possible to lock just way 0 and way 2 When using way locking without DLOCK at least one way is always left unlocked The maximum number of lockable ways is seven on e300c1 and three ways on e300c2 and e300c3 4 5 1 5 Data Cache Flash Invalidate HIDO DCFI The data cache flash invalidate bit HIDO DCFI is used to invalidate the entire data cache in a single operation Note that using DCFI invalidates the cache in a single cycle on e300c1 however on e300c2 and e300c3 DCFI is a 128 cycle sequenti
244. can be performed Also referred to as problem state VEA virtual environment architecture The level of the architecture that describes the memory model for an environment in which multiple devices can access memory defines aspects of the cache model defines cache control instructions and defines the time base facility from a user level perspective Implementations that conform to the PowerPC VEA also adhere to the UISA but may not necessarily adhere to the OEA e300 Power Architecture Core Family Reference Manual Rev 3 Glossary 12 Freescale Semiconductor Virtual address An intermediate address used in the translation of an effective address to a physical address Virtual memory The address space created using the memory management facilities of the processor Program access to virtual memory is possible only when it coincides with physical memory W Way A location in the cache that holds a cache block its tags and status bits Word A 32 bit data element Write back A cache memory update policy in which processor write cycles are directly written only to the cache External memory is updated only indirectly for example when a modified cache block is cast out to make room for newer data Write through A cache memory update policy in which all processor write cycles are written to both the cache and memory e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Glossary 13 e3
245. ccess that caused the TLB miss interrupt The contents are used by the core when calculating the values of HASH1 and HASH2 and by the tlbld and tlbli instructions when loading a new TLB entry Note that the core always loads DMISS with a big endian address even when MSR LE is set These registers are both read and write accessible However caution should be used when writing to these registers SPR DMISS 976 Access Supervisor read write SPR IMISS 980 0 31 R Effective Address W Reset All zeros Figure 2 8 DMISS and IMISS Registers 2 2 5 Data and Instruction TLB Compare Registers DCMP and ICMP DCMP and ICMP shown in Figure 2 9 contain the first word in the required PTE The contents are constructed automatically from the contents of the segment registers and the effective address DMISS or IMISS when a TLB miss interrupt occurs Each PTE read from the tables during the table search process should be compared with this value to determine if the PTE is a match Upon execution of a tlbld or tlbli instruction the upper 25 bits of the DCMP or ICMP register and 11 bits of the effective address are loaded into the first word of the selected TLB entry These registers are read and write to the software e300 Power Architecture Core Family Reference Manual Rev 3 18 Freescale Semiconductor Register Model SPR 977 DCMP Access Supervisor read write SPR 981 ICMP o 1 24 25 26 31 R V VSID H A
246. ceeseesseceseeeeeeeeeeeneeens 7 16 Rename Register Operations secdesini i Be 7 16 Instru Senalizatiois eenegen tege 7 17 Execution Unit Considerations 212 2geceeeeide ce dieieceiiies eed il eh lA eeeatine eee 7 17 ek Tea Unit Kur 7 17 Branch Processing Unit Execution Tee erdege Eeer Eege 7 18 Le EE 7 18 Static Branch Prediction 2 aie tele eee ail Na a he ee ea 7 19 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor xi Paragraph Number 7 4 1 2 1 7 4 2 TAS 7 4 4 7 4 5 7 5 7 5 1 7 5 2 7 5 3 7 6 7 6 1 7 6 1 1 7 6 1 2 7 6 1 3 7 7 8 1 8 1 1 8 1 2 8 1 2 1 8 2 8 3 8 3 1 8 3 2 8 3 3 8 3 4 8 4 8 4 1 9 1 9 2 9 3 9 3 1 9 3 1 1 9 3 1 2 Page Title Number Predicted Branch Timing Examples cecccecesecessceecesececseeeeeseeeenneeeenaeeeaees 7 20 Integer Unit Execution KE 7 21 Floating Point Unit Execution Timing eeeseseesssseessreesesresstsererresressesnresresseseresrersesee 7 23 Load Store Unit Execution Timing eege eu deele 7 24 System Register Unit Execution Timing ageet dee CAE 7 24 Memory Performance Considerations 0 ccccessceesseeceseeceeeeeceeeeeceeeeeceneeeseeecneeeesaes 7 24 Copy Back MGS ee Eed eege 7 25 Write Through Mod s sonsir tensien arein ea E E SE A EEES 7 25 Cache Inbibited ACCESSES eet 7 25 Instruction Scheduling Guidelines cs testes c ceosatecaeies cen sceweds SES 7 26 Branch Dispatch and Completion Unit Resource Reoutre
247. ception conditions to support software table searching The only exception conditions that occur for data accesses when MSR DR 0 are the conditions that cause the alignment interrupt For more detailed information about the conditions that cause the alignment interrupt in particular for string multiple instructions see Section 5 5 6 Alignment Interrupt 0x00600 Table 6 4 Other MMU Exception Conditions Exception Condition Description Interrupt TLB miss for an instruction fetch No matching entry found in ITLB Instruction TLB miss interrupt SRR1 13 1 MSR 14 1 TLB miss for a data load access No matching entry found in DTLB for data load access Data TLB miss on load interrupt SRR1 13 0 SRR1 15 1 MSR 14 1 TLB miss for a data store or store andC 0 No matching entry found in DTLB for data store access or matching DLTB entry has C 0 and the access is a store Data TLB miss on store interrupt or store andC 0 SRR1 13 0 SRR1 15 0 MSR 14 1 dcbz with W 1 orl 1 dcbz instruction to write through or cache inhibited segment or block Alignment interrupt not required by architecture for this condition dcbz when the data cache is locked lwarx or stwex instruction to direct store segment The debz instruction takes an alignment interrupt if the data cache is locked HIDO bits 18 and 19 when it is executed Reservation instruction or external
248. ch with DPM Doze e Bus snooping Controlled by SW External asynchronous interrupts e Data cache as needed Decrementer interrupt e Decrementer timer Reset Nap Decrementer timer Controlled by hardware and External asynchronous interrupts software Decrementer interrupt Reset Sleep None Controlled by hardware and External asynchronous interrupts software Reset e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Power Management 9 3 1 Power Management Modes The following sections describe the characteristics of the e300 core power management modes the requirements for entering and exiting the various modes and the system capabilities provided by the core while the power management modes are active 9 3 1 1 Full Power Mode with DPM Disabled Full power mode with DPM disabled is selected when the DPM enable bit in HIDO DPM is cleared The following characteristics apply e Default state following power up and hreset e All functional units are operating at full processor speed at all times 9 3 1 2 Full Power Mode with DPM Enabled Full power mode with DPM enabled HIDO DPM 1 provides on chip power management without affecting the functionality or performance of the core as follows e Required functional units are operating at full processor speed e Functional units are clocked only when needed e No software or hardware intervention required after mode is set e Software hardw
249. changeably or shared for cache replacements and snoop pushes This allows the data cache to support two outstanding cache replacements or two outstanding snoop push operations on the bus at any given time This supports greater bus throughput between the data cache and memory and less stall latency during data cache miss operations due to unavailable bus queues However when queue sharing is enabled the snoop queue may not be available because there may be 2 pending cast outs In this case there may be a window of opportunity push from an address other than the snooped address Queue sharing is enabled by setting HID2 EBQS 4 6 7 Cache Block Replacement Selection The instruction and data caches both use a three step selection process to determine which cache way will be used for the instruction or data First the core determines if there is a hit to a valid way If there is a hit then that way is selected Next the core checks to see if there are any invalid ways in the set and chooses the lowest order invalid way as the recipient Last when there is not a hit but all eight ways in the set are valid and not locked the PLRU algorithm is used to select the replacement target e300 Power Architecture Core Family Reference Manual Rev 3 22 Freescale Semiconductor Instruction and Data Cache Operation For e300c1 there are seven PLRU bits B 0 6 for each set in the cache A way is selected for replacement according to the PLRU
250. che locking The second area is a 256 Mbyte block of memory that contains the instructions to lock not all of the 256 Mbytes of memory is locked in the cache this area is set up as an example Both memory areas use identity translation the logical memory address equals the physical memory address Table 4 18 summarizes the BAT settings used in this example Table 4 18 Example BAT Settings for Cache Locking Area Base Address Memory Size WIMG Bits BATU Setting BATL Setting First OxFFFO_0000 1 Mbyte 0b0100 OxFFFO_001F OxFFFO_0022 Second 0x0000_0000 256 Mbytes 0b0000 0x0000_1FFF 0x0000_0002 1 OxFFFO_0022 defines a caching inhibited memory area used for instruction cache locking and corresponds to a WIMG of 060100 Caching inhibited memory is not a requirement for data cache locking A setting of OxFFFO_0002 with a corresponding WIMG of 0b0000 marks the memory area as caching allowed The block address translation upper BATU and block address translation lower BATL settings in Table 4 18 can be used for both instruction block address translation IBAT and data block address translation DBAT registers After the BAT registers have been set up the MMU must be enabled The following assembly code enables both instruction and data memory address translation Enable instruction and data memory address translation This corresponds to setting IR and DR in the MSR bits 26 amp 27 mfmsr
251. che Lock HIDO ILOCK The entire contents of the instruction cache may be locked through the use of HIDO ILOCK A locked instruction cache supplies instructions normally on a cache hit but cache misses are treated as caching inhibited accesses On a miss the transaction to the CSB is single beat however ci still reflects the state of the I bit in the MMU for that page regardless of whether the cache is locked or disabled The setting of the ILOCK bit must be preceded by an isync instruction to prevent the instruction cache from being locked during an instruction access The core also provides instruction cache way locking in addition to entire instruction cache locking as described in Section 4 10 Applications Information Cache Locking 4 5 1 8 Instruction Cache Way Lock HID2 IWLCK Locking only a portion of the instruction cache is accomplished by locking ways within the cache using HID2 IWLCK Locking always begins with the first way way 0 and is sequential That is it is valid to lock ways 0 1 and 2 but it is not possible to lock just way 0 and way 2 Way locking without ILOCK always leaves at least one way unlocked The maximum number of lockable ways is seven on e300c1 and three ways on e300c2 and e300c3 4 5 1 9 Instruction Cache Flash Invalidate HIDO ICFI The instruction cache flash invalidate bit HIDO ICFI is used to invalidate the entire instruction cache in a single operation Note that using ICFI inval
252. cification refers to user and supervisor level as problem state and privileged state respectively The general purpose registers GPRs and floating point registers FPRs are accessed through instruction operands Floating point registers are not supported by the e300c2 core Access to registers can be explicit that is through the use of specific instructions for that purpose such as the mtspr and mfspr instructions or implicit as part of the execution or side effect of an instruction Some registers are accessed both explicitly and implicitly Figure 2 1 describes the registers in the e300 core Note that the implementation specific registers for the e300 core are shown in Figure 2 1 The number to the right of the register name indicates the number that is used in the syntax of the instruction operands to access the register for example the number used to access the XER is SPR1 For more information on the PowerPC register set refer to Chapter 2 Register Set in the Programming Environments Manual e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Register Model USER MODEL General Purpose Registers 32 Bit Floating Point Registers 64 Bit XER SPR 1 Link Register LR SPR 8 Count Register CTR SPR 9 Time Base Facility For Reading TBL SPR268 TBU SPR269 Performance Monitor read only UPMGCO PMR384 UPMCs PMR 0 3 UPMLCas SUPERVISOR MODEL C
253. ck time 100 usec e System logic asserts one of the sleep recovery signals for example int or smi 9 3 2 Power Management Software Considerations Because the e300 core is a dual issue processor core with out of order execution capability care must be taken in how the power management modes are entered Furthermore nap and sleep modes require all outstanding bus operations to be completed before the power management mode is entered Section 9 4 Example Code Sequence for Entering Processor Sleep Mode provides an example software sequence for putting the e300 core into sleep mode Normally during system configuration time one of the power management modes is selected by setting the appropriate HIDO mode bit Later the power management mode is invoked by setting MSR POW To ensure a clean transition into and out of the power management mode set MSR EE external interrupt enable and execute the following code sequence sync mtmsr POW 1 isync loop b loop e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Power Management 9 4 Example Code Sequence for Entering Processor Sleep Mode The following is a sample code sequence for entering e300 core sleep mode KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KK set up e300 core HIDO power management bits Hh ee I A I ee AA I eA I eK KK KK pbrocessor HID and external interrupt ini
254. control instruction when SR T 1 Alignment interrupt DSI interrupt DSISR 5 1 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 15 Memory Management Table 6 4 Other MMU Exception Conditions continued Exception Condition Description Interrupt Floating point load or store to FP memory access when SR T 1 See data access to direct store direct store segment segment in Table 6 3 Load or store that results in a Does not occur in G2 core Does not apply direct store error Imw stmw Iswi Iswx stswi or Imw stmw Iswi Iswx stswi or stswx Alignment interrupt stswx instruction attempted in instruction attempted while MSR LE 1 little endian mode Operand misalignment Translation enabled and operand is misaligned Alignment interrupt some of these as described in Chapter 5 Interrupts and cases are implementation specific Exceptions 1 Floating point not supported on the e300c2 Note that some exception conditions depend on whether the memory area is set up as write through W 1 or cache inhibited I 1 These bits are described fully in Memory Cache Access Attributes in Chapter 5 Cache Model and Memory Coherency in the Programming Environments Manual Refer to Chapter 5 Interrupts and Exceptions and to Chapter 6 Interrupts in the Programming Environments Manual for a complete description of the SRR1 and D
255. core provides hardware support for all instructions defined for 32 bit implementations A processor of this family invokes the illegal instruction error handler part of the program interrupt when the unimplemented PowerPC instructions are encountered so they can be emulated in software as required A defined instruction can have invalid forms as described in the following section 3 2 1 3 Illegal Instruction Class Illegal instructions are grouped into the following categories e Instructions not defined in the PowerPC architecture These opcodes are available for future extensions of the PowerPC architecture that is future versions of the PowerPC architecture may define any of these instructions to perform new functions The following primary opcodes are defined as illegal but may be used in future extensions to the architecture 1 4 5 6 9 22 56 57 60 61 e Instructions defined in the PowerPC architecture but not implemented in a specific PowerPC implementation For example instructions that can be executed on 64 bit processors are considered illegal by 32 bit processor cores The following primary opcodes are defined for 64 bit implementations only and are illegal on the core 2 30 58 62 e All unused extended opcodes are illegal The unused extended opcodes can be determined from information in Appendix A 2 Instructions Sorted by Opcode and Section 3 2 1 4 Reserved Instruction Class Notice that exten
256. ction is effectively a normal store class instruction In MEI mode the execution of a deht debi or dcbst instruction causes an address only broadcast if the HIDO ABE bit is set Also in MEI mode the debz instruction is the only cache operation that is snooped by the core In MESI mode the execution of the dcbf debi and debst instructions are broadcast if the HIDO ABE bit is set or if the addressed cache block is marked memory coherency required Also in MESI mode the core snoops the debst dcbz debf and debi instructions The ability of the core to optionally perform address only broadcasts when executing the debi dcbf and debst instructions allows for managing coherency of external caches if they are present e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 17 Instruction and Data Cache Operation 4 5 2 1 Data Cache Block Touch dcbt Instruction This instruction provides a method for improving performance through the use of software initiated prefetch hints The core performs the fetch when the address hits in the TLB or BAT registers and when it is a permitted load access from the addressed page The operation is treated similarly to a byte load operation with respect to coherency The debt instruction which misses in the data cache is treated as a burst read operation on the bus with a transaction type of RWITM if operating in three state MEI mode or READ if operating in four state
257. ctions icbt tlbld tibli and rfci e300 Power Architecture Core Family Reference Manual Rev 3 34 Freescale Semiconductor icbt Instruction Set Model icbt Instruction Cache Block Touch Integer Unit icbt rA rB b 111111 b 0000010110 31 00000 A 22 0 0 5 6 10 11 15 16 20 21 30 31 Reserved EA is the sum rAl0 rB Other registers altered es None The instruction is a hint that the performance will be improved if the block containing the instruction addressed by the EA is fetched into the instruction cache because the program will execute from the addressed location Executing an icbt instruction does not cause any interrupt to be invoked If the EA specifies a location outside of the main memory the instruction is treated as a no op Also the icbt instruction is treated as a no op if touch load operation is disabled by the HIDO NOPT configuration bit The icbt instruction is effective regardless of WIMG settings instruction or data cache enable status or the instruction cache lock status This is a supervisor level context synchronizing instruction It is e300 specific and not part of the PowerPC instruction set e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 35 Instruction Set Model tibid tibid Load Data TLB Entry Integer Unit tlbld rB 0 5 6 10 11 15 16 20 21 30 31 31 00000 00000 B 978 0 E
258. ctions user level registers programming model data types and addressing modes This section discusses the instructions defined in the UISA 3 2 4 1 This section describes the integer instructions These consist of the following Integer Instructions e Integer arithmetic instructions e Integer compare instructions e Integer logical instructions e Integer rotate and shift instructions Integer instructions use the content of the GPRs as source operands and place results into GPRs into the XER and into condition register CR fields 3 2 4 1 1 Table 3 3 lists the integer arithmetic instructions for the core Integer Arithmetic Instructions Table 3 3 Integer Arithmetic Instructions Name Mnemonic Operand Syntax Add add add addo addo rD rA rB Add Carrying addc addc addco addco rD rA rB Add Extended adde adde addeo addeo rD rA rB Add Immediate addi rD rA SIMM Add Immediate Carrying addic rD rA SIMM Add Immediate Carrying and Record addic rD rA SIMM Add Immediate Shifted addis rD rA SIMM Add to Minus One Extended addme addme addmeo addmeo rD rA e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Table 3 3 Integer Arithmetic Instructions continued Instruction Set Model Name Mnemonic Operand Syntax Add to Zero Extended addze addze addzeo addzeo rD rA Divide Word divw d
259. ctions allow movement of data from memory to registers or from registers to memory without concern for alignment These instructions can be used for a short move between arbitrary memory locations or to initiate a long move between misaligned memory fields When the core is operating with little endian byte order execution of a load or store string instruction causes the system alignment error handler to be invoked see Section 3 1 2 Byte Ordering in the Programming Environments Manual for more information Table 3 18 lists the integer load and store string instructions Table 3 18 Integer Load and Store String Instructions Name Mnemonic Operand Syntax Load String Word Immediate Iswi rD rA NB Load String Word Indexed Iswx rD rA rB Store String Word Immediate stswi rS rA NB Store String Word Indexed stswx rS rA rB Load string and store string instructions may involve operands that are not word aligned As described in Alignment Interrupt 0x00600 in Chapter 6 Interrupts in the Programming Environments Manual e300 Power Architecture Core Family Reference Manual Rev 3 20 Freescale Semiconductor Instruction Set Model a misaligned string operation suffers a performance penalty compared to a word aligned operation of the same type When a string operation crosses a 4 Kbyte boundary the instruction may be interrupted by a DSI interrupt associated with the address translation of th
260. cture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Core Interface Operation 8 3 2 Checkstops The e300 core has two checkstop input signals ckstp_in non maskable and mcp enabled when MSR ME is cleared and HIDO EMCP is set and a checkstop output ckstp_out If ckstp_in or mcp is asserted the core halts operations by gating off all internal clocks The core asserts ckstp_out if ckstp_in is asserted If ckstp_out is asserted by the core it has entered the checkstop state and processing has halted internally The ckstp_out signal can be asserted for various reasons including receiving a fea signal and detection of external parity errors For more information about the checkstop state see Section 5 5 2 2 Checkstop State MSR ME 0 8 3 3 Reset Inputs The e300 core has two reset inputs described as follows e hreset hard reset hreset is used for power on reset sequences or for situations in which the core must go through the entire cold start sequence of internal hardware initializations e sreset soft reset The soft reset input provides warm reset capability This input can be used to avoid forcing the core to complete the cold start sequence When either reset input is negated the processor attempts to fetch code from the system reset interrupt vector The vector is located at offset 0x100 from the interrupt prefix all zeros or ones depending on the setting of the interr
261. d Syntax Floating Multiply Add Double Precision fmadd fmadd frD frA frC frB Floating Multiply Add Single fmadds fmadds frD frA frC frB Floating Multiply Subtract Double Precision fmsub fmsub frD frA frC frB Floating Multiply Subtract Single fmsubs fmsubs frD frA frC frB Floating Negative Multiply Add Double Precision fnmadd fnmadd frD frA frC frB e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Instruction Set Model Table 3 9 Floating Point Multiply Add Instructions continued Name Mnemonic Operand Syntax Floating Negative Multiply Add Single fnmadds fnmadds frD frA frC frB Floating Negative Multiply Subtract Double Precision fnmsub fnmsub frD frA frC frB Floating Negative Multiply Subtract Single fnmsubs fnmsubs _ frD frA frC frB Implementation note Single precision multiply type instructions operate faster than their double precision equivalents See Chapter 7 Instruction Timing for more information 3 2 4 2 3 Floating Point Rounding and Conversion Instructions The Floating Round to Single Precision frsp instruction is used to truncate a 64 bit double precision number to a 32 bit single precision floating point number The floating point conversion instructions convert a 64 bit double precision floating point number to a 32 bit signed integer number The PowerPC architecture defines bits 0 31 of floating point reg
262. d bus control logic The BIU also captures snoop addresses for data cache address queue and memory reservation wars and stwex instruction operations The instruction cache is not snooped therefore instruction cache coherency must be maintained by software The LSU provides the data transfer interface between the data cache and the GPRs and FPRs It provides all logic required to calculate effective addresses handle data alignment to and from the data cache and provides sequencing for load and store string and multiple operations As shown in Figure 1 1 the caches provide a 64 bit interface to the instruction fetcher and LSU Write operations to the data cache can be performed on a byte half word word or double word basis e300 Power Architecture Core Family Reference Manual Rev 3 2 Freescale Semiconductor Instruction and Data Cache Operation 4 2 Data Cache Organization The e300c1 data cache is configured as 128 sets of eight blocks per set The organization of the e300c1 data cache is shown in Figure 4 1 128 Sets Block 0 Block 1 Block 2 Address Tag 2 Block 3 Address Tag 3 Block 4 Address Tag 4 Words 0 7 Block 5 Address Tag 5 Words 0 7 Block 6 Address Tag 6 Words 0 7 Block 7 Address Tag 7 Words 0 7 me 8 Words Block al Figure 4 1 e300c1 Data Cache Organization The e300c2 and e300c3 data cache is configured as 128 sets of f
263. d by bus traffic bus clock speed and memory translation These conditions are discussed in the following sections 7 3 2 1 Cache Arbitration When the fetcher requests instructions from the cache two things may happen If the instruction cache is idle and the requested instructions are present they are provided on the next clock cycle The instruction fetch cancel extension allows a new instruction fetch to be issued to the cache or to the bus if a cancelled instruction fetch is pending or active on the bus This is also called hit under cancel capability e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Instruction Timing 7 3 2 2 Cache Hit An instruction fetch that hits the instruction cache takes only one clock cycle after the request for as many as two instructions to enter the IQ Note that the cache is not blocked to internal accesses until a cache reload completes hits under misses The critical double word is written simultaneously to the cache and forwarded to the requesting unit minimizing stalls due to load delays Figure 7 6 shows a simple example of instruction fetching that hits in the on chip cache This example uses a series of integer add and and double precision floating point add instructions to show how the number of instructions to be fetched is determined how program order is maintained by the IQ and CQ how instructions are dispatched and retired in pairs maximum and
264. d differences between e300c1 and e300c2 in the instruction cache way locking feature to HID2 16 18 in Table 2 7 e300 HID2 Field Descriptions Add differences between e300c1 and e300c2 in the data cache way locking feature to HID2 24 26 in Table 2 7 e300 HID2 Field Descriptions In the last sentence of the section add the caveat that when HIDO IFEM is set the core broadcasts the M bit Change second and third bullets to include e300c2 implementation 16 Kbyte four way Add paragraph for e300c2 implementation and replace Figure 4 1 Data Cache Organization with two figures e300c1 Data Cache Organization and e300c2 Data Cache Organization Add paragraph for e300c2 implementation and replace Figure 4 2 Instruction Cache Organization with two figures e300c1 Instruction Cache Organization and e300c2 Instruction Cache Organization Remove statement that icbi broadcasts to the bus because icbi does not broadcast to the bus in the e300 core Modify the parenthetical description of icbi to say invalidate the old instruction cache entry in this processor Add paragraph and Table 4 5 e300c2 PLRU Replacement Way Selection to show e300c2 PLRU implementation e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 4 6 7 4 24 4 10 1 4 33 4 10 3 1 4 4 36 4 10 3 1 7 4 38 4 10 3 2 6 4 43 5 1 5 4 5 2 1 4 5 12 5 2 1 4 5 13 7 3
265. d in the e300 core including architecture defined and implementation specific registers e300 Power Architecture Core Family Reference Manual Rev 3 xxiv Freescale Semiconductor Chapter 3 Instruction Set Model provides a brief description of the operand conventions an overview of addressing modes and a list of the instructions implemented by the e300 core Note that instructions are organized by functions Chapter 4 Instruction and Data Cache Operation provides a discussion of the cache and memory model as implemented on the e300 core Chapter 5 Interrupts and Exceptions describes the interrupt model defined in the PowerPC OEA and the specific interrupt model implemented on the e300 core Chapter 6 Memory Management describes the e300 core s implementation of the memory management unit specifications provided by the OEA Chapter 7 Instruction Timing provides information about latencies interlocks special situations and various conditions to help make programming more efficient This chapter is of special interest to software engineers and system designers Chapter 8 Core Interface Operation provides an overview of individual signals of the e300 core It also provides a description of the coherent system bus CSB bus Chapter 9 Power Management provides information about the power saving modes for the e300 core Chapter 10 Debug Features provides information ab
266. d not attempt to emulate a misaligned Iwarx or stwex instruction because there is no correct way to define the address associated with the reservation In general the lwarx and stwex instructions should be used only in system programs which can be invoked by application programs as needed e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 25 Instruction Set Model At most one reservation exists simultaneously on any processor The address associated with the reservation can be changed by a subsequent Iwarx instruction The conditional store is performed based on the existence of a reservation established by the preceding Iwarx regardless of whether the address generated by the lwarx matches that generated by the stwex instruction A reservation held by the processor is cleared by one of the following e Executing an stwex instruction to any address e Attempt by some other device to modify a location in the reservation granularity 32 bytes The Iwarx and stwex instructions to write through memory do not cause a DSI interrupt Table 3 25 lists the UISA memory synchronization instructions for the e300 core Table 3 25 Memory Synchronization Instructions UISA Name Mnemonic Operand Syntax Load Word and Reserve Indexed lwarx rD rA rB Store Word Conditional Indexed stwex rS rA rB Synchronize sync 3 2 5 PowerPC VEA Instructions The VEA describes the semant
267. d tbe ect Tama 6 3 6 1 2 MMU EE 6 3 6 1 3 Address Translation lettre GE Aere Eed 6 8 6 1 4 Memory Protection Paci li tie Sys eeeggrdeee iten edel sieasnacadeseladdoesdyhavsteasicanteed sos be ee den 6 9 6 1 5 Page History Information oss sciscccsisceds sesvasastaydaatssasieass seaeaaea ei eao aiiai iat 6 10 6 1 6 General Flow of MMU Address Translation eee eesceeeeeseeceseceseeeeeeesseesnaeenseeees 6 11 6 1 6 1 Real Addressing Mode and Block Address Translation Selection eee 6 11 6 1 6 2 Page Address Translation EE 6 12 6 1 7 MMU Interrupts Sumninivar yas D eg ee ee Eege E 6 14 6 1 8 MMU Instructions and Register Summary 6 16 6 2 R al Addressing MOde 24 5 lt haceansesvaundesasnal esac EE et EA Geet 6 19 6 3 Block Address Trans lation cc25 biede Eeer 6 19 6 4 Memory Sesment Model eegener getrei edel lee autos Aetna die 6 19 6 4 1 Page History Kette 6 20 e300 Power Architecture Core Family Reference Manual Rev 3 D Freescale Semiconductor Paragraph Number 6 4 1 1 6 4 1 2 6 4 1 3 6 4 2 6 4 3 6 4 3 1 6 4 3 2 6 4 4 6 5 6 5 1 6 5 2 6 5 2 1 6 5 2 1 1 6 5 2 1 2 6 5 2 1 3 6 5 2 1 4 6 5 2 2 6 5 2 2 1 6 5 2 2 2 6 5 3 6 5 4 7 1 7 2 71 3 Tal 71 3 2 7 3 2 1 7 3 2 2 7 3 2 3 7 3 3 el 7 3 3 2 7 3 3 3 7 4 7 4 1 7 4 1 1 7 4 1 2 Page Title Number Reference EE 6 21 Change BAL TE 6 21 Scenarios for Reference and Change Bit Recording eeeeeeeeseeeeeeeeeteeeeeeees 6 22 Page Memory Pr
268. d using MEI the e300 core signals all cache block fills as if they were write misses read with intent to modify or RWITM flushing the corresponding copies of the data in all caches external to the core prior to the core cache block fill operation Following the cache block load the core is the exclusive owner of the data and may write to it without a bus broadcast transaction To maintain this coherency all global reads observed on the bus by the e300 core are snooped as if they are writes causing the core to write a modified cache block back to memory and invalidate the cache block or simply invalidate the cache block if it is unmodified The exception to this rule occurs when a snooped transaction is a single beat read implying caching inhibited in which case the core does not invalidate the snooped cache block If the cache block is modified the block is written back to memory and the cache block is marked exclusive If the cache block is marked exclusive when snooped no bus action is taken and the cache block remains in the exclusive state This treatment of caching inhibited reads decreases the possibility of data thrashing by allowing noncaching devices to read data without invalidating the entry from the core data cache 4 4 2 1 1 MEI State Transitions Figure 4 5 shows the state transitions when the core is configured for MEI coherency protocol HID2 MESI 0 Figure 4 5 assumes that the WIM bits for the page or block are set to
269. ddex 011111 D A B OE 0000001010 Re mulhwux 011111 D A B 0 0000001011 Re mfcr 011111 D 00000 00000 0000010011 wan 011111 D A B 0000010100 Idx 011111 D A B 0000010101 0 wax 011111 D A B 0000010111 0 slwx 011111 S A B 0000011000 Rc cntizwx 011111 S A 00000 0000011010 Rc sldx 011111 S A B 0000011011 Rc andx 011111 S A B 0000011100 Rc cmpl 011111 ef O L A B 0000100000 0 subfx 011111 A B OE 0000101000 Re Idux 011111 A B 0000110101 0 debst 011111 00000 A B 0000110110 Iwzux 011111 A B 0000110111 cntizdax 011111 S A 00000 0000111010 Rc andex 011111 S A B 0000111100 Rc fae 011111 TO A B 0001000100 0 mulhdx 011111 D A B 0 0001001001 Re mulhwx 011111 D A B 0 0001001011 Re mfmsr 011111 D 00000 00000 0001010011 0 Idarx 011111 D A B 0001010100 0 deht 011111 00000 A B 0001010110 0 Ibzx 011111 D A B 0001010111 0 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 9 Instruction Set Listings Table A 2 Complete Instruction List Sorted by Opcode continued Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 negx 011111 D A 00000 OE 0001101000 Re Ibzux 011111 D B 0001110111 0 notre 011111 S A B 0001111100 Rc subfex 011111 D A B OE 0010001000 Rc addex 011111 D A B OE 0010001010 Rc mert 011111 S
270. ded opcodes for instructions that are defined only for 64 bit implementations are illegal in 32 bit implementations and vice versa The following primary opcodes have unused extended opcodes 17 19 31 59 63 primary opcodes 30 and 62 are illegal for all 32 bit implementations but as 64 bit opcodes they have some unused extended opcodes e An instruction consisting entirely of zeros is guaranteed to be an illegal instruction This increases the probability that an attempt to execute data or uninitialized memory invokes the system illegal instruction error handler a program interrupt Note that if only the primary opcode consists of all zeros the instruction is considered a reserved instruction This is further described in Section 3 2 1 4 Reserved Instruction Class e300 Power Architecture Core Family Reference Manual Rev 3 6 Freescale Semiconductor Instruction Set Model An attempt to execute an illegal instruction invokes the illegal instruction error handler a program interrupt but has no other effect Section 5 5 7 Program Interrupt 0x00700 describes illegal and invalid instruction interrupts Except for an instruction consisting entirely of binary zeros illegal instructions are available for further additions to the PowerPC architecture 3 2 1 4 Reserved Instruction Class Reserved instructions are allocated to specific implementation dependent purposes not defined by the PowerPC architecture An attempt t
271. del This section describes the PowerPC interrupt model and the e300 core implementation specifically 1 3 4 1 PowerPC Interrupt Model The PowerPC interrupt mechanism allows the core to change to supervisor state as a result of external signals errors or unusual conditions arising in the execution of instructions The conditions that can cause interrupts are called exceptions When interrupts occur information about the state of the core is saved to certain registers and the core begins execution at an address interrupt vector predetermined for each interrupt type Interrupts are processed in supervisor mode Some interrupts such as program interrupts can be triggered by a broad range of exception conditions Other interrupts such as the decrementer interrupt have only a single exception condition Exceptions and e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 25 Overview the interrupts they cause are described in Chapter 5 Interrupts and Exceptions Although multiple exception conditions can map to a single interrupt vector a more specific condition may be determined by examining a register associated with the interrupt for example the DSISR and the FPSCR Additionally some exception conditions can be explicitly enabled or disabled by software The PowerPC architecture requires that interrupts be handled in program order therefore although a particular implementation may recogni
272. ding on whether the access is to cacheable or noncacheable memory whether it hits in the L1 cache whether the cache access generates a write back to memory whether the access causes a snoop hit from another device that generates additional activity and other conditions that affect memory accesses The core implements many features to improve throughput such as pipelining superscalar instruction dispatch branch folding removal of fall through branches two level speculative branch handling and multiple execution units that operate independently and in parallel As an instruction of load store and floating point units passes from stage to stage in a pipelined system the following instruction can follow through the stages as the former instruction vacates them allowing several instructions to be processed simultaneously While it may take several cycles for an instruction to pass through all the stages when the pipeline has been filled one instruction can complete its work on every clock cycle Figure 7 1 represents a generic pipelined execution unit Stage 1 Stage 2 Stage 3 Clock 0 Instruction A Clock 1 Instruction B Instruction A Clock 2 Instruction C Instruction B Instruction A Clock 3 Instruction D Instruction C Instruction B Figure 7 1 Pipelined Execution Unit LI Il The entire path that instructions take through the fetch decode dispatch execute complete and write back stages is considered the e30
273. e The dispatcher monitors the availability of all execution units and suspends instruction dispatch if the required execution unit is unavailable An execution unit may not be available if it can accept and execute only one instruction per cycle or if an execution unit s pipeline becomes full which may occur if instruction execution takes more clock cycles than the number of pipeline stages in the unit and additional instructions are dispatched to that unit to fill the remaining pipeline stages 7 4 Execution Unit Timings The following sections describe instruction timing considerations for each execution unit e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 17 Instruction Timing 7 4 1 Branch Processing Unit Execution Timing Flow control operations conditional branches unconditional branches and traps are typically expensive to execute in most machines because they disrupt normal flow in the instruction stream When a change in program flow occurs the IQ must be reloaded with the target instruction stream During this time the execution units will be idle However previously dispatched instructions will continue to execute while the new instruction stream makes its way into the IQ Performance features such as branch folding and static branch prediction help minimize penalties associated with flow control operations The timing for branch instruction execution is determined by many factors
274. e Cast Out Operations The core uses a PLRU replacement algorithm to determine which of the possible cache locations should be used for a cache update on a cache miss Adding a new block to the cache causes any modified data e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor EN Instruction and Data Cache Operation associated with the replacement element to be written back or cast out to system memory to maintain memory coherence Refer to Section 4 6 7 Cache Block Replacement Selection for more information on how the replacement block is selected 4 6 5 Cache Block Push Operation When a cache block in the core is snooped and hit by another bus master and the data is modified the cache block must be written to memory and made available to the snooping device The cache block that is hit is pushed out onto the CSB If the snooped transaction is a write with kill transaction that hits a modified cache block no push will be performed There is no snooping of the instruction cache 4 6 6 Data Cache Queue Sharing Extension In previous G2 cores two write queues are available for data cache burst write operations one reserved for cache replacements only and one for snoop pushes only There is also a third write queue always available for non burst write operations The e300 features a data cache queue sharing extension that allows the two burst write queues in the bus unit to be used inter
275. e Family Reference Manual Rev 3 Freescale Semiconductor Instruction Set Listings Table A 45 PowerPC Instruction Set Legend continued UISA VEA OEA Supervisor Level Optional 64 Bit Form y a Ee Ku sradix srawx srawix srdx srwx stb stbu stbux x x UO UO XxX X X Xx stbx std E We x lt stdex stdu stdux stdx stfd stfdu E a ENEE A E n stfdux stfidx stfiwx 3 stfs stfsu stfsux stfsx sth sthbrx sthu sthux sthx 4 stmw stswi 4 4 stswx stw stwbrx stwcx L e ej ej e 2 2 S S e Ss 2 Ss ejej e 2 lt lt a ss Ss Ss aes es Ss es elj el eS SO O X X 0 X X 0 X X o X O X X o UO X X X o UO X x stwu e300 Power Architecture Core Family Reference Manual Rev 3 40 Freescale Semiconductor Table A 45 PowerPC Instruction Set Legend continued Instruction Set Listings UISA VEA OEA Supervisor Level Optional 64 Bit Form stwux V xX stwx V X subfx V xO subfcx V xO subfex V xO subfic V D subfmex V xO subfzex V XO sync V X td d V X tdi 1 d d D tlbia 2 3 V d V D tlbie gt 3 V V V X tlbld 2 V D tlbli 2 6 V X tlbsync 2 3 y V X tw V X twi V D xorx V X
276. e and other information required for completion is kept in a five entry FIFO completion queue A single completion queue entry is allocated for each instruction once it enters the execution unit from the dispatch unit An available completion queue entry is a required resource for dispatch if no completion entry is available dispatch stalls A maximum of two instructions per cycle are completed in order from the queue e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Overview 1 1 5 Memory Subsystem Support The core provides separate instruction and data caches and MMUs The core also provides an efficient processor bus interface to facilitate access to main memory and other bus subsystems The memory subsystem support functions are described in the following sections 1 1 5 1 Memory Management Units MMUs The core MMUs support up to 4 Petabytes 252 of virtual memory and 4 Gigabytes 232 of physical memory referred to as real memory in the architecture specification for instruction and data The MMUs also control access privileges for these spaces on block and page granularities Referenced and changed status is maintained by the processor for each page to assist implementation of a demand paged virtual memory system Note that software assistance is required for the device to maintain reference and changed status A key bit is implemented to provide information about memory protection violations
277. e applicable to most microprocessor microarchitectures and be of general value They are identified in Figure 2 1 The performance monitor global control register PMGC0 controls the counting of performance monitor events It takes priority over all other performance monitor control registers UPMGCO provides user level read access to PMGCO The performance monitor local control registers PMLCa0 PMLCa3 control each individual performance monitor counter Each counter has a corresponding PMLCa register UPMLCa0 UPMLCa3 provide user level read access to PMLCa0 PMLCa3 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 11 Register Model 2 2 1 Hardware Implementation Register 0 HIDO Figure 2 5 shows the e300 implementation of HIDO HIDO can be accessed with mtspr and mfspr using SPR1008 SPR 1008 Access Supervisor read write 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 2 EMCP ECPE EBA EBD SBCLK ECLK PAR DOZE NAP SLEEP DPM NHR Reset All zeros 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ICE DCE ILOCK DLOCK ICFI DCFI IFEM FBIOB ABE _ NOOPTI Reset All zeros Figure 2 5 HIDO Register Table 2 5 shows the bit definitions for HIDO Table 2 5 e300 HIDO Field Descriptions Bits Name Function 0 EMCP Enable mcp The purpose
278. e check enable MEI Modified exclusive invalid MESI Modified exclusive shared invalid cache coherency protocol MFG Manufacturing revision tag MJREV Major processor design revision indicator MMU Memory management unit MNREV Minor processor design revision indicator MQ MQ register msb Most significant bit MSB Most significant byte MSR Machine state register NaN Not a number NI Non IEEE mode bit No op No operation NOOPTI No op the data cache touch instructions OEA Operating environment architecture PID Processor identification tag PIR Processor identification register PLL Phase locked loop POR Power on reset POW Power management enable POWER Performance optimized with enhanced RISC architecture PR Privilege level PROC Processor revision tag PT Processor ID type tag PTE Page table entry PTEG Page table entry group e300 Power Architecture Core Family Reference Manual Rev 3 XXX Freescale Semiconductor Table i Acronyms and Abbreviated Terms continued Term Meaning PVR Processor version register RAW Read after write RI Recoverable interrupt RID Resource ID RISC Reduced instruction set computing RTL Register transfer language RWITM Read with intent to modify SDR1 Register that specifies the page table base address for virtual to physical address translation SE Single step trace enable SIG_TYPE Combinational sign
279. e core ignores the extended opcode differences between mftb and mfspr by ignoring TB 25 and treating both instructions identically 3 2 6 2 3 Move To From Performance Monitor Register Instructions The APU defines instructions for reading and writing the PMRs as shown in Table 3 33 Table 3 33 Performance Monitor APU Instructions Name Mnemonic Syntax Move from Performance Monitor Register mfpmr rD PMRN Move to Performance Monitor Register mtpmr PMRN rS 3 2 6 3 Memory Control Instructions OEA This section describes memory control instructions which include the following types e Cache management instructions e Segment register manipulation instructions e Translation lookaside buffer management instructions 3 2 6 3 1 Supervisor Level Cache Management Instruction The supervisor level cache management instruction in the PowerPC architecture debi is used to invalidate individual cache blocks The icbt instruction another supervisor level instruction performs a bus read operation from the bus and allocates into the instruction cache This instruction is new to the e300 core and supplements the instruction cache locking mechanisms and the new lock protect feature The user level debf instruction described in Section 3 2 5 3 Memory Control Instructions VEA and Section 4 5 2 Cache Control Instructions should be used when the program needs to invalidate cache blocks Note that the debf instruction ca
280. e core uses an interim 52 bit virtual address and hashed page tables for generating 32 bit physical addresses The MMwUs in the e300 core rely on the interrupt processing mechanism for the implementation of the paged virtual memory environment and for enforcing protection of designated memory areas Instruction and data TLBs provide address translation in parallel with the on chip cache access incurring no additional time penalty in the event of a TLB hit A TLB is a cache of the most recently used page table entries Software is responsible for maintaining the consistency of the TLB with memory The core TLBs are 64 entry two way set associative caches that contain instruction and data address translations The core provides hardware assist for software table search operations through the hashed page table on TLB misses Supervisor software can invalidate TLB entries selectively For instructions and data that correspond to block address translation the e300 core provides independent eight entry BAT arrays These entries define blocks that can vary from 128 Kbytes to 256 Mbytes The BAT arrays are maintained by system software HID2 HBE is added to the e300 for enabling or disabling the four additional pairs of BAT registers However regardless of the setting of HID2 HBE these BATs are accessible by mfspr and mtspr As specified by the PowerPC architecture the hashed page table is a variable sized data structure that defines the mapping between
281. e eee ens eed eed 1 1 1 IP ALU De ss ee 1 1 2 Instruction UNI E 1 1 2 1 Instruction Queue and Dispatch Unit sseeeeeseeeseseeesserssresseseresresresse Ld Branch Processing Unit REI ebe ietetodeggesgieteeiesgeo Edge dete geg 1 1 3 Independent Execution Units s 2 cusieta einen Ate 1 1 3 1 PGS Sr Unit D E 1 1 3 2 Floating Point Unit PPU 1 1 3 3 L ad Store Unit ESU EE 1 1 3 4 System Register Unit SRU 2 een ee eae ead Sees 1 1 4 Completion Uni seen eege Se ee ee ee Ses 1 1 5 Memory Subsystem Support 1 1 5 1 Memory Management Units MM 1 1 5 2 alten Etgen EE E 1 1 6 Bus Interface Unit OBIUY 1 1 7 System SUpport El Ee 1 1 7 1 Power Maneet een 1 1 7 2 Time Base Decrementet seseseeseeeeeeseeseeseesresrtesetsresresseeseesersseeseeseresee 1 1 7 3 JTAG Test and RR E EE 1 1 7 4 Clock Mil ti pliers iscesvaniccdesscqeesaseedestisjsattngnavalarsepsaecascnceevaseseaumaavevaieeanes ae Core Performance Monitor A cdateds avehe irene eee eseecitees 1 2 PowerPC Architecture Implementation sssnssessseseseeeessessessseeeseeesssresseesse 1 3 Implementation Specific Intormapon 1 3 1 Register Modeleren itane A Percssdeeahareenhaetdtvstal nh inattdelelcenttusaletaat Macs 1 3 1 1 LKE 1 3 1 1 1 General Purpose Registers GPRs sesssssssssesssesesssssessseessresseesse e300 Power Architecture Core Family Reference Manual Rev 3 Page Number Freescale Semiconductor Paragraph Page Number Title Num
282. e following assembly code disables all asynchronous interrupts Clear the following bits from the MSR EE 16 ME 19 FEO 20 FEL 23 E 24 mfmsr rad lis Y2 OxFFFF ori r2 r2 Ox667F and Eih Ely E2 sync mtmsr EI isync 4 10 3 1 4 Invalidating the Data Cache If a non empty data cache has modified data and the data cannot be discarded the data cache must be flushed before it can be invalidated Data cache flushing is accomplished by filling the data cache with known data and performing a flash invalidate or a series of deht instructions that force a flush and invalidation of the data cache block The following code sequence shows how to flush the data cache r6 contains a block aligned address in memory with which to fill the data cache For this example address 0x0 is used li r6 0x0 CTR number of data blocks to load e300 Power Architecture Core Family Reference Manual Rev 3 36 Freescale Semiconductor Instruction and Data Cache Operation e300c1 number of blocks 32K 32 Bytes block 2 15 2 5 2 9 0x400 e300c2 and e300c3 number of blocks 16K 32 Bytes block 2 14 2 5 2 8 0x200 li rl 0x400 Use e300cl value for this example mtctr el Save the total number of blocks in cache to r8 mr ER El Load th ntire cache with known data loop lwz ey 0486 addi fro 6 32 Find the next block bdnz loop Decrement the counter a
283. e given processor appear to have completed before the FPSCR instruction is initiated and e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 15 Instruction Set Model that no subsequent floating point instructions appear to be initiated by the given processor until the FPSCR instruction has completed The FPSCR instructions are listed in Table 3 12 Table 3 12 Floating Point Status and Control Register Instructions Name Mnemonic Operand Syntax Move from FPSCR mffs mffs frD Move to Condition Register from FPSCR merfs crfD crfS Move to FPSCR Bit 0 mtfsb0 mtfsb0 crbD Move to FPSCR Bit 1 mtfsb1 mtfsb1 crbD Move to FPSCR Field Immediate mtfsfi mtfsfi crfD IMM Move to FPSCR Fields mifsf mtfsf FM frB Implementation Note The architecture notes that in some implementations the Move to FPSCR Fields mtfsfx instruction may perform more slowly when only a portion of the fields are updated as opposed to all of the fields This is not the case in the e300 core 3 2 4 2 6 Floating Point Move Instructions Floating point move instructions copy data from one floating point register to another The floating point move instructions do not modify the FPSCR The CR update option in these instructions controls the placing of result status into CR1 Floating point move instructions are listed in Table 3 13 Table 3 13 Floating Point Move Instructions
284. e previous example The two integer units in the e300c2 e300c3 along e300 Power Architecture Core Family Reference Manual Rev 3 22 Freescale Semiconductor Instruction Timing with the faster multipliers result in higher throughput of integer instructions when compared to the e300c1 Fetch in IQ In Dispatch Entry lQ0 IQ1 Execute P Complete In CQ In Retirement Entry CQ0 CQ1 Instru tion Queue Completion Queue N Och Cu Figure 7 10 Instruction Timing Integer Execution in the e300c2 and e300c3 7 4 3 Floating Point Unit Execution Timing Floating point instructions are not supported on the e300c2 core This section is for comparison purposes only with the e300 core The FPU on the e300 core executes all floating point computational instructions The LSU performs integer floating point loads and stores Execution of most floating point instructions is pipelined within the FPU allowing up to three instructions to be executing in the FPU concurrently On the e300 core the FPU is pipelined so a single precision multiply add instruction can be issued and completed every clock cycle While most floating point instructions execute with three or four cycle latency and one or two cycle throughput three instructions fdivs fdiv
285. e processor runs in big endian mode 1 The processor runs in little endian mode For the e300 core see Section 3 1 2 Endian Modes and Byte Ordering for a definition of the core operating in true little endian mode The IEEE floating point exception mode bits FEO and FE1 together define whether floating point exceptions are handled precisely imprecisely or if they are taken at all The possible settings and default conditions for the e300 core are shown in Table 5 9 For further details see Chapter 6 Interrupts in the Programming Environments Manual Table 5 9 IEEE Floating Point Exception Mode Bits FEO FE1 Mode 0 0 Floating point exceptions disabled 0 1 Floating point imprecise nonrecoverable 1 0 Floating point imprecise recoverable 1 1 1 Floating point precise mode 1 Not implemented in the e300 core MSR bits are guaranteed to be written to SRR1 when the first instruction of the interrupt handler is encountered 5 2 2 Enabling and Disabling Exceptions and Interrupts When an exception condition exists that may cause an interrupt to be generated it must be determined whether the interrupt is enabled for that condition as follows e IEEE floating point enabled exceptions which can generate a program interrupt are ignored when both MSR FEO and MSR FE 1 are cleared If either of these bits are set all IEEE enabled floating point exceptions are taken and cause a program in
286. e second page In this case the core performs some or all memory references from the first page and none from the second before taking the interrupt On return from the DSI interrupt the load or store string instruction will re execute from the beginning For more information refer to DSI Interrupt 0x00300 in Chapter 6 Interrupts in the Programming Environments Manual Implementation Note If rA is in the range of registers to be loaded for a Load String Word Immediate Iswi instruction or if either rA or rB is in the range of registers to be loaded for a Load String Word Indexed Iswx instruction the PowerPC architecture defines the instruction to be of an invalid form In addition the Iswx and stswx instructions that specify a string length of zero are defined to be invalid by the PowerPC architecture However none of these cases hold true for the e300 core the core treats these cases as valid forms 3 2 4 3 8 Floating Point Load and Store Address Generation Floating point load and store operations generate effective addresses using the register indirect with immediate index addressing mode and register indirect with index addressing mode details are described below Floating point loads and stores are not supported for direct store accesses The use of the floating point load and store operations for direct store accesses results in a DSI interrupt Implementation Note The e300c2 core does not support floating point op
287. e table search software routine to determine if a PTE contains the address translation for the ICMP and DCMP instruction or data access The contents of ICMP and DCMP are automatically derived by the core when a TLB miss interrupt occurs These registers are implementation specific Required physical address The system software loads a TLB entry by loading the second word of the matching PTE entry register RPA into the RPA register and then executing the tlbli or tlbld instruction for loading the ITLB or DTLB respectively This register is implementation specific Note that the core contains other features that do not specifically control the MMU but are implemented to increase performance and flexibility These are e Complete set of shadow segment registers for the instruction MMU These registers are invisible to the programming model as described in Section 6 4 3 TLB Description e Temporary GPRO GPR3 These registers are available as r0 r3 when MSR TGPR is set The core automatically sets MSR TGPR whenever one of the three TLB miss interrupts occurs allowing these interrupt handlers to have four registers that are used as scratchpad space without having to save or restore this part of the machine state that existed when the interrupt occurred Note that MSR TGPR is restored to the value in SRR1 when the rfi instruction is executed Refer to Section 6 5 2 Implementation Specific Table Search Operation fo
288. e300 Power Architecture Core Family Reference Manual Supports e300c1 e300c2 e 300c3 e300coreRM Rev 3 12 2006 7 freescale How to Reach Us Home Page www freescale com Web Support http Awww freescale com support USA Europe or Locations Not Listed Freescale Semiconductor Inc Technical Information Center EL516 2100 East Elliot Road Tempe Arizona 85284 1 800 521 6274 or 1 480 768 2130 www freescale com support Europe Middle East and Africa Freescale Halbleiter Deutschland GmbH Technical Information Center Schatzbogen 7 81829 Muenchen Germany 44 1296 380 456 English 46 8 52200080 English 49 89 92103 559 German 33 1 69 35 48 48 French www freescale com support Japan Freescale Semiconductor Japan Ltd Headquarters ARCO Tower 15F 1 8 1 Shimo Meguro Meguro ku Tokyo 153 0064 Japan 0120 191014 or 81 3 5437 9125 support japan freescale com Asia Pacific Freescale Semiconductor Hong Kong Ltd Technical Information Center 2 Dai King Street Tai Po Industrial Estate Tai Po N T Hong Kong 800 2666 8080 support asia freescale com For Literature Requests Only Freescale Semiconductor Literature Distribution Center P O Box 5405 Denver Colorado 80217 1 800 441 2447 or 1 303 675 2140 Fax 1 303 675 2150 LDCForFreescaleSemiconductor hibbertgroup com Document Number e300coreRM Rev 3 12 2006 Information in this document is provided so
289. earch operations 5 10 stwex 8 8 Superscalar definition 7 2 Supervisor mode 5 12 Supervisor level SPRs 1 19 e300 Power Architecture Core Family Reference Manual Rev 3 Index 8 Freescale Semiconductor SVR system version register 2 11 2 23 UPMLCa0 UPMLCa3 user performance monitor local Synchronization control registers A 0 3 11 5 context synchronization 3 8 User mode 5 1 5 12 execution of rfci 5 16 User level SPRs 1 18 execution of rfi 5 15 execution synchronization 3 9 v memory instructions UISA 3 25 VEA 3 27 memory synchronization instructions A 21 requirements for setting breakpoints 10 6 requirements for special registers and TLBs 3 32 X System call se 1 28 XER 32 bit 2 5 system call interrupt 5 4 5 31 System linkage instructions 3 28 A 22 System management interrupt smi 1 29 5 5 5 12 5 36 9 1 System register unit SRU 7 3 execution timing 7 24 latency CR logical instructions 7 29 system register instructions 7 28 architecture VEA Virtual page number 6 27 T Table search operations algorithm 6 25 software routines 6 29 6 34 6 44 software table search operations SRRI1 bit settings 5 10 table search flow primary and secondary 6 27 TBL TBU time base facility for reading 1 19 2 6 for writing 2 10 time base register 9 2 time base decrementer 1 14 time of day maintenance 9 4 TGPRn temporary general purpose regs 0 3 6 30 Trace interrupt 5 5
290. eatures Table 10 1 Other Debug and Support Register Bits Register Bits Name Description MSR 17 PR Privilege level Breakpoint registers can only be accessed when this bit is cleared PR 0 corresponds to supervisor mode 21 SE Single step trace enable 0 The processor executes instructions normally 1 The processor generates a trace interrupt upon the successful completion of the next instruction 22 BE Branch trace enable 0 The processor executes branch instructions normally 1 The processor generates a trace interrupt upon the successful completion of a branch instruction HIDO 0 31 _ See Table 2 8 for details DAR 0 31 _ Data address register DAR is loaded with the effective address of a data breakpoint condition that matches DSISR 9 DABR Set if a DABR interrupt occurs 10 1 6 Interrupt Vectors for Debugging Table 10 2 lists the interrupt vectors that are associated with debug and breakpoint events Breakpoint events do not change other interrupt vectors and conditions Table 10 2 Debug Interrupts and Conditions Interrupt Type Vector Offset Exception Condition Data access DSI 00300 A data access breakpoint interrupt occurs when a match condition exists for the effective address of the data access in either DABR or DABR2 for the next read or write data access and the corresponding WBE or RBE DABR enable bits are set for a read or write access respect
291. eatures through JTAG boundary scan capability Features specific to the e300 core not present on the G2 processors follow Enhancements to the register set The e300 core has one more HIDO bit than the G2 The enable cache parity checking ECPE bit HIDO 1 gives the e300 core the ability to enable the taking of a machine check interrupt based on the detection of a cache parity error Enhancements to cache implementation 32 Kbyte eight way set associative instruction and data cache on the e300c1 16 Kbyte four way set associative instruction and data cache on the e300c2 and e300c3 Full parity checking is performed on both instruction and data cache memory arrays Lockable L1 instruction and data caches entire cache or on a per way basis up to 7 of 8 ways on the e300c1 and 3 of 4 ways on the e300c2 and e300c3 New icht instruction supports initialization of instruction cache Data cache supports four state MESI coherency protocol The instruction cache is blocked only until the critical load completes hit under reloads allowed Instruction cancel mechanism improves utilization of instruction cache by supporting hits under cancels and misses under cancels The critical double word is simultaneously written to the cache and forwarded to the requesting unit thus minimizing stalls due to load delays Data cache queue sharing makes cast outs and snoop pushes more efficient Provides for an o
292. ecrementer interrupt requests are received before the first can be reported only one interrupt is reported The occurrence of a decrementer interrupt cancels the request Register settings for this interrupt are described in Chapter 6 Interrupts in the Programming Environments Manual When a decrementer interrupt is taken instruction execution for the handler begins at offset 0x00900 from the physical base address indicated by MSR IP 5 5 10 Critical Interrupt 0x00A00 A critical interrupt is signaled to the e300 core by the assertion of the int signal The interrupt may not be recognized if a higher priority interrupt occurs simultaneously or if the MSR CE bit is cleared when cint is asserted The following events occur when the e300 recognizes the assertion of cint e Multi cycle instructions not in the completion stage are terminated e Outstanding load or store instructions that have not been completed are terminated e Any outstanding page table search activity is terminated e The effective address for resuming program execution is saved into CSRRO e The contents of MSR are saved into CSRR1 e The MSR register is loaded with all zeros except the IP ILE and ME bits which remain unchanged e Interrupt processing starts at offset value 0xOOA00 from the physical base address indicated by MSR IP Some types of instructions for example load multiple string and floating point instructions cause additional interrupt recognition lat
293. ects the cache even if the cache is disabled 4 5 2 6 Data Cache Block Invalidate dcbi Instruction The effective address is computed translated and checked for protection violations as defined in the PowerPC architecture This instruction is treated as a store to the addressed byte with respect to address translation and protection The debi instruction should be used with caution on the e300 core If the address hits in the cache the cache block is invalidated regardless of the state of the cache block Even if the cache block is modified it is not pushed to main memory and the associated data is lost Because this instruction may effectively destroy modified data it is privileged that is debi is available to programs at the supervisor privilege level MSR PR 0 A BAT or TLB protection violation for a debi translation generates a DSI interrupt The function of this instruction is independent of the WIMG bit settings of the block or PTE containing the effective address However the execution of debi broadcasts an address only kill transaction on the CSB if HIDO ABE is set or if operating in four state MESI mode and the target address is marked memory coherency required Execution of a debi instruction affects the cache even if the cache is disabled e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 19 Instruction and Data Cache Operation 4 5 2 7 Instruction Cache Block Touch icbt Ins
294. ecture Core Family Reference Manual Rev 3 34 Freescale Semiconductor Interrupts and Exceptions Note that the e300 core has a second instruction address breakpoint register ABR2 that functions identically to IABR and allows for two instruction breakpoints to be enabled The bit settings for when an instruction address breakpoint interrupt is taken are shown in Table 5 22 Table 5 22 Instruction Address Breakpoint Interrupt Register Settings Register Setting Description SRRO Set to the address of the next instruction to be executed in the program for which the TLB miss interrupt was generated SRR1 0 15 Cleared 16 31 Loaded from MSR 16 31 MSR POW 0 FP 0 FEI 0 RI 0 TGPR 0 ME CE LE Set to value of ILE ILE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 The default breakpoint action is to trap before the execution of the matching instruction Table 5 23 shows the priority of actions taken when more than one mode is enabled for the same instruction If trace and breakpoint conditions occur simultaneously the breakpoint conditions receive higher priority The e300 core requires that an mtspr instruction that updates the IABR be followed by a context synchronizing instruction If the mtspr instruction enables the instruction address breakpoint interrupt the context synchronizing instruction cannot generate a breakpoint response The e300 core also cannot block a breakpoint response on
295. ed MSR DR 0 MSR IR 1 A MSR DR 1 Perform Real Addressing Mode Translation Perform Real Addressing Mode Translation Compare Address with Instruction or Data BAT Array As Appropriate BAT Array BAT Array see the Programming Miss Hit Environments Manual Perform Address Translation with Segment Descriptor Access Access Protected Permitted see Figure 6 6 Access Faulted Translate Address Continue Access to Memory Subsystem Figure 6 5 General Flow of Address Translation Real Addressing Mode and Block e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 11 Memory Management 6 1 6 2 Page Address Translation Selection If address translation is enabled real addressing mode not selected and the effective address information does not match with a BAT array entry then the segment descriptor must be located Once the segment descriptor is located the T bit in the segment descriptor selects whether the translation is to a page or to a direct store interface segment as shown in Figure 6 6 Note that the e300 core does not implement the direct store interface and accesses to these segments cause a DSI interrupt In addition Figure 6 6 also shows the way the no execute protection is enforced if the N bit in the segment descriptor is set and the access is an instruction fetch the access is faulted as described in Chapter 7 Memory Manag
296. ed for 64 bit implementations and unimplemented optional instructions such as fsqrt eciwx and ecowx as illegal and takes a program interrupt when one of these instructions is encountered Likewise if a supervisor level instruction is encountered when the processor is in user level mode a privileged instruction type program interrupt is taken 5 5 8 Floating Point Unavailable Interrupt 0x00800 A floating point unavailable interrupt occurs when no higher priority interrupt exists an attempt is made to execute a floating point instruction including floating point load store and move instructions and the floating point available bit in the MSR is disabled MSR FP 0 Register settings for this interrupt are described in Chapter 6 Interrupts in the Programming Environments Manual When a floating point unavailable interrupt is taken instruction execution for the handler begins at offset 0x00800 from the physical base address indicated by MSR IP e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 29 Interrupts and Exceptions 5 5 9 Decrementer Interrupt 0x00900 The e300 core implements the decrementer interrupt as it is defined in the PowerPC architecture A decrementer interrupt request is made when the decrementer counts down through zero The request is held until there are no higher priority interrupts and MSR EE 1 At this point the decrementer interrupt is taken If multiple d
297. eeeeeeeee 11 2 5 Performance Monitor Counter Registers DMCO DMCH 11 2 6 User Performance Monitor Counter Registers UPMCO UPMC3 11 2 7 Performance Monitor Instructions cessccessseceesteceseeceeneceesaeeeeaeeeeas 11 3 Performance Monitor Interrupt ssc tccssd eege 11 4 Pyent ETH ne oe hee eG Aa Reid ee las 11 4 1 Processor Context Configurability cee ceeceeesccecssececeseeceseeeceneeeeneeeenes 11 5 Performance Monitor Application Examples c ccessscecseceeeeseeeeeteeeesaes 11 5 1 Chaining Counters icisescaccissaveisssavavkeginnavecaasbassdeaiusenesansaecatasdvendeaseaceaveseeeees 11 5 2 Eyent STC CU EEN e300 Power Architecture Core Family Reference Manual Rev 3 Page Number Freescale Semiconductor xiii Paragraph Number A l A 2 A AA A3 Cl C 2 C 3 Page Title Number Appendix A Instruction Set Listings Instructions Sorted by Mnemonic y hsccssssgsenseansiedsiscched beds deasstenyhaybiwasicesacedetns tase cnceateoanenttavs A 1 Instructions Sorted Dy RE A 8 Instructions Grouped by Functional Categories 0 ceccceeeeccecsseceeeececeneeeceneeeeseeeeaees A 15 Instructions Sorted DY FOM ees A 24 Instruction Set Legend wvcicssccissscicasscassied nonnina sp E Ei E DES ENEE A 35 Appendix B Instructions Not Implemented Appendix C Revision History Changes from Revision tee eene Ee Ee C 1 Changes from Revision 1 to Revision 5 2 tencsichd ee cccssd ac pansoradsthareeneeeiaeascee
298. eeeesteeeenaes 2 22 2 15 Critical Interrupt Save Restore Register 1 CSRRI eee eeecceceeeeecseeeeceeeeceeeeeceeeeeeeteeeenaes 2 22 2 16 Critical Interrupt Save Restore Register 0 CSRRO ceeeeeeeseceececsseeeceeeeceeeeecneeeeesteeeesaes 2 22 2 17 SPRG REGISUED EE 2 22 2 18 EE 2 23 2 19 APR Gi TAB RZ REESE eege dee 2 24 2 20 IBCR EE 2 24 2 21 DABR and DABR2 Registers uo caden eda aep a a Tarii REESE 2 25 2 22 DBCR Registef ecne n r ar ia Ee AE RE R 2 27 4 1 e300c1 Data Cache RE nr EE 4 3 4 2 e300c2 and e300c3 Data Cache Organization sssssesseseesseesssesseesseeessressstesseessesseeesseeeseesse 4 3 4 3 e300c1 Instruction Cache Organization iegenegieg eege enee 4 4 4 4 e300c2 and e300c3 Instruction Cache Organ zapon 4 5 4 5 MEI Cache Coherency Protocol State Diagram WIM 001 4 10 4 6 MESI Cache Coherency Protocol State Diagram WIM 0011 4 11 4 7 PLRU Replacement Al gorithms 3 oscsciassiceisiisserecntvnvcedasedveccaavcsades sotsauaea ESA ENEE 4 24 4 8 Bus Interface Address EE 4 27 4 9 Double Word Address Ordeng Copcal Double Word Fret 4 28 5 1 Machine Status Save Restore Register O SRRO A 5 8 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor XV Figure Number 5 2 5 3 5 4 5 5 5 6 6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6 11 6 12 6 13 6 14 6 15 6 16 6 17 6 18 7 1 7 2 7 3 7 4 7 5 7 6 7 7 7 8 7 9 7 10 11 1 Figures Page Title Number
299. eenees 3 23 Leg E at e 3 23 Condition Register Logical Instructions cece ceeeeeeseeeeeceeeeeeseecsaeeneeeeeeeenees 3 23 oe IS CHOU So ees 3 24 Processor Control Tnstruettong 0ezggbegeEdeEdEd NEEN deed SEENEN Se 3 24 Move To From Condition Register Instructions ecececesececeeeeeeeeeeeeneeeeeaes 3 24 Memory Synchronization Instructions UISA 000 eee eeeeeeeeeeeeneecneecnseeeeeeeeneeens 3 25 PWT VEA TEE eer 3 26 Processor Control Stretton E 3 26 Memory Synchronization Instrucpons NEA cc eeeseeeeeeeceeceeceeeeeeteeeeeteeeenaeees 3 27 Memory Control Instructons NEA eeeeceessecesceceeeeeceenceeeeeeceeeeeceeeeessteeeenaeees 3 27 PowerPC OEA Toten added Eddie eet 3 28 System Linkage Juergen EE 3 28 Processor Control Instructons OEA EE 3 28 Move To From Machine State Register Instructnons 3 29 Move To From Special Purpose Register Instructtong 3 29 e300 Power Architecture Core Family Reference Manual Rev 3 vi Freescale Semiconductor Paragraph Number 3 2 6 2 3 3 2 6 3 3 2 6 3 1 3 2 6 3 2 3 2 6 3 3 3 2 7 3 2 8 4 1 4 1 1 4 1 2 4 2 4 3 4 4 4 4 1 4 4 1 1 4 4 1 2 4 4 1 3 4 4 1 4 4 4 1 5 4 4 2 4 4 2 1 4 4 2 1 1 4 4 2 2 4 4 2 2 1 4 4 2 3 4 4 2 4 4 4 3 4 4 3 1 4 4 3 2 4 4 3 3 4 4 3 4 4 5 4 5 1 4 5 1 1 4 5 1 2 4 5 1 3 4 5 1 4 Page Title Number Move To From Performance Monitor Register Instructions eeeeeeeeeeeeee 3 32 Memory Control Instructons OEA 3 32 Supervisor
300. eference Manual Rev 3 6 Freescale Semiconductor Power Management Hf He ee I ee A I eA I I set msr pow bit to go into sleep mode sync mfmsr r5 addis r3 r0 ori 3y Ey or ro 3 mtmsr r5 isync addis r20 r0 ori 20 2 stay_here addic r20 bgt cro 0x0004 0x0000 B5 0x0000 0 0x0002 r20 I stay_here restore corrupted registers lwz mtcrf sync rfi r23 0x05e4 r0 0 r23 0x05e8 r0 r22 0x05ec r0 srr E22 r21 0x05f0 r0 srrQ p21 r20 0x05f4 r0 E GE r23 0 0x05fc EO get MSR turn on POW bit turn on ME bit 19 subtract 1 from r20 and set cc loop if positive FE KK KK KI ee ee KK ee KK Kk to get out of sleep mode do a Soft Reset FE KK KK I I I I ee KK KK orig 0x00000100 force big endian mode stw stw mfmsr ori ori ori xori ori mtmsr ori isync ori K E E E T E Se r Ts Se 0 0x05f8 r0 0 0x05fc r0 0 0 r0 xr0 0 r0 0x0001 0 r0 r0 0 r0 0x0001 0 r0 xr0 0 0 r0 xr0 0 r0 xr0 Reset handler in low memory need nop every second inst force big endian force big endian save off additional registers to be corrupted stw stw stw r20 0x05f4 r0 E21 0x050 rO r22 0x05ec r0 bit bit e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Power Management stw r23 0x05e8 r0 mfcr E23 stw r23 0x05e4 r0 xor SG EU resto
301. effective address gt IABR2 CEA Table 10 6 describes the instruction address breakpoint register settings for an address matching outside an address range condition Table 10 6 Address Matching for Outside Address Range Signal Condition Signal Condition IABR CEA IABR2 CEA IABR BE 1 IABR2 BE 1 IBCR CNT 0 IBCR SIG_TYPE OR IBCR CMP1 lt IBCR CMP2 gt With address matching for an outside address range a match occurs when the instruction s effective address lt IABR CEA OR the instruction s effective address lt IABR2 CEA For more details see Section 2 2 15 Instruction Address Breakpoint Control Register IBCR and Section 2 2 17 Data Address Breakpoint Control Register DBCR 10 3 Synchronization Requirements and Other Precautions An isync instruction must follow the execution of the mtspr to the breakpoint related registers HIDO IABR IABR2 DABR DABR2 IBCR and DBCR or mtmsr for MSR to ensure that the breakpoint condition is set IBCR and DBCR should be set before a corresponding breakpoint is enabled The breakpoint enable bits should be cleared before changing bits in the IBCR and DCBR For more details see Section 5 5 17 Instruction Address Breakpoint Interrupt 0x01300 An unrecoverable state occurs at any time if one of the register values of IABR IABR2 DABR and DABR2 are set to point to an interrupt vector The IABR or I
302. egister user privilege bit MSR PR is set In the e300 core this interrupt is generated for mtspr or mfspr with an invalid SPR field if SPR 0 1 and MSR PR 1 This may not be true for all cores that implement the PowerPC architecture e Trap A trap type program interrupt is generated when any of the conditions specified in a trap instruction are met Floating point unavailable 00800 Caused by an attempt to execute a floating point instruction including floating point load store and move instructions when the floating point available bit MSR FP is cleared In the e300c2 core any attempt to execute a floating point instruction results in a floating point unavailable exception Decrementer 00900 Occurs when DEC 31 changes from 0 to 1 This interrupt is enabled with MSR EE Critical interrupt 00A00 Taken when cint is asserted and MSR CE 1 Reserved OOBOO OOBFF System call 00C00 Occurs when a System Call sc instruction is executed Trace 00D00 Taken when MSR SE 1 or when the currently completing instruction is a branch and MSR BE 1 e300 Power Architecture Core Family Reference Manual Rev 3 28 Freescale Semiconductor Overview Table 1 2 Exceptions and Interrupts continued Interrupt Type Ve tor Offset Exception Conditions hex Reserved 00E00 The e300 core does not generate an interrupt to this vector Other devices may use this
303. ement in the Programming Environments Manual Note that the figure shows the flow for these cases as described by the OEA and therefore the TLB references are shown as optional Since the core implements TLBs these branches are valid and described in more detail throughout this chapter If the T bit in the corresponding segment descriptor is zero page address translation is selected The information in the segment descriptor is then used to generate the 52 bit virtual address The virtual address is then used to identify the page address translation information stored as page table entries PTEs in a page table in memory For increased performance the core has two TLBs to store recently used PTEs on chip If an access hits in the appropriate TLB the page translation occurs and the physical address bits are forwarded to the memory subsystem If the required PTE is not resident the MMU requires a search of the page table In this case the core traps to one of three interrupt handlers for the system software to perform the page table search If the PTE is successfully matched a new TLB entry is created and the page translation is once again attempted This time the TLB is guaranteed to hit Once the PTE is located the access is qualified with the appropriate protection bits If the access is a protection violation not allowed an interrupt instruction access or data access is generated If the PTE is not found by the table search oper
304. ement the PowerPC architecture automatically prefetch instructions into the instruction cache This feature can be used to preload explicit instructions into the cache even when it is known that their execution will be canceled Although the execution of the instructions is canceled the instructions remain valid in the instruction cache Because instructions are intentionally executed speculatively care must be taken to ensure that all I O memory is marked guarded Otherwise speculative loads and stores to I O space could potentially cause data loss See the Programming Environments Manual for a full discussion of guarded memory The code that prefetches must be in caching inhibited memory as in the following example Assuming interrupts are disabled cache has been flushed the MMU is on and we are executing in a caching inhibited e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 41 Instruction and Data Cache Operation location in memory LR and r6 Starting address of code to lock CTR Number of cache blocks to lock r2 non zero numerator and denominator loop must begin on an 8 byte boundary to ensure that the divw and begqlr are fetched on the same cycle Se SE SE SE RE orig OxFFFO4000 loop divw ie 24 ES LONG divide w non zero result beqlrt Cause the prefetch to happen addi LG Ee 32 Find next block to prefetch mtlr r6 set the next block bdnz loop
305. ency Timing critical applications must consider these instruction execution latencies in calculating worst case interrupt recognition latency Upon returning from a critical interrupt routine the core restarts any terminated or uncompleted instructions including terminated load multiple or load string instructions Note that these restarted load instructions may cause side effects on peripheral devices that have auto decrementer or status bit changes caused by the subsequent load accesses e300 Power Architecture Core Family Reference Manual Rev 3 30 Freescale Semiconductor Interrupts and Exceptions The register settings for the critical interrupt are shown in Table 5 16 Table 5 19 Critical Interrupt Register Settings Register Setting CSRRO _ Set to the effective address of the instruction that the processor would have attempted to execute next if no interrupt conditions were present CSRR1 0 15 Cleared 16 31 Loaded from MSR 16 31 MSR POW 0 FP 0 FE1 0 RI 0 TGPR 0 ME CE 0 LE Set to value of ILE ILE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 The e300 core only recognizes the interrupt condition cint asserted if the MSR CE bit is set it ignores the interrupt condition if the MSR CE bit is cleared To guarantee that the critical interrupt is taken the cint signal must be held asserted until the e300 core takes the interrupt If the cint signal is negated before the interrupt is ta
306. ent for more information about memory management for the core 1 1 5 2 Cache Units The e300c1 provide independent 32 Kbyte eight way set associative instruction and data caches The e300c2 and e300c3 provide 16 Kbyte four way set associative instruction and data caches The cache block is 32 bytes long The caches adhere to a write back policy but the e300 core allows control of cacheability write policy and memory coherency at the page and block levels The caches use a pseudo LRU replacement policy As shown in Figure 1 1 Figure 1 2 and Figure 1 3 the caches provide a 64 bit interface to the instruction fetch unit and LSU The surrounding logic selects organizes and forwards the requested information to the requesting unit Write operations to the cache can be performed on a byte basis and a complete read modify write operation to the cache can occur in each cycle The load store and instruction fetch units provide the caches with the address of the data or instruction to be fetched In the case of a cache hit the cache returns two words to the requesting unit Because the data cache tags are single ported simultaneous load store and snoop accesses cause resource contention Snoop accesses have the highest priority and are given first access to the tags unless the snoop access coincides with a tag write in this case the snoop is retried and must rearbitrate for cache access Loads or stores deferred due to snoop accesses are
307. ented once every four core input clock cycles 2 2 Implementation Specific Registers This section describes the implementation specific registers of the e300 core The core defines the following registers used for software table search operations DMISS IMISS DCMP ICMP HASH1 HASH2 and RPA These registers should be accessed only when address translation is disabled MSR IR and MSR DR are both zero For a complete discussion refer to Section 6 5 2 Implementation Specific Table Search Operation e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Register Model The implementation specific registers also include HIDO HID1 HID2 and IABR SPRs All of these registers can be accessed by supervisor level instructions only using the SPR numbers shown in Figure 2 1 In addition the e300 core defines the following implementation specific registers Eight additional BATs BAT4 IBAT7 and DBAT4 DBAT7 providing better performance in protecting accesses on a segment block or page basis along with memory accesses and I O accesses See Figure 2 1 for a list of the SPR numbers for the BAT arrays Two critical interrupt registers CSRRO CSRR1 which are implementation specific The CSRRO and CSRRI1 registers support the critical interrupt function which have the same bit assignments as SRRO and SRR1 respectively The effective address for resuming program execution is saved into CSRRO a
308. equest and response tenures Split transaction bus A bus that allows address and data transactions from different processors to occur independently Stage The term stage is used in two different senses depending on whether the pipeline is being discussed as a physical entity or a sequence of events In the latter case a stage is an element in the pipeline during which certain actions are performed such as decoding the instruction performing an arithmetic operation or writing back the results Typically the latency of a stage is one processor clock cycle Some events such as dispatch write back and completion happen instantaneously and may be thought to occur at the end of a stage An instruction can spend multiple cycles in one stage An integer multiply for example takes multiple cycles in the execute stage When this occurs subsequent instructions may stall An instruction may also occupy more than one stage simultaneously especially in the sense that a stage can be seen as a physical resource for example when instructions are dispatched they are assigned a place in the CQ at the same time they are passed to the execute stage They can be said to occupy both the complete and execute stages in the same clock cycle Stall An occurrence when an instruction cannot proceed to the next stage Static branch prediction Mechanism by which software for example compilers can hint to the machine hardware about the direction a branch is
309. er 8 Instruction Set in the Programming Environments Manual e300 Power Architecture Core Family Reference Manual Rev 3 16 Freescale Semiconductor Memory Management Table 6 5 Instruction Summary MMU Control Instruction Description misr SR rS Move to Segment Register SR SR lt rS mtsrin rS rB Move to Segment Register Indirect SR rB 0 3 lt rS mfsr rD SR Move from Segment Register rD lt SR SR mfsrin rD rB Move from Segment Register Indirect rD lt SR rB 0 3 tlbie rB TLB Invalidate Entry For effective address specified by rB TLB V lt 0 The tlbie instruction invalidates both TLB entries indexed by the EA and operates on both the instruction and data TLBs simultaneously invalidating four TLB entries The index corresponds to bits 15 19 of the EA Software must ensure that instruction fetches or memory references to the virtual pages specified by the tlbie instruction have been completed prior to executing the tlbie instruction tlbsync TLB Synchronize Synchronizes the execution of all other tlbie instructions in the system In the e300 core when the tibisync signal is negated instruction execution may continue or resume after the completion of a tlbsync instruction When the t bisync signal is asserted instruction execution stops after the completion of a tlbsync instruction tlbli Load Instruction TLB Entry implementation Loads the contents of the I
310. er 7 Memory Management of the Programming Environments Manual 6 4 2 Page Memory Protection The e300 core implements page memory protection as it is defined in Chapter 7 Memory Management in the Programming Environments Manual 6 4 3 TLB Description This section describes the hardware resources provided in the e300 core to facilitate the page address translation process Note that the hardware implementation of the MMU is not specified by the architecture and while this description applies to the e300 core it does not necessarily apply to other processors of this family 6 4 3 1 TLB Organization Because the e300 core has two MMUs IMMU and DMMU that operate in parallel some of the MMU resources are shared and some are actually duplicated shadowed in each MMU to maximize performance Figure 6 7 shows the relationships between these resources within both the IMMU and DMMU and how the various portions of the effective address are used in the address translation process While both MMUs can be accessed simultaneously both sets of segment registers and TLBs can be accessed in the same clock when there is an exception condition only one interrupt is reported at a time ITLB miss interrupts are reported when there are no more instructions to be dispatched or retired the pipeline is empty Refer to Chapter 7 Instruction Timing for more detailed information about the internal pipelines and the reporting of interrupt
311. erations 3 2 4 3 9 Floating Point Load Instructions Separate floating point load instructions are used for single precision and double precision operands Because FPRs support only double precision format the FPU converts single precision data to double precision format before loading the operands into the target FPR This conversion is described fully in Floating Point Load Instructions in Appendix D Floating Point Models in the Programming Environments Manual Implementation Noute The PowerPC architecture defines load with update instructions with rA 0 as an invalid form however the core treats this case as a valid form Table 3 19 provides a list of the floating point load instructions Table 3 19 Floating Point Load Instructions Name Mnemonic Operand Syntax Load Floating Point Double Ifd frD d rA Load Floating Point Double Indexed Ifdx frD rA rB Load Floating Point Double with Update lfdu frD d rA Load Floating Point Double with Update Indexed Ifdux frD rA rB Load Floating Point Single Ifs frD d rA Load Floating Point Single Indexed lfsx frD rA rB e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 21 Instruction Set Model Table 3 19 Floating Point Load Instructions continued Name Mnemonic Operand Syntax Load Floating Point Single with Update Heu frD d rA Load Floating Point Single with Update Indexed Ifsux frD rA rB
312. erforms as if the cache were not locked A cache block invalidated by a snoop remains invalid until the cache is unlocked To prevent locking during a cache access a sync instruction must precede the setting of DLOCK 20 ICFI Instruction cache flash invalidate 0 The instruction cache is not invalidated The bit is cleared when the invalidation operation begins usually the next cycle after the write operation to the register The instruction cache must be enabled for the invalidation to occur 1 An invalidate operation is issued that marks the state of each instruction cache block as invalid Cache access is blocked during this time Setting ICFI clears all the valid bits of the blocks and the PLRU bits to point to way LO of each set For the e300 core the proper use of the ICFI and DCFI bits is to set and clear them with two consecutive mtspr operations 21 DCFI Data cache flash invalidate O The data cache is not invalidated The bit is cleared when the invalidation operation begins usually the next cycle after the write operation to the register The data cache must be enabled for the invalidation to occur 1 An invalidate operation is issued that marks the state of each data cache block as invalid without writing back modified cache blocks to memory Cache access is blocked during this time Bus accesses to the cache are signaled as a miss during invalidate all operations Setting DCFI clears all the valid bits of the blocks and the PLRU bi
313. ersion instructions may set this bit with the FPCC bits to indicate the class of the result Bits 16 19 comprise the floating point condition code FPCC Floating point compare instructions always set one of the FPCC bits to one and the other three FPCC bits to zero Arithmetic rounding and conversion instructions may set the FPCC bits with the C bit to indicate the class of the result Note that in this case the high order three bits of the FPCC retain their relational significance indicating that the value is less than greater than or equal to zero 16 Floating point less than or negative FL or lt 17 Floating point greater than or positive FG or gt 18 Floating point equal or zero FE or 19 Floating point unordered or NaN FU or Note that these are not sticky bits 20 Se Reserved should be cleared 21 VXSOFT Floating point invalid operation exception for software request This is a sticky bit This bit can be altered only by the merfs mtfsfi mtfsf mtfsb0 or mtfsb1 instructions 22 VXSQRT Floating point invalid operation exception for invalid square root This is a sticky bit 23 VXCVI_ Floating point invalid operation exception for invalid integer convert This is a sticky bit 24 VE Floating point invalid operation exception enable 25 OE IEEE floating point overflow exception enable 26 UE IEEE floating point underflow exception enable 27 ZE IEEE floating point zero divide exception enable 28 XE Floating p
314. erview 1 10 Condition register CR 1 18 Context synchronization 3 8 Conventions xxvi xxxii Copy back mode 7 25 Core interface accesses overview 8 7 checkstops 8 9 core quiesce control signals 8 9 external interrupts 8 8 IEEE 1149 1 compliant interface 8 9 reset inputs 8 9 signal groupings 8 1 8 2 signal summary 8 2 CR condition register logical instructions 3 23 overview 1 18 Critical input cint interrupt 1 28 5 4 enabling with MSR CE 5 13 5 30 CSRRaz critical interrupt save restore regs 0 1 2 10 2 11 2 21 2 22 5 8 5 10 5 15 5 16 5 17 CTR count register 2 6 D DABR DABR2 data address breakpoint regs 2 11 2 25 10 2 DAR data address register 2 10 5 23 10 2 10 3 Data access errors see DSI data storage interrupt Data cache see Caches Data errors see Machine check interrupt Data TLB miss on load interrupt 1 29 5 5 5 33 Data TLB miss on store interrupt 1 29 5 5 5 34 DBAT see Block address translation BAT DBATnU L data block address translation regs 0 7 upper lower 2 9 2 11 2 20 2 21 DBCR data address breakpoint control reg 2 11 2 27 10 2 debf 4 19 debi 4 19 dcbst 4 19 debt 4 18 debtst 4 18 dcbz 4 18 DCMP data TLB compare register 2 18 6 32 6 34 Debug facilities 1 33 debugging software 10 3 interrupt vectors for debugging 10 3 other debug resources 10 2 e300 Power Architecture Core Family Reference Manual Rev 3 Index 2 Freesc
315. ess must be translated to a physical address In the e300 core an MMU interrupt condition occurs if this translation fails for one of the following reasons e Page fault There is no valid page table entry to identify the page specified by the effective address and segment descriptor and there is no valid BAT translation e An address translation is found but the access is not allowed by the memory protection mechanism Additionally because the core relies on software to perform table search operations the processor also takes an interrupt when one of the following occurs e There is a miss in the corresponding instruction or data TLB e The page table requires an update to the change C bit The state saved by the processor for each of these interrupts contains information that identifies the address of the failing instruction Refer to Chapter 5 Interrupts and Exceptions for a more detailed description of interrupt processing Because a page fault condition PTE not found in the page tables in memory is detected by the software that performs the table search operation and not the core hardware it does not cause a core interrupt in the strictest sense in that interrupt processing as described in Chapter 5 Interrupts and Exceptions does not occur However in order to maintain architectural compatibility with software written for other devices that implement the PowerPC architecture the software that detects this co
316. eversing for data occurs when the data item is being moved to or from the GPR Therefore the byte reversal in little endian mode for load or store accesses occurs between memory or the data cache and the register files for the e300 core 3 1 3 Alignment and Misaligned Accesses The operand of a single register memory access instruction has a natural alignment boundary equal to the operand length In other words the natural address of an operand is an integral multiple of the operand length A memory operand is said to be aligned if it is aligned at its natural boundary otherwise it is misaligned For a detailed discussion about memory operands see Chapter 3 Operand Conventions in the Programming Environments Manual Operands for single register memory access instructions have the characteristics shown in Table 3 2 Although not permitted as memory operands quad words are shown because quad word alignment is desirable for certain memory operands Table 3 2 Memory Operands Operand Length sar Byte 8 bits XXXX Half word 2 bytes Xxx0 Word 4 bytes xx00 Double word 8 bytes x000 Quad word 16 bytes 0000 Note An xin an address bit position indicates that the bit can be 0 or 1 regardless of the state of other address bits The concept of alignment is also applied more generally to data in memory For example a 12 byte data item is said to be word aligned if its address is a multiple of four
317. execute complete deallocate stage at any one time 7 6 1 Branch Dispatch and Completion Unit Resource Requirements This section describes the specific resources required to avoid stalls during branch resolution instruction dispatching and instruction completion 7 6 1 1 Branch Resolution Resource Requirements The following is a list of branch instructions and the resources required to avoid stalling the fetch unit in the course of branch resolution e The belr instruction requires LR availability e The bectr instruction requires CTR availability e Branch and link instructions require shadow LR availability e The branch conditional on counter decrement and CR condition requires CTR availability or the CR condition must be false and the core cannot be executing instructions following an unresolved predicted branch when the branch is encountered by the BPU e The branch conditional on CR condition cannot be executed following an unresolved predicted branch instruction e300 Power Architecture Core Family Reference Manual Rev 3 26 Freescale Semiconductor 7 6 1 2 Instruction Timing Dispatch Unit Resource Requirements The following is a list of resources required to avoid stalls in the dispatch unit Note that the two dispatch buffers IQO and IQ1 are at the bottom of the instruction queue e Requirements for dispatching from QO are as follows Needed execution unit available Needed GPR rename registers available
318. f PMLCan CE and PMGCO PMIE are set an exception is signaled when PMCn reaches overflow Interrupts are masked by clearing MSR EE An exception may be signaled while MSR EE is cleared but the interrupt is not taken until it is set and only if the overflow condition is still present and the configuration has not been changed in the meantime to disable the exception e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Performance Monitor However if MSR EE remains clear until after the counter leaves the overflow state msb becomes 0 or if MSR EE remains clear until after PMLCan CE or PMGCO PMIE cleared the exception is not signaled The following sequence is recommended for setting counter values and configurations 1 Set PMGCO FAC to freeze the counters 2 Using mtpmr instructions initialize counters and configure control registers 3 Release the counters by clearing PMGCO FAC with a final mtpmr 11 2 6 User Performance Monitor Counter Registers UPMCO UPMC3 The contents of PMCO PMC3 are reflected to UPMCO UPMC3 which can be read by user level software with the mfpmr instruction using PMR numbers in Table 11 2 11 2 7 Performance Monitor Instructions The APU defines instructions for reading and writing the PMRs as shown in Table 11 6 Table 11 6 Performance Monitor APU Instructions Name Mnemonic Syntax Move from Performance Monitor Register mfpmr rD PMRN
319. fer to the Programming Environments Manual Note that the PowerPC architecture allows individual processors to determine whether an interrupt is required to handle various alignment conditions Similar to DSI interrupts alignment interrupts use the SRRO and SRR1 to save the machine state and the DSISR to determine the source of the interrupt The e300 core initiates an alignment interrupt when it detects any of the following conditions e The operand of a floating point load or store operation is not word aligned e The operand of an Imw stmw Iwarx or stwex instruction is not word aligned e A multiple or string access is attempted with the MSR LE bit set e The operand of a dcbz instruction is in a page that is write through or caching inhibited e The instruction is Iswi Iswx stswi stswx and the core is in little endian mode This applies to both PowerPC little endian in previous cores and true little endian mode for the e300 core Note that the e300 core does not support PowerPC modified little endian mode Note that the e300 core does not generate an alignment interrupt for a misaligned little endian access MSR LE 1 The register settings for alignment interrupts are shown in Table 5 16 The architecture does not support the use of a misaligned EA by Iwarx or stwex instructions If one of these instructions specifies a misaligned EA the interrupt handler should not emulate the instruction but should treat the occurrence as
320. few processor clock cycles Nap tThe nap mode further reduces power consumption by disabling bus snooping leaving only the time base register and the PLL in a powered state The core returns to the full power state upon receipt of an external asynchronous interrupt system management interrupt decrementer interrupt hard or soft reset or machine check input mcp signal A return to full power state from a nap state takes only a few processor clock cycles Sleep Sleep mode reduces power consumption to a minimum by disabling all internal functional units then external system logic may disable the PLL and sysclk Returning the core to the full power state requires the enabling of the PLL and sysclk followed by the assertion of an external asynchronous interrupt system management interrupt hard or soft reset or mcp signal after the time required to relock the PLL Note that the core cannot switch from one power management mode to another without first returning to full on mode The nap and sleep modes disable bus snooping therefore a hardware handshake using greq and qack is provided to ensure coherency before the core enters these power management modes Table 9 1 summarizes the four power states for the core Table 9 1 e300 Core Programmable Power Modes PM Mode Functioning Units Activation Method Full Power Wake Up Method Full power All units active Full power Requested logic by demand By instruction dispat
321. ffset hex Exception Conditions ISI 00400 An ISI interrupt is caused when an instruction fetch cannot be performed for any of the following reasons The effective logical address cannot be translated That is there is a page fault for this portion of the translation so an ISI interrupt must be taken to load the PTE and possibly the page into memory e The fetch access violates memory protection indicated by SRR1 4 set If the key bits Ks and Kp in the segment register and the PP bits in the PTE are set to prohibit read access instructions cannot be fetched from this location External interrupt 00500 An external interrupt is caused when MSR EE 1 and the int signal is asserted Alignment 00600 An alignment interrupt is caused when the core cannot perform a memory access for any of the reasons described below The operand of a floating point load or store instruction is not word aligned e The operands of Imw stmw wars and stwex instructions are not aligned e The operand of a load store load multiple store multiple load string or store string instruction crosses a protection boundary The instruction is Iswi Iswx stswi stswx and the core is in little endian mode Note that PowerPC little endian mode is not supported on the e300 core e The operand of dcebz is in memory that is write through required or caching inhibited Program Floating point unavailable 00700
322. for e300 related devices Table 2 3 Assigned PVR Values Register Model Device Name Version No Revision No MPC603r PID7 0x0007 0x1201 G2 core original 0x0081 0x0011 G2 core 0x8081 0x1010 G2_LE core general purpose 0x8082 0x1010 G2_LE core general purpose 0x8082 0x2010 MPC603e PID6 0x0006 0x0101 MPC603e PID7v 0x0007 0x0100 0x0201 e300c1 0x8083 0x0010 e300c2 0x8084 0x0010 e300c3 0x8085 0x0010 Space for future versions Machine state register MSR The MSR defines the state of the processor The MSR can be modified by the Move to Machine State Register mtmsr System Call sc and Return from Interrupt rfi and Return from Critical Interrupt rfci instructions It can be read by the Move from Machine State Register mfmsr instruction Figure 2 4 shows the machine state register MSR Access Supervisor read write 0 liz 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 r POW TGPR ILE EE PR FP ME FEO SE BE FE1 CE IP IR DR RILE Reset 0000_0040 or 0000_0000 or 0001_0041 or 0001_0001 Table 2 4 shows the MSR bit settings Table 2 4 MSR Bit Settings Figure 2 4 Machine State Register Bits Name Description o Reserved Full function 1 4 Reserved Partial function 5 9 Reserved Full function 10 12 _ Reserved
323. formation see Section 3 2 4 6 Processor Control Instructions Section 3 2 5 1 Processor Control Instructions and Section 3 2 6 2 Processor Control Instructions OEA e Memory synchronization instructions These are used for synchronizing memory accesses See Section 3 2 4 7 Memory Synchronization Instructions UISA and Section 3 2 5 2 Memory Synchronization Instructions VEA e Memory control instructions These provide control of caches TLBs and segment registers For more information see Section 3 2 5 3 Memory Control Instructions VEA and Section 3 2 6 3 Memory Control Instructions OEA e System linkage instructions These include the System Call sc and Return from Interrupt rfi instructions See Section 3 2 6 1 System Linkage Instructions e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Instruction Set Model Note that this grouping of instructions does not necessarily indicate the execution unit that processes a particular instruction or group of instructions This information which is useful in taking full advantage of the core superscalar parallel instruction execution is provided in Chapter 8 Instruction Set of the Programming Environments Manual Integer instructions operate on word operands Floating point instructions operate on single and double precision floating point operands PowerPC instructions are
324. from PTE 25 28 WIMG Memory cache access attribute bits 29 Reserved 30 31 PP Page protection bits from PTE e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 33 Memory Management 6 5 2 2 Software Table Search Operation When a TLB miss occurs the instruction or data MMU loads IMISS or DMISS with the effective address of the access The processor completes all instructions ahead of the instruction that caused the interrupt status information is saved in SRR1 and one of the three TLB miss interrupts is taken In addition the processor loads ICMP or DCMP with the value to be compared with the first word of PTEs in the tables in memory The software should then access the first PTE at the address pointed to by HASH 1 The first word of the PTE should be loaded and compared to the contents of DCMP or ICMP If there is a match the required PTE has been found and the second word of the PTE is loaded from memory into RPA Then the tlbli or tlbld instruction is executed which loads the contents of ICMP or DCMP and RPA into the selected TLB entry The TLB entry is selected by the effective address of the access and SRRI WAY If the comparison does not match the PTEG address is incremented to point to the next PTE in the table and the above sequence is repeated If none of the eight PTEs in the primary PTEG matches the sequence is then repeated using the secondary PTEG at the address contained in H
325. fy cache E None Don t care Modify cache and mark modified s RWITM No response Load data modify it and mark as modified s RWITM Retry signaled Retry the RWITM RWITM No response Load data modify it and mark as modified RWITM Retry signaled Retry the RWITM 1 MESI mode only The RWITM transactions involve selecting a replacement class and casting out modified data that may have resided in that replacement class 4 4 2 4 The following situations concerning coherency can be encountered within a single processor system Coherency in Single Processor Systems e Load or store to a caching inhibited page WIM Obx1x and a cache hit occurs Caching is inhibited for this page I 1 Load or store operations to a caching inhibited page that hit in the cache cause boundedly undefined results e300 Power Architecture Core Family Reference Manual Rev 3 12 Freescale Semiconductor Instruction and Data Cache Operation e Store to a page marked write through WIM 0b10x and a cache read hit to a modified cache block This page is marked as write through W 1 The core pushes the modified cache block to memory and the block remains marked modified M Note that when WIM bits are changed it is critical that the cache contents reflect the new WIM bit settings For example if a block or page that had allowed caching becomes caching inhibited software should ensure that the appropriate cache blocks are flu
326. g 4 35 4 40 branch registers count register CTR 2 6 link register LR 2 6 breakpoint registers 2 11 2 23 2 27 data address breakpoint DABR DABR2 2 11 2 25 data address breakpoint control DBCR 2 11 2 27 instruction address breakpoint control IBCR 2 11 2 24 instruction address breakpoint regs IABR IABR2 2 11 2 23 2 24 cache locking registers HIDn 4 33 configuration registers 2 6 2 9 2 12 2 16 hardware implementation regs HIDn 2 12 2 15 2 16 PLL configuration 2 15 machine state register MSR 2 7 2 11 processor version register PVR 2 6 system memory base address MBAR 2 11 2 23 system version register SVR 2 11 2 23 debug data address breakpoint registers DABR DABR2 10 2 data address control register DBCR 10 2 instruction address breakpoint registers JABR IABR2 10 1 instruction address control register IBCR 10 2 decrementer register DEC 2 10 e300 specific registers 2 10 2 27 floating point registers floating point regs 0 3 1 FPRn 2 3 floating point status and control reg FPSCR 2 3 general purpose registers GPRn 2 3 interrupt handling registers 5 8 5 14 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Index 7 critical interrupt save restore regs CSRRv 2 10 2 11 2 22 5 8 5 10 5 15 5 16 5 17 data address register DAR 2 10 5 23 10 2 10 3 DSI status register DSISR 2 10 5 1 10 2 10 3 floating p
327. g point unavailable interrupt occurs 1 The processor can execute floating point instructions and can take floating point enabled type program interrupts 19 ME Machine check enable 0 Machine check interrupts are disabled 1 Machine check interrupts are enabled 20 FEO Floating point interrupt mode 0 see Table 5 9 this bit is read only on the e300c2 core 21 SE Single step trace enable 0 The processor executes instructions normally 1 The processor generates a trace interrupt on the successful completion of the next instruction 22 BE Branch trace enable 0 The processor executes branch instructions normally 1 The processor generates a trace interrupt upon the successful completion of a branch instruction 23 24 FE1 CE Floating point interrupt mode 1 see Table 5 9 this bit is read only on the e300c2 core Critical interrupt enable O Critical interrupts disabled 1 Critical interrupts enabled critical interrupt and rfci instruction enabled The critical interrupt is an asynchronous implementation specific interrupt The critical interrupt vector offset is OxOOA00 The Return From Critical Interrupt rfci instruction is implemented to return from these interrupts Also CSRRO and CSRR1 are used to save and restore the processor state for critical interrupts 25 Interrupt prefix Specifies whether an interrupt vector offset is prepended with Fs or Os In the following description n
328. g resides and is executed e The second area is where the data to be locked resides Both areas of memory must be in locations that are translated by the memory management unit MMU This translation can be performed either with the page table or the block address translation BAT registers For the purposes of the cache locking example in this document the two areas of memory are defined using the BAT registers The first area is a 1 Mbyte area in the upper region of memory that contains the code performing the cache locking The second area is a 256 Mbyte block of memory not all of the 256 Mbytes of memory is locked in the cache this area is set up as an example that contains the data to lock Both memory areas use identity translation the logical memory address equals the physical memory address Table 4 14 summarizes the BAT settings used in this example Table 4 14 Example BAT Settings for Cache Locking Area Base Address Memory Size WIMG Bits BATU Setting BATL Setting First OxFFFO_0000 1 Mbyte 0b0100 OxFFFO_001F OxFFFO_0002 Second 0x0000_0000 256 Mbyte 0b0000 0x0000_1FFF 0x0000_0002 1 Caching inhibited memory is not a requirement for data cache locking A setting of OxFFFO_0002 with a corresponding WIMG of 0b0000 marks the memory area as caching allowed The block address translation upper BATU and block address translation lower BATL settings in Table 4 14 can be used for both instruction bl
329. g software execution and then recoding algorithms for more efficiency For example memory hierarchy behavior can be monitored and analyzed to optimize task scheduling or data distribution algorithms e Characterize processors in environments not easily characterized by benchmarking e Help system developers bring up and debug their systems The performance monitor uses the following resources e The performance monitor mark bit in the MSR MSR PMM This bit controls which programs are monitored e The move to from performance monitor registers PMR instructions mtpmr and mfpmr e The external input pm_event_in e PMRs e300 Power Architecture Core Family Reference Manual Rev 3 14 Freescale Semiconductor Overview The performance monitor counter registers PMCO PMC3 are 32 bit counters used to count software selectable events Each counter counts up to 128 events UPMCO UPMC3 provide user level read access to these registers They are identified in Table 1 4 The performance monitor global control register PMGC0 controls the counting of performance monitor events It takes priority over all other performance monitor control registers UPMGCO provides user level read access to PMGCO The performance monitor local control registers PMLCa0 PMLCa3 control each individual performance monitor counter Each counter has a corresponding PMLCa register UPMLCa0 UPMLCa3 provide user level read access to PMLCa0
330. g tees sched EE 5 8 Interrupt Processing RESIS LETS deeg eege Aere 5 8 SRRO and SO RR il SOROS test Eeer ed een 5 8 CSRRO and CSRRYT Bit Settings deeg aes 5 10 SPRGO SPRG 7 EEN 5 11 MSR Bit Setting EE 5 12 Enabling and Disabling Exceptions and Interrupts 5 14 Steps for Interrupt Processing seni Meee te A Rise a E i iaia 5 15 EAR WEE 5 15 Returning from an Interrupt with Wily eege teg 5 15 Returning from an Interrupt with gf 5 16 Process SWIC ge sisonu jeceay sapeaaiavedsacs sastadgs dabedeaga R E GEN at 5 16 Interr pt TEE EE Ee 5 16 Iiterrupt KEE 5 17 Reset Intemupts OxO0100 22 0 0 ei oie eh Ba EE a 5 18 Hard Reset and NA 5 18 Soft RESC EE 5 19 Byte Ordering Considerations net eege Eeer 5 20 Machine Cheek Interrupt Ox002 00 s 3 5 ces cascceassiveseses sonnets deet Ee 5 21 Machine Check Interrupt Enabled MSR ME IN 5 22 Checkstop State MSRIME EE 5 22 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor ix Paragraph Page Number Title Number 5 5 3 DSI Interrupt 0X003 AA E 5 23 5 5 4 EST Interrupt Ox 00400 n nieret aaiae aeti i enat 5 24 5 5 5 External Interrupt 0x00500 eege ergeet leede Eege 5 25 5 5 6 Alignment Interrupt 0x00600 i202 pee depth nda neld ee de 5 25 5 5 6 1 Integer Alignment Oe 5 27 5 5 6 2 Load Store Multiple Alignment Exceptions c cccesseecesceceeeeceeeeeeeeeeeestneeenaeees 5 28 5 5 7 Program Interrupt 0x00700 sessssesesseessenssessssess
331. g the number of bytes to be transferred by a Load String Word Indexed Iswx or Store String Word Indexed stswx instruction 1 3 1 2 VEA Registers The VEA introduces the time base facility TB for reading The TB is a 64 bit register pair whose contents are incremented once every four core input clock cycles The TB consists of two 32 bit registers time base upper TBU and time base lower TBL Note that the time base registers are read only in user state 1 3 1 3 OEA Registers OEA registers are supervisor level registers that include the following 1 3 1 3 1 Machine State Register MSR The MSR is a supervisor level register that defines the state of the core The contents of this register are saved when an interrupt is taken and restored when the interrupt handling completes A critical interrupt interrupt is taken in the e300 core when the cint signal is asserted and MSR CE is set The e300 core implements the MSR as a 32 bit register 1 3 1 3 2 Segment Registers SRs For memory management 32 bit processors implement sixteen 32 bit SRs To speed access the core implements the SRs as two arrays a main array for data memory accesses and a shadow array for instruction memory accesses Loading a segment entry with the Move to Segment Register mtsr instruction loads both arrays 1 3 1 3 3 Supervisor Level SPRs The e300 core like the G2_LE core has additional supervisor level SPRs which are shown in Figure 1 4 Two cri
332. g this instruction execute in the context established by this instruction For a complete description of context synchronization refer to Chapter 6 Interrupts in the Programming Environments Manual 5 2 6 Returning from an Interrupt with rfci The Return From Critical Interrupt rfci is a e300 core only supervisor level instruction that performs context synchronization by allowing previously issued instructions to complete before returning to the interrupted process The rfci instruction performs the same functions as rfi except that it uses CSRRO and CSRRI to restore the processor state Thus execution of the rfci instruction ensures the following e CSRRI 0 5 9 16 31 are placed into the corresponding bits of the MSR If the new MSR value does not enable any pending interrupts the next instruction is fetched from the address defined by CSRRO 0 29 II 0b00 s Ifthe new MSR value enables one or more pending interrupts the interrupt associated with the highest priority pending interrupt is generated In this case the interrupt processing mechanism places in SRRO the address of the instruction which would have executed next had the interrupt not occurred 5 3 Process Switching The operating system should execute one of the following when processes are switched e The sync instruction which orders the effects of instruction execution All instructions previously initiated appear to have completed before the sync instruction comp
333. ger and floating point load and store instructions Integer load and store instructions Integer load and store multiple instructions Floating point load and store Primitives used to construct atomic memory operations lwarx and stwex instructions Flow control instructions These include branching instructions condition register logical instructions trap instructions and other instructions that affect the instruction flow Branch and trap instructions Condition register logical instructions Processor control instructions These instructions are used for synchronizing memory accesses and management of caches TLBs and the segment registers Move to from SPR instructions Move to from MSR e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor EN Overview Move to from PMR Synchronize Instruction synchronize e Memory control instructions These instructions provide control of caches TLBs and segment registers Supervisor level cache management instructions Translation lookaside buffer management instructions Note that there are additional implementation specific instructions User level cache instructions Segment register manipulation instructions e The e300 core implements the following instructions which are defined as optional by the PowerPC architecture Floating Select fsel Floating Reciprocal Estimate S
334. gisters PMCO PMC3 User Performance Monitor Counter Registers UPMCO UDMCH cc eeeeeeeesreeeeteees 11 6 e300 Power Architecture Core Family Reference Manual Rev 3 xvi Freescale Semiconductor 2 6 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 3 10 3 11 3 13 3 14 3 15 3 16 3 17 3 18 3 19 3 20 3 21 3 22 Tables Page Title Number Interrupt Classifications sesser erines aE a eria AEE S TERREA n EE ai 1 27 Exceptions and MILER EE 1 27 Differences Between e300 and G2_LE Cores 5 cad sscceaswsscondganntes aeactaee ieacusunane 1 34 Differences Between C500 C Ores ere 1 36 PSG RoBi EE 2 4 Architectural PVR Field Descriptions icieissesiaccvessscatsscevedcadegecedysussacaaasesaccsansscnensgsbateadegedenes 2 6 Assiened PVR ET 2 7 MSR Bit Settings EE 2 7 E300 HIDO Field Des Crip eons ares e sa eke alee oa ose sadn ee 2 12 HIDO SBCLK and HIDO ECLK elt our Configuration eeceeseeeeseeeeeececeeeeeeeteeeenaes 2 15 UD TB EE Salat teenie ed 2 15 300 HID Field DeSCri Te 2 16 DCMP and ICMP Bit Sete os ccisacicccvsndcceaseasadeedanidccevdessanta NEE secsaavelendacesvasedsteanuvees 2 19 HASH and HASH Bit Settings vc ccstaiecwcnsiss saichas on uctded dueatantcesg sued sucscceenasedacasunse ides eceeesaecedaees 2 19 RPA Bit Setting lt 5 Sic ee ee 2 20 System Version Register SVR Bit Settings eege eege ee E 2 23 Instruction Address Breakpoint Register ABR a
335. gnal asserted and the vector taken A system management interrupt is signaled to the e300 core by the assertion of the smi signal The interrupt may not be recognized if a higher priority interrupt occurs simultaneously or if MSR EE is cleared when smi is asserted Note that smi takes priority over int if they are recognized simultaneously After the smi is detected and provided that MSR EE is set the e300 core generates a recoverable halt to instruction completion The e300 core requires the next instruction in program order to complete or except block completion of any following instructions and allow the completed store queue to drain see Section 7 1 Terminology and Conventions for the definition If any higher priority interrupts are encountered in this process they are taken first and the system management interrupt is delayed until a recoverable halt is achieved At this time the e300 core saves state information and takes the system management interrupt The register settings for the external interrupt are shown in Table 5 24 Table 5 24 System Management Interrupt Register Settings Register Setting Description SRRO Set to the effective address of the instruction that the processor would have attempted to complete next if no interrupt conditions were present SRR1 0 15 Cleared 16 31 Loaded from MSR 16 31 MSR POW 0 FP 0 FE1 0 RI 0 TGPR 0 ME CE LE Set to value of ILE ILE FEO 0 IP EE 0 SE
336. gy Throughout this document the terms e300 core core and processor are used interchangeably The terms e300c1 e300c2 and e300c3 are used when describing an implementation specific feature or when a difference exists between different configurations The term e300 is used when describing a feature that pertains to the family of e300 processors 1 1 Overview This section describes the details of the e300 core provides a block diagram showing the major functional units see Section 3 1 2 Endian Modes and Byte Ordering and briefly describes how these units interact All differences between the e300 and previous PowerPC implementations derived from the MPC603e processor are noted See Section 1 5 Differences Between e300 Cores for a differences description of the e300 core configurations The e300 core is a low power implementation of this microprocessor family of reduced instruction set computing RISC microprocessors The core implements the 32 bit portion of the PowerPC architecture which defines 32 bit effective addresses integer data types of 8 16 and 32 bits and floating point data types of 32 and 64 bits The core is a superscalar processor that can issue and retire as many as three instructions per clock cycle Instructions can execute out of program order for increased performance however the core makes completion appear sequential The e300 core integrates independent executio
337. h Og ee ee Completion Queue Dies ee ee in Program Order Maximum Two Instruction Completion per Clock Cycle Complete Retire Figure 7 2 Instruction Flow Diagram for the e300c1 e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Instruction Timing Fetch Maximum Two Instruction Fetch per Clock Cycle Branch Processing Unit Instruction Queue 105 gt EE ESCH KOSCH KIES on in Program Order Maximum Two Instruction Dispatch per Clock Cycle Dispatch ee ee ee E Se Completion Buffer Assignment 4 a T q Reservation 4 Stations S a l L l l l SRU 2 Entry esi Store Queue Finish LE Completion Queue Diptera in Program Order Maximum Two lInstruction Completion per Clock Cycle Complete Retire Figure 7 3 Instruction Flow Diagram for the e300c2 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Instruction Timing Figure 7 4 shows a block diagram of the e300c3 Note that the e300c3 support floating point operations and includes two integer units 64 Bit Two Instructions Branch Processing Unit Instruction Queue System Register Load Store Floating Unit Point Unit Completion Unit D MMU Completes up to SRs two instructions DBAT per clock Array DTLB Power Time Base Dissipation Cou
338. h Signals that are not active low such as then time base enable and ckstp checkstop interrupt are referred to as asserted when they are high and negated when they are low Signal Groupings A subset of the selected internal e300 coherent system bus CSB core signals is grouped as follows Interrupts resets These signals include the external interrupt signal checkstop signals performance monitor interrupt on e300c3 and both soft reset and hard reset signals They are used to interrupt and under various conditions to reset the core JTAG debug interface signals The JTAG IEEE 1149 1 compliant interface and debug unit provides a serial interface to the system for performing monitoring and boundary tests Core status and control These signals include the memory reservation signal machine quiesce control signals time base enable signal and the tlbisync signal Clock control These signals provide for system clock input and frequency control Test interface signals Signals such as address matching combinational matching and watchpoint are used in the core for production testing Transfer attribute signals These signals provide information about the type of transfer such as the transfer size and whether the transaction is bursted write through or cache inhibited e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Core Interface Operation 8 1 1 Functional Groupi
339. h processor state are shown in Table 11 7 Table 11 7 Processor States and PMLCa0 PMLCa3 Bit Settings Processor State FCS FCU FCM1 FCMO Marked 0 0 0 1 Not marked 0 0 1 0 Supervisor 0 1 0 0 User 1 0 0 0 Marked and supervisor 0 1 0 1 Marked and user 1 0 0 1 Not marked and supervisor 0 1 1 0 Not mark and user 1 0 1 0 All 0 0 0 0 None X X 1 1 None 1 1 X X Two unconditional counting modes may be specified e Counting is unconditionally enabled regardless of the states of MSR PMM and MSR PR This can be accomplished by clearing PMLCan FCS PMLCan FCU PMLCan FCM 1 and PMLCan FCMO for each counter control e Counting is unconditionally disabled regardless of the states of MSR PMM and MSR PR This can be accomplished by setting PMGCO FAC or by setting PMLCan FC for each counter control Alternatively this can be accomplished by setting PMLCan FCM1 and PMLCan FCMO for each counter control or by setting PMLCan FCS and PMLCan FCU for each counter control e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Performance Monitor 11 5 Performance Monitor Application Examples The following sections provide examples of how to use the performance monitor facility 11 5 1 Chaining Counters The counter chaining feature can be used to decrease the processing pollution caused by performance monitor interrupts things like cache
340. has finished executing e Folding branch folding The replacement of a branch instruction with target instructions and any instructions along the not taken path when a branch is either taken or predicted as taken e Latency The number of clock cycles necessary to execute an instruction and make ready the results of that execution for a subsequent instruction e Pipeline In the context of instruction timing the term pipeline refers to the interconnection of the stages The events necessary to process an instruction are broken into several cycle length tasks to e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Instruction Timing allow work to be performed on several instructions simultaneously analogous to an assembly line As an instruction is processed it passes from one stage to the next When it does the stage becomes available for the next instruction Although an individual instruction may take many cycles to complete the number of cycles is called instruction latency pipelining makes it possible to overlap the processing so that the throughput number of instructions completed per cycle is greater than if pipelining were not implemented e Program order The order of instructions in an executing program More specifically this term is used to refer to the original order in which program instructions are fetched into the instruction queue from the cache e Rename register Tem
341. have finished execution but have not completed Reservation The processor establishes a reservation on a cache block of memory space when it executes an Iwarx instruction to read a memory semaphore into a GPR Reservation station A buffer between the dispatch and execute stages that allows instructions to be dispatched even though the results of instructions on which the dispatched instruction may depend are not available Retirement Removal of the completed instruction from the CQ RISC reduced instruction set computing An architecture characterized by fixed length instructions with nonoverlapping functionality and by a separate set of load and store instructions that perform memory accesses Scan interface The e300 test interface Secondary cache A cache memory that is typically larger and has a longer access time than the primary cache A secondary cache may be shared by multiple devices Also referred to as L2 or level 2 cache Set v To write a nonzero value to a bit or bit field the opposite of clear The term set may also be used to generally describe the updating of a bit or bit field Set n A subdivision of a cache Cacheable data can be stored in a given location in one of the sets typically corresponding to its lower order address bits Because several memory locations can map to the same location cached data is typically placed in the set whose cache block corresponding to that address was used least rece
342. he cast out buffer cause retry to be signaled e Snoop attempt during the tag allocation period from debz instruction or load or store operations During the execution of a debz instruction or during a load or store operation that requires a cache line cast out the cache tags are inaccessible during the first and last cycle of the operation The period of any associated cast out due to reallocation overlaps the tag allocate period e Snoop attempt during the cycle when a debf debst or debi instruction is updating the tag If the EA of a debf or debst instruction hits in the cache the tag will be changed to its new state During that clock the tag is not accessible and snoop transactions during that cycle cause retry to be signaled e Snoop hits in the data load buffer for a burst read while the data is still being transferred from memory that is data is in transit e Other internal resource collisions While the e300 core provides the hardware required to monitor bus traffic for coherency the core data cache tags are single ported and a simultaneous load or store and snoop access represent a resource conflict In general the snoop access has highest priority and is given first access to the tags A pending load or store access will then occur on the clock following the snoop However the snoop is not given priority into the tags when the snoop coincides with a tag write for example validation after a cache block load In these situations
343. he e300 core is described in Chapter 7 Memory Management in the Programming Environments Manual for 32 bit implementations However note that for improved performance the e300 core contains twice as many BAT registers as previous PowerPC cores as shown in Figure 6 2 and Figure 6 3 Implementation Note The BAT registers are not initialized by the hardware after the power up or reset sequence Consequently all valid bits in both instruction and data BAT areas must be explicitly cleared before setting any BAT area for the first time and before enabling translation Also note that software must avoid overlapping blocks while updating a BAT area or areas Even if translation is disabled multiple BAT area hits with the valid bits set can corrupt the remaining portion any bits except the valid bits of the BAT registers Thus multiple BAT hits with valid bits set are considered a programming error whether translation is enabled or disabled and can lead to unpredictable results if translation is enabled or if translation is disabled when translation is eventually enabled For the case of unused BATs if translation is to be enabled it is sufficient precaution to simply clear the valid bits of the unused BAT entries 6 4 Memory Segment Model The core adheres to the memory segment model as defined in Chapter 7 Memory Management in the Programming Environments Manual for 32 bit implementations Memory in the OEA is divided int
344. he msrip signal as described in Table 5 11 without attempting to reach a recoverable state Table 5 11 Hard Reset MSR Value and Interrupt Vector f Fetch Instructions from Handler marp MSR 0 31 at System Reset Vector Asserted 0x0000_0040 MSR IP 1 OxFFFO_0100 Negated 0x0000_0000 MSR IP 0 0x0000_0100 A hard reset has the highest priority of any interrupt and is always nonrecoverable Table 5 12 shows the state of the machine just before it fetches the first instruction of the system reset handler after a hard reset e300 Power Architecture Core Family Reference Manual Rev 3 18 Freescale Semiconductor Interrupts and Exceptions Table 5 12 Settings Caused by Hard Reset Register Setting Register Setting GPRs Unknown PVR See Table 2 2 FPRs Unknown HIDO 0000_0000 FPSCR 00000000 HID1 0000_0000 CR All Os HID2 0000_0000 or 0800_0000 SRs Unknown DMISS and IMISS_ All Os MSR 0000_0040 or 0000_0000 or DCMP and ICMP All Os 0001_0041 or 0001_0001 XER 0000_0000 RPA All Os TBU 0000_0000 IABR All Os TBL 0000_0000 DSISR 0000_0000 LR 0000_0000 DAR 0000_0000 CTR 0000_0000 DEC FFFF_FFFF SDR1 0000_0000 HASH1 0000_0000 SRRO and CSRRO 0000_0000 HASH2 0000_0000 SRR1 and CSRR1 0000_0000 TLBs Unknown SPRGs 0000_0000 Cache All cache blocks invalidated Tag directory All Os However LRU bits are BATs Unknown initialized so each side of the cache has a uniq
345. how the FPU pipeline functions Note that floating point instructions are not supported on the e300c2 The following instruction sequence is examined add fadd add fadd br 6 fsub fadd fadd add add and VD DJ Ch in P GA H ra CO ee ka and fadd add fadd ken Fe J Ch LN E D FA e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 11 Instruction Timing 9 10 14 riicht L L pa k ei ols E H Fetch in IQ In Dispatch Entry IQ0 IQ1 a Execute Complete In CQ __ In Retirement Entry CQ0 CQ1 12 fadd 13 add 14 fadd Instruction Queue 11 14 10 12 13 14 ee EE 9 9 11 12 13 aI T IL 1 3 5 7 8 8 10 11 12 13 0 2 4 6 7 7 9 11 12 14 Completion Queue 11 13 Ia del I Troa 10 10 CIE EIN 2 3 3 8 8 9 9 11 13 14 14 1 1 2 2 6 7 7 8 8 10 12 13 13 0 0 1 1 3 6 6 7 7 9 11 12 12 14 Figure 7 6 Instruction Timing Cache Hit e300 Power Architecture Core Family Reference Manual Rev 3 12 Freescale Semiconductor Instruction Timing The instruction timing for this example is de
346. ical load completes The e300 core supports instruction fetching from other instruction cache lines following the forwarding of the critical first double word of a cache line load operation Successive instruction fetches from the cache line being loaded are forwarded and accesses to other instruction cache lines can proceed during the cache line e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 23 Overview load operation The instruction cache is not snooped and cache coherency must be maintained by software A fast hardware invalidation capability is provided to support cache maintenance The organization of the instruction cache for the e300c lis very similar to the data cache shown in Figure 1 5 128 Sets Block 0 Block 1 Address Tag 1 Block 2 Address Tag 2 Block 3 Address Tag 3 Block 4 Address Tag 4 Words 0 7 Block 5 Address Tag 5 Words 0 7 Block 6 Address Tag 6 Words 0 7 Block 7 Address Tag 7 Words 0 7 Figure 1 5 e300c1 Data Cache Organization The e300c2 and e300c3 data cache is configured as 128 sets of four blocks per set The organization of the data cache is shown in Figure 1 6 Block 1 Words 0 7 Block 2 Address Tag 2 Words 0 7 Block 3 Address Tag 3 Words 0 7 e 8 Words Block e Figure 1 6 e300c2 and e300c3 Data Cache Organizatio
347. ics of the memory model that can be assumed by software processes and includes descriptions of the cache model cache control instructions address aliasing and other related issues 3 2 5 1 Processor Control Instructions The VEA defines the Move from Time Base mftb instruction for reading the contents of the time base register The mftb is a user level instruction as shown in Table 3 26 Table 3 26 Move From Time Base Instruction Name Mnemonic Operand Syntax Move from Time Base mftb rD TBR Simplified mnemonics are provided for the mftb instruction so it can be coded with the TBR name as part of the mnemonic rather than requiring it to be coded as an operand The mftb instruction serves as both a basic and simplified mnemonic Assemblers recognize an mftb mnemonic with two operands as the basic form and an mftb mnemonic with one operand as the simplified form Simplified mnemonics are also provided for Move from Time Base Upper mftbu a variant of the mftb instruction rather than of mfspr The core ignores the extended opcode differences between mftb and mfspr by ignoring bit 25 of both instructions and treating them identically Refer to Appendix F Simplified Mnemonics in the Programming Environments Manual e300 Power Architecture Core Family Reference Manual Rev 3 26 Freescale Semiconductor Instruction Set Model 3 2 5 2 Memory Synchronization Instructions VEA Memory synchronization ins
348. idates the cache in a single cycle on e300c however on e300c2 and e300c3 ICFI is a 128 cycle sequential reset of the cache Flash invalidation of the instruction cache is accomplished by setting ICFI and subsequently clearing ICFI in two consecutive mtspr HIDO instructions The instruction cache is automatically invalidated when the core is powered up and during a hard reset However a soft reset does not automatically invalidate the instruction cache Software must set and clear HIDO ICFI to invalidate the entire instruction cache after a soft reset e300 Power Architecture Core Family Reference Manual Rev 3 16 Freescale Semiconductor Instruction and Data Cache Operation 4 5 1 10 Instruction Cache Way Protect HID2 ICWP Typically instruction cache management routines rely on icbi instructions or flash invalidate HIDO ICFI operations to clear out part or all of the instruction cache for program related operations such as for process changes or MMU page deallocation Often these cache management operations clear out both the locked and unlocked portions of the instruction cache icbi invalidates all ways of a cache set unconditionally and HIDO ICFI invalidates the entire instruction cache unconditionally The e300 core has a new instruction cache way protection feature By setting HID2 ICWP 1 any locked ways in the instruction cache are protected from cache block icbi or flash HIDO ICFI invalidation This allows an
349. if found pte bdnzf 0 ceql dec count br if cmp ne and if count not zero bne cEq0SecHash if not found set up second hash or exit T rr Alre2 load tlb entry lower word andi r3 r1 0x80 check the C bit beq cEqOChkProt if C 0 go check protection modes ceq2 mtctr ro restore counter mfspr r0 dMiss get the miss address for the tlbld mfspr er seri get the saved cr0 bits mtcrf 0x80 r3 restore CRO mtspr rpa rl set the pte tlbld ro load the dtlb rfi return to executing program e300 Power Architecture Core Family Reference Manual Rev 3 42 Freescale Semiconductor Register usage rO is saved counter rl is junk r2 is pointer to pteg r3 is current compare value cEqO0SecHash andi rl r3 0x0040 bne doDSI mfspr r2 hash2 ori r3 r3 0x0040 addi ST fra lt 8 addi t2 s 58 D ceq0 entry found and PTE c bit 0 Register usage check protection before setting PT Memory Management S if we have done second hash if so go to DSI interrupt get the second pointer change the compare value load 8 for counter pre dec for update on load try second hash E c bit rO is saved counter rl is PTE entry r2 is pointer to pteg r3 is trashed cEqOChkProt rlwinm r3 r1 30 0 1 test PP bge chkO if PP 00 or PP 01 goto chk0 andi ES tate test PP 0 Dech chk2 return if PP 0 0 b doDSIp else DSIp chko mfspr ESS L get old msr andi
350. ific Instructions bectrx 19 BO BI 00000 528 LK belrx 19 BO Bl 00000 16 LK crand 19 crbD crbA crbB 257 0 crandc 19 crbD crbA crbB 129 0 creqv 19 crbD crbA crbB 289 0 crnand 19 crbD crbA crbB 225 0 crnor 19 crbD crbA crbB 33 0 cror 19 crbD crbA crbB 449 0 crorc 19 crbD crbA crbB 417 0 crxor 19 crbD crbA crbB 193 0 isync 19 00000 00000 00000 150 0 mert 19 crfD 00 crfS 00 00000 0 0 rfi 1 19 00000 00000 00000 50 0 i Supervisor level instruction Table A 37 XFX Form Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD D spr XO 0 OPCD D 0 CRM 0 XO 0 OPCD S spr XO 0 OPCD D tbr XO 0 Specific Instructions mfspr 31 D spr 339 0 mftb 31 D tbr 371 0 mtcrf 31 S 0 CRM 0 144 0 mtspr 31 D spr 467 0 1 Supervisor and user level instruction e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 31 Instruction Set Listings Table A 38 XFL Form Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD 0 FM 0 B XO Re Specific Instructions mtfsfx 63 0 FM 0 B 711 Rc Table A 39 XS Form Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 OPCD S A sh XO eh He Specific Instructions sradix 31 S A sh 413 eh Be 1 64 bit instruction Table A 40 XO Form
351. ific interrupts 2 SRRI1 1 4 10 15 are loaded with information specific to the interrupt type 3 SRR1 5 9 16 31 are loaded with a copy of the corresponding bits of the MSR The MSR is set as described in Table 5 8 The new values take effect beginning with the fetching of the first instruction of the interrupt handler routine located at the interrupt vector address Note that MSR IR and MSR DR are cleared for all interrupt types therefore address translation is disabled for both instruction fetches and data accesses beginning with the first instruction of the interrupt handler routine 5 Instruction fetch and execution resumes using the new MSR value at a location specific to the interrupt type The location is determined by adding the interrupt s vector see Table 5 2 to the base address determined by MSR IP If IP is cleared interrupts are vectored to the physical address 0x000n_nnnn If IP is set interrupts are vectored to the physical address OxFFFn_nnnn For a machine check interrupt that occurs when MSR ME 0 machine check interrupts are disabled the processor enters the checkstop state the machine stops executing instructions See Section 5 5 2 Machine Check Interrupt 0x00200 Note that the same steps occur when a critical interrupt occurs and is enabled for the e300 core except that CSRRO is set instead of SRRO and CSRR1 is set instead of SRR1 5 2 4 Setting MSR RI The operating system should handle MS
352. ifies the type of instruction Program order The order of instructions in an executing program More specifically this term is used to refer to the original order in which program instructions are fetched into the instruction queue from the cache e300 Power Architecture Core Family Reference Manual Rev 3 Glossary 8 Freescale Semiconductor Protection boundary A boundary between protection domains Protection domain A protection domain is a segment a virtual page a BAT area or a range of unmapped effective addresses It is defined only when the appropriate relocate bit in the MSR IR or DR is 1 Quiesce To come to rest The processor is said to quiesce when an interrupt is taken or a sync instruction is executed The instruction stream is stopped at the decode stage and executing instructions are allowed to complete to create a controlled context for instructions that may be affected by out of order parallel execution See Context synchronization Quiet NaN A type of NaN that can propagate through most arithmetic operations without signaling interrupts A quiet NaN is used to represent the results of certain invalid operations such as invalid arithmetic operations on infinities or on NaNs when invalid See Signaling NaN rA The rA instruction field is used to specify a GPR to be used as a source or destination rB The rB instruction field is used to specify a GPR to be used as a source rD The rD instruction field i
353. ime base transition events are disabled 1 Exceptions from time base transition events are enabled A time base transition is signalled to the performance monitor if the TB bit specified in PMGCO TBSEL changes from 0 to 1 Time base transition events can be used to freeze counters PMGCO FCECE or signal an exception PMGCO PMIE Changing PMGCO TBSEL while PMGCO TBEE is enabled may cause a false 0 to 1 transition that signals the specified action freeze exception to occur immediately Although the interrupt signal condition may occur with MSR EE 0 the interrupt cannot be taken until MSR EE 1 12 31 Reserved should be cleared 11 2 2 User Global Control Register 0 UPMGCO The contents of PMGCO0 are reflected to UPMGCO which can be read by user level software UPMGCO can be read with the mfpmr instruction using PMR384 11 2 3 Local Control A Registers PMLCa0 PMLCa3 The local control A registers PMLCa0 PMLCa3 function as event selectors and give local control for the corresponding performance monitor counters PMLCa works with the corresponding PMLCb register PMLCa registers are shown in Figure 11 2 PMLCa0 PMR144 UPMLCa0 PMR128 Access PMLCa0 PMLCa3 Supervisor only PMLCa1 PMR145 UPMLCa1 PMR129 UPMLCa0 UPMLCas3 Supervisor user read only PMLCa2 PMR146 UPMLCa2 PMR130 PMLCa3 PMR147 UPMLCa3 PMR131 0o 1 2 3 4 5 6 8 9 15 16 31 R W FC FCS
354. imultaneously Therefore the core is described as having two MMUs one for instruction accesses MMU and one for data accesses DMMU The block address translation BAT mechanism is a software controlled array that stores the available block address translations on chip BAT array entries are implemented as pairs of BAT registers that are accessible as supervisor level special purpose registers SPRs There are separate instruction and data BAT mechanisms and in the e300 core they reside in the instruction and data MMUs respectively The MMUs together with the interrupt processing mechanism provide the necessary support for the operating system to implement a paged virtual memory environment and for enforcing protection of designated memory areas Interrupt processing is described in Chapter 5 Interrupts and Exceptions Section 5 2 Interrupt Processing describes the MSR which controls some of the critical functionality of the MMUs e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Memory Management 6 1 MMU Features The e300 core completely implements all features required by the memory management specification of the OEA for 32 bit implementations Thus it provides 4 Gbytes of effective address space accessible to supervisor and user programs with a 4 Kbyte page size and 256 Mbyte segment size In addition the MMwUs of 32 bit processors use an interim virtual address 52 bits
355. in 2 mtfsb0x mtfsb1x mtfsfix mtmsr 2 mtsr 2 mtsrin 2 nandx norx orx orcx slbia t 24 slbie 2 4 sidx slwx sradx Table A 35 X Form continued 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Instruction Set Listings wo bi 31 D A B 375 0 31 D A B 343 0 31 D A B 790 0 31 D A B 311 0 31 D A B 279 0 31 D A NB 597 0 31 D A B 533 0 31 D A B 20 0 31 D A B 373 0 31 D A B 341 0 31 D A B 534 0 31 D A B 55 0 31 D A B 23 0 63 crfD 00 crfS 00 00000 64 0 31 crfD 00 00000 00000 512 0 31 D 00000 00000 19 0 63 D 00000 00000 583 Re 31 D 00000 00000 83 0 31 D 0 SR 00000 595 0 31 D 00000 B 659 0 63 crbD 00000 00000 70 Re 63 crfD 00000 00000 38 Re 63 crbD 00 00000 IMM 134 Re 31 S 00000 00000 146 0 31 S 0 SR 00000 210 0 31 S 00000 B 242 0 31 S A B 476 Rc 31 S A B 124 Rc 31 S A B 444 Rc 31 S A B 412 Rc 31 00000 00000 00000 498 0 31 00000 00000 B 434 0 31 S A B 27 Re 31 S A B 24 Rc 31 S A B 794 Re e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 29 Instruction Set Listings Table A 35 X Form continued Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 srawx 31 S A
356. in the case of a TLB miss However neither of these instructions causes the C bit to be set e300 Power Architecture Core Family Reference Manual Rev 3 20 Freescale Semiconductor Memory Management As defined by the PowerPC architecture the reference and change bits are updated as if address translation were disabled real addressing mode translation Additionally these updates should be performed with single beat read and byte write transactions on the bus 6 4 1 1 Reference Bit The reference R bit of a page is located in the PTE in the page table Every time a page is referenced with a read or write access and the R bit is zero the R bit is then set in the page table The OEA specifies that the reference bit may be set immediately or the setting may be delayed until the memory access is determined to be successful Because the reference to a page is what causes a PTE to be loaded into the TLB the reference bit in all core TLB entries is effectively always set The processor never automatically clears the reference bit The reference bit is only a hint to the operating system about the activity of a page At times the reference bit may be set by software although the access was not logically required by the program or even if the access was prevented by memory protection Examples of this in these systems include the following e Fetching of instructions not subsequently executed e Accesses generated by an Iswx or stswx in
357. ine and provides the subsequent instruction with a vacancy in the requested execution unit The completion unit maintains program order after instructions are dispatched guaranteeing in order completion and a precise interrupt model Completing an instruction committing execution results to the architected destination registers In order completion ensures the correct architectural state when the core must recover from a mispredicted branch or an interrupt The core can execute instructions out of order but in order completion by the completion unit ensures a precise interrupt mechanism Program related interrupts are signaled when the instruction causing the interrupt reaches the last position in the completion queue Prior instructions are allowed to complete before the interrupt is taken 7 3 3 1 Rename Register Operation To avoid contention for a given register file location the core provides rename registers for holding instruction results before the completion commits them to the architected register There are five GPR rename registers four FPR rename registers and one each for the CR LR and CTR When an instruction dispatches to its execution unit any required rename registers are allocated for the results of that instruction If an instruction is dispatched to the reservation station associated with an execution unit due to a data dependency the dispatcher also provides a tag to the execution unit identifying the rename register th
358. ing Note that the e300 core does not support the modified PowerPC little endian mode present in previous PowerPC cores but supports true little endian mode True little endian mode is supported in the e300 core to minimize the impact on software porting from true little endian systems The true little endian mode applies for all instruction fetches and data load and store operations to and from memory The e300 powers up in one of two endian modes big endian mode or true little endian mode selected by the tle signal at the negation of hreset The endian mode should be set at the negation of hreset and should remain unchanged by software for the duration of the system operation Bit 4 of HID2 HID2 LET is used in conjunction with MSR LE to indicate the endian mode of operation of the e300 core as shown in Table 3 1 Note that the e300 core no longer supports modified PowerPC little endian mode as in previous PowerPC cores e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Instruction Set Model Table 3 1 Endian Mode Indication MSR LE HID2 LET Endian Mode 0 x Big endian 1 1 True little endian When the e300 core is in true little endian mode memory and I O subsystems are treated as true little endian The following occurs when operating in true little endian mode e The byte reversing for instruction occurs before the instruction is decoded e The byte r
359. ingle Precision fres Floating Reciprocal Square Root Estimate frsqrte Store Floating Point as Integer Word stfiwx Note that this grouping of instructions does not indicate the execution unit that executes a particular instruction or group of instructions Integer instructions operate on byte half word and word operands Floating point instructions operate on single precision one word and double precision one double word floating point operands The PowerPC architecture uses instructions that are 4 bytes long and word aligned It provides for byte half word and word operand loads and stores between memory and a set of 32 GPRs It also provides for word and double word operand loads and stores between memory and a set of 32 FPRs Computational instructions do not modify memory To use a memory operand in a computation and then modify the same or another memory location the memory contents must be loaded into a register modified and then written back to the target location with distinct instructions The core follows the program flow when it is in the normal execution state However the flow of instructions can be interrupted directly by the execution of an instruction or by an asynchronous event Either kind of interrupt may cause one of several components of the system software to be invoked 1 3 2 2 Implementation Specific Instruction Set The e300 core instruction set is defined as follows e The core provides hardware s
360. interrupt and the address to return to when a Return from Interrupt rfi instruction is executed The machine status save restore register 1 SRR1 is used to save machine status on interrupts and to restore machine status when an rfi instruction is executed The SPRGO SPRG7 registers are provided for operating system use They reduce the latency that may be incurred in the saving of registers to memory while in a handler Note that the e300 implements four more SPRGs than the G2 SPRGO SPRG3 The time base register TB is a 64 bit register that maintains the time of day and operates interval timers It consists of two 32 bit fields time base upper TBU and time base lower TBL The processor version register PVR is a read only register that identifies the version model and revision level of the processor See Table 1 3 for the version and revision level of the PVR for the e300 processor core Block address translation BAT arrays The PowerPC architecture defines 16 BAT registers The e300 core includes a total of eight pairs of DBAT and eight pairs of IBAT registers See Figure 1 4 for a list of the SPR numbers for the BAT arrays The following supervisor level SPRs are implementation specific not defined in the PowerPC architecture DMISS and IMISS are read only registers that are loaded automatically on an instruction or data TLB miss HASH 1 and HASH2 contain the physical addresses of the primary and secondary page table ent
361. interrupts have other characteristics such as whether they are maskable the distinctions shown in Table 1 1 define categories of interrupts that the core handles uniquely Note that Table 1 1 includes no synchronous imprecise instructions While the PowerPC architecture supports imprecise handling of floating point exceptions the core implements floating point exception modes as precise The e300 core interrupts and exception conditions that cause them are listed in Table 1 2 Table 1 2 Exceptions and Interrupts Vector Offset Interrupt Type Exception Conditions hex Reserved 00000 System reset 00100 Caused by the assertion of either sreset or hreset Machine check 00200 Caused by the assertion of the tea signal during a data bus transaction assertion of mcp an address or data parity error or an instruction or data cache parity error Note that the e300 has SRR1 register values that are different from the G2 G2_LE cores when a machine check occurs See Table 5 14 for more information DSI 00300 Determined by the bit settings in the DSISR listed as follows 1 Setifthe translation of an attempted access is not found in the primary hash table entry group HTEG or in the rehashed secondary HTEG or in the range of a DBAT register otherwise cleared 4 Set if a memory access is not permitted by the page or DBAT protection mechanism otherwise cleared 6 Set for a store operation and cleared for a load operati
362. inue to operate as normal cache for the remaining program Normal instruction cache management however typically relies on the icbi instruction and the flash invalidate mechanism HIDO ICFI to routinely clear out part or all of the instruction cache for program related operations such as process changes and MMU page deallocation These cache management operations clear out both the locked and unlocked portions of the cache icbi invalidates all ways of a cache set unconditionally and HIDO ICFI clears the entire instruction cache unconditionally In the e300 core the new instruction cache way protect extension prevents the locked portion of the instruction cache from being invalidated by the icbi instruction or by the flash invalidate mechanism HIDO ICFI This allows the locked portion of the instruction cache to have the same persistence as main memory while still allowing the remaining unlocked portion of the instruction cache to be managed by the program This way protect extension is enabled by setting HID2 IC WP In addition to the way protect extension the instruction cache block touch icbt instruction is added to support easy start up initialization or reloading of the instruction cache This specifically supports the way lock and way protect mechanism by allowing the locked portion of the cache to be easily initialized in a predictable fashion The natural prefetch mechanism of super scalar processors otherwise precludes this
363. ion Undefined for dcbz 27 31 Set to bits 11 15 of the instruction rA Set to either bits 11 15 of the instruction or to any register number not in the range of registers loaded by a valid form instruction for Imw Iswi and Iswx instructions Otherwise undefined DAR Set to the EA of the data access as computed by the instruction causing the alignment interrupt When the operand of an Imw stmw war or stwex instruction is not word aligned that address value 4 is stored into the DAR 5 5 6 1 Integer Alignment Exceptions The e300 core is optimized for load and store operations that are aligned on natural boundaries Operations that are not naturally aligned may suffer performance degradation depending on the type of operation the boundaries crossed and the mode that the processor is in during execution More specifically these operations may either cause an alignment interrupt or they may cause the processor to break the memory access into multiple smaller accesses with respect to the cache and the memory subsystem The e300 core can initiate an alignment interrupt for the access shown in Table 5 18 In this case the appropriate range check is performed before the instruction begins execution As a result if an alignment interrupt is taken it is guaranteed that no portion of the instruction has been executed Table 5 18 Access Types MSR DR SR T Access Type 1 0 Page address translation access
364. ion TLB Miss Handler DSISR 6 lt SRR1 15 Instruction Access to Guarded Memory Data Access to Protected Memory C 0 SAR SARI AND aere SARI SAR AND tere DSISR 4 lt 1 SRR1 31 1 Little Endian Mode Otherwise dtemp lt dtemp XOR 0x07 Branch to ISI Interrupt Handler DAR lt dtemp Restore CRO Bits MSR TGPR lt 0 Branch to DSI Interrupt Handler Figure 6 18 Setup for Protection Violation Exceptions 6 5 2 2 2 Code for Example Interrupt Handlers This section provides assembly language examples that implement the flow diagrams described above Note that although these routines fit into a few cache lines they are supplied only as functional examples they could be further optimized for faster performance TLB software load for e300 core New Instructions tlbld write the dtlb with the pte in rpa reg e300 Power Architecture Core Family Reference Manual Rev 3 38 Freescale Semiconductor tlbli New SPRs dmiss imiss hashl hash2 icmp dcmp rpa gpr r0 r3 are shadowed there are three flows tlbDataMiss tlbCeq0d tlbInstrMiss tlb miss tlb miss tlb miss Memory Management write the itlb with the pte in rpa reg address address address returns returns returns on data of dstream miss of istream miss primary hash PTEG address secondary hash PTEG the primary istream the primary dstream the second w
365. ion as part of the instruction encoding e Branch resolution The determination of whether a branch is taken or not taken A branch is said to be resolved when the processor can determine which instruction path to take If the branch is resolved as predicted the instructions following the predicted branch that may have been speculatively executed can complete see completion If the branch is not resolved as predicted instructions on the mispredicted path and any results of speculative execution are purged from the pipeline and fetching continues from the nonpredicted path e Completion Completion occurs when an instruction has finished executing written back any results and is removed from the completion queue CQ When an instruction completes it is guaranteed that this instruction and all previous instructions can cause no interrupts e Fall through branch fall through A not taken branch On the e300 core fall through branch instructions are removed from the instruction stream at dispatch That is these instructions are allowed to fall through the instruction queue through the dispatch mechanism without either being passed to an execution unit and or given a position in the CQ e Fetch tThe process of bringing instructions from memory such as a cache or system memory into the instruction queue e Finish Finishing occurs in the last cycle of execution In this cycle the CQ entry is updated to indicate that the instruction
366. ion fetching bypasses the data cache changes to items in the data cache may not be reflected in memory until the fetch operations complete Special care must be taken to avoid coherency paradoxes in systems that implement unified secondary caches and designers should carefully follow the guidelines for maintaining cache coherency that are provided in the VEA and discussed in Chapter 5 Cache Model and Memory Coherency in the Programming Environments Manual Because the core does not broadcast the M bit for instruction fetches except when HIDO IFEM is set external caches are subject to coherency paradoxes 3 2 4 3 2 Integer Load and Store Address Generation Integer load and store operations generate effective addresses using register indirect with immediate index mode register indirect with index mode or register indirect mode See Section 3 2 2 3 Effective Address Calculation Note that the core is optimized for load and store operations that are aligned on natural boundaries and operations that are not naturally aligned may suffer performance degradation Refer to Section 5 5 6 1 Integer Alignment Exceptions 3 2 4 3 3 Register Indirect Integer Load Instructions For integer load instructions the byte half word word or double word addressed by the EA is loaded into rD Many integer load instructions have an update form in which rA is updated with the generated effective address For these forms the EA is placed i
367. ion for IBCR and DBCR An address match can be signaled after an OR function of the two compared addresses match or the AND of the two addresses match depending on the setting of IBCR and DBCR This feature along with e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Debug Features matching on greater than and less than allows a breakpoint to be set inside or outside a range of two addresses The instruction address breakpoints and data address breakpoints always operate independently of each other For more details see Section 2 2 15 Instruction Address Breakpoint Control Register IBCR and Section 2 2 17 Data Address Breakpoint Control Register DBCR The address matching for the instruction address breakpoint register has the following four possible conditions for the specific register 1 Instruction s effective address IABR CEA Table 10 3 describes the instruction address breakpoint register for a single address matching condition In this case only one ABR is used Table 10 3 Single Address Matching Bit Settings Register Field Name Condition Register Field Name Condition IABR CEA IABR2 CEA IABR BE 1 IABR2 BE 0 IBCR CNT 0 IBCR SIG_TYPE OR IBCR CMP1 IBCR CMP2 With single address matching settings a match occurs when the instruction s effective address IABR CEA Instruction s effective
368. ion operations not included in the e300c2 e Independent execution units and two register files Branch processing unit BPU featuring static branch prediction Two 32 bit integer units IU in the e300c2 and e300c3 One 32 bit integer units IU in the e300c1 FPU based on the IEEE 754 standard for both single and double precision operations Load store unit LSU for data transfer between data cache and general purpose registers GPRs and floating point registers FPRs System register unit SRU that executes condition register CR special purpose register SPR and integer add compare instructions Add compare instructions are also executed in the IUs Thirty two 32 bit GPRs for integer operands Thirty two 64 bit FPRs for single or double precision operands e High instruction and data throughput Zero cycle branch capability branch folding Programmable static branch prediction on unresolved conditional branches Two integer units with enhanced multipliers in the e300c2 and e300c3 for increased integer instruction throughput and a maximum two cycle latency for multiply instructions Instruction fetch unit capable of fetching two instructions per clock from the instruction cache A six entry instruction queue IQ that provides lookahead capability Independent pipelines with feed forwarding that reduces data dependencies in hardware 32 Kbyte data cache and 32 Kbyte instruction cache with parity eight way set associati
369. ion resumes at an address provided by the handler Synchronous imprecise The PowerPC architecture defines two imprecise floating point exception modes recoverable and nonrecoverable Even though the e300 core provides a means to enable the imprecise modes it implements these modes identically to the precise mode that is all enabled floating point exceptions are always precise on the e300 core These are not implemented on the e300c2 core as it does not support floating point instructions Asynchronous maskable The external interrupt int system management interrupt smi decrementer interrupt and critical interrupt cint are maskable asynchronous interrupts When these interrupts occur their handling is postponed until the next instruction completes execution and until any interrupts associated with that instruction complete execution If there are no instructions in the execution units the interrupt is taken immediately upon determination of the correct restart address for loading SRRO Asynchronous nonmaskable There are two nonmaskable asynchronous interrupts system reset and the machine check interrupt These interrupts may not be recoverable or may provide a limited degree of recoverability All interrupts report recoverability through the MSR RI bit e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Interrupts and Exceptions The e300 core interrupt classes are shown in Table 5
370. ion specific 3 34 integer 3 1 3 22 A 15 A 20 load and store 3 16 e300 Power Architecture Core Family Reference Manual Rev 3 Index 4 Freescale Semiconductor byte reverse instructions A 20 integer multiple instructions A 20 string instructions A 20 memory control A 23 memory synchronization A 21 operating environment architecture OEA 3 29 3 32 performance monitor 11 7 PowerPC instructions complete lists form format A 24 function A 15 legend A 35 mnemonic A 1 opcode A 8 processor control A 23 rfci 5 16 rfi 5 15 sc 5 4 5 31 segment register manipulation A 24 system linkage A 22 TLB management instructions A 24 trap instructions A 23 user instruction set architecture UISA 3 10 3 24 virtual environment architecture VEA 3 27 int signal 5 4 5 12 5 25 Integer unit IU 7 3 execution timing 7 21 latency integer instructions 7 29 Interrupt handling 5 2 classes of interrupts 5 2 synchronous exceptions imprecise 5 2 precise 5 2 5 5 core interface operations 8 1 8 4 8 8 8 9 enabling and disabling interrupts and exceptions 5 14 enabling critical interrupts MSR CE bit 5 13 enabling external interrupts MSR EE bit 5 12 9 5 instruction related interrupts 3 9 interrupt modes FP interrupt mode 0 5 13 5 14 FP interrupt mode 1 5 13 interrupt types 5 17 5 36 alignment 5 4 5 25 critical input interrupt cint 5 4 5 13 5 30 data TLB miss on load 5 5 5 33
371. is clock cycle as there are many results in the rename registers e300 Power Architecture Core Family Reference Manual Rev 3 20 Freescale Semiconductor Instruction Timing Lo ada Fetch in IQ Tadd In Dispatch Entry IQ0 IQ1 I L 2 be J EE mus Execute Snu I em 4b In Retirement Entry CQ0 CQ1 5 fadd I l TO add T1 add T2 add T3 add T4 add T5 or Sg a lt l I l Instruction Queue T5 o ri T3 T4 3 5 Ti T2 T3 1 2 4 TO Ti T2 6 0 1 3 5 TO T1 5 Completion Queue emt 2 3 1 2 TO 6 0 0 1 3 3 5 Figure 7 8 Branch Instruction Timing After one clock cycle required to refetch the original instruction stream instruction 5 the same instruction that was fetched in clock cycle 2 is brought back into the IQ from the instruction cache along with one other 7 4 2 Integer Unit Execution Timing The integer unit executes all integer and bit field computational instructions Many of these instructions execute in a single clock cycle The integer unit has one execute stage so when a multiple cycle integer instr
372. is section describes the PowerPC architecture in general and specific details about the implementation of the e300 core as a low power 32 bit member of this PowerPC core family The main topics addressed are as follows e Section 1 3 1 Register Model describes the registers for the operating environment architecture common among e300 cores that implement the PowerPC architecture and describes the programming model It also describes the additional registers that are unique to the core e Section 1 3 2 Instruction Set and Addressing Modes describes the PowerPC instruction set and addressing modes for the OEA and defines and describes the instructions implemented in the core e Section 1 3 3 Cache Implementation describes the cache model that is defined generally for cores that implement the PowerPC architecture by the VEA It also provides specific details about the e300 core cache implementation e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 15 Overview e Section 1 3 4 Interrupt Model describes the interrupt model of the OEA and the differences in the core interrupt model e Section 1 3 5 Memory Management describes generally the conventions for memory management among these cores This section also describes the core implementation of the 32 bit PowerPC memory management specification e Section 1 3 6 Instruction Timing provides a general description
373. ister frD as undefined when executing the Floating Convert to Integer Word fctiw and Floating Convert to Integer Word with Round Toward Zero fetiwz instructions Examples of uses of these instructions to perform various conversions can be found in Appendix D Floating Point Models in the Programming Environments Manual The floating point rounding instructions are shown in Table 3 10 Table 3 10 Floating Point Rounding and Conversion Instructions Name Mnemonic Operand Syntax Floating Convert to Integer Word fctiw fctiw frD frB Floating Convert to Integer Word with Round Toward Zero fetiwz fctiwz frD frB Floating Round to Single Precision frsp frsp frD frB 3 2 4 2 4 Floating Point Compare Instructions Floating point compare instructions compare the contents of two floating point registers The comparison ignores the sign of zero that is 0 0 The floating point compare instructions are listed in Table 3 11 Table 3 11 Floating Point Compare Instructions Name Mnemonic Operand Syntax Floating Compare Ordered Tempo crfD frA frB Floating Compare Unordered fcmpu crfD frA frB 3 2 4 2 5 Floating Point Status and Control Register Instructions Every FPSCR instruction appears to synchronize the effects of all floating point instructions executed by a given processor Executing an FPSCR instruction ensures that all floating point instructions previously initiated by th
374. it MMU HAS Hu hash address regs primary secondary 2 19 6 32 HIDn hardware implementation registers 0 2 2 12 2 15 2 16 cache control parameters 4 14 4 17 HIDO register doze bit 9 3 DPM enable bit 9 1 9 3 HIDO register nap bit 9 4 HID1 PLL configuration 8 4 I O accesses 8 8 IABR IABR2 instruction address breakpoint regs 2 11 2 23 2 24 10 1 IBATnU L instruction block address translation regs 0 7 upper lower 1 5 2 9 2 11 2 20 2 21 IBCR instruction address breakpoint control reg 2 11 2 24 10 2 icbi 4 20 icbt 4 20 ICMP instruction TLB compare register 2 18 6 32 6 34 Illegal instructions 5 29 IMISS instruction TLB miss address reg 2 18 6 31 6 34 Instruction access errors see ISI instruction storage interrupt Instruction address breakpoint interrupt 5 5 5 34 10 1 10 3 10 4 Instruction cache see Caches Instruction latencies 7 28 see also Execution timing Instruction set model overview 3 10 summary 3 4 Instruction timing overview 1 30 1 31 see also Execution timing Instruction TLB miss interrupt 5 5 5 33 Instructions branch instructions A 22 cache control instructions 4 17 4 21 CSB operations for cache instructions 4 29 cache management instructions A 23 classes defined 3 5 illegal 3 5 reserved 3 5 condition register logical A 22 e300 instructions not implemented B 1 floating point A 17 A 22 illegal program interrupt 5 29 implementat
375. itecture supports four types of interrupts e Synchronous precise These are caused by instructions All instruction caused interrupts are handled precisely that is the machine state at the time the interrupt occurs is known and can be completely restored This means that excluding the trap and system call interrupts the address of the faulting instruction is provided to the interrupt handler and neither the faulting instruction nor subsequent instructions in the code stream will complete execution before the interrupt is taken Once the interrupt is processed execution resumes at the address of the faulting instruction or at an alternate address provided by the interrupt handler When an interrupt is taken due to a trap or system call instruction execution resumes at an address provided by the handler e Synchronous imprecise The PowerPC architecture defines two imprecise floating point exception modes recoverable and nonrecoverable Even though the core provides a means to enable the imprecise modes it implements these modes identically to the precise mode that is all enabled floating point exceptions are always precise on the core e Asynchronous maskable The external system management interrupt SMI and decrementer interrupts are maskable asynchronous interrupts When these interrupts occur their handling is postponed until the next instruction and any of its associated interrupts complete execution If there are no instructi
376. ith DPM Enabled AAA 9 3 e300 Power Architecture Core Family Reference Manual Rev 3 xii Freescale Semiconductor Paragraph Number Title 9 3 1 3 Doze Modenese in asi a n eine ee 9 3 1 4 leger Sept Een EE i 9 3 1 5 ee E 9 3 2 Power Management Software Consideratons 9 4 Example Code Sequence for Entering Processor Sleep Mode Chapter 10 Debug Features 10 1 Breakpoint Resources eenegen 10 1 1 Instruction Address Breakpoint Registers QABR IARBR7 ee 10 1 2 Instructional Address Control Register OBCR 10 1 3 Data Address Breakpoint Registers DABR DABRI eee 10 1 4 Data Address Control Register OODBCR cc eeeceeseceeeseeeeeeeeesteeeeneeeeaes 10 1 5 Other Debug RESOULCES 3426 sssica es sesedarasussaassaved cas sassad nina si 10 1 6 Interrupt Vectors Tor Debugging 23 hee cncsicp te sseea ddd cgtiehcnneds 10 2 Using Breakpoint Facilities 20 225 Soke Ee alae eae 10 2 1 ebe 10 2 2 Branch ET ACTIN ca ods eege ere ee Ee 10 2 3 Breakpoint Address Matching Optons 10 3 Synchronization Requirements and Other Precautions 0 0 00 eeeeeeeeeereees Chapter 11 Performance Monitor 11 1 MV EE VIG W gebeten eenegen 11 2 Performance Monitor Register 11 2 1 Global Control Register O PMGCO0 0 eee eeseceseceseeeeseceseeneeeeeaeeesaeenes 11 2 2 User Global Control Register O UPMOCU cee eeeeeseceseeeeeeeeeenneeees 11 2 3 Local Control A Registers DM Ca DM Ca i 11 2 4 User Local Control A Registers UPMLCa0 UPMLCa3 ee e
377. ively When a data breakpoint event occurs DSISR 9 is set identifying the DSI as having been caused by a data breakpoint event In this case the DAR contains the address of the data access that matched Trace 00D00 A trace interrupt is taken when MSR SE 1 or when the currently completing instruction is a branch instruction and MSR BE 1 Instruction address breakpoint 01300 An instruction address breakpoint interrupt occurs when a match condition exists for the effective address of the instruction access in either IABR or IABR2 for the next instruction to complete in the completion unit and the corresponding IABR BE enable bit is set 10 2 Using Breakpoint Facilities Breakpoints single stepping branch tracing address and combinational matching are debugging facilities provided by the breakpoint registers DABR DABR2 IABR and IABR2 and the MSR 10 2 1 Single Stepping Single stepping can be a very useful tool in software debugging This debug feature executes one instruction before it takes a trace interrupt In the trace interrupt handler the results of executing that instruction can be examined e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Debug Features When MSR SE single step trace enable is set the processor generates a trace interrupt 0x00D00 upon the successful completion of the next instruction A trace interrupt is not taken for an isync
378. ivw divwo divwo rD rA rB Divide Word Unsigned divwu divwu divwuo divwuo rD rA rB Multiply High Word mulhw mulhw rD rA rB Multiply High Word Unsigned mulhwu mulhwu rD rA rB Multiply Low mullw mullw mullwo mullwo rD rA rB Multiply Low Immediate mulli rD rA SIMM Negate neg neg nego nego rD rA Subtract From subf subf subfo subfo rD rA rB Subtract From Carrying subfc subfc subfco subfco rD rA rB Subtract From Extended subfe subfe subfeo subfeo rD rA rB Subtract From Immediate Carrying subfic rD rA SIMM Subtract From Minus One Extended subfme subfme subfmeo subfmeo rD rA Subtract From Zero Extended subfze subfze subfzeo subfzeo rD rA Although there is no Subtract Immediate instruction its effect can be achieved by using an addi instruction with the immediate operand negated Simplified mnemonics are provided that include this negation The subf instructions subtract the second operand rA from the third operand rB Simplified mnemonics are provided in which the third operand is subtracted from the second operand See Appendix F Simplified Mnemonics in the Programming Environments Manual for examples 3 2 4 1 2 The integer compare instructions algebraically or logically compare the contents of rA with either the UIMM operand the SIMM operand or the contents of rB The comparison is signed for the empi and cmp instructions and unsigned for the cmpli and cmpl instructions Table 3 4 li
379. ked for instruction fetches that hit in the cache For the data cache parity is checked for loads that hit in the cache as well as for any cache line writes to memory replacement copy backs debf dcbst pushes and snoop copy backs Note that data and instruction cache parity generation and checking are always on the ECPE parameter simply enables reporting of parity errors The state of SRR1 reflects the additional machine check conditions of instruction cache parity error bit 10 or data cache parity error bit 11 Cache parity errors are logged as non recoverable Refer to Section 5 5 2 Machine Check Interrupt 0x00200 for more information 4 8 Bus Interface The bus interface buffers receive requests from the instruction and data caches and execute the requests per the CSB protocol They include address register queues prioritization logic and bus control logic The bus interface also captures snoop addresses for snooping in the cache and address register queues snoops for reservations and holds the touch load address for the cache All data storage for the address register buffers load and store data buffers is located in the cache logic The data buffers are considered temporary storage for the cache and not part of the bus interface The general functions and features of the bus interface are as follows e Address register buffers that include Instruction cache load address buffer Data cache load address buffer D
380. ken the e300 core is not guaranteed to take a critical interrupt The interrupt must send a command to the device that asserted cint acknowledging the interrupt and instructing the device to negate cint before the handler re enables recognition of critical interrupts The additional SPRG4 7 registers on the e300 core can reduce overall latency for critical interrupts as fewer GPRs need to be saved upon entering a critical interrupt routine The e300 core also implements the rfci instruction for specifically returning from critical interrupt routines and restoring the processor state from CSRRO and CSRRI1 5 5 11 System Call Interrupt 0x00C00 The e300 core implements the system call interrupt as it is defined by the PowerPC architecture A system call interrupt request is made when a system call se instruction is completed If no higher priority interrupt exists the system call interrupt is taken with SRRO being set to the EA of the instruction following the sc instruction Register settings for this interrupt are described in Chapter 6 Interrupts in the Programming Environments Manual When a system call interrupt is taken instruction execution for the handler begins at offset OxOOCOO from the physical base address indicated by MSR IP 5 5 12 Trace Interrupt 0x00D00 The trace interrupt is taken under one of the following conditions e When MSR SE is set a single step instruction trace interrupt is taken when no higher prio
381. king 4 34 4 36 4 41 FP interrupt modes 0 1 5 14 MSR CE critical interrupt enable bit 5 13 MSR EE 9 5 MSR POW power management enable bit 9 5 settings due to interrupts 5 17 N Nap mode 9 2 9 4 O OEA instructions see Instructions operating environment architecture OEA Operands conventions 3 1 operand placement and performance 3 4 Optional instructions A 35 P Page address translation page address translation flow 6 25 page size 6 19 see also Memory management unit MMU selection of page address translation 6 8 6 12 table search operation 6 25 TLB organization 6 24 Page history status R and C bit recording 6 10 6 20 6 23 Page tables resources for table search operations 6 29 software table search operation 6 29 6 34 e300 Power Architecture Core Family Reference Manual Rev 3 Index 6 Freescale Semiconductor table search for PTE 6 25 Parity checking 8 9 Parity error reporting 4 14 Performance characterizing through performance monitor event counting 1 14 11 1 performance considerations memory 7 24 performance transparent functionality 9 3 see also Execution timing Performance monitor 1 14 Performance monitor APU event counting 11 10 chaining counters 11 11 event types 11 11 11 13 processor context marking 11 10 unconditional counting 11 10 examples of uses 11 11 instructions 11 7 interrupt triggered by events 1 14 5 33 11 1 11 9 registers PMRs 1
382. l Rev 3 Instruction Timing Freescale Semiconductor 33 Instruction Timing Table 7 6 Load and Store Instructions continued Kettel Primary Extended Unit Latency Opcode Opcode in Cycles stfdx 31 727 LSU 2 1 stfdux 31 759 LSU 2 1 Ihbrx 31 790 LSU 2 1 sthbrx 31 918 LSU 2 1 tlbid 31 978 LSU 2 amp icbi 31 982 LSU 3 amp stfiwx 31 983 LSU 2 1 tlbli 31 1010 LSU 3 amp dcbz 31 1014 LSU 10 amp Iwz 32 LSU 2 1 Iwzu 33 LSU 2 1 Ibz 34 LSU 2 1 Ibzu 35 LSU 2 1 stw 36 LSU 2 1 stwu 37 _ LSU 2 1 stb 38 LSU 2 1 stbu 39 LSU 2 1 Ihz 40 LSU 2 1 Ihzu 41 LSU 2 1 Iha 42 LSU 2 1 Ihau 43 LSU 2 1 sth 44 LSU 2 1 sthu 45 LSU 2 1 Imw 46 LSU 2 n amp stmw 47 LSU 1 n amp Ifs 48 LSU 2 1 Heu 49 LSU 2 1 Ifd 50 LSU 2 1 Ifdu 51 LSU 2 1 stfs 52 LSU 2 1 stfsu 53 LSU 2 1 stfd 54 LSU 2 1 e300 Power Architecture Core Family Reference Manual Rev 3 34 Freescale Semiconductor Table 7 6 Load and Store Instructions continued Knemoni Primary Extended Unit Latency Opcode Opcode in Cycles stfdu 55 E LSU 2 1 Note Cycle times marked with amp require a variable number of cycles due to serialization Cycle times marked with a specify hit and miss times for cache management instructions that require conditio
383. l Business Machines Inc e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor XXV For updates to the specification see http www 1 ibm com support docview wss uid pub 1sa14208300 PowerPC Microprocessor Common Hardware Reference Platform A System Architecture by Apple Computer Inc International Business Machines Inc and Motorola Inc Computer Architecture A Quantitative Approach Second Edition John L Hennessy and David A Patterson Computer Organization and Design The Hardware Software Interface Second Edition David A Patterson and John L Hennessy Inside Macintosh PowerPC System Software Addison Wesley Publishing Company One Jacob Way Reading MA 01867 Tel 800 282 2732 U S A 800 637 0029 Canada 716 871 6555 International Related Documentation Freescale documentation is available from the sources listed on the back cover of this manual the document order numbers are included in parentheses for ease in ordering Programming Environments Manual for 32 Bit Implementations of the PowerPC Architecture MPCFPE32B Describes resources defined by the PowerPC architecture User s and reference manuals These books provide details about individual implementations and are intended for use with the Programming Environments Manual Addenda errata to user s or reference manuals Because some processors have follow on devices an addendum is provided that de
384. l an enabled condition or event occurs When an enabled condition or event occurs PMGCO FAC is set It is up to software to clear FAC An enabled condition or event is defined as one of the following e When the msb 1 in PMCx and PMLCax CE 1 e When the time base bit specified by TBSEL 1 and TBEE 1 3 17 Reserved should be cleared e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Performance Monitor Table 11 3 PMGCO Field Descriptions continued Bits Name Description 19 20 TBSEL Time base selector Selects the time base bit that can cause a time base transition event the event occurs when the selected bit changes from 0 to 1 00 TB 63 TBL 31 01 TB 55 TBL 23 10 TB 51 TBL 19 11 TB 47 TBL 15 Time base transition events can be used to periodically collect information about processor activity In multiprocessor systems in which TB registers are synchronized across processors these events can be used to correlate performance monitor data obtained by the several processors For this use software must specify the same TBSEL value for all processors in the system Time base frequency is implementation dependent so software should invoke a system service program to obtain the frequency before choosing a TBSEL value 20 21 Reserved should be cleared 23 TBEE Time base transition event exception enable 0 Exceptions from t
385. l the completion logic commits the value to a GPR or FPR Stores cannot be executed in a predicted manner and are held in the store queue until the completion logic signals that the store operation is to be completed to memory The core executes store instructions with a maximum throughput of one per cycle and with a three cycle total latency The time required to perform the actual load or store depends on whether the operation involves the cache system memory or an I O device 1 1 3 4 System Register Unit SRU The SRU executes various system level instructions including condition register logical operations and move to from special purpose register instructions It also executes integer add compare instructions In order to maintain system state most instructions executed by the SRU are completion serialized that is the instruction is held for execution in the SRU until all prior instructions issued have completed Results from completion serialized instructions executed by the SRU are not available or forwarded for subsequent instructions until they complete 1 1 4 Completion Unit The completion unit tracks instructions in program order from dispatch through execution and then completes Completing an instruction commits the core to any architectural register changes caused by that instruction In order completion ensures the correct architectural state when the core must recover from a mispredicted branch or an interrupt Instruction stat
386. lable in MESI protocol mode The instruction cache always operates in a reduced two state mode using the states valid V or invalid I Cache blocks in the instruction cache are never modified The four MESI states are defined in Table 4 2 Table 4 2 MEI MESI State Definitions MESI State Definition Modified M The addressed cache block is valid only in the cache The cache block is modified with respect to system memory that is the modified data in the cache block has not been written back to memory Exclusive E The addressed block is in this cache only The data in this cache block is consistent with system memory Shared S The addressed cache block is valid in this cache and the data in the block is consistent with system memory Unlike the exclusive state however the block may also simultaneously be in the shared state in another cache in the system The data in the block may be read at any time by this processor however before it may be written with any newer data by this processor it must first be removed de allocated from all other caches in the system Note This state is only available when HID2 MESI is set Invalid I This state indicates that the addressed cache block is not resident in the cache 4 4 2 1 MEI Coherency Protocol The default cache coherency protocol is a coherent subset of the standard MESI four state cache protocol that omits the shared state Since data cannot be share
387. le on the current clock cycle if only one IQ entry was vacant on the previous cycle only one instruction is fetched Typically instructions are fetched from the on chip instruction cache If the instruction request hits in the on chip instruction cache it can usually present the first two instructions of the new instruction stream in the next clock cycle giving enough time for the next pair of instructions to be fetched from the cache with no idle cycles Instructions not in the instruction cache are fetched from system memory e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 9 Instruction Timing Branch instructions that do not update the LR or CTR are removed from the instruction stream either by branch folding or removal of fall through branch instructions as described in Section 7 4 1 1 Branch Folding Branch instructions that update the LR or CTR are treated as if they require dispatch even through they are not dispatched to an execution unit in the process They are assigned a position in the CQ to ensure that the CTR and LR are updated sequentially All other instructions are dispatched from IQO and IQ1 The dispatch rate depends on the availability of resources such as the execution units rename registers and CQ entries and on the serializing behavior of some instructions Instructions are dispatched in program order an instruction in IQ1 can be dispatched at the same time as one in IQ
388. led otherwise the e300 core attempts to enter an internal checkstop Note that the resulting machine check interrupt has priority over any interrupts caused by the instruction that generated the bus operation Machine check interrupts are only enabled when MSR ME 1 this is described in Section 5 5 2 1 Machine Check Interrupt Enabled MSR ME 1 If MSR ME 0 and a machine check occurs the e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 21 Interrupts and Exceptions processor enters the checkstop state this is described in Section 5 5 2 2 Checkstop State MSR ME 0 5 5 2 1 Machine Check Interrupt Enabled MSR ME 1 When a machine check interrupt is taken registers are updated as shown in Table 5 14 When a machine check interrupt is taken instruction execution for the handler begins at offset 0x00200 from the physical base address indicated by MSR IP In order to return to the main program the interrupt handler should do the following 1 SRRO and SRR1 should be given the values to be used by the rfi instruction 2 Execute rfi Table 5 14 Machine Check Interrupt Register Settings Register Setting Description SRRO Set to the address of the next instruction that would have been completed in the interrupted instruction stream Neither this instruction nor any others beyond it will have been completed All preceding instructions will have been completed
389. led Spec Cycles the IQ is not empty but 0 instructions decoded Com 19 Cycles issue stalled Spec Cycles the issue buffer is not empty but 0 instructions issued Com 31 Cache inhibited accesses Spec Cache inhibited accesses translated translated Com 61 Number of instruction fetches that Spec Counts fetches that write at least one instruction to the IQ With hit instruction fetched com 4 can used to compute instructions per fetch Instruction MMU and Data MMU Events Com 62 MMU inside miss Spec Counts instruction TLB miss exceptions BIU Interface Usage Com 67 BIU master requests Spec Master transaction starts assertions of ts Com 68 BIU master instruction side Spec Master instruction side assertions of ts requests Com 69 BIU master data side requests Spec Master data side assertions of ts e300 Power Architecture Core Family Reference Manual Rev 3 12 Freescale Semiconductor Performance Monitor Table 11 9 Performance Monitor Event Selection continued Number Event ue Count Description Com 71 BIU master retries Spec Transactions initiated by this processor that were retried on the BIU interface The core is master and another device retries the core transaction Snoop Com 74 Snoop pushes N A Snoop pushes from all data side resources Counts snoop ARTRYs and WOPs Chaining Events Com 82 PMCO overflow N A PMCO 32 transitions from 1 t
390. lely to enable system and software implementers to use Freescale Semiconductor products There are no express or implied copyright licenses granted hereunder to design or fabricate any integrated circuits or integrated circuits based on the information in this document Freescale Semiconductor reserves the right to make changes without further notice to any products herein Freescale Semiconductor makes no warranty representation or guarantee regarding the suitability of its products for any particular purpose nor does Freescale Semiconductor assume any liability arising out of the application or use of any product or circuit and specifically disclaims any and all liability including without limitation consequential or incidental damages Typical parameters which may be provided in Freescale Semiconductor data sheets and or specifications can and do vary in different applications and actual performance may vary over time All operating parameters including Typicals must be validated for each customer application by customer s technical experts Freescale Semiconductor does not convey any license under its patent rights nor the rights of others Freescale Semiconductor products are not designed intended or authorized for use as components in systems intended for surgical implant into the body or other applications intended to support or sustain life or for any other application in which the failure of the Freescale Semiconductor product
391. letes and no subsequent instructions appear to be initiated until the syne instruction completes For an example showing the use of a sync instruction see Chapter 2 Register Set of the Programming Environments Manual e The isync instruction which waits for all previous instructions to complete and then discards any fetched instructions causing subsequent instructions to be fetched or refetched from memory and to execute in the context privilege translation protection etc established by the previous instructions e The stwex instruction to clear any outstanding reservations which ensures that an lwarx instruction in the old process is not paired with an stwex instruction in the new process The operating system should set the MSR RI bit as described in Section 5 2 4 Setting MSR RI 5 4 Interrupt Latencies Latencies for taking various interrupts depend on the state of the machine when the exception conditions occur This latency may be as short as one cycle in which case an interrupt is signaled in the cycle following the appearance of the exception condition The latencies are as follows e300 Power Architecture Core Family Reference Manual Rev 3 16 Freescale Semiconductor Interrupts and Exceptions e Hard reset and machine check In most cases a hard reset or machine check interrupt will have a single cycle latency A two to three cycle delay may occur only when a predicted instruction is next to c
392. lizations and inhibits subsequent instruction dispatching as required For a more detailed overview of instruction dispatch see Section 1 3 6 Instruction Timing 1 1 2 2 Branch Processing Unit BPU The BPU receives branch instructions from the fetch unit and performs CR lookahead operations on conditional branches to resolve them early achieving the effect of a zero cycle branch in many cases The BPU uses a bit in the instruction encoding to predict the direction of the conditional branch Therefore when an unresolved conditional branch instruction is encountered the core fetches instructions from the predicted target stream until the conditional branch is resolved The BPU contains an adder to compute branch target addresses and three user control registers the link register LR the count register CTR and the conditional register CR The BPU calculates the return pointer for sub routine calls and saves it into the LR for certain types of branch instructions The LR also contains the branch target address for the Branch Conditional to Link Register belrx instruction The CTR contains the branch target address for the Branch Conditional to Count Register bectrx instruction The contents of the LR and CTR can be copied to or from any GPR Because the BPU uses dedicated registers rather than GPRs or FPRs execution of branch instructions is largely independent from execution of integer and floating point instructions 1 1 3 I
393. ly Reference Manual Rev 3 40 Freescale Semiconductor Chapter 4 Instruction and Data Cache Operation This chapter describes the organization of the cache cache coherency protocols cache control instructions and various cache operations It describes the interaction between the caches the load store unit the instruction unit and the memory subsystem It also describes the cache way locking features provided in the core Note that in this chapter the term multiprocessor is used in the context of maintaining cache coherency These multiprocessor devices could be actual processors and other devices that can access system memory maintain their own caches and function as bus masters requiring cache coherency 4 1 Introduction The core provides two independent L1 instruction and data caches to allow the registers and execution units rapid access to instructions and data 4 1 1 Instruction and Data Cache Features The cache implementation has the following characteristics e Harvard architecture separate instruction and data caches e 32 Kbyte instruction and data caches on e300c1 16 Kbyte caches on e300c2 and e300c3 e Eight way set associativity on e300c1 and four way set associative on e300c2 and e300c3 e Physically addressed cache directories The physical real address tag is stored in the cache directory e 32 byte cache blocks A cache block is the block of memory that a coherency state describes also referred to
394. ly improves integer instruction throughput The complete complete write back pipeline stage maintains the correct architectural machine state and commits it to the architectural registers at the proper time If the completion logic detects an instruction containing an interrupt status all following instructions are canceled their execution results in rename registers are discarded and the correct instruction stream is fetched The complete stage ends when the instruction is retired Two instructions can be retired per cycle Instructions are retired only from the two lowest CQ entries CQO and CO e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Instruction Timing The notation conventions used in the instruction timing examples are as follows Fetch The fetch stage includes the time between when an instruction is requested and when it is brought into the instruction queue This latency can vary greatly depending on whether the instruction is in the on chip cache or system memory in which case latency can be affected by bus speed and traffic on the system bus and address translation dispatches Therefore in the examples in this chapter the fetch stage is usually idealized that is an instruction is usually shown to be in the fetch stage when it is a valid instruction in the instruction queue The instruction queue has six entries IQO IQ5 In dispatch entry IQ0 IQ1 Instructions can be disp
395. ly set during or after the reset routine has completed a subsequent soft reset causes the system reset interrupt handler to be entered in true little endian mode potentially resulting in illegal instruction execution if the beginning of the handler is written assuming big endian code Note that the reverse occurs for true little endian mode The following assembly language code highlights register settings necessary when in big endian mode coming out of hard reset and subsequently changing the processor state to true little endian mode and setting the MSR ILE MSR LE and HID2 LET bits The first eight instructions of the system reset interrupt handler is written in big endian format in order to facilitate the mode switch The rest of the reset handler is written in true little endian format for the remaining supervisor or OS code This reset code assumes that caching is not enabled out of reset Due to the complexities involved with keeping the memory system coherent it is strongly recommended not to change endinaness at any other time once it is determined at hard reset orig OxFFFO 0100 default IP vector Begin HRESET_ handler with Big Endian Mode xor see E initialize register xor pees is EAR initialize register oris r2 xr2 0x0800 set bit in r2 for HID2 4 LET mtspr HID2 r2 load HID2 setting LET bit oris r1 rl1 0x0001 set bit in rl for MSR 15 ILE ori r1 r1 0x0001 set bit in rl for MSR 31 LE
396. management instructions is implementation dependent system software should incorporate uses of the instructions into subroutines to maximize compatibility with programs written for other processors For more information on the PowerPC instruction set refer to Chapter 4 Addressing Modes and Instruction Set Summary and Chapter 8 Instruction Set in the Programming Environments Manual 3 2 7 Recommended Simplified Mnemonics To simplify assembly language programs a set of simplified mnemonics is provided for some of the most frequently used operations such as no op load immediate load address move register and complement register PowerPC compliant assemblers provide the simplified mnemonics listed in Recommended e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 33 Instruction Set Model Simplified Mnemonics in Appendix F Simplified Mnemonics in the Programming Environments Manual and listed with some of the instruction descriptions in this chapter Programs written to be portable across the various assemblers for the PowerPC architecture should not assume the existence of mnemonics not described in this document For a complete list of simplified mnemonics see Appendix F Simplified Mnemonics in the Programming Environments Manual 3 2 8 Implementation Specific Instructions This section provides a detailed look at the e300 core implementation specific instru
397. mance monitor register as listed in Table 11 1 and Table 11 2 The contents of GPR rS are placed into the designated performance monitor register When MSR PR 1 specifying a performance monitor register that is not implemented and is not privileged PMRN 5 0 results in an illegal instruction exception type program interrupt When MSR PR 1 specifying a performance monitor register that is privileged PMRN 5 1 results in a privileged instruction execution type program interrupt When MSR PR 0 specifying an unimplemented performance monitor register is boundedly undefined Other registers altered none 11 3 Performance Monitor Interrupt The performance monitor interrupt is triggered by an enabled condition or event The only performance monitor enabled condition or event defined for the e300c3 is the following e A PMCn overflow condition occurs when both of the following are true The counter s overflow condition is enabled PMLCan CE is set The counter indicates an overflow PMCn OV is set If PMGCO PMIE is set an enabled condition or event triggers the signaling of a performance monitor exception If PMGCO FCECE is set an enabled condition or event also triggers all performance monitor counters to freeze Although the performance monitor exception condition could occur with MSR EE cleared the interrupt cannot be taken until MSR EE is set If PMCn overflows and would signal an exception PMLCan CE a
398. me base and decrementer are disabled while the core is in sleep mode the time base contents must be updated from an external time base following sleep mode if accurate time of day maintenance is required Before entering sleep mode the core asserts greq to indicate that it is ready to disable bus snooping When the system has ensured that snooping is no longer necessary the system logic allows the core to enter sleep mode by asserting gack for the duration of the sleep mode period e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Power Management Sleep mode is characterized by the following features e All functional units disabled including bus snooping and time base e All nonessential input receivers disabled e Internal clock regenerators disabled e PLL and sysclk can be disabled To enter sleep mode the following conditions must occur e Set sleep bit HIDO 10 1 MSR POW is set e 300 core asserts greq e System logic asserts gack e e300 core enters sleep mode after several processor clocks To return to full power mode when sysclk and PLL are not disabled the following conditions must occur e Assert int smi or mcp internal signals e Hard reset or soft reset To return to full power mode after PLL and sysclk are disabled in sleep mode the following conditions must occur e Enable sysclk e Reconfigure PLL into desired processor clock mode e System logic waits for PLL startup and relo
399. ment architecture VEA and operating environment architecture OEA Secondly this chapter describes the core implementation specific registers Full descriptions of the basic register set defined by the PowerPC architecture are provided in Chapter 2 Register Set in the Programming Environments Manual The PowerPC architecture defines register to register operations for all computational instructions Source data for these instructions is accessed from the on chip registers or is provided as an immediate value embedded in the opcode The three register instruction format allows specification of a target register distinct from the two source registers thus preserving the original data for use by other instructions and reducing the number of instructions required for certain operations Data is transferred between memory and registers with explicit load and store instructions only Note that there may be registers common to other processors of this family that are not implemented in the e300 core When the core detects special purpose register SPR encodings other than those defined in this document it either takes an interrupt or it treats the instruction as a no op Conversely some SPRs in the e300 core may not be implemented in other processors or may not be implemented in the same way 2 1 PowerPC Register Set The UISA registers shown in Figure 2 1 can be accessed by either user or supervisor level instructions the architecture spe
400. mentg ee eeeeeeee 7 26 Branch Resolution Resource Requirements cceesceceseeceeceeceeeeeeeeeeecsteeeesaeees 7 26 Dispatch Unit Resource Reogurements 7 27 Completion Unit Resource Requirements cccessceceeseeceeeeceeeeeceeeeecsteeeeeaeees 7 27 Instruction Latency ZONEN des 7 28 Chapter 8 Core Interface Operation Signal Groupings eege EES EE es 8 1 Punct mal Groupings tren erte eet 8 2 Signal HENSE 8 2 PLL Configuration pll_cfg 0 6 Input eee eee cseeceseeeseeesseecnaeeeseesseeeenees 8 4 Overview of Core Intertace Accesses Sage ie i ee ee 8 7 Interrupt Checkstop and Reset Signals ceeccecssccecececssececeseeecseceecseeeecsseeeesteeeenaeees 8 8 FES Ee ER 8 8 Cheeketbomg esst zersbuue re genee tadveudadugatenseazacaannasuadesy soaedessnecaunsbeyaunaavesneeesenonaates 8 9 MS SEU WITLI EE 8 9 Core Quiesce Control Stomalss EE 8 9 IEEE 1149 1 Compliant Int rface eeneg Eed 8 9 IEEE 1149 1 Interface K eeeetteEt eigene end saasdennadeganaasescdaeed seat eet Goeres 8 9 Chapter 9 Power Management RE 9 1 Dynamic Power Management 5 i sscscascsassdensesenndcadasaedaaeadagededesuascdescesdantaausbaaceatbebenssdagedenes 9 1 Programmable Power MOd g eier SC enee EE EAR 9 1 Power Management Modes ijsccccsssisassiceadeasyasaddesscceesensavaatisousndeassnccauatedeaceesececedessensates 9 3 Full Power Mode with DPM Disabled c ccsssssessrsesescsesssssesssncesnstesenencssneeceses 9 3 Full Power Mode w
401. miconductor 15 Register Model Table 2 7 HID1 Bit Settings continued Bits Name Description 6 PC6 PLL configuration bit 6 read only 7 31 Reserved should be cleared Note The clock configuration bits reflect the state of the pll_cfg 0 6 signals HID1 can be accessed with mfspr using SPR1009 2 2 3 Hardware Implementation Register 2 HID2 The core implements an additional hardware implementation dependent HID2 register shown in Figure 2 7 which enables cache way locking the HID2 also enables true little endian mode and the new additional BAT registers It is a supervisor only read write implementation specific special purpose register SPR which is accessed as SPR1011 decimal The HID2 bits are shown in Table 2 12 SPR 1011 Access Supervisor read write o 3 4 5 6 7 8 9 10 11 12 13 14 15 a LET IFEB MESI IFEC EBQS EBPX HBE Reset All zeros 16 18 19 20 23 24 26 27 31 Reset All zeros Figure 2 7 HID2 Register Table 2 8 describes the HID2 fields Table 2 8 e300 HID2 Field Descriptions Name Description Reserved should be cleared LET True little endian This bit enables true little endian mode operation for instruction and data accesses This bit is set to reflect the state of the tle signal at the negation of hreset This bit is used in conjunction with MSR LE to determine the endian mode of
402. ming Environments Manual Floating point operations that change exception sticky bits in the FPSCR may suffer a performance penalty When an exception is disabled in the FPSCR and MSR FE 0 updates to the FPSCR exception sticky bits are serialized at the completion stage This serialization may result in a one or two cycle execution delay The penalty is incurred only when the exception bit is changed and not on subsequent operations with the same exception See Chapter 7 Instruction Timing for a full description of completion serialization When an exception is enabled in the FPSCR the instruction traps to the emulation trap interrupt vector without updating the FPSCR or the target FPR The emulation trap interrupt handler is required to complete the instruction The emulation trap interrupt handler is invoked regardless of the FE setting in the MSR The two IEEE floating point imprecise modes defined by the PowerPC architecture when MSR FEO MSR FE1 are treated as precise that is MSR FEO MSR FE1 1 This is regardless of the setting of MSR NI For the highest and most predictable floating point performance all exceptions should be disabled in the FPSCR and MSR For more information about the program exception see the Programming Environments Manual 5 5 7 2 Illegal Reserved and Unimplemented Instructions Program Interrupts In accordance with the PowerPC architecture the e300 core considers all instructions defin
403. mory address space Bus snooping is used to ensure the coherency of global memory with respect to the data cache On a cache miss cache blocks are loaded in four beats of 64 bits each when the core is configured for a 64 bit data bus when the core is configured for a 32 bit bus cache block loads are performed as eight beats of 32 bits each Regardless of the bus size the burst load is performed as a critical double word first operation For data cache loads the data cache is blocked to internal accesses until the load completes The critical double word is written to the cache and forwarded to the requesting unit minimizing stalls due to load delays Note that the cache line being filled cannot be accessed internally until the fill completes The instruction cache allows sequential fetching during a cache block load the instruction cache is blocked only until the cache line load completes Both caches are tightly coupled to the core bus interface unit BIU to allow efficient access to the system memory controller and other bus masters The core load store unit LSU is also directly coupled to the data cache to allow the efficient movement of data to and from the general purpose and floating point registers The core bus interface unit receives requests for bus operations from the instruction and data caches and executes the operations according to the coherent system bus CSB protocol The BIU provides transaction queuing prioritization an
404. most cases the translation is ina TLB and the physical address bits are available to the on chip cache The e300 core implements four more IBAT and four more DBAT entries than the G2 When the EA misses in the TLBs the core provides hardware assistance for software to perform a search of the translation tables in memory The hardware assist consists of the following features e Automatic storage of the missed effective address in IMISS and DMISS e Automatic generation of the primary and secondary hashed real addresses of the page table entry group PTEG which are readable from the HASH and HASH2 register locations The HASH data is generated from the contents of the IMISS or DMISS register The register that is selected depends on the miss instruction or data that was last acknowledged e Automatic generation of the first word of the page table entry PTE of the tables being searched e Areal page address RPA register that matches the format of the lower word of the PTE e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 11 Overview e TLB access instructions tlbli and tlbld that are used to load an address translation into the instruction or data TLBs e Shadow registers for GPRO GPR3 that allow miss code to execute without corrupting the state of any of the existing GPRs Shadow registers are used only for servicing a TLB miss See Section 1 3 5 2 Implementation Specific Memory Managem
405. mplete 3 Instructions after the mispredicted branch are purged 4 Dispatching resumes from the correct path After an execution unit executes an instruction it places resulting data into the appropriate GPR or FPR rename register The results are then stored into the correct GPR or FPR during the write back stage If a subsequent instruction needs the result as a source operand it is made available simultaneously to the appropriate execution unit which allows a data dependent instruction to be decoded and dispatched without waiting to read the data from the register file Branch instructions that update either the LR or CTR write back their results in a similar fashion The following section describes this process in greater detail 7 3 1 General Instruction Flow As many as two instructions can be fetched into the instruction queue IQ in a single clock cycle Instructions enter the IQ and are dispatched to the various execution units from the dispatch queue The IQ is a six entry queue which together with the CQ is the backbone of the master pipeline for the microprocessor The core tries to keep the IQ full at all times The number of instructions requested in a clock cycle is determined by the number of vacant spaces in the IQ during the previous clock cycle This is shown in the examples in this chapter Although the IQ can accept as many as two new instructions in a single clock cycle and even if there are more than two spaces availab
406. mplete and execute stages in the same clock cycle e Stall An occurrence when an instruction cannot proceed to the next stage e Store queue Holds store operations that have not been committed to memory resulting from completed or retired instructions e Superscalar A superscalar processor is one that can dispatch multiple instructions concurrently from a conventional linear instruction stream In a superscalar implementation multiple instructions can be in the same stage at the same time e Throughput A measure of the number of instructions that are processed per cycle For example a series of double precision floating point multiply instructions has a throughput of one instruction per clock cycle e Write back Write back in the context of instruction handling occurs when a result is written from the rename registers into the architectural registers typically the GPRs and FPRs or the store queue e300 Power Architecture Core Family Reference Manual Rev 3 2 Freescale Semiconductor Instruction Timing 7 2 Instruction Timing Overview The e300 core design minimizes average instruction execution latency the number of clock cycles it takes to fetch decode dispatch and execute instructions and make the results available for a subsequent instruction Some instructions such as loads and stores access memory and require additional clock cycles between the execute phase and the write back phase These latencies vary depen
407. msr 31 083 SRU 1 mtmsr 31 146 SRU 2 mtsr 31 210 SRU 2 mtsrin 31 242 SRU 2 mfspr not I DBATs 31 339 SRU 1 mfspr DBATs 31 339 SRU 3 amp mfspr IBATs 31 339 SRU 3 amp mtspr not IBATs 31 467 SRU 2 XER amp mtspr IBATs 31 467 SRU 2 amp mfsr 31 595 SRU 3 amp sync 31 598 SRU 1 amp mfsrin 31 659 SRU 3 amp eieio 31 854 SRU 1 mftb 31 371 SRU 1 e300 Power Architecture Core Family Reference Manual Rev 3 28 Freescale Semiconductor Table 7 2 System Register Instructions continued Instruction Timing Mnemoni Primary Extended Unit Latency Opcode Opcode in Cycles mttb 31 467 SRU 1 Note Cycle times marked with amp require a variable number of cycles due to serialization Table 7 3 provides the latencies for the condition register logical instructions Table 7 3 Condition Register Logical Instructions Mnemonic Primary Extended Unit Latency Opcode Opcode in Cycles merf 19 000 SRU 1 crnor 19 033 SRU 1 crandc 19 129 SRU 1 crxor 19 193 SRU 1 crnand 19 225 SRU 1 crand 19 257 SRU 1 creqv 19 289 SRU 1 crore 19 417 SRU 1 cror 19 449 SRU 1 mfcr 31 019 SRU 1 mtcerf 31 144 SRU 1 merxr 31 512 SRU 1 amp Note Cycle times marked with amp require a variable number of cycles due to serialization Table 7 4 provides the latencies for the integer instructions Table 7 4 Integer Instructions
408. n OV Overflow 0 Counter has not reached an overflow state 1 Counter has reached an overflow state 1 31 Counter Value Indicates the number of occurrences of the specified event The minimum counter value is 0x0000_0000 4 294 967 295 OxFFFF_FFFF is the maximum A counter can increment by 0 1 or 2 up to the maximum value and then wraps to the minimum value A counter enters overflow state when the high order bit is set by entering the overflow state at the halfway point between the minimum and maximum values A performance monitor interrupt handler can easily identify overflowed counters even if the interrupt is masked for many cycles during which the counters may continue incrementing A high order bit is set normally only when the counter increments from a value below 2 147 483 648 Ox8000_0000 to a value greater than or equal to 2 147 483 648 Ox8000_0000 NOTE Initializing PMCs to overflowed values is strongly discouraged If an overflowed value is loaded into a PMCx that held a non overflowed value and PMGCO PMIE PMLCan CE and MSR EE are set an interrupt is generated before any events are counted The response to an overflow depends on the configuration as follows e IfPMLCan CE is clear no special actions occur on overflow the counter continues incrementing and no exception is signaled e If PMLCan CE and PMGCO FCECE are set all counters are frozen when PMCn overflows e I
409. n Each cache block contains eight contiguous words from memory that are loaded from an 8 word boundary that is bits A 27 31 of the effective addresses are zero thus a cache block never crosses a page boundary Misaligned accesses across a page boundary can incur a performance penalty The e300 core cache blocks are loaded in four beats of 64 bits each on the 64 bit data bus The burst load is performed as critical double word first The data cache is blocked to internal accesses until the load e300 Power Architecture Core Family Reference Manual Rev 3 24 Freescale Semiconductor Overview completes the instruction cache allows sequential fetching during a cache block load In the core the critical double word is simultaneously written to the cache and forwarded to the requesting unit thus minimizing stalls due to load delays To ensure coherency among caches in a multiprocessor or multiple caching device implementation the core implements the MEI protocol during normal operation of the data cache The new data cache MESI extension supports the additional fourth cache coherency shared state for the data cache To support this feature the shared signal shd has been added to the bus interface The following four states indicate the state of the cache block e Modified The cache block is modified with respect to system memory that is data for this address is valid only in the cache and not in system memory e Exclusive
410. n The number coded is split into two 5 bit halves that are reversed in the instruction encoding with the high order 5 bits appearing in bits 16 20 of the instruction encoding and the low order 5 bits in bits 11 15 If the SPR field contains any value other than one of the values shown in Table 3 32 either the program interrupt handler is invoked or the results are boundedly undefined Table 3 32 Implementation Specific SPR Encodings mfspr SPR Register Name Access Decimal spr 5 9 spr 0 4 1 00000 00001 XER User 8 00000 01000 LR User 9 00000 01001 CTR User 18 00000 10010 DSISR Supervisor 19 00000 10011 DAR Supervisor 22 00000 10110 DEC Supervisor 25 00000 11001 SDR1 Supervisor 26 00000 11010 SRRO Supervisor 27 00000 11011 SRR1 Supervisor 58 00001 11010 CSRRO Supervisor 59 00001 11011 CSRR1 2 Supervisor e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 29 Instruction Set Model Table 3 32 Implementation Specific SPR Encodings mfspr continued SPR Register Name Access Decimal spr 5 9 spr 0 4 272 01000 10000 SPRGO Supervisor 273 01000 10001 SPRG1 Supervisor 274 01000 10010 SPRG2 Supervisor 275 01000 10011 SPRG3 Supervisor 276 01000 10100 SPRG4 2 Supervisor 277 01000 10101 SPRGS5 Supervisor 278 01000
411. n be issued e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 9 Overview back to back The 32 FPRs are provided to support floating point operations Stalls due to contention for FPRs are minimized by the automatic allocation of rename registers The core writes the contents of the rename registers to the appropriate FPR when floating point instructions are retired by the completion unit The e300c2 does not include an FPU and does not support floating point operations The e300c1 and e300c3 core support all floating point data types based on the IEEE 754 standard normalized denormalized NaN zero and infinity in hardware eliminating the latency incurred by software interrupt routines 1 1 3 3 Load Store Unit LSU The LSU executes all load and store instructions and provides the data transfer interface between the GPRs FPRs and the cache memory subsystem The LSU calculates effective addresses performs data alignment and provides sequencing for load store string and multiple instructions Load and store instructions are issued and executed in program order however the memory accesses can occur out of order Synchronizing instructions are provided to enforce strict ordering Cacheable loads when free of data bus dependencies can execute out of order with a maximum throughput of one per cycle and with a two cycle total latency Data returned from the cache is held in a rename register unti
412. n the rehashed secondary HTEG or in the range of a DBAT register otherwise cleared 2 3 Cleared 4 Set if a memory access is not permitted by the page or BAT protection mechanism otherwise cleared 5 Cleared 6 Set for a store operation and cleared for a load operation 9 Set if a data address breakpoint interrupt occurs when the data bit 29 in the DABR or DABR2 matches the next data access load or store instruction to complete in the completion unit The different breakpoints are enabled as follows e Write breakpoints enabled when DABR 30 is set e Read breakpoints enabled when DABR 31 is set 7 31 Cleared DAR Set to the effective address of a memory element as described in the following list e A byte in the first word accessed in the page that caused the DSI interrupt for a byte half word or word memory access e A byte in the first word accessed in the BAT area that caused the DSI interrupt for a byte half word or word access to a BAT area e A byte in the block that caused the interrupt for icbi dcbz dcbst dcbf or debi instructions e The EA that causes a data breakpoint DSI interrupts can occur for any of the following reasons e The instruction is not supported for the type of memory addressed e The attempted access violates the memory protection defined by SR Ks Kp PTE PP or DBATn PP e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 23 Interrupts
413. n time The e300 core also offers an optional pipeline extension to one and a half level pipelining which means that a new transaction can complete an address tenure when the previous transaction has been granted to the data bus for the G2_LE core a new transaction must wait until the previous data tenure has completed before completing its address tenure Accesses are prioritized with load operations preceding store operations Instructions are automatically fetched from the memory system into the instruction unit where they are dispatched to the execution units or forwarded to the branch processing unit at a peak rate of three instructions per clock Conversely load and store instructions explicitly specify the movement of operands to and from the general purpose and floating point registers GPRs and FPRs and the memory system Note that the e300c2 core does not support floating point registers When the e300 core encounters an instruction or data access it calculates the logical address effective address and uses the low order address bits to check for a hit in the on chip instruction or data caches During cache lookup the instruction and data memory management units MMUs use the higher order address bits to calculate the virtual address allowing them to calculate the physical address real address The physical address bits are then compared with the corresponding cache tag bits to determine if a cache hit occurred If the access misses in
414. n units including an integer unit IU a floating point unit FPU a branch processing unit BPU a load store unit LSU and a system register unit SRU The e300c2 and e300c3 integrate an additional integer unit for a total of two IUs Note that the e300c2 does not include an FPU The ability to execute instructions in parallel and the use of simple instructions with rapid execution times yield high efficiency and throughput for e300 core based systems Most integer instructions execute in one clock cycle The additional Us along with enhanced multipliers in the e300c2 and e300c3 improve multiply instructions to a maximum two cycle latency a significant improvement from previous processors In the e300c1 and e300c3 core the FPU is pipelined so a single precision multiply add instruction can be issued and completed every clock cycle The e300c1 and e300c3 core provides hardware support for all single and double precision floating point operations for most value representations and all rounding modes e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Overview Figure 1 1 shows a block diagram of the e300c1 core R Sequential Fetcher 64 Bit Instruction Queue Ld Ld Ld Ld LT System Register Unit Dispatch Unit J Integer GPR GP Rename Unit Registers Completion Unit Power Time Base Dissipation Counter Control Decrementer 32 Kbyte JTAG Debug Clock D Cache I
415. naffected by instructions in the execution pipeline Instructions issued beyond a predicted branch cannot complete execution until the branch is resolved preserving the programming model of sequential execution If any of these are branch instructions they are decoded but not issued Instructions to be executed by the FPU IU LSU and SRU are issued and allowed to progress up to the register write back stage Write back is allowed when a correctly predicted branch is resolved and execution continues along the predicted path If branch prediction is incorrect the instruction unit flushes all predicted path instructions and instructions are issued from the correct path 1 1 2 1 Instruction Queue and Dispatch Unit The instruction queue IQ shown in Figure 1 1 Figure 1 2 and Figure 1 3 holds as many as six instructions and loads up to two instructions from the instruction unit during a single cycle The instruction fetch unit continuously loads as many instructions as space in the IQ allows Instructions are dispatched to e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Overview their respective execution units from the dispatch unit at a maximum rate of two instructions per cycle Dispatching is facilitated to the Us FPU LSU and SRU by the provision of a reservation station at each unit The dispatch unit performs source and destination register dependency checking determines dispatch seria
416. nal bus activity Cycle times marked with a specify cycles of total latency and throughput Load and store multiple and string instruction cycles are shown as a fixed number of cycles plus a variable number of cycles where n is the number of words accessed by the instruction e300 Power Architecture Core Family Reference Manual Rev 3 Instruction Timing Freescale Semiconductor 35 Instruction Timing e300 Power Architecture Core Family Reference Manual Rev 3 36 Freescale Semiconductor Chapter 8 Core Interface Operation This chapter provides a general description of the coherent system bus CSB which is the interface between the core and the integrating device Because most of the behavior of the CSB is not directly programmable or even visible to the user this chapter does not attempt to describe all aspects of the CSB or even the most important CSB signals Instead it describes mostly those aspects of the CSB that are configurable or that provide status information through the programming interface It provides a glossary of those signals that are referenced in other chapters to offer a clearer understanding of how the core is integrated as part of a larger device 8 1 NOTE A bar over a signal name indicates that the signal is active low for example hreset hardware reset and int external interrupt Active low signals are referred to as asserted active when they are low and negated when they are hig
417. nch Conditional to Count Register bectr bectrl BO BI Branch Conditional to Link Register belr bell BO BI 3 2 4 4 3 Condition Register Logical Instructions Condition register logical instructions shown in Table 3 22 and the Move Condition Register Field merf instruction are also defined as flow control instructions although they are executed by the system register unit SRU Most instructions executed by the SRU are completion serialized to maintain system state that is the instruction is held for execution in the SRU until all prior instructions issued have completed e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 23 Instruction Set Model Table 3 22 Condition Register Logical Instructions Name Mnemonic Operand Syntax Condition Register AND crand crbD crbA crbB Condition Register AND with Complement crandc crbD crbA crbB Condition Register Equivalent creqv crbD crbA crbB Condition Register NAND crnand crbD crbA crbB Condition Register NOR crnor crbD crbA crbB Condition Register OR cror crbD crbA crbB Condition Register OR with Complement crorc crbD crbA crbB Condition Register XOR crxor crbD crbA crbB Move Condition Register Field merf crfD crfS Note that if the LR update option is enabled for any of these instructions these forms of the instructions are invalid in the e300 core 3 2 4 5 Trap Instructions The trap instructions
418. nctions primary secondary PTEG 6 28 6 29 instruction accesses IMMU 6 1 instructions 6 16 interrupts 6 14 memory protection 6 9 overview 1 5 1 11 1 29 page address translation 6 8 6 11 6 25 page history status 6 10 6 20 6 23 page table search operation 6 25 software table searches 6 29 6 34 6 35 physical address generation 6 1 registers 6 16 6 29 6 33 data TLB compare register DCMP 2 18 6 32 6 34 data TLB miss address reg DMISS 2 18 6 31 6 34 instruction TLB compare register ICMP 2 18 6 32 6 34 instruction TLB miss address reg IMISS 2 18 6 31 6 34 primary secondary hash addr HASH1 HASH2 6 32 temporary GPRs enabling with MSR TGPR 5 12 segmented memory model 6 19 TLBs translation lookaside buffers 5 5 5 33 5 34 description 6 23 TLB management instructions 3 33 A 24 TLB invalidate bie instruction 6 25 6 44 A 24 TLB miss interrupts data TLB miss on load 5 5 5 33 data TLB miss on store 5 5 5 34 instruction TLB miss 5 5 5 33 Memory model access ordering 4 14 addressing 3 7 guarded bit G bit 4 7 memory coherency required M bit 4 7 gbl internal signal 8 2 timing considerations 7 25 performance considerations 7 24 W I and M bit combinations 4 8 write through mode W bit 6 16 timing considerations 7 25 write through stores 4 6 wt internal signal 8 2 Mnemonics simplified recommended 3 33 MSR machine state register 1 19 2 7 2 11 5 12 cache loc
419. nd branch if CTR 0 Now flush the cache with dcbf instructions li r6 0x0 Address of first block mtctr r8 Number of blocks loop2 deht CU r6 addi Ee Lo 32 Find the next block bdnz loop2 Decrement the counter and branch if CTR 0 If the content of the data cache does not need to be flushed to memory the cache can be directly invalidated The entire data cache is invalidated through the data cache flash invalidate bit HIDO DCFI bit 21 Setting HIDO DCFI and then immediately clearing it causes the entire data cache to be invalidated The following assembly code invalidates the entire data cache does not flush modified entries Set and then clear the HIDO DCFI bit bit 21 mfspr rl HIDO mr Se ET ori rl rl 0x0400 sync isync mtspr HIDO rl mtspr HIDO r2 isync 4 10 3 1 5 Loading the Data Cache This section explains loading data into the data cache The data cache can be loaded in several ways The example in this document loads the data from memory The following assembly code loads the data cache Assuming interrupts are turned off cache has been flushed MMU on and loading from contiguous caching allowed memory r6 Starting address of code to lock r20 Temporary register for loading into CTR Number of cache blocks to lock loop lwz r20 O r6 Load data into d cache addi ro fr 32 Find next block to load bdnz loop CTR CTR 1 branch if CTR 0 e300 Power A
420. nd continued UISA VEA OEA Supervisor Level Optional 64 Bit Form a el Iha Ihau Ihaux Ihax Ihbrx Ihz Ihzu Ihzux Ihzx 4 Imw Iswi 4 4 x XxX O KX XxX o UO X X X Iswx 1 ean g We lwa lwarx 1 lwaux Iwax lwbrx lwz lwzu lwzux x x UO UO XxX X X Xx lwzx x lt Z mcrf mcrfs merxr mfcr L ay 2 e ej 2 2 2 lt Sp Se SB lt 2 ej e e ej OS 2 e 2 2 mffsx 2 X XJ Kx X Xx mfmsr mfspr gt d XFX mfsr 2 2 il ay ala lt lt lt a lt a mfsrin mftb y XFX XFX mtcrf mtfsb0x mtfsb1x pon lt lt x lt mtfsfx XFL e300 Power Architecture Core Family Reference Manual Rev 3 38 Freescale Semiconductor mtfsfix mtmsr 2 mtspr gt mtsr 2 mtsrin 2 Table A 45 PowerPC Instruction Set Legend continued Instruction Set Listings UISA VEA OEA Supervisor Level Optional 64 Bit Form d D V V xX d V V XFX V V x d V x mulhwx mulhwux mulli mullwx nandx negx norx orx orcx ori oris rlwimix rlwinmx rlwnmx sc SE XO a XO lt lt lt lt lt lt lt a a e300 Power Architecture Cor
421. nd IABR2 Bit Settings 2 24 Instruction Address Breakpoint Control Registers IBCR ceesceeeseeeereeceeeeeceeeeeenteeeesaes 2 24 Data Address Breakpoint Registers DABR and DABR2 Bit Settings 2 26 Data Address Breakpoint Control Registers ODBCHR 2 27 BAC ati Mode ee EE 3 2 Memory Operands 3 cise Bede eee ei eG SSG tlt RI a 3 2 Integer Arithmetic InstuctOns 25 424 o oer Be wd Sa ao ee BA eg 3 10 Integer Compare ee ee 3 11 Integer Logical IMstru ction eewer SEENEN NEEN 3 12 Integer ROS MSU COM session n e EEGEN 3 13 Integer Shift Instr etn eege Eeer as ye ated eds acca needa oe 3 13 Floating Point Arithmetic Instructions egene ed eer 3 14 Floating Point Multiply Add Instructions c cc eeccessseeesseceeseeceeseeeesaeceesaeceeaeeeeaeeeeaeeneas 3 14 Floating Point Rounding and Conversion Instructpons 3 15 Floating Point Compare INstructOns EE 3 15 Floating Point Move Instructions eeeesesssecssecsseeesseesseecsseesseeesaeecsaeceseeeeecsaecsseesseeesaees 3 16 MMS SET LG AIMS EENEG 3 17 Integer Store DEENEN Eeer 3 18 Integer Load and Store with Byte Reverse Instructions 00 0 0 ceeceeseeseceeeeeeeecsneeeseeeseeeenees 3 19 Integer Load and Store Multiple Instructions 0 0 0 cee eee esseeeeeeneeeseeceseeeeeeeeseecsaeenseesseeennees 3 20 Integer Load and Store String Instructions 0 eee ee cess eeseeeeceseeceeeceseceseeseseecaecsseesseeeenees 3 20 Floating Point Load Instructions c c scissvesoctssccessctuess
422. nd PMGCO PMIE are set while interrupts are disabled MSR EE is clear and freezing of the counters is not enabled PMGCO FCECE is clear PMCn can wrap around to all zeros again without the performance monitor interrupt being taken e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 9 Performance Monitor 11 4 Event Counting This section describes configurability and specific unconditional counting modes 11 4 1 Processor Context Configurability Counting can be enabled if conditions in the processor state match a software specified condition Because a software task scheduler may switch a processor s execution among multiple processes and because statistics on only a particular process may be of interest a facility is provided to mark a process The performance monitor mark bit MSR PMM is used for this purpose System software may set this bit when a marked process is running This enables statistics to be gathered only during the execution of the marked process The states of MSR PR PMM together define a state that the processor supervisor or user and the process marked or unmarked may be in at any time If this state matches an individual state specified by the PMLCan FCS FCU FCM1 FCMO fields the state for which monitoring is enabled counting is enabled for PMCn The processor states and the settings of the FCS FCU FCM1 and FCMO fields in PMLCan necessary to enable monitoring of eac
423. nd the content of the MSR is saved into CSRR1 An additional rfci instruction is implemented for supporting the return from a critical interrupt selecting the CSRRO and CSRR1 registers Four additional interrupt handling SPRG registers which are provided for operating system use A new system version register SVR See Section 2 2 12 System Version Register SVR for bit definitions System memory base address MBAR is a new implementation specific register for the G2_LE core It supports a system level memory map See Section 2 2 13 System Memory Base Address MBAR for more information One new instruction address breakpoint control register IBCR two new data address breakpoint registers DABR DABR2 and one new data address breakpoint control register DBCR are implemented in the e300 processor core All of these new registers as well as the IABR2 instruction address breakpoint register 2 are implementation specific and are described in the Section 2 2 14 Instruction Address Breakpoint Registers IABR and IABR2 and Section 2 2 16 Data Address Breakpoint Register DABR and DABR2 Performance monitor registers are available in the e300c3 The performance monitor counter registers PMCO PMC3 are 32 bit counters used to count software selectable events Each counter counts up to 128 events UPMCO UPMC 3 provide user level read access to these registers Reference events are those that should b
424. ndependent Execution Units The PowerPC architecture s support for independent execution units allows implementation of processors with out of order instruction execution For example because branch instructions do not depend on GPRs or FPRs branches can often be resolved early eliminating stalls caused by taken branches The four other execution units and the completion unit are described in the following sections 1 1 3 1 Integer Unit IU The IU executes all integer instructions The IU executes one integer instruction at a time performing computations with its arithmetic logic unit ALU multiplier divider and XER register Most integer instructions are single cycle instructions The 32 GPRs hold integer operands Stalls due to contention for GPRs are minimized by the automatic allocation of rename registers The core writes the contents of the rename registers to the appropriate GPR when integer instructions are retired by the completion unit The e300c2 and e300c3 provide two integer units for greater integer instruction throughput along with enhanced multipliers in each IU for faster multiply instruction execution 1 1 3 2 Floating Point Unit FPU The FPU contains a single precision multiply add array and the floating point status and control register FPSCR The multiply add array allows the core to efficiently implement multiply and multiply add operations The FPU is pipelined so that single and double precision instructions ca
425. nding bit positions of SRR1 The e300 core loads SRR1 with specific bits for handling machine check interrupts as shown in Table 5 4 Table 5 4 SRR1 Bit Settings for Machine Check Interrupts Bits Name Description 0 9 Cleared 10 ICPE Instruction cache parity error 11 DCPE Data cache parity error 12 MCP Machine check 13 TEA TEA error 14 DPE Data parity error 15 APE Address parity error 16 29 MSR 16 29 Copy of MSR bits16 29 30 MSR 30 Cleared for instruction cache parity error data cache parity error tea dpe ape copied from MSR 30 for mcp If mcp and tea are asserted simultaneously then SRR1 30 is cleared and the interrupt is not recoverable 31 MSR 31 Copy of MSR 31 The e300 core loads SRR1 with specific bits for handling program interrupts as shown in Table 5 5 Table 5 5 SRR1 Bit Settings for Program Interrupts Bits Name Description 0 10 Cleared 11 Floating point enabled program exception Otherwise cleared 12 Illegal instruction program exception Otherwise cleared 13 Privileged instruction program exception Otherwise cleared 14 Trap program exception Otherwise cleared 15 Set if SRRO contains the address of a subsequent instruction Cleared if SRRO contains the address of the instruction causing the exception condition 16 29 MSR 16 29 Copy of MSR bits16 29 30 MSR 30 Cleared for instruction cache parity error data
426. ndition should synthesize an interrupt by setting the appropriate bits in the DSISR or SRR1 and branching to the ISI or DSI handler Refer to Section 6 5 2 Implementation Specific Table Search Operation for more information and examples of this interrupt software The remainder of this chapter assumes that the table search software emulates this interrupt and refers to this condition as an interrupt The translation exception conditions defined by the OEA for 32 bit implementations cause either the ISI or the DSI interrupt to be taken as shown in Table 6 3 Table 6 3 Translation Exception Conditions Exception Condition Description Interrupt Page fault no PTE found No matching PTE found in page tables and no access ISI interrupt matching BAT array entry SRR1 1 1 D access DSI interrupt DSISR 1 1 Block protection violation Conditions described for block in Block Memory access ISI interrupt Protection in Chapter 7 Memory Management in SRR1 4 1 the Programming Environments Manual D access DSI interrupt DSISR 4 1 Page protection violation Conditions described for page in Page Memory access ISI interrupt Protection in Chapter 7 Memory Management in SRR1 4 1 the Programming Environments Manual 7 D access DSI interrupt DSISR 4 1 e300 Power Architecture Core Family Reference Manual Rev 3 14 Freescale Semiconductor Memory
427. neeceeececeeeeeceeeeeenteeeesaes 3 14 Floating Point Rounding and Conversion Instructions eeeesseeeseeeeeeeeeee 3 15 Floating Point Compare Instructions 00 0 0 ce seeeseeesceeseeceeceseeeeseecsaeceseenseeesnees 3 15 Floating Point Status and Control Register Instructions 0 0 0 0 eeeeeeeeeeeeeeeee 3 15 Floating Point Move Instructions 0 ceecceessceceenceceeeeeceeeeeceeneeceeneeceeeeeeneeeesaes 3 16 Load and Store Instructions steet geed 3 16 SMod e 3 17 Integer Load and Store Address Generaton 3 17 Register Indirect Integer Load Instructions sssssssssseseseeesssessesssereseeessseesseese 3 17 Integer Store INStrulCllOns siccssisasvadsassaczesasesesastavcvaaedassaveasiseeva dca shaccdve Seceeutaawerees 3 18 Integer Load and Store with Byte Reverse Instructions cesses eeeeeeereeeneee 3 19 Integer Load and Store Multiple Instructions 00 eee eeeceeeseceesteeeeeeeeenteeeenaees 3 19 Integer Load and Store String Instructions 0 eee cess esseceeeeeeeeseecnseeeseeeenees 3 20 Floating Point Load and Store Address Generaton 3 21 Floating Point Load Instructions ii cscnsesensnesanscecnseesssnevensstesanceg saenaedseegeeanseenees 3 21 Floating Point Store Instructions ceeeecesececesececeeececeeceeceeececeeeeeeeeeeeeneeeees 3 22 Branch and Flow Control Instructons ee eeseessecesecsseeeseeeeecsaeceseesaeecsaeenseeees 3 22 Branch Instruction Address Calculaton cc ceeceeccessesseceseeeeeecsaecnseeseee
428. ngs Figure 8 1 shows the e300 core signals groups in greater detail Note that the pm_event_in signal is only found on the e300c3 Master Address Bus Clock Control Test Interface JTAG 8 1 2 Selected Core Interface Signals e300 Core 8 gt gbi cint lt ic _int lt lt lt wt am gt tea DCH IL sreset lt _ hreset lt ____ ckstp lt ___ gt pm_event_in pll_cfg 0 6 lt lt clk out gt tck gt tdi lt x tdo tms irst timsel tap_en core_disable tben ken greq m gt gack m tlbisync ken Figure 8 1 Core Interface Signals Signal Summary Interrupts Reset Core Status and Control Table 8 1 provides alphabetically ordered e300 core signals with related cross references that are relevant to the user It details the signal name signal grouping number of signals and whether the signal is an input or an output Table 8 1 Summary of Selected Internal Signals Signal HO Comments or Meaning when Asserted Bus Signals Master Address Bus Cache inhibit Normally reflected from the bit of the WIMG bits regardless of whether the cache is enabled _ For burst writes and address only transactions ci is always negated gbi O Global Normally reflected from the M bit of the WIMG bits asserted indicates transaction is enabled for snooping by other masters For burst writes always nega
429. ning stale values could exist in a system resulting in errors when the stale values are used This section describes the coherency mechanisms of the PowerPC architecture and the cache coherency protocol that the data cache supports Note that unless specifically noted the discussion of coherency in this section applies to the data cache only The instruction cache is not snooped Instruction cache coherency must be maintained by software e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Instruction and Data Cache Operation 4 4 1 Memory Cache Access Attributes WIMG Bits Some memory characteristics can be set on either a block or page basis by using the WIMG bits in the BAT registers or page table entry PTE respectively The WIMG attributes control the following functionality e Write through W bit e Caching inhibited I bit e Memory coherency M bit e Guarded memory G bit The WIMG attributes are programmed by the operating system for each page and block The W and I attributes control how the processor performing an access uses its own cache The M attribute ensures that coherency is maintained for all copies of the addressed memory location The G attribute prevents speculative referred to as out of order in the architecture specification loading and prefetching from the addressed memory location These bits allow both uniprocessor and multiprocessor system designs to exploit n
430. nly RWITM Read with Intent to Modify snoop w push RO bus request performs a copy back if the block is in the modified state snoop w flush RWITM bus request performs a copy back if the block is in the modified state snoop whkill broadcasts an invalidate on the CSB no copy back if the block is in the modified state Figure 4 6 MESI Cache Coherency Protocol State Diagram WIM 001 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 11 Instruction and Data Cache Operation 4 4 2 3 Table 4 4 provides a summary of memory coherency actions performed by the e300 core on load operations Caching inhibited cases are not considered in this table Load and Store Coherency Summary Table 4 3 Memory Coherency Actions on Load Operations Cache State Tranenchen External Response Action Generated M None Don t care Read from cache E None Don t care Read from cache s None Don t care Read from cache READ No response Load data and mark exclusive READ Share signaled Load data and mark shared READ Retry signaled Retry read operation 1 MESI mode only Table 4 4 provides an overview of memory coherency actions on store operations This table does not include caching inhibited or write through cases Table 4 4 Memory Coherency Actions on Store Operations Cache State Diener External Response Action M None Don t care Modi
431. nnnn is the offset of the interrupt See Table 5 2 O Interrupts are vectored to the physical address 0x000n_nnnn 1 Interrupts are vectored to the physical address OxFFFn_nnnn 26 Instruction address translation 0 Instruction address translation is disabled 1 Instruction address translation is enabled See Chapter 6 Memory Management 27 DR Data address translation 0 Data address translation is disabled 1 Data address translation is enabled See Chapter 6 Memory Management 28 29 Reserved Full function Bit 29 reserved on e300c1 and e300c2 only 29 PMM Performance monitor mark bit e300c3 only System software can set PMM when a marked process is running to enable statistics to be gathered only during the execution of the marked process MSR PR and MSR PMM together define a state that the processor supervisor or user and the process marked or unmarked may be in at any time If this state matches an individual state specified in the PMLCan the state for which monitoring is enabled counting is enabled e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Interrupts and Exceptions Table 5 8 MSR Bit Settings continued Bits Name Description 30 RI Recoverable interrupt for system reset and machine check interrupts 0 Interrupt is not recoverable 1 Interrupt is recoverable 31 LE Little endian mode enable 0 Th
432. ns and interrupts can be explicitly enabled or disabled by software The PowerPC architecture requires that interrupts be handled in program order therefore although a particular implementation may recognize interrupt conditions out of order they are handled strictly in order with respect to the instruction stream When an instruction caused interrupt is recognized any unexecuted instructions that appear earlier in the instruction stream including any that have not yet entered the execute state are required to complete before the interrupt is taken Any interrupts caused by those instructions are handled first Likewise interrupts that are asynchronous and precise are recognized when they occur but are not handled until the instruction currently in the completion stage successfully completes execution or generates an interrupt and the completed store queue is emptied see Section 7 1 Terminology and Conventions for the definition An instruction is said to have completed when the results of that instruction s execution have been committed to the registers defined by the architecture for example the GPRs or FPRs rather than rename buffers If a single instruction encounters multiple exception conditions those exceptions are taken and handled sequentially Likewise interrupts that are asynchronous are recognized when they occur but are not handled until the next instruction to complete in program order successfully completes Througho
433. ns can be performed to the cache on the clock cycle immediately following a snoop access if the snoop misses Snoop hits may block the data cache for two or more cycles depending on whether a copy back to main memory is required The replacement algorithm is a PLRU algorithm that is the pseudo least recently used block is filled with new data on a cache miss 4 3 Instruction Cache Organization The e300c1 instruction cache also consists of 128 sets of eight blocks per set The e300c1 instruction cache organization is shown in Figure 4 3 128 Sets Block 0 Block 1 Address Tag 1 Words 0 7 Block 2 Address Tag 2 Words 0 7 Block 3 Address Tag 3 Words 0 7 Block 4 Address Tag 4 Words 0 7 Block 5 Address Tag 5 Words 0 7 Block 6 Address Tag 6 Words 0 7 Block 7 Address Tag 7 Words 0 7 Le 8 Words Block _________ Figure 4 3 e300c1 Instruction Cache Organization e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Instruction and Data Cache Operation The e300c2 and e300c3 instruction cache is configured as 128 sets of four blocks per set The organization of the e300c2 and e300c3 instruction cache is shown in Figure 4 4 128 Sets Block 0 Block 1 Words 0 7 Block 2 Address Tag 2 Words 0 7 Block 3 Address Tag 3 Words 0 7 lt lt
434. ns grouped by function Table A 3 Integer Arithmetic Instructions Name 0 56 7 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 addx 31 D A B OE 266 Re addcx 31 A B OE 10 Re addex 31 D A B OE 138 Rc addi 14 D A SIMM addic 12 D A SIMM addic 13 D A SIMM addis 15 D A SIMM addmex 31 D A 00000 OE 234 Rc addzex 31 D A 00000 OE 202 Rc divdx 31 D A B OE 489 Rc divdux 31 D A B OE 457 Rc divwx 31 D A B OE 491 Rc divwux 31 D A B OE 459 Rc mulhdx 31 D A B 0 73 Rc mulhdux 31 D A B 0 9 Rc mulhwx 31 D A B 0 75 Re mulhwux 31 D A B 0 11 Re mulld 1 31 D A B OE 233 Rc mulli 07 D A SIMM mullwx 31 D A B OE 235 Rc negx 31 D A 00000 OE 104 Rc subfx 31 D A B OE 40 Rc subfcx 31 D A B OE 8 Rc subficx 08 D A SIMM subfex 31 D A B OE 136 Rc subfmex 31 D A 00000 OE 232 Rc subfzex 31 D A 00000 OE 200 Rc 1 64 bit instruction e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 15 Instruction Set Listings Table A 4 Integer Compare Instructions Name 0 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 cmp 31 CD JOJL A B 0000000000 0 cmpi 11 cfiD O L A SIMM cmpl 31 cfiD O L A B 32 0 cmpli 10 cfiD O L A UIMM Table A 5 Integer Logical Instructions
435. ns of W I and MB 4 8 4 2 MEVMESI State Definition S nia e ee EE EE 4 9 4 3 Memory Coherency Actions on Load Operations sesesssesessseesseesesseeesseeesstessersseesseessees 4 12 4 4 Memory Coherency Actions on Store Operations 00 0 0 eeeeeeeesseecsseceseceseeeeaeecsaecneeeseeeenees 4 12 4 5 e300c1 PLRU Replacement Way Selection sisesciic cseccisavaccecssavscaccuseacades EEN teceeuteaweracs 4 23 4 6 e300c2 PLRU Replacement Way Selection w occisccsceglesdssdeseeaseensttessodtenons lt edaghanvanteed inde nnetoenace 4 23 4 7 PLRU Bit Update Rules issiasecccsegassvenseccinnivcesssaniadaudagscesanegedssvesGeaaes snaenn REENEN 4 25 4 8 e300 Bus Operations Caused by Cache Control Instructions 0 ceeceeeseeeeeeeeeeeeeeeeeeeeeeaes 4 29 4 9 Snoop Response to CSB Transactions E 4 30 4 10 AAS ATI AEN nen a a a aie uaa nd eeu 4 33 4 1 HIDO Bits Used to Perform Cache Locking cece ceeeeesseceececeeeeeceeneeceeeeeceeneeceeeeeeeeeenaes 4 33 4 12 HID2 Bits Used to Perform Cache Way locking cceecccessseceeseeeeeseceenaeceenaeeeeaeeeenaeeeaes 4 34 4 13 MSR Bits Used to Perform Cache Locking 0 cece eessecesseecesseeceeeeeceseeeceeneeceeeeceeeeeenteeeenaes 4 34 4 14 Example BAT Settings TE 4 35 4 15 MSR Bits for Disabling Interrupts eet Abee 4 36 4 16 e300c1 Core DWLCK 0 2 Ee ee ee dde 4 38 4 17 e300c2 and e300c3 Core DWLCK 0 2 Encodmgs eee eeecceesseceeseeceeseeeenaeeeenaeeeeaeeeens 4 39 4 18 Example BAT Settings for Cache
436. nsists of 4 bit fields CRO CR7 that reflect the results of certain arithmetic operations and provides a mechanism for testing and branching Floating point status and control register FPSCR The FPSCR contains all floating point exception signal bits exception summary bits exception enable bits and rounding control bits needed for compliance with the IEEE 754 standard Figure 2 2 shows the bit fields of the FPSCR The FPSCR is not supported on the e300c2 core For more detailed information on the FPSCR see the Programming Environments Manual Access User read write 2 3 4 5 6 7 8 9 10 11 12 13 14 15 19 20 21 22 23 24 25 26 27 28 29 30 31 FX FEX VX OX UX ZX XX VXSNAN VXISI VXIDI VXZDZ VXIMZ VXVC FR FI FPRF VXSOFT VXSQRT VXCVI VE OE UE ZE XE NI RN All zeros Figure 2 2 Floating Point Status and Control Register FPSCR e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Register Model FPSCR bits are described in Table 2 1 Table 2 1 FPSCR Bit Settings Bits Name Description 0 FX Floating point exception summary Every floating point instruction except mtfsfi and mtfsf implicitly sets FX if that instruction causes any FPSCR floating point exception bit to transition from 0 to 1 The merfs mtfsfi
437. nsitions Bus master The owner of the address or data bus the device that initiates or requests the transaction C Cache High speed memory containing recently accessed data or instructions subset of main memory Cache block A small region of contiguous memory that is copied from memory into a cache The size of a cache block may vary among processors the maximum block size is one page In PowerPC processors cache coherency is maintained on a cache block basis Note that the term cache block is often used interchangeably with cache line Cache coherency An attribute wherein an accurate and common view of memory is provided to all devices that share the same memory system Caches are coherent if a processor performing a read from its cache is supplied with data corresponding to the most recent value written to memory or to another processor s cache e300 Power Architecture Core Family Reference Manual Rev 3 Glossary 2 Freescale Semiconductor Cache flush An operation that removes from a cache any data from a specified address range This operation ensures that any modified data within the specified address range is written back to main memory This operation is generated typically by a Data Cache Block Flush debf instruction Caching inhibited A memory update policy in which the cache is bypassed and the load or store is performed to or from main memory Cast out A cache block that must be written to memory when
438. nslation and is similar to page address translation however fewer higher order effective address bits are translated into physical address bits more lower order address bits at least 17 are untranslated to form the offset into a block Also instead of segment descriptors and a TLB block address translations use the on chip BAT registers as a BAT array If an effective address matches the corresponding field of a BAT register the information in the BAT register is used to generate the physical address in this case the results of the page translation occurring in parallel are ignored even if the segment corresponds to the direct store interface space e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Memory Management Effective Address Segment Descriptor Match with BAT Located Registers T 0 Page Address Block Address Translation Translation see Section 6 3 Block Address Translation T 1 51 Virtual Address Direct Store Interface Translation Look Up in Page Table DSI ISI Interrupt 0 31 0 31 Physical Address Physical Address Figure 6 4 Address Translation Types 6 1 4 Memory Protection Facilities In addition to the translation of effective addresses to physical addresses the MMUs provide access protection of supervisor areas from user access and can designate areas of memory as read only as well as no execute or guarded T
439. nstruction Remove statements noting that broadcasting a sequence of debz instructions may cause snoop accesses to be retired indefinitely Remove statement that debi causes a direct storage interrupt and added that debi is used to invalidate cache blocks Revise some section titles for clarity Change first paragraph to specify e300c1 implementation Change first paragraph to specify e300c1 implementation Change second paragraph and Table 4 7 title to specify e300c1 PLRU implementation In Table 4 17 relabel Ways Locked column as e300c1 Ways Locked In Table 4 20 relabel Ways Locked column as e300c1 Ways Locked Remove references to direct stores no longer supported Remove Imw and stmw from list of instructions mentioned in Table 5 2 that cause an alignment exception when in little endian mode Remove mention of power on reset as a cause for system reset in Table 5 3 hreset is very similar to power on reset and is the only listed cause of system reset now Remove Imw and stmw from list of instructions mentioned in Table 5 3 that cause an alignment exception when in little endian mode Following Table 5 4 add table entitled Bit Settings for Program Interrupts to show SRRI bit settings on a program interrupt Change statement that mentioned CSRR1 loading only bits 16 31 from MSR to loading all bits 0 31 from MSR for the e300 core Add statement that the tea signal is still monitored even when H
440. nstruction Number of CR updates from both CQO and CQ1 must not exceed one Number of GPR updates from both CQO and CQ1 must not exceed two Number of FPR updates from both CQO and CQ1 must not exceed one e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 27 Instruction Timing 7 7 Instruction Latency Summary Table 7 1 through Table 7 6 list the latencies associated with each instruction executed by the e300 core Note that the instruction latency tables contain no 64 bit architected instructions These instructions will trap to an illegal instruction interrupt handler when encountered Recall that the term latency is defined as the total time it takes to execute an instruction and make ready the results of that instruction Table 7 1 provides the latencies for the branch instructions Table 7 1 Branch Instructions Mnemonic Primary Extended Unit i Latency Opcode Opcode in Cycles be l a 16 BPU 1 b i a 18 BPU 1 bell 19 016 BPU 1 bectr l 19 528 BPU 1 1 These operations may be folded for an effective cycle time of 0 Table 7 2 provides the latencies for the system register instructions Table 7 2 System Register Instructions Mnemonic Primary Extended Unit Latency Opcode Opcode in Cycles sc 17 1 SRU 3 rfi 19 050 SRU 3 rfci 19 051 SRU 3 isync 19 150 SRU 1 amp mf
441. nstruction the contents of the DCMP or ICMP register is loaded into the first word of the selected TLB entry SPR 977 DCMP Access Supervisor read write SPR 981 ICMP o 1 24 25 26 31 R V VSID H API W Reset All zeros Figure 6 12 DCMP and ICMP Registers Table 6 11 describes the bit settings for the DCMP and ICMP registers Table 6 11 DCMP and ICMP Bit Settings Bits Name Description 0 V Valid bit Set by the processor on a TLB miss interrupt 1 24 VSID Virtual segment ID Copied from VSID field of corresponding segment register 25 H Hash function identifier Cleared by the processor on a TLB miss interrupt 26 31 API Abbreviated page index Copied from API of effective address 6 5 2 1 3 Primary and Secondary Hash Address Registers HASH1 and HASH2 HASH1 and HASH2 contain the physical addresses of the primary and secondary PTEGs for the access that caused the TLB miss interrupt Only bits 7 25 differ between them For convenience the processor automatically constructs the full physical address by routing bits 0 6 of SDR1 into HASH1 and HASH2 and clearing the lower six bits These registers are read only and are constructed from the contents of the DMISS or IMISS register The format for HASH and HASH2 is shown in Figure 6 13 e300 Power Architecture Core Family Reference Manual Rev 3 32 Freescale Semiconductor Memory Management SPR 978 HA
442. nstruction Cache Instructions in The Programming Environments Manual e300 Power Architecture Core Family Reference Manual Rev 3 20 Freescale Semiconductor Instruction and Data Cache Operation Since the sync instruction strongly serializes the memory subsystem performance of code containing several icbi instructions can be improved by batching the icbi instructions together such that only one syne instruction is used to synchronize all the icbi instructions in the batch 4 6 Cache Operations This section describes the three types of operations that can occur to the caches and how these operations are implemented in the e300 core 4 6 1 Data Cache Fill Operations A cache block fill is caused by a cacheable load or store miss in the cache The cache block that corresponds to the missed address is updated by a burst transfer of the data from system memory If a read miss occurs in a system with multiple bus masters and the data is modified in another cache the modified data is first written to external memory before the cache fill occurs When the core is configured with a 64 bit data bus cache blocks are loaded in four beats of 8 bytes each When the core is configured with a 32 bit bus cache block loads are performed with eight beats of 4 bytes each The burst load is performed as critical double word first The data cache is blocked to subsequent load store operations until the load completes The critical double word is sim
443. nstruction address breakpoint interrupt de 1 Program Program interrupt due to the following EXECUNON e Illegal instruction e Privileged instruction e Trap 2 System call System call interrupt 3 Floating point unavailable Floating point unavailable interrupt 4 Program Floating point enabled interrupt 5 Alignment One to the following e Floating point not word aligned e Imw stmw Iwarx or stwex not word aligned e ecwix or ecowx operands not aligned e String access with little endian bit MSR LE set e The operand of a load store load multiple store multiple load string or store string instruction crosses a protection boundary e The instruction is Iswi Iswx stswi stswx and the core is in little endian mode Note that PowerPC little endian mode is not supported on the e300 core 6 Data access BAT page protection violation 7 DTLB miss A store or load miss 8 Alignment A dcbz to a write through or caching inhibited page 9 Data access A TLB page protection violation 10 DTLB miss A change bit not set on a store operation Post instruction 0 Trace One of the following execution e MSR SE 1 e MSR BE 1 for branches e300 Power Architecture Core Family Reference Manual Rev 3 6 Freescale Semiconductor Interrupts and Exceptions Interrupt priorities are described in detail in Interrupt Priorities in Chapter 6 Interrupts in the Programming Environments Manual 5 1 2 Summary of Front End Interrupt Handling The foll
444. nstruction is executed When the interrupt is taken DAR is set to the data address that causes the breakpoint and DSISR 9 is set to indicate a data address breakpoint The address of the instruction associated with the breakpoint condition is stored in SRRO The instruction retires after returning from the DSI interrupt and all registers and memory accesses are committed to memory An unrecoverable state occurs whenever DABR or DABR2 values are set to an interrupt vector These values must not be set to match within the DSI interrupt handler or the core may enter an indeterminate or unrecoverable processor core state 10 1 4 Data Address Control Register DBCR DBCR is a supervisor level SPR on the e300 core that controls the compare type and match type conditions for DABR and DABR2 Note that DABR or DABR2 or both must be enabled before the effects of DBCR are realized The e300 core includes additional bits in DBCR 6 7 that contain the status of whether a data address breakpoint has matched See Section 2 2 17 Data Address Breakpoint Control Register DBCR for bit descriptions 10 1 5 Other Debug Resources In addition to the four breakpoint registers and the two breakpoint control registers other internal register values control and monitor the effects of breakpoint conditions Table 10 1 shows these registers and their bits e300 Power Architecture Core Family Reference Manual Rev 3 2 Freescale Semiconductor Debug F
445. nstructions This section describes the user level cache management instructions defined by the VEA See Section 3 2 6 3 Memory Control Instructions OEA for information about supervisor level cache segment register manipulation and translation lookaside buffer management instructions The instructions listed in Table 3 28 provide user level programs the ability to manage on chip caches when they exist Table 3 28 User Level Cache Instructions Name Mnemonic Operand Syntax Data Cache Block Flush dcbf rA rB Data Cache Block Set to Zero dcbz rA rB Data Cache Block Store dcbst rA rB Data Cache Block Touch dcbt rA rB Data Cache Block Touch for Store dcbtst rA rB Instruction Cache Block Invalidate icbi rA rB e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 27 Instruction Set Model As with other memory related instructions the effect of the cache management instructions on memory are weakly ordered If the programmer needs to ensure that cache or other instructions have been performed with respect to all other processors and system mechanisms a sync instruction must be placed in the program following those instructions Note that when data address translation is disabled MSR DR 0 the Data Cache Block Set to Zero debz instruction allocates a cache block in the cache and may not verify that the physical address is valid If a cache block is created for
446. nt register is enabled and the conditions are met an instruction address breakpoint interrupt 0x01300 or DSI interrupt 0x00300 occurs The conditions for these exceptions are described in Section 10 1 6 Interrupt Vectors for Debugging 10 1 1 Instruction Address Breakpoint Registers IABR IABR2 IABR and IABR2 can be used to cause a breakpoint interrupt if a specified instruction address is encountered ABR and IABR2 control the instruction address breakpoint interrupt ABR CEA and IABR2 CEA hold the effective address to which each instruction s address is compared The interrupt for each breakpoint is enabled by setting IABR BE or IABR2 BE respectively The interrupt is taken when there is an instruction address breakpoint match on the next instruction to complete The instruction tagged with the match cannot complete before the instruction address breakpoint interrupt 0x01300 is taken The address of the instruction that matches the breakpoint condition is stored in SRRO The tagged instruction retires after returning from the interrupt rfi or rfci The results are then committed to the destination registers and address If the IABR or IABR2 values are set to any interrupt vector range an unrecoverable state occurs The IABR or IABR2 values should never be set to match within the instruction address breakpoint interrupt handler Allowing a breakpoint within any handler may result in an indeterminate or unrecoverable processor
447. nter 16 Kbyte 16 Kbyte D Cache Cache Debug COP PLL amp Clock JTAG Interface Multiplier a Touch Load Buffer Control Decrementer Copy Back Buffer Core Interface 32 Bit Address Bus Kc 64 Bit Data Bus e300 Power Architecture Core Family Reference Manual Rev 3 6 Freescale Semiconductor Instruction Timing Figure 7 4 Instruction Flow Diagram for the e300c3 The instruction pipeline stages are described as follows The instruction fetch stage includes the clock cycles necessary to request instructions from the memory system and the time the memory system takes to respond to the request Instruction fetch timing depends on many variables such as whether the instruction is in the branch target instruction cache or in the on chip instruction cache Instruction fetch timing increases when it is necessary to fetch instructions from system memory The variables that affect fetch timing include the processor to bus clock ratio the amount of bus traffic and whether any cache coherency operations are required Because there are so many variables unless otherwise specified the instruction timing examples below assume optimal performance and that the instructions are available in the instruction queue in the same clock cycle that they are requested The fetch stage ends when the instruction is dispatched The decode dispatch stage consists of the time it takes to fully decode the instruction and dispatch it f
448. nter the checkstop state as directed by the MSR smi _ System management interrupt If smi is asserted and MSR EE is set the core initiates a system management interrupt ckstp I O Checkstop interrupt Assertion of this signal by the core is used to generate a chip wide hard stop pm_event_in Performance monitor interrupt Initiates a performance monitor interrupt to the core Core Status tben Asserted by the system logic to enable the time base qack Quiescent acknowledge Assertion Indicates that all bus activity that requires snooping has terminated or paused and that the core may enter the quiescent or low power state qreq O Quiescent request Indicates that the core is requesting all CSB activity normally required to be snooped to terminate or to pause so the core may enter the quiescent low power state Once the core enters a quiescent state it no longer snoops CSB activity tlbisync TLB instruction synchronize If asserted the tlbsync instruction causes instruction execution to stop If negated instruction execution may continue or resume after the completion of a tlbsyne instruction Clocks pll_cfg 0 6 PLL configuration select Configurations are as shown in Table 8 2 clk_out O Assertion provides PLL clock output for PLL testing and monitoring The c k_out signal clocks at either the core clock frequency or bus clock frequency if enabled by the appropriate HIDO bits
449. nterface Multiplier Touch Load Buffer Copy Back Buffer 32 Bit Address Bus 64 Bit Data Bus 64 Bit Two Instructions Branch Processing Unit 64 Bit Two Instructions Instruction Unit FPR File FP Rename Registers Figure 1 1 e300c1 Core Block Diagram Floating Point Unit 32 Kbyte Cache Core Interface e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Overview Figure 1 2 is a block diagram of the e300c2 core Note that it does not support floating point operations 64 Bit Two Instructions E Sequeniial Branch Fetcher Processing Unit 64 Bit Instruction Queue System Register Unit 64 Bit Instruction Unit 64 Bit Two Instructions 64 Bit Integer Load Store FPR File Unit2 Unit XER FP Rename Registers Completion Unit Completes up to two instructions per clock 64 Bit Power Time Base Dissipation Counter Control Decrementer 16 Kbyte D Cache 16 Kbyte Cache Debug COP PLL amp Clock JTAG Interface Multiplier Touch Load Buffer Copy Back Buffer Core Interface 32 Bit Address Bus rl 64 Bit Data Bus Figure 1 2 e300c2 Core Block Diagram e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Overview Figure 1 3 shows a block diagram of the e300c3 core Note that the e300c3 supports floating point operations and includes
450. ntext established by the interrupt 16 EE External interrupt enable 0 The processor ignores external interrupts system management interrupts and decrementer interrupts 1 The processor is enabled to take an external interrupt system management interrupt or decrementer interrupt 17 PR Privilege level O The processor can execute both user and supervisor level instructions 1 The processor can only execute user level instructions 18 FP Floating point available this bit is read only on the e300c2 core 0 The processor prevents dispatch of floating point instructions including floating point loads stores and moves 1 The processor can execute floating point instructions and can take floating point enabled exception type program interrupts 19 ME Machine check enable 0 Machine check interrupts are disabled 1 Machine check interrupts are enabled 20 FEO Floating point exception mode 0 see Table 5 9 This bit is read only on the e300c2 core 21 SE Single step trace enable 0 The processor executes instructions normally 1 The processor generates a trace interrupt upon the successful completion of the next instruction 22 BE Branch trace enable 0 The processor executes branch instructions normally 1 The processor generates a trace interrupt upon the successful completion of a branch instruction 23 FE1 Floating point exception mode 1 see Table 5 9 This bit is read only on the e300c2 core 2
451. ntexts where results are allowed to be boundedly undefined are constrained to ones that could have been achieved by executing an arbitrary sequence of defined instructions in valid form starting in the state the machine was in before attempting to execute the given instruction Branch folding The replacement with target instructions of a branch instruction and any instructions along the not taken path when a branch is either taken or predicted as taken Branch prediction The process of guessing whether a branch will be taken Such predictions can be correct or incorrect the term predicted as it is used here does not imply that the prediction is correct successful The PowerPC architecture defines a means for static branch prediction as part of the instruction encoding Branch resolution The determination of whether a branch is taken or not taken A branch is said to be resolved when the processor can determine which instruction path to take If the branch is resolved as predicted the instructions following the predicted branch that may have been speculatively executed can complete see Completion If the branch is not resolved as predicted instructions on the mispredicted path and any results of speculative execution are purged from the pipeline and fetching continues from the nonpredicted path Burst A multiple beat data transfer whose total size is typically equal to a cache block Bus clock Clock that causes the bus state tra
452. ntly See Set associative Set associative Aspect of cache organization in which the cache space is divided into sections called sets The cache controller associates a particular main memory address with the contents of a particular set or region within the cache Shadowing Shadowing allows a register to be updated by instructions that are executed out of order without destroying machine state information Signaling NaN A type of NaN that generates an invalid operation program interrupt when it is specified as arithmetic operands See Quiet NaN Significand The component of a binary floating point number that consists of an explicit or implicit leading bit to the left of its implied binary point and a fraction field to the right Simplified mnemonics Assembler mnemonics that represent a more complex form of a common operation e300 Power Architecture Core Family Reference Manual Rev 3 Glossary 10 Freescale Semiconductor Slave The device addressed by a master device The slave is identified in the address tenure and is responsible for supplying or latching the requested data for the master during the data tenure Snooping Monitoring addresses driven by a bus master to detect the need for coherency actions Snoop push Response to a snooped transaction that hits a modified cache block The cache block is written to memory and made available to the snooping device Split transaction A transaction with independent r
453. nto rA and the memory element byte half word word or double word addressed by EA is loaded into rD Implementation Note In some implementations the load half word algebraic instructions lha and Ihax and the load with update Ibzu Ibzux lhzu lhzux Ihau lhaux lwu and lwux instructions may execute with greater latency than other types of load instructions In the e300 core these instructions operate with the same latency as other load instructions Table 3 14 lists the integer load instructions Table 3 14 Integer Load Instructions Name Mnemonic Operand Syntax Load Byte and Zero Ibz rD d rA Load Byte and Zero Indexed Ibzx rD rA rB Load Byte and Zero with Update Ibzu rD d rA Load Byte and Zero with Update Indexed Ibzux rD rA rB e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 17 Instruction Set Model Table 3 14 Integer Load Instructions continued Name Mnemonic Operand Syntax Load Half Word Algebraic me oan Load Half Word Algebraic Indexed Ihax rD rA rB Load Half Word Algebraic with Update Ihau rD d rA Load Half Word Algebraic with Update Indexed Ihaux rD rA rB Load Half Word and Zero Ihz rD d rA Load Half Word and Zero Indexed Ihzx rD rA rB Load Half Word and Zero with Update Ihzu rD d rA Load Half Word and Zero with Update Indexed Ihzux rD rA rB Load Word and Zero Iwz rD d rA Load Word and Zero Indexed
454. ntrol registers PMLCa0 PMLCa3 control each individual performance monitor counter Each counter has a corresponding PMLCa register UPMLCa0 UPMLCa3 provide user level read access to PMLCa0 PMLCa3 The performance monitor interrupt is assigned to interrupt vector OxOFO0 Its priority is less than the fixed interval interrupt and greater than the decrementer interrupt e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Performance Monitor Software communication with the performance monitor is achieved through PMRs rather than SPRs The DMR are used for enabling conditions that can trigger the performance monitor interrupt 11 2 Performance Monitor Registers The performance monitor provides a set of PMRs for defining enabling and counting conditions that trigger the performance interrupt It also the performance monitor interrupt vector The supervisor level performance monitor registers in Table 11 1 are accessed with mtpmr and mfpmr Attempting to read or write supervisor level registers in user mode causes a privilege exception Table 11 1 Performance Monitor Registers Supervisor Level Number PMR 0 4 PMR 5 9 Name Abbreviation 16 00000 10000 Performance monitor counter 0 PMCO 17 00000 10001 Performance monitor counter 1 PMC1 18 00000 10010 Performance monitor counter 2 PMC2 19 00000 10011 Performance monitor counter 3 PMC3 144 00100
455. o 256 Mbyte segments This segmented memory model provides a way to map 4 Kbyte pages of effective e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 19 Memory Management addresses to 4 Kbyte pages in physical memory page address translation while providing the programming flexibility afforded by a large virtual address space 52 bits The segment page address translation mechanism may be superseded by the BAT mechanism described in Section 6 3 Block Address Translation If not the translation proceeds in the following two steps 1 From effective address to the virtual address which never exists as a specific entity but can be considered to be the concatenation of the virtual page number and the byte offset within a page 2 From virtual address to physical address The following section highlights those areas of the memory segment model defined by the OEA that are specific to the e300 core 6 4 1 Page History Recording Reference R and change C bits reside in each PTE to keep history information about the page They are maintained by a combination of the core hardware and the table search software The operating system uses this information to determine which areas of memory to write back to disk when new pages must be allocated in main memory Reference and change recording is performed only for accesses made with page address translation and not for translations made with the BAT
456. o 0 Com 83 PMC1 overflow N A _ PMC1 32 transitions from 1 to 0 Com 84 PMC2 overflow N A PMC2 32 transitions from 1 to 0 Com 85 PMC3 overflow N A PMC3 32 transitioned from 1 to 0 Interrupt Events Com 86 Interrupts taken Nonspec Com 87 External input interrupts taken Nonspec Com 88 Critical input interrupts taken Nonspec Com 89 System call and trap interrupts Nonspec Ref 90 Transitions of TBL bit selected by Nonspec Counts transitions of the TBL bit selected by PMGCO TBSEL PMGCO TBSEL e300 Performance Monitor Events Com 96 i cache hits Spec Number of fetches that hit in i cache Com 97 Instructions folded Spec Number of instructions folded used to determine true number of instructions completed Com 100 Stalls due to completion buffer Spec Cycles issue stalled due to full completion buffer Com 101 Reserved Reserved Com 104 Stalled completion Spec Cycles that completion is stalled Com 105 Stalles due to load Spec Cycles that completion is stalled due to load Com 106 Stalles due to floating point Spec Cycles that completion is stalled due to floating point instruction Com 108 Load and stores to cacheable Spec Number of loads and stores to cacheable space in the data cache space Com 109 Loads and stores that hit in cache Spec Number of loads and stores that hit in the data cache 1 For chaining events if a counter is configured to count its own overflow bit that counter does not
457. o acquire exclusive use of a memory location for the purpose of modifying it e f the addressed block is in the invalid state the core takes no action e f the addressed block in the cache is in the exclusive or shared state the core initiates an additional snoop action to change the state of the cache block to invalid e f the addressed block in the cache is in the modified state the block is flushed to memory and the block is invalidated e Any associated reservation is canceled MESI state only The RWITM atomic operations appear on the bus in response to stwex instructions and are snooped like RWITM instructions sync No action is taken TLB invalidate No action is taken e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 31 Instruction and Data Cache Operation In addition to the retries described in Table 4 9 the core may signal retry for any bus transaction due to internal conflicts that prevent the appropriate snooping The following scenarios cause the core to signal a retry e Snoop hits to a block in the M state flush or clean This case is a normal snoop hit and results in retry being signaled if the snooped transaction is a flush or clean request If the snooped transaction is a kill request retry is not signaled e Snoop hits to line in the cast out buffer The cast out buffer is kept coherent with main memory and snoop operations that hit in t
458. o execute an unimplemented reserved instruction invokes the illegal instruction error handler a program interrupt See Section 5 5 7 Program Interrupt 0x00700 for additional information about illegal and invalid instruction interrupts The following types of instructions are included in this class e Implementation specific instructions for example Load Data TLB Entry tlbld and Load Instruction TLB Entry tlbli instructions e Optional instructions defined by the PowerPC architecture but not implemented by the core for example Floating Square Root fsqrt and Floating Square Root Single fsqrts instructions 3 2 2 Addressing Modes This section provides an overview of conventions for addressing memory and calculating effective addresses as defined by the PowerPC architecture for 32 bit implementations For more detailed information see Conventions in Chapter 4 Addressing Modes and Instruction Set Summary of the Programming Environments Manual 3 2 2 1 Memory Addressing A program references memory using the effective logical address computed by the processor when it executes a memory access or branch instruction or when it fetches the next sequential instruction As described in Section 3 1 1 Data Organization in Memory and Memory Operands bytes in memory are numbered consecutively starting with zero Each number is the address of the corresponding byte 3 2 2 2 Memory Operands Memory operands
459. o single step through rfi rfci and branch instructions Note that single stepping on an mtmsr may give unwanted results The new value moved into MSR upon execution of the mtmsr might cause single stepping to be disabled if the new value of MSR SE is cleared Thus it is recommended that the value of MSR SE be changed to enable or disable single stepping indirectly by changing the value of SRRO within an interrupt handler and relying on rfi to set or clear MSR SE 10 2 2 Branch Tracing When MSR BE branch trace enable is set the processor generates a trace interrupt OxOOD00 upon the successful completion of a branch instruction 10 2 3 Breakpoint Address Matching Options When an instruction address breakpoint is set and a condition is matched an instruction address breakpoint interrupt 0x01300 occurs along with execution of the matched instruction The instruction retires after the return from the interrupt handler When a data address and data translation condition match occurs a DSI interrupt 0x00300 occurs along with execution of the matched instruction The instruction retires after the return from the interrupt handler occurs and the instruction has updated memory or registers as appropriate On the e300 a match occurs when an address equals the effective address in a corresponding breakpoint register The e300 can also match addresses on a greater than or equal to or less than basis as an additional matching condit
460. o the instruction at which instruction processing should resume when the interrupt handler returns control to the interrupted process All instructions in the program flow preceding this one will have completed and no subsequent instruction will have completed This may be the address of the instruction that caused the interrupt or the next one as in the case of a system call interrupt The instruction addressed can be determined from the interrupt type and status bits This address is used to resume instruction processing in the interrupted process typically when an rfi instruction is executed The SRRO register is shown in Figure 5 1 SPR 26 Access Supervisor read write 0 a R W SRRO holds EA for resuming program execution Reset All zeros Figure 5 1 Machine Status Save Restore Register 0 SRRO The save restore register 1 SRR1 is used to save machine status the contents of the MSR on interrupts and to restore those values when rfi is executed SRR1 is shown in Figure 5 2 SPR 27 Access Supervisor read write Interrupt specific information and MSR bit values Reset All zeros Figure 5 2 Machine Status Save Restore Register 1 SRR1 e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Interrupts and Exceptions Typically when an interrupt occurs SRR1 0 15 are loaded with interrupt specific information and bits 16 31 of MSR are placed into the correspo
461. oad miss operation including for lwarx uses the READ bus transaction rather than a RWITM bus transaction as is required for the MEI protocol The debt instruction causes a READ transaction and the debtst instruction causes a RWITM transaction Note that MMU translation should be enabled when using these instructions In e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Instruction and Data Cache Operation addition instruction fetches are performed as READ transactions and signal the g b pin according to the true WIMG status including signaling global when the MMU translation is disabled real addressing mode 4 4 2 2 1 MESI State Transitions Figure 4 6 shows the state transitions when the core is configured for MESI coherency protocol HID2 MESI 1 Figure 4 6 assumes that the WIM bits for the page or block are set to 001 that is write back caching not inhibited and memory coherency required snoop w push hit read hit snoop w flush hit CA read miss snoop whkill hit snoop w push hit snoop w push hit Shared signaled CAMWTICI write hit RO line fill snoop w flush hit snoop w flush hit snoop w kill hit snoop whkill hit CA write miss CA read miss RWITM RWITM RO line fill line fill Modified CA write hit no broadcast Exclusive read hit CA WT CI write hit read hit WT CI write hit CA Caching Allowed WT Write Through Cl Caching Inhibited RO Read o
462. ock address translation IBAT and data block address translation DBAT registers After the BAT registers have been set up the MMU must be enabled The following assembly code enables both instruction and data memory address translation Enable instruction and data memory address translation This corresponds to setting IR and DR in the MSR bits 26 amp 27 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 35 Instruction and Data Cache Operation mfmsr rl ori rli ri Ox0030 sync mtmsr rl isync 4 10 3 1 3 Disabling Interrupts for Data Cache Locking To ensure that interrupt handler routines do not execute while the cache is being loaded which could possibly pollute the cache with undesired contents all interrupts must be disabled This is accomplished by clearing the appropriate bits in the machine state register MSR See Table 4 15 for the bits within the MSR that must be cleared to ensure that interrupts are disabled Table 4 15 MSR Bits for Disabling Interrupts Bits Name Description 16 EE External interrupt enable 19 ME Machine check enable 20 FEO Floating point exception mode 0 23 FE1 Floating point exception mode 1 24 CE Critical interrupt enable 1 The floating point exception may not need to be disabled because the example code shown in this document that performs cache locking does not execute any floating point operations Th
463. of the device save the device configuration parameters and put the device into a power saving mode Furthermore every time the device driver is called it needs to check the power status of the device and restore the device to the full on state if the device is in a power saving mode 9 2 Dynamic Power Management Dynamic power management DPM automatically powers up and down the individual execution units of the core based on the contents of the instruction stream For example if no floating point instructions are being executed the floating point unit is automatically powered down Note that floating point instructions are not supported in the e300c2 Power is not actually removed from the execution unit instead each execution unit has an independent clock input which is automatically controlled on a clock by clock basis Because CMOS circuits consume negligible power when they are not switching stopping the clock to an execution unit effectively eliminates its power consumption The operation of DPM is completely transparent to software or any external hardware Dynamic power management is enabled by setting HIDO DPM on power up following a hard reset sequence hreset 9 3 Programmable Power Modes Hardware can enable a power management state through external asynchronous interrupts The hardware interrupt causes the transfer of program flow to interrupt handler code The appropriate mode is then set by the software The core provides
464. of the instruction timing provided by the superscalar parallel execution supported by the PowerPC architecture and the e300 core e Section 1 1 6 Bus Interface Unit BIU describes the signals implemented on the core The e300 core is a high performance superscalar processor core The PowerPC architecture allows optimizing compilers to schedule instructions to maximize performance through efficient use of the PowerPC instruction set and register model The multiple independent execution units allow compilers to optimize instruction throughput Compilers that take advantage of the flexibility of the PowerPC architecture can additionally optimize system performance The following sections summarize the features of the core including both those that are defined by the architecture and those that are unique to the various core implementations Specific features of the core are listed in Section 1 1 1 Features 1 3 1 Register Model The PowerPC architecture defines register to register operations for most computational instructions Source operands for these instructions are accessed from the registers or are provided as immediate values embedded in the instruction opcode The three register instruction format allows specification of a target register distinct from the two source operands Load and store instructions transfer data between registers and memory The e300 core has two levels of privilege supervisor mode of operation
465. oint inexact exception enable 29 NI Floating point non IEEE mode If this bit is set results need not conform with IEEE standards and the other FPSCR bits may have meanings other than those described here If the bit is set and if all implementation specific requirements are met and if an IEEE conforming result of a floating point operation would be a denormalized number the result produced is zero retaining the sign of the denormalized number Any other effects associated with setting this bit are described in the user s manual for the implementation Effects of the setting of this bit are implementation dependent 30 31 RN Floating point rounding control 00 Round to nearest 01 Round toward zero 10 Round toward infinity 11 Round toward infinity The remaining user level registers are SPRs Note that the PowerPC architecture provides a separate mechanism for accessing SPRs the mtspr and mfspr instructions These instructions are commonly used to explicitly access certain registers while other SPRs may be accessed as the side effect of executing other instructions XER register XER The 32 bit XER indicates overflow and carries for integer operations It is set implicitly by many instructions e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Register Model Link register LR The 32 bit LR provides the branch target address for the Branch Conditional to Link
466. oint status and control FPSCR 5 1 save restore registers SRR 2 10 5 8 5 9 5 10 5 14 5 15 5 16 5 17 SPRGan 2 10 2 11 2 22 5 11 machine state register MSR 4 34 MMU registers 2 9 2 18 2 21 6 29 6 33 data block address translation regs DBATnU L 2 9 2 11 2 20 2 21 data TLB compare register DCMP 2 18 6 32 6 34 data TLB miss address reg DMISS 2 18 6 31 6 34 hash address regs primary secondary HASHn 2 19 instruction block address translation regs IBATnU L 2 9 2 11 2 20 2 21 instruction TLB compare register ICMP 2 18 6 32 6 34 instruction TLB miss address reg IMISS 2 18 6 31 6 34 primary secondary hash addr HASH1 HASH2 6 32 required physical address register RPA 2 20 SDRI1 2 9 segment registers SRn 2 10 performance monitor 11 7 counter registers PMCO 3 11 5 global control 0 PMGCO0 11 3 local control A PMLCa0 PMLCa3 11 4 user counter registers UPMCO0 3 11 7 user global control 0 UPMGCO 11 4 user local control A UPMLCa0 UPMLCa3 11 5 summary 2 1 supervisor level reg summary 2 6 time base facility TBL TBU for reading 1 19 2 6 for writing 2 10 user level reg summary 2 3 XER 32 bit 2 5 Rename registers definition 7 2 operation 7 16 Reservation station definition 7 2 Reservations memory with lwarx and stwex 8 7 Reset hard reset sequence 9 1 reset exception 5 7 settings caused by hard reset 5 19 soft reset 5 18 5 19 5 20 s
467. omatically sets MSR TGPR for these cases allowing these interrupt handlers to have four registers that are used as scratchpad space without having to save or restore this part of the machine state that existed when the interrupt occurred Note that MSR TGPR is cleared when the rfi instruction is executed because the old MSR value with MSR TGPR 0 saved in SRR1 is restored Refer to Section 6 5 2 2 Software Table Search Operation for code examples that take advantage of these registers e300 Power Architecture Core Family Reference Manual Rev 3 30 Freescale Semiconductor Memory Management e Also the core automatically saves the values of CR CRO of the executing context to SRR1 0 3 Thus the interrupt handler can set CR CRO bits and branch accordingly in the interrupt handler routine without having to save the existing CR CRO bits However the interrupt handler must restore these bits to CR CRO before executing the rfi instruction or branching to the DSI or ISI interrupt handler In addition SRR1 CRFO must be cleared before branching to the DSI interrupt handler on a data access page fault For an instruction access page fault SRR1 0 2 3 must be cleared before branching to the ISI handler See Figure 6 17 for synthesizing a page fault interrupt when no PTE is found e SRR1 D I identifies an instruction or data miss and SRR1 L S identifies a load or store miss SRRI WAY identifies the associativity class
468. omplete and the branch guess that forced this instruction to be predicted was resolved to be incorrect e Soft reset The latency of a soft reset interrupt is affected by recoverability The time to reach a recoverable state may depend on the time needed to complete or except an instruction at the point of completion the time needed to drain the completed store queue see Section 7 1 Terminology and Conventions for the definition and the time waiting for a correct empty state so that a valid MSR IP may be saved For lower priority externally generated interrupts a delay may be incurred waiting for another interrupt generated while reaching a recoverable state to be serviced Further delays are possible for other types of interrupts depending on the number and type of instructions that must be completed before those interrupts may be serviced See Section 5 1 2 Summary of Front End Interrupt Handling to determine possible maximum latencies for different interrupts 5 5 Interrupt Definitions Table 5 10 shows all the types of interrupts that can occur with the e300 core and the MSR bit settings when the processor transitions to supervisor mode The state of these bits prior to the interrupt is typically stored in SRR1 or CSRR1 for critical interrupts on the e300 core Note that MSR CE is cleared for the system reset machine check and critical interrupts Table 5 10 MSR Setting Due to Interrupt
469. on 9 Set if a data address breakpoint interrupt occurs when the data 0 28 in the DABR or DABR2 matches the next data access load or store instruction to complete in the completion unit The different breakpoints are enabled as follows e Write breakpoints enabled when DABR 30 is set e Read breakpoints enabled when DABR 31 is set e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 27 Overview Table 1 2 Exceptions and Interrupts continued Interrupt Type ISI Vector Offset hex 00400 Exception Conditions Caused when an instruction fetch cannot be performed for any of the following reasons e The effective logical address cannot be translated That is there is a page fault for this portion of the translation so an ISI interrupt must be taken to load the PTE and possibly the page into memory e The fetch access violates memory protection indicated by SRR1 4 set If the key bits Ks and Kp in the segment register and the PP bits in the PTE are set to prohibit read access instructions cannot be fetched from this location External interrupt 00500 Caused when MSRI EE 1 and the int signal is asserted Alignment 00600 Caused when the core cannot perform a memory access for any of the reasons described below e The operand of a floating point load or store instruction is not word aligned e The operands of Imw stmw Iwarx and stwex
470. on 3 passes into the last FPU execute stage Note that all three FPU stages are full To allow for the potential need for denormalization the dispatch logic prevents instruction 7 fadd from being dispatched in the next clock cycle In cycle 4 target instructions 8 and 9 are fetched Instruction 1 completes in cycle 4 allowing instruction 2 which had finished executing in the previous clock cycle to be removed from the CQ Instruction 6 replaces instruction 3 in the first stage of the FPU Also as will be shown in cycle 5 a single cycle stall occurs when the FPU pipeline is full In cycle 5 instruction 3 completes instruction 6 continues through the FPU pipeline and although the first stage of the FPU pipeline is free instruction 7 cannot be dispatched because of the potential need for one of the previous floating point instructions to require denormalization Because instruction 7 cannot be dispatched neither can instruction 8 This dispatch stall causes the instruction queue to become full when instructions 10 and 11 are fetched In cycle 6 instruction 12 is fetched Instruction 7 is dispatched to the first FPU stage so instruction 8 can also be dispatched to the IU Instructions 9 and 10 move to IQO and IQ1 but because instructions 9 10 and 11 are integer instructions only one instruction is dispatched in each of the next two clock cycles Note that moving instruction 12 fadd up further in the program flow would improve dispatch
471. on E E a oa E skew a 4 11 Load and Store Coherency Summarg 4 12 Coherency in Single Processor Systems eet gue d ers EEN gen 4 12 Core Initiated Load Store Operatons 4 13 Performed Loads and Stores isi e s1sieecdeetiest ncsraciaeseetieaetetitiesleatiecta eu deleeeapeaele 4 13 Sequential Consistency of Memory Accesses ceesceceeccecseececeeeeeceeeeeneecsteeeees 4 13 Enforcing Load Store Ordering ag 0 ose Go sled deed ade ada acoseA 4 14 Een ETH eer EE 4 14 Cache EE 4 14 Cache Control Parameters in HIDO and HID2 000 0 eee ceeneeeeeeeeceeeeeceeeeeeeneeeens 4 14 Cache Parity Error Reporting HIDO ECPE cee eeeeeeeneeceeeceeeeeeeesteeeeeaeees 4 14 Data Cache Enable HIDO DCE 00 0000 ecececesessessseseseeesesesesesesesesenene 4 14 Data Cache Lock HIDOIDELOCKT cc ccccessssccecececeesessnseceeeseesesensnteaeees 4 15 Data Cache Wav Jock HUDOIDWELCKRT ce eeccceesseeesseceesteeeenaeeeenaeeesaeeeeaees 4 15 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor vii Paragraph Page Number Title Number 4 5 1 5 Data Cache Flash Invalidate HIDO DCF 0 000 0 eeeeeeeeeeeeeseseeeeeseneee 4 15 4 5 1 6 Instruction Cache Enable HIDO ICE 0 000 ee eceeceeesesesseeseseseseseseseseseeens 4 16 4 5 1 7 Instruction Cache Lock HIDO ILOCK 0 ec cceceeeeeeessesesesesesesesesenene 4 16 4 5 1 8 Instruction Cache Way Lock HID2 TWLCK eee eeecceceececseeeeeeteeeesneeeesaeees 4 16
472. on is incorrect when the branch is resolved the IQ and all subsequently executed instructions are purged instructions executed before the predicted branch are allowed to complete and instruction fetching resumes down the correct path There are several situations where instruction sequences create dependencies that prevent a branch instruction from being resolved immediately thereby causing execution of the subsequent instruction stream based on the predicted outcome of the branch instruction The instruction sequences and the resulting action of the branch instruction is described as follows e An mtspr LR followed by a belr Fetching is stopped and the branch waits for the mtspr to execute e An mtspr CTR followed by a bectr Fetching is stopped and the branch waits for the mtspr to execute e300 Power Architecture Core Family Reference Manual Rev 3 18 Freescale Semiconductor Instruction Timing e An mtspr CTR followed by a be CTR Fetching is stopped and the branch waits for the mtspr to execute Note Branch conditions can be a function of the CTR and CR if the CTR condition is sufficient to resolve the branch then a CR dependency is ignored e A be CTR followed by another be CTR Fetching is stopped and the second branch waits for the first to be completed e A be CTR followed by a bectr Fetching is stopped and the bectr waits for the first branch to be completed e A branch LK 1 followed by a branch
473. on of the instructions however may indirectly cause bus transactions to be performed or their completion may be linked to the bus Table 4 8 summarizes how these instructions may operate with respect to the CSB Table 4 8 e300 Bus Operations Caused by Cache Control Instructions Operation Cache State Next Cache State CSB Operation Comment sync Don t care No change None Waits for data path related queues to complete bus activity icbi Don t care l None Seng dcbi Don t care l None or Kill block Broadcast dependent on whether HIDO ABE is set in MEI mode dcbf M l Write with kill Block is pushed dcbf E S l None or Write Broadcast dependent on whether HIDO ABE is set in MEI mode dcbst M E MEI mode Write Block is pushed S MESI mode dcbst ES No change None dcbz E M None Clear all bytes in the block dcbz S M Kill block Invalidate cache line Allocate the tag Clear all bytes in the block dcbz l M Kill block dcbt M E No change None deht l No change Read with intent Load the block into cache to modify MEI Read MESI dcbtst l No change Read with intent Load the block into cache to modify dcbtst E M No change None Table 4 8 assumes that the WIM bits are set to 001 that is since the cache is operating in write back mode caching is permitted and coherency is enforced Table 4 8 does not include caching inhibited or write through cases nor does i
474. on to signals Change Table 2 10 description of manufacturer ID to Freescale from Motorola Add statements that reserved bits should be cleared for future compatibility Before the description of the machine state register add table entitled Assigned PVR Values to show PVR values of different processors Add footnote to Table 2 2 MSR Bit Settings to show that reserved bits must be set to zero Remove bullet describing the EAR register and the eciwx ecowx instructions no longer supported e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 2 1 2 3 2 14 2 1 2 12 2 20 3 2 4 3 6 3 20 3 2 5 3 3 28 3 2 5 3 3 28 3 2 5 3 3 28 3 2 6 3 1 3 32 Chapter 4 4 2 4 3 4 3 4 4 4 6 7 4 23 4 10 3 1 7 4 38 4 10 3 2 6 4 41 Chapter 5 5 1 5 4 5 1 1 5 6 5 1 1 5 7 5 2 1 1 5 10 5 2 1 2 5 11 5 5 2 5 21 5 5 3 5 24 5 5 3 5 25 Revision History Change the name of bit 7 in Table 2 6 from MESI to MESISTATE Change SVR value in Figure 2 17 System Version Register SVR to show that it is determined by the SoC Remove third bullet on page noting that Imw and stmw cause an alignment exception when in true little endian mode Remove statement that cache control instructions that deal with direct store segments are treated as no ops since direct stores are no longer supported Remove statements noting that incoherency may occur if a write through store is followed by a debz i
475. onductor 25 Register Model Note that if the DABR DABR2 register values are set to match on any interrupt vector an indeterminate or unrecoverable processor state may occur Table 2 15 describes the fields in DABR and DABR2 Table 2 15 Data Address Breakpoint Registers DABR and DABR2 Bit Settings Bits Name Description 0 28 CEA Data address breakpoint 29 BT Breakpoint translation enable Match if MSR DR DABR BT 30 WBE Data write enable Matching on data writes enabled 31 RBE Data read enable Matching on data reads enabled A data address breakpoint match is detected for a load or store instruction if the following conditions are met for any byte accessed e EAO EA28 DABR CEA e MSR DR DABR BT e The instruction is a store and DABR WBE 1 or the instruction is a load and DABR RBE 1 Even if the above conditions are satisfied it is undefined whether a match occurs in the following cases e A store conditional indexed instruction stwex in which the store is not performed e A load or store string instruction Iswx or stswx with a zero length e A dcbz dcba eciwx or ecowx instruction For the purpose of determining whether a match occurs eciwx is treated as a load and dcbz dcba and ecowx are treated as stores Note that eciwx and ecowx may generate illegal transactions in some implementations and are not supported in the e300 The cache management instructions other than dcbz and de
476. onfiguration Registers PMR 128 131 Hardware Implementation Registers SPR 1008 SPR 1009 SPR 1011 Memory Management Registers Instruction BAT Registers SPR 528 SPR 529 SPR 530 SPR 531 SPR 532 SPR 533 SPR 534 SPR 535 IBAT4U SPR 560 IBATAL 1 SPR 561 SPR 562 SPR 563 SPR 564 SPR 565 SPR 566 SPR 567 Machine State Register Register MBAR SPR311 Data BAT Registers DBATOU SPR 536 DBATOL SPR 537 DBAT1U SPR 538 DBATIiL SPR539 DBAT2U SPR 540 DBAT2L SPR541 DBAT3U SPR 542 DBAT3L SPR 543 DBAT4U SPR 568 DBAT4L SPR 569 DBAT5U SPR 570 DBAT5L SPR 571 DBAT6U SPR 572 DBAT6L SPR 573 DBAT7U SPR 574 DBAT7L SPR575 Interrupt Handling Registers SPRGs DSISR SPR 272 DSISR SPR 18 SPR 273 SPR 274 SPR 275 SPR 276 Save and Restore Registers SRRO SPR 26 Data Address Register Memory Base Address System Processor Version Register Software Table Search Registers SPR 976 SPR 977 SPR 978 SPR 979 SPR 980 SPR 981 SPR 982 Segment Registers Miscellaneous Registers Decrementer Time Base Facility For Writing TBL SPR 284 TBU SPR 285 SPR277 DAR sPR19 Breakpoint Registers SPR 278 SPR 279 Critical Interrupt Registers CSRRO SPR 58 CSRR1 SPR59 Instruction Data Address Breakpoint Register IABR SPR 1010 IABR2 SPR 1018 DABR SPR 1013 DABR2 SPR 317 Instruction Data Address Breakpoint Control
477. onfiguring pll_cfg 0 6 to PLL bypass mode then disabling sysclk 11 DPM Dynamic power management enable 0 Dynamic power management is disabled 1 Functional units enter a low power mode automatically if the unit is idle This does not affect operational performance and is transparent to software or any external hardware 12 15 Reserved should be cleared 16 ICE Instruction cache enable 0 The instruction cache is neither accessed nor updated All pages are accessed as if they were marked cache inhibited WIM x1x Potential cache accesses from the bus snoop and cache operations are ignored In the disabled state for the L1 caches the cache tag state bits are ignored and all instruction fetches are propagated to the coherent system bus CSB as single beat or burst transactions depending on the value of HID2 IFEB For those transactions however ci reflects the state of the bit in the MMU for that page regardless of cache disabled status ICE is zero at power up 1 The instruction cache is enabled 17 DCE Data cache enable 0 The data cache is neither accessed nor updated All pages are accessed as if they were marked cache inhibited WIM x1x Potential cache accesses from the bus snoop and cache operations are ignored In the disabled state for the L1 caches the cache tag state bits are ignored and all data read and write accesses are propagated to the CSB as single beat transactions For those t
478. ons in program order before the interrupt causing instruction are completed The interrupt is then taken without completing the interrupt causing instruction If any other exception condition is created in completing these previous instructions in the machine that interrupt takes priority over the pending instruction dispatch execution interrupt which is then forgotten Post instruction execution trace This type of interrupt is generated following execution and completion of an instruction while a trace mode is enabled If executing the instruction produces e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Interrupts and Exceptions conditions for another type of interrupt that interrupt is taken and the post instruction execution interrupt is forgotten for that instruction 5 2 Interrupt Processing When an interrupt is taken the processor uses the save restore registers SRRO and SRR1 to save the contents of the machine state register for user level mode and to identify where instruction execution should resume after the interrupt is handled 5 2 1 Interrupt Processing Registers The e300 core implements the SRRO and SRR1 registers that are used for saving processor state on an interrupt Additionally the e300 core implements CSRRO and CSRR1 to specifically save state for critical interrupt interrupts 5 2 1 1 SRRO and SRR1 Bit Settings When an interrupt occurs SRRO is set to point t
479. ons in the execution units the interrupt is taken immediately upon determination of the correct restart address for loading SRRO e Asynchronous nonmaskable The system reset and the machine check interrupt are nonmaskable asynchronous interrupts They may not be recoverable or they may provide a limited degree of recoverability All interrupts report recoverability through MSR RI e300 Power Architecture Core Family Reference Manual Rev 3 26 Freescale Semiconductor Overview 1 3 4 2 Implementation Specific Interrupt Model As specified by the PowerPC architecture all interrupts can be described as either precise or imprecise and either synchronous or asynchronous Asynchronous interrupts some of which are maskable are caused by events external to the processor s execution synchronous interrupts which are all handled precisely by the e300 core are caused by instructions A system management interrupt is an implementation specific interrupt The interrupt classes are shown in Table 1 1 The interrupts are listed in Table 5 3 in order of highest to lowest priority Table 1 1 Interrupt Classifications Synchronous Asynchronous Precise Imprecise Interrupt Type Asynchronous nonmaskable Imprecise Machine check System reset Asynchronous maskable Precise External interrupt Decrementer System management interrupt Critical interrupt Synchronous Precise Instruction caused interrupts Although
480. operation Maybe No No No 4 Out of order store operation for instructions that will cause no other kind of Maybe No No No precise interrupt in the absence of system caused imprecise or floating point assist interrupts 5 All other out of order store operations Maybe No Maybe No 6 Zero length load Iswx Maybe Yes No No 7 Zero length store stswx Maybe Yes Maybe Yes 8 Store conditional stwex that does not store Maybe Yes Maybe Yes 9 In order instruction fetch Yes Yes No No 10 Load instruction or eciwx Yes Yes No No 11 Store instruction ecowx or debz instruction Yes Yes Yes Yes 12 debt debtst dcebst or debf instruction Maybe Yes No No e300 Power Architecture Core Family Reference Manual Rev 3 22 Freescale Semiconductor Memory Management Table 6 8 Model for Guaranteed R and C Bit Settings continued R Bit Set C Bit Set Priority Scenario OEA G2Core OEA G2 Core 13 icbi instruction Maybe No No No 14 debi instruction Maybe Yes Maybe Yes 1 Floating point not supported on the e300c2 2 If Cis set Ris guaranteed to also be set 3 This includes the case when the instruction was fetched out of order and R was not set does not apply for the e300 core 4 Not supported on the e300 core 5 The dcbi instruction should never be used on the e300 core For more information see Page History Recording in Chapt
481. or Lookaside Buffers in Chapter 2 Register Set in the Programming Environments Manual e300 Power Architecture Core Family Reference Manual Rev 3 44 Freescale Semiconductor Instruction Timing Chapter 7 Instruction Timing This chapter describes how the e300 core processor fetches dispatches and executes instructions and how it reports the results of instruction execution It gives detailed descriptions of how the core execution units work and how those units interact with other parts of the processor such as the instruction fetching mechanism register files and caches It gives examples of instruction sequences showing potential bottlenecks and how to minimize their effects Finally it includes tables that identify the unit that executes each instruction implemented on the core the latency for each instruction and other information that is useful for the assembly language programmer 7 1 Terminology and Conventions This section provides an alphabetical glossary of terms used in this chapter These definitions are provided as a review of commonly used terms and as a way to point out specific ways these terms are used in this chapter e Branch prediction The process of guessing whether a branch will be taken Such predictions can be correct or incorrect the term predicted as it is used here does not imply that the prediction is correct successful The PowerPC architecture defines a means for static branch predict
482. ord of pte used load on data store or store with on instruction fetch SE Se Se Se SE DE DE Se 5e SQ 5Q Se se Se Sse Se CESTE CF CY CY IT EI fh er er CET Ee EF ET machine e300 place labels for rel branches t H ae LL e P3 3 dMiss 976 dcmp 977 hashl 978 hash2 979 iMiss 980 icmp 981 rpa 982 CD D dar 19 dsisr 18 srr0 26 SE a 2I csect tlbmiss PR vec0 globl vec0 JOT vec300 org vec400 Entry vec0 0x300 vec0 0x400 Vec 1000 rr srrl msr lt tgpr gt gt iMiss gt iCmp gt hashl gt hash2 gt Instruction TB miss flow q ea that missed the compare value for the va that missed gt address of instruction that missed gt 0 3 cr0 4 lru way bit 16 31 pointer to first hash pteg pointer to second hash pteg saved MSR address compare value compare value by tlblx tlb change bit e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 39 Memory Management Register usage r0 is saved counter rl is junk r2 is pointer to pteg r3 is current compare value Org vec0 0x1000 tlbInstrMiss mfspr r2 hashl get first pointer addi E a 5 load 8 for counter mfctr r0 save counter mfspr r3 iCmp get first compare value addi D er r2 8 pre dec the pointer im0 mtctr rl load counter iml lwzu
483. ori rl rl rl 0x0030 e300 Power Architecture Core Family Reference Manual Rev 3 40 Freescale Semiconductor Instruction and Data Cache Operation sync mtmsr rl isync 4 10 3 2 3 Disabling Interrupts for Instruction Cache Locking To ensure that interrupt handler routines do not execute while the cache is being loaded which could possibly pollute the cache with undesired contents all interrupts must be disabled This is accomplished by clearing the appropriate bits in the machine state register MSR See Table 4 19 for the bits within the MSR that must be cleared to ensure that interrupts are disabled Table 4 19 MSR Bits for Disabling Interrupts Bit Name Description 16 EE External interrupt enable 19 ME Machine check enable 20 FEO Floating point exception mode 0 23 FE1 Floating point exception mode 1 24 CE Critical interrupt enable The floating point exception may not need to be disabled because the example code shown in this document that performs cache locking does not execute any floating point operations The following assembly code disables all asynchronous interrupts Clear the following bits from the MSR EE 16 ME 19 FEO 20 FEL 23 ME 24 mfmsr ed lis r2 OxFFFF ori r2 r2 Ox667F and El Eby E2 sync mtmsr el isync 4 10 3 2 4 Preloading Instructions into the Instruction Cache To optimize performance processors that impl
484. ose registers that serve a variety of functions such as providing controls indicating status configuring the core and performing special operations During normal execution a program can access the registers as shown in Figure 1 4 depending on the program s access privilege supervisor or user determined by the privilege level bit MSR PR Note that GPRs and FPRs are accessed through operands that are part of the instructions Access to registers can be explicit that is through the use of specific instructions for that purpose such as Move to Special Purpose Register mtspr and Move from Special Purpose Register mfspr instructions or implicit as the part of the execution of an instruction Some registers are accessed both explicitly and implicitly In the e300 core all SPRs are 32 bits wide The following SPRs are accessible by user level software e Link register LR The LR can be used to provide the branch target address and to hold the return address after branch and link instructions The LR is 32 bits wide in 32 bit implementations e Count register CTR The CTR is decremented and tested automatically as a result of branch and count instructions The CTR is 32 bits wide in 32 bit implementations e300 Power Architecture Core Family Reference Manual Rev 3 18 Freescale Semiconductor Overview e XER register The 32 bit XER contains the summary overflow bit integer carry bit overflow bit and a field specifyin
485. otection ege geed 6 23 MISTS DIE SCE PLO ead saad cats EI EE AE E E as ncs tee San 6 23 TEB Organization EE 6 23 TEB Entry Invalido einan a e E a E 6 25 Page Address Translation Summary 6 25 Page E RE 6 25 Page Table Search Operation Conceptual FLOW 00 eee eeceeeceeseecsseceseeeeeeeeaeeeaeens 6 25 Implementation Specific Table Search Operation cc ceeeeeeeseesseceeeeeeeeeeseeesaeens 6 29 Resources for Table Search Operations 0 ecccceesceceseceeeececeeeeeceeeeeeeseeeeseeeeenaeees 6 29 Data and Instruction TLB Miss Address Registers DMISS and IMISS 6 31 Data and Instruction TLB Compare Registers DCMP and CMD 6 32 Primary and Secondary Hash Address Registers HASH1 and HASH2 6 32 Required Physical Address Register RA 6 33 Software Table Search Operation ege ti acceae cote eae enee eier 6 34 Flow for Example Interrupt Handlerz ee eeececeeeeeceeececeeeeeesneeeeneeeenaeeeenaes 6 34 Code for Example Interrupt Handlers AA 6 38 Page Table Updates ccsicic wine eee eee ha ele eal eee Ue eels 6 44 Segment Register E EE 6 44 Chapter 7 Instruction Timing erreneren duet cee t vat iepara ae cieee neste tenia 7 1 Instruction Timing Overview avciiccisccccesassescavessenaiadaseeyedeasvesuaeesntsdeonswesaatesaveaecasnscnedesesenndeds 7 3 Taming eet 7 9 General Tnstr cti e 7 9 Instruction Fetch TER 7 10 ee 7 10 C che EE 7 11 ELE 7 14 Instruction Dispatch and Completion Considerations ce
486. ough accesses except when the store instructions are separated by a sync or eieio instruction the e300 core does not implement this combined store capability A store operation that uses the write through attribute may cause any part of valid data in the cache to be written back to main memory The eieio instruction is treated as a no op by the e300 core The definition of the external memory location to be written to in addition to the on chip cache depends on the implementation of the memory system and can be illustrated by the following examples e RAM The store is sent to the RAM controller to be written into the target RAM e I O device The store is sent to the memory mapped I O control hardware to be written to the target register or memory location e300 Power Architecture Core Family Reference Manual Rev 3 6 Freescale Semiconductor Instruction and Data Cache Operation In systems with multilevel caching the store must be written to at least a depth in the memory hierarchy that is seen by all processors and devices Accesses that correspond to W 0 are considered write back For this case although the store operation is performed to the cache it is only made to external memory when a copy back operation is required Use of the write back mode W 0 can improve overall performance for areas of the memory space that are seldom referenced by other masters in the system 4 4 1 2 Caching Inhibited Attribute I If I 1
487. our blocks per set The organization of the e300c2 and e300c3 data cache is shown in Figure 4 2 Block 0 Block 1 Words 0 7 Block 2 Address Tag 2 Words 0 7 Block 3 Address Tag 3 Words 0 7 if 8 Words Block gt Figure 4 2 e300c2 and e300c3 Data Cache Organization Each block consists of 32 bytes of data 32 parity bits state bits and an address tag Each cache block contains eight contiguous words from memory that are loaded from an eight word boundary that is bits e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 3 Instruction and Data Cache Operation A27 A31 are not part of the cache block address thus a cache block never crosses a page boundary Misaligned accesses across a page boundary can incur a performance penalty Bits A20 A26 provide an index to select a set Bits A27 A31 select a byte within a block The tags consists of bits PAO PA19 Address translation occurs in parallel such that higher order bits the tag bits in the cache are physical The state bits implement either a standard four state MESI coherency protocol or a three state MEI protocol Cache coherency is enforced by on chip bus snooping logic Since the data cache tags are single ported a load or store simultaneous with a snoop access represents a resource contention The snoop access is given first access to the tags Load or store operatio
488. out the debug features of the e300 core This chapter also describes trace facility debug features for e300 core Appendix A Instruction Set Listings lists all the PowerPC instructions while indicating those instructions that are not implemented by the e300 core it also includes the instructions that are specific to the e300 core Instructions are grouped according to mnemonic opcode function and form Also included is a quick reference table that contains general information such as the architecture level privilege level and form and indicates if the instruction is 64 bit and optional Appendix B Instructions Not Implemented provides a list of the 32 and 64 bit instructions that are not implemented in the e300 core Appendix C Revision History lists the major differences between revisions of this reference manual This reference manual also includes a glossary and an index Suggested Reading This section lists additional reading that provides background for the information in this reference manual as well as general information about the PowerPC architecture General Information The following documentation available through Morgan Kaufmann Publishers 340 Pine Street Sixth Floor San Francisco CA provides useful information about the PowerPC architecture and computer architecture in general The PowerPC Architecture A Specification for a New Family of RISC Processors Second Edition by Internationa
489. owing list of interrupt categories describes how the e300 core handles interrupts up to the point of signaling the appropriate interrupt to occur Note that a recoverable state is reached if the completed store queue is empty drained not canceled and any instruction that is next in program order and has been signaled to complete has completed If MSR RI is clear the core is in a nonrecoverable state by default Also completion of an instruction is defined as performing all architectural register writes associated with that instruction and then removing that instruction from the completion buffer queue Asynchronous nonmaskable nonrecoverable system reset caused by the assertion of hreset These interrupts have highest priority and are taken immediately regardless of other pending interrupts or recoverability A nonpredicted address is guaranteed Asynchronous maskable nonrecoverable machine check A machine check interrupt takes priority over any other pending interrupt except a nonrecoverable system reset caused by the assertion of either reset or internally during POR A machine check interrupt is taken immediately regardless of recoverability A machine check interrupt can occur only if the machine check enable bit MSR ME is set If MSR ME is cleared the processor goes directly into checkstop state when a machine check exception condition occurs A nonpredicted address is guaranteed Asynchronous nonmaskable recoverable system
490. page is in copy back mode data being stored to that page is written only to the on chip cache If a page is in write through mode writes to that page update the on chip cache on hits and always update main memory If a page is cache inhibited data in that page will never be stored in the on chip cache All three of these modes of operation have advantages and disadvantages A decision as to which mode to use depends on the system environment as well as the application The following sections describe how performance is impacted by each memory update mode For details about the operation of the on chip cache and the memory update modes see Chapter 4 Instruction and Data Cache Operation e300 Power Architecture Core Family Reference Manual Rev 3 24 Freescale Semiconductor Instruction Timing 7 5 1 Copy Back Mode When data is stored in a location marked as copy back store operations for cacheable data do not necessarily cause an external bus cycle to update memory Instead memory updates only occur on modified line replacements cache flushes or when another processor attempts to access a specific address for which there is a corresponding modified cache entry For this reason copy back mode may be preferred when external bus bandwidth is a potential bottleneck for example in a multiprocessor environment Copy back mode is also well suited for data that is closely coupled to a processor such as local variables If more than
491. pair B BAT block address translation mechanism A software controlled array that stores the available block address translations on chip Beat A single state on the e300 bus interface that may extend across multiple bus cycles A e300 transaction can be composed of multiple address or data beats Biased exponent An exponent whose range of values is shifted by a constant bias Typically a bias is provided to allow a range of positive values to express a range that includes both positive and negative values Big endian A byte ordering method in memory where the address n of a word corresponds to the most significant byte In an addressed memory word the bytes are ordered left to right 0 1 2 3 with 0 being the most significant byte See Little endian Block An area of memory that ranges from 128 Kbytes to 256 Mbytes whose size translation and protection attributes are controlled by the BAT mechanism Boundedly undefined A characteristic of certain operation results that are not rigidly prescribed by the PowerPC architecture Boundedly undefined results for a given operation may vary among implementations and between execution attempts in the same implementation e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Glossary 1 Although the architecture does not prescribe the exact behavior for when results are allowed to be boundedly undefined the results of executing instructions in co
492. pendent instruction and data block address translation IBAT and DBAT arrays each containing 8 pairs of BATs for a total of 16 BAT registers Effective addresses are compared simultaneously with all eight entries in the BAT array during block translation Figure 2 1 lists SPR numbers for the BAT registers SDRI1 The SDR1 register specifies the page table base address used in virtual to physical address translation Note that physical address is referred to as real address in the architecture specification e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Register Model Segment registers SRs The OEA defines sixteen 32 bit segment registers SRO SR15 The fields in the segment register are interpreted differently depending on the value of bit 0 Interrupt handling registers Data address register DAR After a data access or an alignment interrupt the DAR is set to the effective address generated by the faulting instruction The SPRGO SPRG7 registers are provided for operating system use which reduce the latency that may be incurred because of saving registers to memory while in a handler and also assist in searching the page tables in software If software table searching is not enabled then these registers may be used for any supervisor purpose Note that the e300 core implements four additional SPRGs SPRG4 SPRG7 than previous PowerPC cores These additional registers a
493. performance monitor register that is not implemented and is not privileged PMRN 5 0 results in an illegal instruction exception type program interrupt When MSR PR 1 specifying a performance monitor register that is privileged PMRN 5 1 results in a privileged instruction execution type program interrupt When MSR PR 0 specifying an unimplemented performance monitor register is boundedly undefined Other registers altered none e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 39 Instruction Set Model mtpmr mtpmr Move To Performance Monitor Register Integer Unit mtpmr PMRN rS 01 1 1 1 1 rS PMRNS 9 PMRNoO 4 O 1 1 10 01 1 10 o 0 5 6 10 11 15 16 20 21 31 PMREG PMRN lt GPR rS PMRN denotes a performance monitor register as listed in Table 11 1 and Table 11 2 The contents of GPR rS are placed into the designated performance monitor register When MSR PR 1 specifying a performance monitor register that is not implemented and is not privileged PMRN 5 0 results in an illegal instruction exception type program interrupt When MSR PR 1 specifying a performance monitor register that is privileged PMRN 5 1 results in a privileged instruction execution type program interrupt When MSR PR 0 specifying an unimplemented performance monitor register is boundedly undefined Other registers altered none e300 Power Architecture Core Fami
494. performed on the clock cycle following the snoop The e300 core includes a new instruction cancel extension The instruction cancel extension improves utilization of the instruction cache during cancel operations It allows a new instruction fetch to be issued to the cache or to the bus if a canceled instruction fetch is pending or active on the bus This supports hit under cancel and miss under cancel instruction fetch operations 1 1 6 Bus Interface Unit BIU Because the caches are on chip write back caches the most common transactions are burst read memory operations burst write memory operations and single beat noncacheable or write through memory read and write operations There can also be address only operations variants of the burst and single beat operations for example global memory operations that are snooped and atomic memory operations and address retry activity for example when a snooped read access hits a modified cache block Memory accesses can occur in single beat 1 8 bytes and four beat burst 32 bytes data transfers on the 64 bit data bus The address and data buses operate independently to support pipelining and split transactions during memory accesses The e300 bus interface unit BIU has been enhanced to allow a pipeline slot to become available once a previous transaction has been granted the data bus that is as early as when the data tenure starts rather e300 Power Architecture Core Family Reference
495. ph Number 4 10 3 1 5 4 10 3 1 6 4 10 3 1 7 4 10 3 1 8 4 10 3 2 4 10 3 2 1 4 10 3 2 2 4 10 3 2 3 4 10 3 2 4 4 10 3 2 5 4 10 3 2 6 4 10 3 2 7 4 10 3 2 8 5 1 5 1 1 5 1 2 5 2 5 2 1 5 2 1 1 5 2 1 2 5 2 1 3 5 2 1 4 52 2 5 2 3 5 2 4 5 2 3 5 2 6 5 3 5 4 5 5 5 5 1 5 5 1 1 5 5 1 2 5 5 1 3 5 5 2 5 5 2 1 5 35 22 Page Title Number Loading the Data Cache ici seceisiscgcattassaciva teieni oiii n ai 4 37 Locking the Entire Data Cache s c csacs cecoteseqntenesteenativougccsusnaies aehanecateedundeneetoenaee 4 38 Way Locking the Data TE 4 38 Invalidating the Data Cache Even if Locked AA 4 39 Instruction Cache Lockng Drocedures 4 39 Enabling the Instruction Cache 5 2ic csssdecsesecaanaesdaceasuaded e csyanacteanedenta sucdadsvanseant 4 40 Address Translation for Instruction Cache Locking ceeeseeeeeeeeeeeeeeeeeees 4 40 Disabling Interrupts for Instruction Cache Lockmg 4 4 Preloading Instructions into the Instruction Cache 4 4 Locking the Entire Instruction Cache x4 22 3 2cch ceca etek e eect aienaases 4 43 Way Lockine the Instruction Cache 2 4 ci cncieias eit eaiendns 4 43 Invalidating the Instruction Cache Even if Locked 0 0 cee eeeeeseceseeeeeeenees 4 44 Instruction Cache Way Protection 0 ceccccessecesscecesececesececseccececeecseeeeenteeessaes 4 45 Chapter 5 Interrupts and Exceptions Joere eessen 5 2 ee ei het 5 6 Summary of Front End Interrupt Handling 5 7 Interrupt Processin
496. porary buffers used by instructions that have finished execution but have not completed e Reservation station A buffer between the dispatch and execute stages that allows instructions to be dispatched even though the results of instructions on which the dispatched instruction may depend are not available e Retirement Removal of the completed instruction from the CQ e Stage The term stage is used in two different senses depending on whether the pipeline is being discussed as a physical entity or a sequence of events In the latter case a stage is an element in the pipeline during which certain actions are performed such as decoding the instruction performing an arithmetic operation or writing back the results A stage is typically described as taking a processor clock cycle to perform its operation however some events such as dispatch and write back happen instantaneously and may be thought to occur at the end of the stage An instruction can spend multiple cycles in one stage An integer multiply for example takes multiple cycles in the execute stage When this occurs subsequent instructions may stall In some cases an instruction may also occupy more than one stage simultaneously especially in the sense that a stage can be seen as a physical resource for example when instructions are dispatched they are assigned a place in the CQ at the same time they are passed to the execute stage They can be said to occupy both the co
497. procedures described in Chapter 7 Memory Management in the Programming Environments Manual augmented with information in this chapter The memory subsystem uses the physical address for the access For a complete discussion of effective address calculation see Section 3 2 2 3 Effective Address Calculation 6 1 2 MMU Organization Figure 6 1 shows the conceptual organization of a PowerPC MMU in a 32 bit implementation note that it does not describe the specific hardware used to implement the memory management function for a particular processor Processors may optionally implement on chip TLBs and may optionally support the automatic search of the page tables for PTEs In addition other hardware features invisible to the system software not depicted in the figure may be implemented Figure 6 2 and Figure 6 3 show the conceptual organization of the core instruction and data MMUs respectively The instruction addresses shown in Figure 6 2 are generated by the processor for sequential instruction fetches and addresses that correspond to a change of program flow Data addresses shown in Figure 6 3 are generated by load and store instructions and by cache instructions As shown in the figures after an address is generated the higher order bits of the effective address EAO EA19 or a smaller set of address bits EAO EAn in the cases of blocks are translated into physical address bits PAO PA19 The lower order address bits A20
498. program interrupt is generated To invalidate all entries of both TLBs 32 tlbie instructions must be executed incrementing the value in EA 15 19 by 1 each time See Chapter 8 Instruction Set in the Programming Environments Manual for detailed information about the tlbie instruction 6 4 4 Page Address Translation Summary Figure 6 8 provides the detailed flow for the page address translation mechanism The figure includes the checking of the N bit in the segment descriptor and then expands on the TLB Hit branch of Figure 6 6 The detailed flow for the TLB Miss branch is described in Section 6 5 1 Page Table Search Operation Conceptual Flow Note that as in the case of block address translation if the debz instruction is attempted to be executed either in write through mode or as cache inhibited W 1 or I 1 the alignment interrupt is generated The checking of memory protection violation conditions for page address translation is described in Chapter 7 Memory Management in the Programming Environments Manual for 32 bit implementations 6 5 Page Table Search Operation As stated earlier the operating system must synthesize the table search algorithm for setting up the tables The core TLB miss interrupt handlers also use this algorithm with the assistance of some hardware generated values to load TLB entries when TLB misses occur as described in Section 6 5 2 Implementation Specific Table Search Operation
499. protect the system from undesired accesses caused by speculative load operations or instruction prefetches that could lead to the generation of the machine check interrupt Also the guarded bit can be used to prevent speculative load operations or prefetches from occurring to certain peripheral devices where such speculative accesses could produce undesired results for example a destructive read from a FIFO buffer The processor will perform speculative out of order accesses to this area of memory only as follows e Speculative load operations from guarded memory areas are performed only if the corresponding data is resident in the cache e The processor prefetches from guarded areas but only when required and only within the memory boundary dictated by the cache block That is if an instruction is certain to be required for execution by the program it is fetched and the remaining instructions in the block may be prefetched even if the area is guarded 4 4 1 5 W I and M Bit Combinations Table 4 1 summarizes the six combinations of the WIM bits A setting of zero or one for the G bit is allowed for each of these WIM bit combinations Table 4 1 Combinations of W I and M Bits WIM Setting Meaning 000 Data may be cached Loads or stores whose target hits in the cache use that entry in the cache Memory coherency is not enforced by hardware 001 Data may be cached Loads or stores whose targe
500. pt decrementer interrupt hard or soft reset or machine check input mcp signal A return to full power state from a nap state takes only a few processor clock cycles e Sleep Sleep mode reduces power consumption to a minimum by disabling all internal functional units then external system logic may disable the PLL and sysclk Returning the core to the full power state requires the enabling of the PLL and sysclk followed by the assertion of an external asynchronous interrupt system management interrupt hard or soft reset or mcp signal after the time required to relock the PLL e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Overview 1 1 7 2 Time Base Decrementer The time base is a 64 bit register accessed as two 32 bit registers that is incremented once every four bus clock cycles external control of the time base is provided through the time base enable tben signal The decrementer is a 32 bit register that generates a decrementer interrupt after a programmable delay The contents of the decrementer register are decremented once every four bus clock cycles and the decrementer interrupt is generated as the count passes through zero 1 1 7 3 JTAG Test and Debug Interface The core provides JTAG and hardware debug functions for facilitating board testing and chip debugging The JTAG test interface based on IEEE 1149 1 provides a means for boundary scan testing of the core and the a
501. pted by a DSI interrupt associated with the address translation of the second page In this case the core performs some or all of the memory references from the first page and none of the memory references from the second page before taking the interrupt On return from the DSI interrupt the load or store multiple instruction will re execute from the beginning For additional information refer to DSI Interrupt 0x00300 in Chapter 6 Interrupts in the Programming Environments Manual e The PowerPC architecture defines the load multiple word Imw instruction with rA in the range of registers to be loaded as an invalid form It defines the load multiple and store multiple instructions with misaligned operands that is the EA is not a multiple of four to cause an alignment interrupt The core defines the load multiple word Imw instruction with rA in the range of registers to be loaded as an invalid form e The PowerPC architecture describes some preferred instruction forms for the integer load and store multiple instructions that may perform better than other forms in some implementations None of these preferred forms affect instruction performance in the core Table 3 17 Integer Load and Store Multiple Instructions Name Mnemonic Operand Syntax Load Multiple Word Imw rD d rA Store Multiple Word stmw rS d rA 3 2 4 3 7 Integer Load and Store String Instructions The integer load and store string instru
502. ptional data cache operation broadcast feature enabled by HIDO ABE that allows for coherent system management All of the data cache control instructions except debz debi deht and dcbst require that HIDO ABE be enabled to broadcast Instruction fetch burst feature allows all instruction fetches from caching inhibited space to be performed on the bus as burst transactions Interrupts The e300 core offers hardware support for misaligned little endian accesses Little endian load store accesses that are not on a word boundary except for strings and multiples generate interrupts under the same circumstances as big endian accesses The e300 core supports true little endian mode to minimize the impact on software porting from true little endian systems e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 7 Overview An input interrupt signal cint is provided to trigger the critical interrupt exception on the e300 core Also an interrupt pm_event_int is provided on the e300c3 to support the performance monitor e Bus clock PLL configuration signals include seven signals for settings and control pll_cfg 0 6 e Debug features Breakpoint status recorded in DBCR and IBCR control registers Two signals for the debug interface stopped and ext_halt Performance monitor registers for system analysis in the e300c3 Figure 1 1 Figure 1 2 and Figure 1 3 provide block diag
503. r ptr HASH1 8 compare_value lt ICMP DCMP Read Lower Word of Next PTE from Memory ptr lt ptr 8 OPa 0 Otherwise compare_value oN temp compare_value Otherwise Read Upper Word of PTE temp lt ptr 4 Secondary Hash RPA lt temp Complete Setup for P Load Secondary etup for Page itar Instruction Access and Si PTEG Pointer temp G 1 ompare_value H lt 1 Set Counter cnt lt 8 See Figure 6 17 Otherwise lt ea gt 4 IMISS DMISS Check R C Bits and Set as Needed Setup for Protection Violation Exception See Figure 6 18 See Figure 6 16 Load TLB Entry tlbli lt ea gt or tlbld lt ea gt Restore Old Counter and CRO Bits eturn to Executing Program fi Figure 6 15 Flow for Example Software Table Search Operation e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 35 Memory Management The flow for checking the R and C bits and setting them appropriately is shown in Figure 6 16 Check R C Bits and Set as Needed Handler for Data Store Op Otherwise temp C 0 Check Protection pp 10 Otherwise Bet R Bit lemp lt temp OR 0x100 10 PP pp 00 Store Byte 7 of PTE to Memory pp 11 01 ptr 2 lt temp Byte 7 Setup for Return to TLB Miss Protection Violation Interrupt Flow See Figure 6 18 See Figure 6 15 SRR1 KEY 1 Otherwi
504. r code examples that take advantage of these registers In addition the core also automatically saves the values of CR CRO of the executing context to SRR1 0 3 whenever one of the three TLB miss interrupts occurs Thus the interrupt handler can set CR CRO bits and branch accordingly in the interrupt handler routine without having to save the existing CR CRO bits However the interrupt handler must restore these bits to CR CRO before executing the rfi instruction There are also four other bits saved in SRR1 whenever a TLB miss interrupt occurs that give information about whether the access was an instruction or data access and if it was a data access whether it was for a load or a store instruction Also these bits give some information related to the protection attributes for the access and which set in the TLB will be replaced when the next TLB entry is loaded Refer to Section 6 5 2 1 Resources for Table Search Operations for more information on these bits and their use e300 Power Architecture Core Family Reference Manual Rev 3 18 Freescale Semiconductor Memory Management 6 2 Real Addressing Mode If address translation is disabled MSR IR 0 or MSR DR 0 for a particular access the effective address is treated as the physical address and is passed directly to the memory subsystem as described in Chapter 7 Memory Management in the Programming Environments Manual Note that the default WIMG bits 0b
505. r instructions regardless of the setting of HID2 13 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 17 Memory Management Table 6 6 MMU Registers continued Register Description SDR1 The SDR1 register specifies the variable used in accessing the page tables in memory SDR1 is defined as a 32 bit register for 32 bit implementations This is a special purpose register that is accessed by the mtspr and mfspr instructions Instruction TLB miss address When a TLB miss interrupt occurs the IMISS or DMISS register contains the 32 bit effective and data TLB miss address address of the instruction or data access respectively that caused the miss Note that the e300 registers IMISS and DMISS core always loads a big endian address into the DMISS register These registers are implementation specific Primary and secondary hash The HASH1 and HASH2 registers contain the primary and secondary PTEG addresses that address registers HASH1 correspond to the address causing a TLB miss These PTEG addresses are automatically and HASH2 derived by the core by performing the primary and secondary hashing function on the contents of IMISS or DMISS for an ITLB or DTLB miss interrupt respectively These registers are implementation specific Instruction and data PTE The ICMP and DCMP registers contain the word to be compared with the first word of a PTE in compare registers th
506. r maintaining optional TLBs bie instruction in G2 core e300 core specific 64 entry 32 entry byway two way set associative ITLB 64 entry 32 entry byway two way set associative DTLB Segment descriptors Architecturally defined Stored as segment registers on chip e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Memory Management Table 6 1 MMU Features Summary continued Architecturally Defined Feature Category e300 Core Specitic Feature Page table search e300 core specific Three MMU interrupts defined ITLB miss DTLB miss on load and support DTLB miss on store or C 0 MMU related bits set in SRR1 for these interrupts IMISS and DMISS registers missed effective address HASH1 and HASH2 registers PTEG addr ICMP and DCMP registers for comparing PTEs RPA register for loading TLBs tlbli rB instruction for loading ITLB entries tlbld rB instruction for loading DTLB entries Shadow registers for GPRO GPR3 can use r0 r3 in table search handler without corruption of r0 r3 in context that was previously executing called T PRO TGPR3 6 1 1 Memory Addressing A program references memory using the effective logical address computed by the processor when it executes a load store or cache instruction and when it fetches the next instruction The effective address is translated to a physical address according to the
507. r marked as shared if operating in MESI mode If no push is required from the data cache the debst instruction instead conditionally generates an address broadcast operation on the bus The address broadcast is contingent upon whether HIDO ABE is set or whether the target address is mapped as memory coherency required M 1 in MESI mode The debst instruction is always effective on the data cache or the bus regardless of WIMG settings or whether the data cache is enabled 4 5 2 5 Data Cache Block Flush dcbf Instruction The effective address is computed translated and checked for protection violations as defined in the PowerPC architecture This instruction is treated as a load to the addressed byte with respect to address translation and protection If the address hits in the cache and the block is in the modified state the modified block is written back to memory and the cache block is invalidated If the address hits in the cache and the cache block is in the exclusive or shared state the cache block is invalidated If the address misses in the cache no action is taken The function of this instruction is independent of the WIMG bit settings of the block or PTE containing the effective address However the execution of debf broadcasts an address only flush transaction on the CSB if HIDO ABE is set or if operating in four state MESI mode and the target address is marked memory coherency required Execution of a debf instruction aff
508. r would have attempted to execute next if no interrupt conditions were present SRR1 0 15 Cleared 16 31 Loaded from MSR 16 31 MSR POW 0 FP 0 FEI 0 RI 0 TGPR 0 ME CE LE Set to value of ILE ILE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 When an external interrupt is taken instruction execution for the handler begins at offset 0x00500 from the physical base address indicated by MSR IP The e300 core only recognizes the interrupt condition int asserted if the MSR EE bit is set it ignores the interrupt condition if the MSR EE bit is cleared To guarantee that the external interrupt is taken the int signal must be held asserted until the e300 core takes the interrupt If the int signal is negated before the interrupt is taken the e300 core is not guaranteed to take an external interrupt The interrupt handler must send a command to the device that asserted int acknowledging the interrupt and instructing the device to negate int before the handler re enables recognition of external interrupts 5 5 6 Alignment Interrupt 0x00600 This section describes conditions that can cause alignment interrupts in the e300 core The e300 core implements the alignment interrupt as it is defined in the PowerPC architecture For information on bit e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 25 Interrupts and Exceptions settings and how interrupt conditions are detected re
509. rams of the e300 cores that show how the execution units IU FPU BPU LSU and SRU operate independently and in parallel It should be noted that this is a conceptual diagram and does not attempt to show how these features are physically implemented on the device The e300 core provides address translation and protection facilities including an ITLB DTLB and instruction and data BAT arrays Instruction fetching and issuing are handled in the instruction unit Translation of addresses for cache or external memory accesses are handled by the MMUs Both units are discussed in more detail in Section 1 1 2 Instruction Unit and Section 1 1 5 1 Memory Management Units MMUs 1 1 2 Instruction Unit As shown in Figure 1 1 Figure 1 2 and Figure 1 3 the e300 core instruction unit containing a fetch unit instruction queue dispatch unit and BPU provides centralized control of instruction flow to the execution units The instruction unit determines the address of the next instruction to be fetched based on information from the sequential fetcher and from the BPU The instruction unit fetches the instructions from the instruction cache into the instruction queue The BPU receives branch instructions from the fetcher and uses static branch prediction to allow fetching from a predicted instruction stream while a conditional branch is evaluated The BPU folds out for unconditional branch instructions and conditional branch instructions u
510. ransactions however ci reflects the state of the bit in the MMU for that page regardless of cache disabled status DCE is zero at power up 1 The data cache is enabled 18 ILOCK Instruction cache lock 0 Normal operation 1 The entire instruction cache is locked that is all eight ways of the cache are locked A locked cache supplies data normally on a hit but the access is treated as a cache inhibited transaction on a miss On a miss the transaction to the bus is single beat or burst depending on the value of HID2 IFEB however ci still reflects the state of the bit in the MMU for that page regardless of whether the cache is locked or disabled To prevent locking during a cache access an isync instruction must precede the setting of ILOCK e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Register Model Table 2 5 e300 HIDO Field Descriptions continued Bits Name Function 19 DLOCK Data cache lock 0 Normal operation 1 The entire data cache is locked that is all eight ways of the cache are locked A locked cache supplies data normally on a hit but is treated as a cache inhibited transaction on a miss On a miss the transaction to the bus is single beat however ci still reflects the state of the bit in the MMU for that page regardless of whether the cache is locked or disabled A snoop hit to a locked L1 data cache p
511. rates on true little endian instructions and data from memory The critical interrupt is an additional interrupt in the e300 core and has higher priority order than the system management interrupt Also debug features are improved in the e300 Additional SPRG interrupt handling registers are provided for enhancing flexibility for the operating system The e300c3 include a performance monitor facility that provides the ability to monitor and count predefined events such as core clocks misses in the instruction cache data cache or L2 cache types of instructions dispatched mispredicted branches and other occurrences The count of such events which may be an approximation can be used to trigger the performance monitor interrupt Section 1 1 7 5 Core Performance Monitor describes the operation of the performance monitor diagnostic tool This functionality is fully described in Chapter 11 Performance Monitor 1 1 1 Features This section describes the major features of the e300 core e High performance superscalar microprocessor core As many as three instructions issued and retired per clock two instructions plus one branch instruction As many as five instructions in execution per clock Single cycle execution for most instructions e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Overview Pipelined floating point unit FPU for all single precision and double precis
512. ration Registers Machine State Register Hardware Implementation Registers SPR 1008 SPR 1009 SPR 1011 Register MBAR SPR311 Instruction BAT Registers SPR 528 SPR 529 SPR 530 SPR 531 SPR 532 SPR 533 SPR 534 SPR 535 SPR 560 SPR 561 SPR 562 SPR 563 SPR 564 SPR 565 Data BAT Registers SPR 536 SPR 537 SPR 538 SPR 539 SPR 540 SPR 541 SPR 542 SPR 543 SPR 568 SPR 569 SPR 570 SPR 571 SPR 572 SPR 573 SPR 566 SPR 574 SPR 567 SPR 575 Interrupt Handling Registers DSISR SPR 272 SPR 18 SPR 273 Save and Restore Registers SPR 274 SRRO SPR 26 RR1 SPR 275 S SPR 27 SPR 276 Data Address Register Memory Base Address Overview System Processor Version Register Memory Management Registers Software Table Search Registers SPR 976 SPR 977 SPR 978 SPR 979 SPR 980 SPR 981 SPR 982 Segment Registers Miscellaneous Registers Decrementer Time Base Facility For Writing TBL SPR 284 TBU SPR 285 SPR277 Gap SPR 19 Breakpoint Registers SPR 278 Instruction Data Address SPR 279 Breakpoint Register SPR 1010 SPR 1018 SPR 1013 SPR 317 Instruction Data Address Breakpoint Control IBCR SPR 309 DBCR SPR310 Performance Monitor PMR 144 147 1 These registers are e300 core implementation specific not defined by the PowerPC architecture e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 17 Overview Figure 1 4 e300
513. rchitecture Core Family Reference Manual Rev 3 Freescale Semiconductor 37 Instruction and Data Cache Operation 4 10 3 1 6 Locking the Entire Data Cache Locking of the entire data cache is controlled by the data cache lock bit HIDO DLOCK bit 19 Setting HIDO DLOCK to 1 locks the entire data cache To unlock the data the HIDO DLOCK must be cleared to 0 Setting the DLOCK bit must be preceded by a sync instruction to prevent the data cache from being locked during a data access The following assembly code locks the entire data cache Set the DLOCK bit in HIDO bit 19 mfspr rl HIDO ori rl SL 0x1000 sync mtspr HIDO rl isync NOTE The debz instruction ignores DLOCK and always allocates a tag Therefore it is recommended that when setting DLOCK the user also set the way lock bits to lock the maximum number of ways Refer to Section 4 10 3 1 7 Way Locking the Data Cache for more information 4 10 3 1 7 Way Locking the Data Cache Data cache way locking is controlled by HID2 DWLCK bits 24 26 Table 4 16 shows the HID2 DWLCK 0 2 settings for the e300c1 core embedded processor Table 4 16 e300c1 Core DWLCK 0 2 Encodings DWLCK 0 2 Ways Locked Ob000 No ways locked 0b001 Way 0 locked 0b010 Ways 0 and 1 locked 0b011 Ways 0 1 and 2 locked 0b100 Ways 0 1 2 and 3 locked 0b101 Ways 0 1 2 3 and 4 locked 0b110 Ways 0 1 2 3 4 and
514. rd A and the four data beats are ordered in the following manner Beat A B C D If the address requested is in double word C the address placed on the bus will be that of double word C and the four data beats are ordered in the following manner Beat 1 2 3 C D A B Figure 4 9 Double Word Address Ordering Critical Double Word First 4 9 3 Instruction Fetch Burst Enable for Caching Inhibited Space In the G2 core all instruction fetches to caching inhibited CI space results in single beat bus transactions and in the fetched instructions not being cached The ci_b signal reflects the setting of the I bit for the block or page that contains the address of the current transaction In the e300 core the instruction fetch burst enable extension allows all instruction fetches from caching inhibited instruction space to be performed on the CSB as burst transactions similar to caching allowed instruction space even though the instructions are not cached This allows for greater performance to instruction space that is not desired to be cached With this performance extension up to an entire cache line may be returned up to 8 instructions with one bus operation The remaining instructions in the burst transaction that follow the critical instruction are forwarded to the instruction fetch unit as they are normally requested until an instruction redirect or instruction stall occurs Because instruc
515. re can be improved by observing the following guidelines e Implement good static branch prediction setting of y bit in BO field e When branch prediction is uncertain or an even probability predict fall through e To reduce mispredictions separate the instruction that sets CR bits from the branch instruction that evaluates them separation by more than nine instructions ensures that the CR bits will be immediately available for evaluation e When branching conditionally to a location specified by count registers CTRs or link registers LRs or when branching conditionally based on the value in the count register separate the mtspr instruction that initializes the CTR or LR from the branch instruction performing the evaluation Separation of the branch and mtspr instruction by more than nine instructions ensures the register values will be immediately available for use by the branch instruction e Schedule instructions such that they can dual dispatch e Schedule instructions to minimize stalls when an execution unit is busy e Avoid using serializing instructions e Schedule instructions to avoid dispatch stalls due to renamed resource limitations Only five instructions can be in execute complete stage at any one time Only five GPR destinations can be in execute complete deallocate stage at any one time Note that load with update address instructions use two destination registers Only four FPR destinations can be in
516. re Core Family Reference Manual Rev 3 Freescale Semiconductor 27 Instruction Set Listings Table A 35 X Form continued Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 debi 2 31 00000 A B 470 0 dcbst 31 00000 A B 54 0 dcbt 31 00000 A B 278 0 dcbtst 31 00000 A B 246 0 dcbz 31 00000 A B 1014 0 eciwx 31 D A B 310 0 ecowx 31 S A B 438 0 eieio 31 00000 00000 00000 854 0 eqvx 31 A B 284 Re extsbx 31 S A 00000 954 Rc extshx 31 S A 00000 922 Rc extswx 31 S A 00000 986 Rc fabsx 63 D 00000 B 264 Rc fcfidx 63 D 00000 B 846 Rc fcmpo 63 crfD 00 A B 32 0 fempu 63 crfD 00 A B 0 0 fetidx 63 D 00000 B 814 Re fetidzx 63 D 00000 B 815 Re fctiwx 63 D 00000 B 14 Re fctiwzx 63 D 00000 B 15 Re fmrx 63 D 00000 B 72 Re fnabsx 63 D 00000 B 136 Re fnegx 63 D 00000 B 40 Re frspx 63 D 00000 B 12 Re icbi 31 00000 A B 982 0 icbt gt 31 00000 A B 22 0 Ibzux 31 A B 119 0 Ibzx 31 A B 87 0 Idarx 31 D A B 84 0 Idux 31 D A B 53 0 Idx 31 D A B 21 0 Ifdux 31 D A B 631 0 Ifdx 31 D A B 599 0 Ifsux 31 D A B 567 0 lfsx 31 D A B 535 o e300 Power Architecture Core Family Reference Manual Rev 3 28 Freescale Semiconductor Name Ihaux Ihax Ihbrx Ihzux Ihzx Iswi 2 Iswx 2 lwarx Iwaux Iwax Iwbrx lwzux lwzx merfs merxr mfcr mffsx mfmsr 2 mfsr 2 mfsr
517. re corrupted registers lwz r23 0x05e4 r0 mtcrf Oxff r23 lwz r23 0x05e8 r0 lwz r22 0x05ec r0 lwz r21 0x05f0 r0 lwz r20 0x05f4 r0 lwz r0 0x05fc r0 sync EEL FE KK KK I I ee ee I ee KK Kk e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Chapter 10 Debug Features This chapter describes the debug features of the PowerPC architecture with respect to the e300 core The e300 includes the trace facility debug features The enhanced debug features are described as follows e Addition of breakpoint status bits to IBCR and DBCR 10 1 Breakpoint Resources The e300 core provides enhanced debug facilities instruction address breakpoint data address breakpoint and program single stepping to enable software debug events The original IABR and single step functions are supplemented by the new debug features The debug facilities consist of a set of debug control registers DBCR IBCR a set of instruction address breakpoint registers ABR IABR2 and a set of data address breakpoint registers DABR DABR2 These registers used together enable various breakpoint functions Additional hardware debug facilities exist through the JTAG debug interface These registers are accessible only to supervisor level programs by the mfspr and mtspr instructions The SPR addresses for the registers can be found in Table 3 32 of Chapter 3 Instruction Set Model When an instruction or data address breakpoi
518. re not defined by the PowerPC architecture The format of these registers is defined in Section 2 2 11 SPRGO SPRG7 DSISR The DSISR defines the cause of data access and alignment interrupts Machine status save restore register 0 1 SRRO SRR1 The SRRO and SRR1 are used to save machine status on interrupts and to restore machine status when an rfi instruction is executed For more information refer to Chapter 6 Memory Management NOTE The e300 core implements the KEY bit bit 12 in the SRR1 register to simplify the table search software For more information refer to Chapter 6 Memory Management Note that to support critical interrupts two new registers CSRRO and CSRR1 are implemented on the e300 core which are not defined by the PowerPC architecture These registers have the same bit assignments as SRRO and SRR1 albeit with different SPR numbers as described in Section 2 2 Implementation Specific Registers Miscellaneous registers The time base facility TB for writing The TB is a 64 bit register pair that can be used to provide time of day or interval timing It consists of two 32 bit registers time base upper TBU and time base lower TBL The TB is incremented once every four core input clock cycles Decrementer DEC The DEC register is a 32 bit decrementing counter that provides a mechanism for causing a decrementer interrupt after a programmable delay The DEC is decrem
519. red PTE then a page fault must be synthesized The handler must restore the machine state and clear MSR TGPR before invoking the DSI interrupt 0x00300 Software table search operations are discussed inChapter 6 Memory Management When a data TLB miss on load interrupt is taken instruction execution for the handler begins at offset 0x01100 from the physical base address indicated by MSR IP e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 33 Interrupts and Exceptions Table 5 21 Instruction and Data TLB Miss Interrupts Register Settings Register Setting Description SRRO Set to the address of the next instruction to be executed in the program for which the TLB miss interrupt was generated SRR1 0 3 Loaded from condition register CRO field 4 11 Cleared 12 KEY Key for TLB miss SR Ks or SR Kp depending on whether the access is a user or supervisor access 13 D I Data or instruction access 0 Data TLB miss 1 Instruction TLB miss 14 WAY Next TLB set to be replaced set per LRU 0 Replace TLB associativity set 0 1 Replace TLB associativity set 1 15 S L Store or load data access 0 Data TLB miss on load 1 Data TLB miss on store or C 0 16 31Loaded from MSR 16 31 MSR POW 0 FP 0 FEI 0 RI 0 TGPR 1 ME CE LE Set to value of ILE ILE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 5 5 16 Data TLB Miss on Store Interrupt 0x01200
520. resented to the CSB 4 4 3 2 Sequential Consistency of Memory Accesses The PowerPC architecture requires that all memory operations executed by a single processor be sequentially consistent with respect to that processor This means that all memory accesses appear to be executed in program order with respect to interrupts and data dependencies The e300 load store unit is very simple allowing only one load at a time and only one or one and a half stores at a time However loads are allowed to bypass caching allowed stores once interrupt checking has been performed for the store but data dependency checking is handled in the load store unit so that a load will not bypass a store with an address match Loads do not bypass caching inhibited stores Although memory accesses that miss in the cache are forwarded to the CSB interface all potential synchronous interrupts have been resolved before the cache In addition although subsequent memory accesses can address the cache full coherency checking between the cache and the memory queue is provided to avoid dependency conflicts e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Instruction and Data Cache Operation 4 4 3 3 Enforcing Load Store Ordering As defined by the PowerPC architecture the eieio instruction provides an ordering function for the effects of certain classes of load and store instructions executed by a given processor These classe
521. rformance model described in Chapter 3 Operand Conventions in the Programming Environments Manual 3 2 Instruction Set Summary This section describes instructions and addressing modes defined for the e300 core These instructions are divided into the following functional categories e Integer instructions These include arithmetic and logical instructions For more information see Section 3 2 4 1 Integer Instructions e Floating point instructions These include floating point arithmetic instructions as well as instructions that affect the floating point status and control register FPSCR For more information see Section 3 2 4 2 Floating Point Instructions Floating point instructions are not supported on the e300c2 core e Load and store instructions These include integer and floating point load and store instructions For more information see Section 3 2 4 3 Load and Store Instructions e Flow control instructions These include branching instructions condition register logical instructions and other instructions that affect the instruction flow For more information see Section 3 2 4 4 Branch and Flow Control Instructions e Trap instructions These are used to test for a specified set of conditions see Section 3 2 4 5 Trap Instructions e Processor control instructions These are used for synchronizing memory accesses and managing caches TLBs and segment registers For more in
522. ri rl rl 0x100 set reference bit srw Elsa St 8 get byte 7 of pte tlbld ro load the dtlb stb rl 6 r2 update page table rfi return to executing program e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 41 Memory Management Register usage rO is saved counter rl is junk r2 is pointer to pteg r3 is current compare value dataSecHash andi rl r3 0x0040 S if we have done second hash bne doDSI if so go to DSI interrupt mfspr r2 hash2 get the second pointer ori r3 r3 0x0040 change the compare value addi ET rs 38 load 8 for counter addi D er s 8 pre dec for update on load b Om try second hash C 0 in dtlb and dtlb miss on store flow Entry Vec 1200 rr gt address of store that caused the interrupt srrl gt 0 3 cr0 4 lru way bit 5 1 16 31 saved MSR msr lt tgpr gt gt 1 dMiss gt ea that missed dcmp gt the compare value for the va that missed hashl gt pointer to first hash pteg hash2 gt pointer to second hash pteg Register usage r0 is saved counter rl is junk r2 is pointer to pteg r3 is current compare value csect tlbmiss PR Org vec0 0x1200 tlbCeq0 mfspr r2 hashl get first pointer addi EL e D load 8 for counter mfctr ro save counter mfspr r3 dCmp get first compare value addi Es EZ 8 pre dec the pointer ceq0 mtctr TAL load counter ceql lwzu ri 8 r2 get next pte cmp CO Arh ES see
523. rity interrupt exists and any instruction other than rfi rfci mtmsr or isync is successfully completed Note that other processors will take the trace interrupt on isync instructions when MSR SE is set the e300 core does not take the trace interrupt on isyne instructions Single step instruction trace mode is described in Section 5 5 12 1 Single Step Instruction Trace Mode e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 31 Interrupts and Exceptions e When MSR BE is set the branch trace interrupt is taken after each branch instruction is completed e The e300 core deviates from the architecture by not taking trace interrupts on isync instructions Single step instruction trace mode is described in Section 5 5 12 2 Branch Trace Mode Successful completion implies that the instruction caused no other interrupts A trace interrupt is never taken for an sc or trap instruction that takes a trap interrupt MSR SE and MSR BE are cleared when the trace interrupt is taken In the normal use of this function MSR SE and MSR BE are restored when the interrupt handler returns to the interrupted program using an rfi instruction Register settings for the trace mode are described in Table 5 20 Table 5 20 Trace Interrupt Register Settings Register Setting Description SRRO Set to the address of the instruction following the one for which the trace interrupt was generated
524. rl 8 r2 get next pte cmp CO El E see if found pte bdnzft eq iml dec count br if cmp ne and if count not zero bne instrSecHash if not found set up second hash or exit L rl 4 r2 load tlb entry lower word andi T3y ely RB check G bit bne doISIp if guarded take an ISI mtctr ro restore counter mfspr r0 iMiss get the miss address for the tlbli mfspr 3 SEL get the saved cr0 bits mtcrf 0x80 r3 restore CRO mtspr rpa rl set the pte ori rl rl 0x100 set reference bit srwi rl rl 8 get byte 7 of pte tlbli ro load the itlb stb rl 6 r2 update page table rfi return to executing program Register usage rO is saved counter rl is junk r2 is pointer to pteg r3 is current compare value instrSecHash andi rl r3 0x0040 S if we have done second hash bne doIsSI if so go to ISI interrupt mfspr r2 hash2 get the second pointer ori r3 r3 0x0040 change the compare value addi rl 0 8 load 8 for counter addi E2 625 48 pre dec for update on load b im0 try second hash entry Not Found synthesize an ISI interrupt guarded memory protection violation synthesize an ISI interrupt Entry r0 is saved counter rl is junk r2 is pointer to pteg r3 is current compare value doISIp mfspr Hoy SERI get srrl andi r2 r3 0xffff clean upper srrl e300 Power Architecture Core Family Reference Manual Rev 3 40 Freescale Semiconductor Memory Management addis r2 r2 0x0800 or in srr lt 4 gt 1 to flag prot violation b Tsiis doISI
525. rom the instruction queue to the appropriate execution unit Instruction dispatch requires the following Instructions can be dispatched only from the two lowest instruction queue entries QO and IQ1 A maximum of two instructions can be dispatched per clock cycle Only one instruction can be dispatched to each execution unit per clock cycle There must be a vacancy in the specified execution unit A rename register must be available for each destination operand specified by the instruction For an instruction to dispatch the appropriate execution unit must be available and there must be an open position in the CQ If no entry is available the instruction remains in the IQ The execute stage consists of the time between dispatch to the execution unit or reservation station and the point at which the instruction vacates the execution unit Most integer instructions have a one cycle latency results of these instructions can be used in the clock cycle after an instruction enters the execution unit However integer multiply and divide instructions take multiple clock cycles to complete The IU can process all integer instructions The e300c2 e300c3 integrate two integer units The latency for multiply instructions in both units is now a maximum of 2 cycles to complete execution Also it is now possible to dispatch and execute two integer instruction types at one time i e two multiply instructions which significant
526. ry groups PTEGs ICMP and DCMP contain a duplicate of the first word in the page table entry PTE for which the table search is looking The required physical address RPA register is loaded by the core with the second word of the correct PTE during a page table search The system version register SVR is available on the e300 core which identifies the specific version model and revision level of the system on a chip SOC integration System memory base address MBAR is an implementation specific register available on the e300 core It supports a temporary storage for the system level memory map The instruction and data address breakpoint registers IABR IABR2 DABR DABR2 are loaded with an instruction or data address respectively that is compared to instruction addresses in the dispatch queue or to the data address in the LSU When an address match occurs a breakpoint interrupt is generated One instruction breakpoint control register IBCR and one data breakpoint control register DBCR are implemented in the e300 core To support critical interrupts two registers CSRRO and CSRR1 are included in the e300 core Eight SPRG registers SPRGO SPRG7 are in the e300 core Block address translation BAT arrays The e300 core has eight instruction and eight data BAT registers e300 Power Architecture Core Family Reference Manual Rev 3 20 Freescale Semiconductor 1 3 2 Overview The hardware implementation
527. ry word the bytes are ordered left to right 3 2 1 0 with 3 being the most significant byte See Big endian Mantissa The decimal part of logarithm MEI modified exclusive invalid Cache coherency protocol used to manage caches on different devices that share a memory system Note that the PowerPC architecture does not specify the implementation of a MEI protocol to ensure cache coherency Memory access ordering The specific order in which the processor performs load and store memory accesses and the order in which those accesses complete Memory mapped accesses Accesses whose addresses use the page or block address translation mechanisms provided by the MMU and that occur externally with the bus protocol defined for memory Memory coherency An aspect of caching in which it is ensured that an accurate view of memory is provided to all devices that share system memory e300 Power Architecture Core Family Reference Manual Rev 3 Glossary 6 Freescale Semiconductor Memory consistency Refers to agreement of levels of memory with respect to a single processor and system memory for example on chip cache secondary cache and system memory Memory management unit MMU The functional unit that is capable of translating an effective logical address to a physical address providing protection mechanisms and defining caching methods Modified state MEI state M in which one and only one caching device has the valid
528. s As TLB entries are on chip copies of PTEs in the page tables in memory they are similar in structure TLB entries consist of two words the high order word contains the VSID and API fields of the high order word of the PTE and the low order word contains the RPN C bit WIMG bits and PP bits as in the low order word of the PTE In order to uniquely identify a TLB entry as the required PTE the TLB entry also contains five more bits of the page index EA 10 14 in addition to the API bits of the PTE e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 23 Memory Management When an instruction or data access occurs the effective address is routed to the appropriate MMU EA 0 3 select 1 of the 16 segment registers and the remaining effective address bits and the virtual address from the segment register is passed to the TLB EA 15 19 then select two entries in the TLB the valid bit is checked and EA 10 14 VSID and API fields EA 4 9 for the access are then compared with the corresponding values in the TLB entries If one of the entries hits the PP bits are checked for a protection violation and the C bit is checked If these bits do not cause an interrupt the RPN value is passed to the memory subsystem and the WIMG bits are then used as attributes for the access Also note that the segment registers do not have a valid bit and so they should also be initialized before translation i
529. s In addition the cache must be globally flushed before it is disabled to prevent coherency problems when it is re enabled Although snooping is not performed when the data cache is disabled cache operations caused by the dcbz dcbf dcbst and deht instructions are not affected by disabling the cache Thus there is a risk of coherency errors Regardless of the state of DCE load and store operations are assumed to be weakly ordered Thus the LSU can perform load operations that occur later in the program ahead of store operations even when the data cache is disabled However strongly ordered load and store operations can be enforced through the setting of the I bit of the page WIMG bits when address translation is enabled Note that when address translation is disabled the default WIMG bits cause the I bit to be cleared accesses are assumed to be caching allowed thus the accesses are weakly ordered Refer to Section 4 4 1 2 Caching Inhibited Attribute I for a description of the operation of the I bit and Section 6 2 Real Addressing Mode for a description of the WIMG bits when address translation is disabled 4 5 1 3 Data Cache Lock HIDO DLOCK The entire contents of the data cache may be locked by setting the data cache lock bit HIDO DLOCK For a locked data cache there are no new tag allocations except those caused by dchz instructions The setting of DLOCK must be preceded by a sync instruction to prevent the
530. s r3 r3 0x0008 test the KEY bit SRR1 bit 12 beq chk2 if KEY 0 goto chk2 b doDSIp else DSIp chk2 ori rl rl 0Ox180 set reference and change bit sth rl GEZ update page table b ceq2 and back we go entry Not Found synthesize a DSI interrupt Entry rO is saved counter rl is junk r2 is pointer to pteg r3 is current compare value doDSI mfspr e SCEL get srrl rlwinm Pla r3 97 656 get srrli lt flag gt to bit 6 for load store zero rest addis rl rl 0x4000 or in dsisr lt 1l gt 1 to flag pte not found b dsil doDSIp mfspr E3 srr get srrl rlwinm L1G E 19776 6 get srri lt flag gt to bit 6 for load store zero rest addis rl rl 0x0800 or in dsisr lt 4 gt 1 to flag prot violation e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 43 Memory Management dsil mtctr ro restore counter andi Ee 43 OxXELEL clear upper bits of srrl mtspr ser 2 set srrl mtspr dsisr rl load the dsisr mfspr rl dMiss get miss address rlwinm r2 r2 0 31 31 test LE bit beq dsi2 if little endian then xor r1 r1 0x07 de mung the data address Ges mtspr dar rl put in dar mfimsr ro get msr xoris Y0 CD 0x2 flip the msr lt tgpr gt bit mtcrf 0x80 r3 restore CRO mtmsr ro flip back to the native gprs b vec300 branch to DSI interrupt 6 5 3 Page Table Updates TLBs are defined as noncoherent caches of the PTEs TLB entries must be flushed explicitly with the TLB invalidate entr
531. s 16 Kbytes 4 way 8 words 4 Kbytes 4 10 2 Cache Locking Register Summary Table 4 11 through Table 4 13 outline the registers and bits used to perform cache locking on the e300 core Refer to Section 2 2 1 Hardware Implementation Register 0 HIDO for a complete description of the HIDO and MSR registers Refer to Section 2 2 3 Hardware Implementation Register 2 HID2 for a complete description of the HID2 register Table 4 11 HIDO Bits Used to Perform Cache Locking Bits Name Description 16 ICE Instruction cache enable This bit must be set for instruction cache locking See Section 4 10 3 1 1 Enabling the Data Cache 17 DCE Data cache enable This bit must be set for data cache locking See Section 4 10 3 1 1 Enabling the Data Cache 18 ILOCK Instruction cache lock Set to lock the entire instruction cache See Section 4 10 3 2 5 Locking the Entire Instruction Cache 19 DLOCK Data cache lock Set to lock the entire data cache See Section 4 10 3 1 6 Locking the Entire Data Cache e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 33 Instruction and Data Cache Operation Table 4 11 HIDO Bits Used to Perform Cache Locking continued Bits Name Description 20 ICFI Instruction cache flash invalidate Setting and then clearing this
532. s are purged and instruction fetching continues along the alternate path See Chapter 8 Instruction Timing in the Programming Environments Manual for more information about how branches are executed 3 2 4 4 1 Branch Instruction Address Calculation Branch instructions can change the instruction sequence Instruction addresses are always assumed to be word aligned the processor ignores the two low order bits of the generated branch target address Branch instructions compute the effective address EA of the next instruction address using the following addressing modes e Branch relative e Branch conditional to relative address e Branch to absolute address e Branch conditional to absolute address e Branch conditional to link register e Branch conditional to count register 3 2 4 4 2 Branch Instructions Table 3 21 lists the branch instructions provided by the processors that implement the PowerPC architecture To simplify assembly language programming a set of simplified mnemonics and symbols is provided for the most frequently used forms of branch conditional compare trap rotate and shift and certain other instructions See Appendix F Simplified Mnemonics in the Programming Environments Manual for a list of simplified mnemonic examples Table 3 21 Branch Instructions Name Mnemonic Operand Syntax Branch b ba bl bla target_addr Branch Conditional bc bca bcl bcla BO Bl target_addr Bra
533. s enabled EAO EA31 Segment Registers 0 7 8 31 EAO EA3 EA4 EA14 Se 1 eee 0 EA15 EA19 Select Compare t CH oO E D amp el PAO PA19 Figure 6 7 Segment Register and TLB Organization e300 Power Architecture Core Family Reference Manual Rev 3 24 Freescale Semiconductor Memory Management 6 4 3 2 TLB Entry Invalidation For processors such as the e300 core that implement TLB structures to maintain on chip copies of the PTEs that are resident in physical memory the optional tlbie instruction provides a way to invalidate the TLB entries Note that the execution of the tlbie instruction in the e300 core invalidates four entries both the ITLB entries indexed by EA 15 19 and both the indexed entries of the DTLB The architecture allows tlbie to optionally enable a TLB invalidate signaling mechanism in hardware so that other processors also invalidate their resident copies of the matching PTE The core does not signal the TLB invalidation to other processors and does not perform any action when a TLB invalidation is performed by another processor The tlbsync instruction causes instruction execution to stop if the tlbisync input signal is also asserted If tlbisync is negated instruction execution may continue or resume after the completion of a tlbsyne instruction The tibia instruction is not implemented in the e300 core and when its opcode is encountered an illegal instruction
534. s is not provided The PVR consists of the fields as described in Table 2 2 Architecturally the PVR consists of two 16 bit fields as described in Table 2 2 and Figure 2 3 The e300c1 core version number is 0x8083 and the revision level starts at 0x0010 and changes for each revision of the core The e300c2 core version number is 0x8084 and the revision level starts at 0x0010 The e300c3 core version number is 0x8085 and the revision level starts at 0x0010 Table 2 2 Architectural PVR Field Descriptions Bits Name Description 0 15 Version A 16 bit number that uniquely identifies a particular processor version This number can be used to determine the version of a processor it may not distinguish between different end product models if more than one model uses the same processor 16 31 Revision A 16 bit number that distinguishes between various releases of a particular version that is an engineering change level The value of the revision portion of the PVR is implementation specific The processor revision level is changed for each revision of the device SPR 287 Access Supervisor read only H 15 16 31 R Version Number Revision Level W Reset 0x8083_0010 for e300c1 0x8084_0010 for e300c2 0x8085_0010 for e300c3 Figure 2 3 e300 Processor Version Register e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Table 2 3 describes some of the PVR values
535. s jixcise sideaasepicesvssgecezs EE sosis ri AEE aE E E EE EEE S aS A 17 Floating Point Arithmetic Instructions erdeetec geed EEGENEN A 17 Floating Point Multiply Add Instructions ssseseeeeeeeeeseesseseessrserssressessresressrssresressesseseresresse A 18 Floating Point Rounding and Conversion Instruectons A 18 Floating Point Compare Instructions e ce esceeesecsseceeeeesseeceaeceseeseneeeaeecaeesseeesaeecaeenseensees A 18 Floating Point Status and Control Register Instructons ee eeeeeeseceseeeeeeeeseecnaeenseeees A 18 Inite Ser Load Ree A 19 Inte per Store trett deeg ESA A 19 Integer Load and Store with Byte Reverse Instructions 2 0 0 cece eeseeeseeeseeceeeceseeeeeeesneeenaeenes A 20 Integer Load and Store Multiple Instructions 20 0 cece eeeceeeseeesseceseceeeeeeneessaecneeseaeeeaeenes A 20 Integer Load and Store String Instructions 2 0 cece eeseeseecesecseeeesacecaecnseesseeesseecsaeenseesees A 20 Memory Synchronization Instructions i c 20 2 1tesecce estes deccicaschen acted sl achat ceased A 21 Floating Point Load Instructions guest ae ue A 21 Floating Point Store Instructions eege AE A 21 Floating Point Move Tote HCH tegt eebe ege ie ee Eege eet A 22 Branch E ee A 22 Condition Register Logical Instructions ee eeegeubteuAENegeden Ee eeeseeestennsiendcets doeataesteeds A 22 System Linkage Instructions dek Steed iene A 22 Trap instructions ET A 23 Processor Control Future eeben deed A 23 Cache Management Oste eege ee A
536. s mapping for a particular access The PTEs reside in page tables in memory As defined for 32 bit implementations by the PowerPC architecture segment descriptors reside in 16 on chip segment registers The e300 core provides the ability to invalidate a TLB entry The TLB Invalidate Entry tlbie instruction invalidates the TLB entry indexed by the EA and operates on both the instruction and data TLBs simultaneously invalidating four TLB entries both sets in each TLB The index corresponds to bits 15 19 of the EA To invalidate all entries within both TLBs 32 tlbie instructions should be issued incrementing this field by one each time The core provides two implementation specific instructions tlbld and tlbli that are used by software table search operations following TLB misses to load TLB entries on chip For more information on tlbld and tlbli refer to Section 3 2 8 Implementation Specific Instructions Note that the tlbia instruction is not implemented on the core Refer to Chapter 6 Memory Management for more information about the TLB operations for the core Table 3 35 lists the TLB instructions Table 3 35 Translation Lookaside Buffer Management Instructions Name Mnemonic Operand Syntax Load Data TLB Entry tlbld rB Load Instruction TLB Entry tibli rB TLB Invalidate Entry tlbie rB TLB Synchronize tlbsync Because the presence and exact semantics of the translation lookaside buffer
537. s occur when the most significant bit of PMCn is equal to 1 It is recommended that CE be cleared when counter PMCn is selected for chaining 6 8 Reserved should be cleared 9 15 EVENT Event selector Up to 128 events selectable See Section 11 5 2 Event Selection 16 31 Reserved should be cleared 11 2 4 User Local Control A Registers UPMLCa0 UPMLCa3 The PMLCa contents are reflected to UPMLCa0 UPMLCa3 which can be read by user level software with mfpmr using PMR numbers in Table 11 2 11 2 5 Performance Monitor Counter Registers PMC0 PMC3 The performance monitor counter registers PMC0 PMC3 shown in Figure 11 3 are 32 bit counters that can be programmed to generate interrupt signals when they overflow Each counter is enabled to count up to 128 events e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Performance Monitor PMCO PMR16 UPMCO PMRO Access PMCO PMC3 Supervisor only PMC1 PMR17 UPMC1 PMR1 UPMCO UPMC3 Supervisor user read only PMC2 PMR18 UPMC2 PMR2 PMC3 PMR19 UPMC3 PMR3 o oi 31 R OV Counter value W Reset All zeros Figure 11 3 Performance Monitor Counter Registers PMC0 PMC3 User Performance Monitor Counter Registers UPMCO UPMC3 PMCs are cleared by a hard reset Table 11 5 describes PMC register fields Table 11 5 PMCO PMC3 Field Descriptions Bits Name Descriptio
538. s of instructions are automatically ordered by the LSU on the e300 core and thus core treats the eieio instruction as a no op For the e300 core caching inhibited load and store operations are performed in strict program order Therefore the caching inhibited WIMG bit may be used to strictly order memory accesses In addition the sync instruction may be used to enforce the ordering of any class of load store operations The syne instruction provides a stronger ordering on the bus by requiring the transaction to complete on the bus The e300 does not broadcast eieio and sync instructions 4 4 3 4 Atomic Memory References The Load Word and Reserve Indexed Iwarx and Store Word Conditional Indexed stwex instructions provide an atomic update function for a single aligned word of memory While an Iwarx instruction should normally be paired with an stwex instruction with the same effective address an stwex instruction to any address will cancel the reservation For detailed information on these instructions refer to Chapter 3 Instruction Set Model in this book and Chapter 8 Instruction Set in the Programming Environments Manual 4 5 Cache Control The core provides several means of cache control through the use of the memory cache access attributes WIMG bits implementation specific control bits in the HIDO and HID2 registers and dedicated cache control instructions Memory block page level cache control is provided by the W
539. s used to specify a GPR to be used as a destination rS The rS instruction field is used to specify a GPR to be used as a source Real address mode An MMU mode when no address translation is performed and the effective address specified is the same as the physical address The processor s MMU is operating in real address mode if its ability to perform address translation has been disabled through the MSR registers IR and or DR bits Record bit Bit 31 or the Rc bit in the instruction encoding When it is set updates the condition register CR to reflect the result of the operation Referenced bit One of two page history bits found in each page table entry The processor sets the referenced bit whenever the page is accessed for a read or write See also Page access history bits Register indirect addressing A form of addressing that specifies one GPR that contains the address for the load or store Register indirect with immediate index addressing A form of addressing that specifies an immediate value to be added to the contents of a specified GPR to form the target address for the load or store Register indirect with index addressing A form of addressing that specifies that the contents of two GPRs be added together to yield the target address for the load or store e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Glossary 9 Rename register Temporary buffers used by instructions that
540. saction is a single beat read caching inhibited and the block in the cache is in the exclusive state the snoop causes no bus activity and the block remains in the exclusive state If the block is in the cache in the modified state the core initiates a push of the modified block out to memory and marks the cache block as exclusive Read atomic operations appear on the bus in response to war instructions and generate the same snooping responses as read operations MESI The read operation is used by most single beat read and burst read operations on the bus In the MESI protocol burst reads are treated as reads All burst reads observed on the bus are snooped as if they were reads A read on the bus with the gbi signal asserted causes the following responses e If the addressed block in the cache is in the invalid state the core takes no action e If the addressed block in the cache is in the exclusive state the core signals shared and the address snoop forces the state of the addressed block to shared e f the addressed block in the cache is in the modified state the address snoop signals retry and shared and initiates a push of the modified block out of the cache and changes the state of the block to shared Read atomic operations appear on the bus in response to Iwarx instructions and generate the same snooping responses as read operations Read with intent to modify RWITM RWITM atomic MEI MESI A RWITM operation is issued t
541. sal during the load or store accesses is performed between memory or the data cache and the register files Table 3 16 Integer Load and Store with Byte Reverse Instructions Name Mnemonic Operand Syntax Load Half Word Byte Reverse Indexed Ihbrx rD rA rB Load Word Byte Reverse Indexed Iwbrx rD rA rB Store Half Word Byte Reverse Indexed sthbrx rS rA rB Store Word Byte Reverse Indexed stwbrx rS rA rB Implementation Note In some implementations load byte reverse instructions Ihbrx and Iwbrx may have greater latency than other load instructions however these instructions operate with the same latency as other load instructions in the core 3 2 4 3 6 Integer Load and Store Multiple Instructions The integer load store multiple instructions are used to move blocks of data to and from the GPRs In some implementations these instructions are likely to have greater latency and take longer to execute perhaps much longer than a sequence of individual load or store instructions that produce the same results e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 19 Instruction Set Model Implementation notes The following describes the e300 core implementation of the load store multiple instruction e The load multiple and store multiple instructions may have operands that require memory accesses crossing a 4 Kbyte page boundary As a result these instructions may be interru
542. scribed cycle by cycle as follows 0 In cycle 0 instructions 0 and 1 are fetched from the instruction cache and are placed in the two entries in the instruction queue OO and IQ1 where they can be dispatched on the next clock cycle In cycle 1 instructions 0 and 1 are dispatched to the IU and FPU respectively Notice that for instructions to be dispatched they must be assigned positions in the CQ In this case because the CQ is empty instructions 0 and 1 take the two lowest CQ entries CQO and CQ1 Instructions 2 and 3 are fetched from the instruction cache At least two IQ positions were available in the IQ in cycle 1 so in cycle 2 instructions 4 and 5 are fetched Instruction 4 is a branch unconditional instruction that resolves immediately as taken Because the branch is taken and does not update CTR or LR it can be folded from the IQ Instruction 0 completes writes back its results and vacates the CQ by the end of the clock cycle Instruction 1 enters the second FPU execute stage instruction 2 enters the single stage IU and instruction 3 is dispatched into the first FPU stage In cycle 3 target instructions 6 and 7 are fetched replacing the folded br instruction 4 and instruction 5 Instruction 1 enters the last FPU execute stage instruction 2 has executed but must remain in the CQ until instruction 1 completes Note that it can make its results available to subsequent instructions but cannot be removed from the CQ Instructi
543. scribes the additional features and functionality changes These addenda are intended for use with the corresponding book Implementation Variances Relative to Rev I of the Programming Environments Manual is available at http www freescale com Technical summaries Each device has a technical summary that provides an overview of its features This document is roughly the equivalent to the overview Chapter 1 of an implementation s user s or reference manual Application notes These short documents contain useful information about specific design issues useful to programmers and engineers working with Freescale processors Additional literature is published as new processors become available For a current list of documentation refer to http www freescale com Conventions This document uses the following notational conventions mnemonics Instruction mnemonics are shown in lowercase bold italics Italics indicate variable command parameters for example bectrx Book titles in text are set in italics 0x0 Prefix to denote hexadecimal number 0b0 Prefix to denote binary number e300 Power Architecture Core Family Reference Manual Rev 3 xxv Freescale Semiconductor rA rB rAl0 rD frA frB frC frD REG FIELD JB x amp 0000 Instruction syntax used to identify a source GPR Contents of a specified GPR or the value 0 Instruction syntax used to identify a destination GPR Instruction syntax used to identif
544. se an interrupt If a prior memory access instruction causes direct store error interrupts the results are guaranteed to be determined before this instruction is executed e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Instruction Set Model e Previous instructions complete execution in the context privilege protection and address translation under which they were issued e The instructions following the sc or rfi instruction execute in the context established by these instructions 3 2 2 4 2 Execution Synchronization An instruction is execution synchronizing if all previously initiated instructions appear to have completed before the instruction is initiated or in the case of the Synchronize sync and Instruction Synchronize isync instructions before the instruction completes For example the Move to Machine State Register mtmsr instruction is execution synchronizing It ensures that all preceding instructions have completed execution and will not cause an interrupt before the instruction executes but does not ensure subsequent instructions execute in the newly established environment For example if the mtmsr sets MSR PR unless an isyne immediately follows the mtmsr instruction a privileged instruction could be executed or privileged access could be performed without causing an interrupt even though MSR PR indicates user mode 3 2 2 4 3 Instruction Related Interrupts There are
545. se etup for Protection Violation See Figure 6 18 Store Bytes 6 7 of PTE to Memory ptr 2 lt temp Bytes 6 7 Return to TLB Miss Interrupt Flow See Figure 6 15 Figure 6 16 Check and Set R and C Bit Flow e300 Power Architecture Core Family Reference Manual Rev 3 36 Freescale Semiconductor Memory Management Figure 6 17 shows the flow for synthesizing a page fault interrupt when no PTE is found Setup for Page Fault Interrupt Data TLB Miss Handlers Instruction TLB Miss Handlers SRR1 SRR1 AND OxFFFF Branch to ISI Interrupt Handler DSISR 6 lt SRR1 15 Clear Upper Bits of SRR1 SRR1 lt SRR1 AND OxFFFF DSISR 1 lt 1 dtemp lt DMISS SRR1 31 1 Little Endian Mode Otherwise dtemp lt dtemp XOR 0x07 DAR lt dtemp Restore CRO Bits MSR TGPR 0 Branch to DSI Interrupt Handler Figure 6 17 Page Fault Setup Flow e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 37 Memory Management Figure 6 18 shows the flow for managing the cases of a TLB miss on an instruction access to guarded memory and a TLB miss when C 0 and a protection violation exists The setup for these protection violation exceptions is very similar to that of page fault conditions as shown in Figure 6 17 except that different bits in SRR1 and DSISR are set Setup for Protection Violation Interrupts Data TLB Miss Handlers Instruct
546. seesseesseesseresseesssresseesseesseresseeessesse 5 28 5 5 7 1 IEEE Floating Point Exception Program Interrupts 0 0 0 0 eeececeeeeeeeeeeeeeeeeeeeees 5 29 5 5 7 2 Illegal Reserved and Unimplemented Instructions Program Interrupts 5 29 5 5 8 Floating Point Unavailable Interrupt OXO0800 0 eee eeeececeeeeeceeeeeeeteeeceteeeeeneeeees 5 29 5 5 9 Decrementer Interrupt 0X00900 nania n e E E A E 5 30 5 5 10 Critical Intetr pt OxOOAUO ercsi acen useen aat g n ii 5 30 5 5 11 System Call Interrupt Ox00CO0 s c 5 icisssisssecsiseccsasacs sesetasnteneetecdeaenaussad toasdeceasedessces 5 31 5 5 12 Trace Interrupt EISE arsit eaaeo E tege 5 31 5 5 12 1 Single Step Instruction Trace Mode AA 5 32 5 5 12 2 Branch Trace bet EE ee sada cae E EREE EA 5 32 5 5 13 Performance Monitor Interrupt OXOOFOO ssesesseessesssesssesesseessresseesseessseessseesseese 5 33 5 5 14 Instruction TLB Miss Interrupt OXO1000 0 0 0 eeeeeeeneceececeeeeeceeeeeceteeeceeeeeeeteeeees 5 33 5 5 15 Data TLB Miss on Load Interrupt OX01100 00 eee eeeeeceeeeeceeeeeeeteeeceteeeeeeeeens 5 33 5 5 16 Data TLB Miss on Store Interrupt OO 200 5 34 5 5 17 Instruction Address Breakpoint Interrupt Us 3001 5 34 5 5 18 System Management Interrupt 0x01400 0 eee eeeceecseeeeseeeeceeeeeceteeecsteeecseeeesaes 5 36 Chapter 6 Memory Management 6 1 MMU Features gereest ee See ee 6 2 6 1 1 Memory CES SS INS sa scat ea a otras gash oan eae a a ite mas ei
547. seessessrssesssesosstenseseesscestensssot 2 20 Critical Interrupt Save Restore Register 0 CSRRO ose ceeeeseeeeseeceeeeeeeeaeeeeaeens 2 21 Critical Interrupt Save Restore Register 1 CSRR1 0 eee eeeeeeeeeeseeeneceeeeeeeenaeee 2 22 EE EE Ee 2 22 System Version Resister S VIR tee dE EES 2 23 System Memory Base Address MBAR 2 23 Instruction Address Breakpoint Registers TABR and IABR2 cee eeeeeeeteeeee 2 23 Instruction Address Breakpoint Control Register OBCR 2 24 Data Address Breakpoint Register DABR and DADBRI A 2 25 Data Address Breakpoint Control Register ODBCR ce sceeeeeeeeseeeeneeeeeeeseeenneens 2 27 Performance Monitor Registers 5 2cteca ive nce Mn avdn ie eee BAe 2 28 Chapter 3 Instruction Set Model E TEE 3 1 Data Organization in Memory and Memory Operands eee eeeeeseceeeeereeeneeeneeeees 3 1 Endian Modes and Byte Ordering EE 3 1 Alignment and Misaligned AcCesses c cseccseccestercorteecerteccenseesontescestecceenseseenteee 3 2 Floating Point Execution Model 3 3 Effect of Operand Placement on Performance eee eeseeseceseeesseceseeeseeeeseesnaeeneeees 3 4 Instruction Set Summary oiiasissscacsavsavcdea ENEE ERENNERT 3 4 Classes of IS thu CM OMS ees donde iei E teas Ge ee ee a ae a 3 5 Definition of Boundedly Undefined AA 3 5 Defined Tetris 3 6 Ilegal Instr ction E 3 6 Reserved Instruction Class icc cgieseieed as deta eeede geet ieia 3 7 Addressing Mee buteurs EC Ego 3 7 Memory e
548. shed to memory and invalidated 4 4 3 Core Initiated Load Store Operations Load and store operations are assumed to be weakly ordered In general the load store unit LSU can perform load operations that occur later in the program ahead of store operations even when the data cache is disabled Any load followed by any store is performed in order See Section 4 4 3 2 Sequential Consistency of Memory Accesses for more information 4 4 3 1 Performed Loads and Stores The PowerPC architecture defines a performed load operation as one that has the addressed memory location bound to the target register of the load instruction The architecture defines a performed store operation as one where the stored value is the value that any other processor will receive when executing a load operation that is of course until it is changed again With respect to the core caching allowed WIMG nOnn loads and caching allowed write back WIMG 00nn stores are performed when they have arbitrated to address the cache block in the data cache or the coherent system bus CSB Loads are considered performed at the data cache only if the cache contains a valid copy of that address Write back stores are considered performed at the data cache only if the cache contains a valid copy of that address Caching inhibited WIMG nInn loads caching inhibited WIMG nInn stores and write through WIMG 10nn stores are performed when they have been successfully p
549. shown in Table 3 23 are provided to test for a specified set of conditions If any of the conditions tested by a trap instruction are met the system trap handler is invoked If the tested conditions are not met instruction execution continues normally Table 3 23 Trap Instructions Name Mnemonic Operand Syntax Trap Word tw TO rA rB Trap Word Immediate twi TO rA SIMM See Appendix F Simplified Mnemonics in the Programming Environments Manual for a complete set of simplified mnemonics 3 2 4 6 Processor Control Instructions UISA level processor control instructions are used to read from and write to the condition register CR 3 2 4 6 1 Move To From Condition Register Instructions Table 3 24 lists the instructions provided by the core for reading from or writing to the CR Table 3 24 Move To From Condition Register Instructions Name Mnemonic Operand Syntax Move from Condition Register mfcr rD Move to Condition Register Fields mtcrf CRM rS Move to Condition Register from XER merxr crfD e300 Power Architecture Core Family Reference Manual Rev 3 24 Freescale Semiconductor Instruction Set Model 3 2 4 7 Memory Synchronization Instructions UISA Memory synchronization instructions control the order in which memory operations are completed with respect to asynchronous events and the order in which memory operations are seen by other processors or memory access mechanisms
550. sponding segment descriptor then determines if the access is to memory memory mapped or to the direct store interface space selected when the direct store translation control bit T bit in the corresponding segment descriptor is set Note that the direct store interface existed in previous processors only for compatibility with I O devices that use this interface When an access is determined to be to the direct store interface space the core takes a DSI interrupt as described in Section 5 5 3 DSI Interrupt 0x00300 if it is a data access The G2 core takes an ISI interrupt as described in Section 5 5 4 ISI Interrupt 0x00400 if it is an instruction access For memory accesses translated by a segment descriptor the interim virtual address is generated using the information in the segment descriptor Page address translation corresponds to the conversion of this virtual address into the 32 bit physical address used by the memory subsystem In most cases the physical address for the page resides in an on chip TLB and is available for quick access However if the page address translation misses in an on chip TLB the MMU causes a search of the page tables in memory using the virtual address information and a hashing function to locate the required physical address When this occurs the core vectors to the interrupt handlers that search the page tables with software Block address translation occurs in parallel with page address tra
551. sserted to the e300 core using the ext_halt pin When DBCR and IBCR are configured for an OR combinational signal type the breakpoint signals iabr iabr2 and dabr dabr2 reflect their respective breakpoints When the DBCR and IBCR are configured for AND combinational signal type only the iabr2 and dabr2 breakpoint signals are asserted after the AND condition is met that is both instruction breakpoints occurred or both data breakpoints occurred e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 33 Overview e When the core_stopped pin is asserted the e300 core has entered a stopped state and all internal clocking has stopped indicating that a hardware debug event has occurred e The ext_halt input pin can be used to force the core into halted state The halted state may be a hardstop conditional upon the HARDSTOP condition being set through the JTAG debug interface The breakpoint signaling conditions are described in Chapter 10 Debug Features 1 4 Differences Between Cores The e300 core has similar functionality to the G2_LE core Table 1 3 describes the differences between the G2_LE and the e300 Table 1 3 Differences Between e300 and G2_LE Cores e300 Core G2_LE Core Impact New HIDO bits bam The e300 core has a new HIDO bit defined to enable cache parity error reporting ECPE New HID1 bits The e300 core has new HID1 bits defined to extend the number of
552. sses that access memory directly that is reads and writes when caching is disabled caching inhibited accesses and stores in write through mode Four beat burst transactions which always transfer an entire cache block 32 bytes are initiated when a line is read from or written to memory e300 Power Architecture Core Family Reference Manual Rev 3 32 Freescale Semiconductor Overview 1 3 7 2 Signals The e300 core signals are grouped as follows 1 3 8 Interrupts Resets These signals include the external interrupt signal int critical interrupt signal cint checkstop signals performance monitor signal pm_event_in and both soft reset and hard reset signals They are used to interrupt and under various conditions to reset the core JTAG debug interface signals The JTAG based on the IEEE 1149 1 standard interface and debug unit provides a serial interface to the system for performing monitoring and boundary tests Two additional signals are added to the e300 core to allow observation of the internal clock state of the core stopped and to allow the external input to force the core into a halted state ext_halt Core status and control These signals include the memory reservation signal machine quiesce control signals time base enable signal and the tlbisync signal Clock control These signals provide for system clock input and frequency control Test interface signals Signals like address matching
553. ssigveuslavesdesuseSeaudaneasataasonnadensnceees 1 32 1 3 7 2 EE 1 33 1 3 8 MDS DUS Feature eege reegt Eege 1 33 1 3 8 1 Breakpoint SiS maT ieee ed peat asco deeds each ended alate areas aes 1 33 1 4 Differences Between Eed A s 1 34 1 5 Differences Between e300 Cores is cciesasssdaccseendcnsasacdaneace dnaadeunsderensesdeataansead couebedeasecessces 1 36 Chapter 2 Register Model 2 1 PowerPC Resister Setoa a T E EO E Ne ee 2 1 D2 Implementation Specific EE 2 10 2 2 1 Hardware Implementation Register 0 HIDO ss sssssssssssessesrssseessressessseresseessseesseese 2 12 22 2 Hardware Implementation Register 1 OIDI 2 15 2 2 3 Hardware Implementation Register 2 OHIO 2 16 2 2 4 Data and Instruction TLB Miss Address Registers MISS and MISO EE 2 18 e300 Power Architecture Core Family Reference Manual Rev 3 iv Freescale Semiconductor Paragraph Number 22D 2 2 6 2 2 1 2 2 8 2 2 9 2 2 10 2 2 11 2 2 12 2 2 13 2 2 14 2 2 15 2 2 16 22 17 2 2 18 3 1 3 1 1 3 1 2 3 1 3 3 1 4 3 1 5 3 2 3 2 1 3 2 1 1 3 2 1 2 3 2 1 3 3 2 1 4 3 2 2 3 221 3 2262 3 2 2 3 3 2 2 4 322A 3 2 2 4 2 3 2 2 4 3 3 2 3 Page Title Number Data and Instruction TLB Compare Registers DCMP and IGMP d cn lt dateensevstaseeguaetendansedees EE EE ee 2 18 Primary and Secondary Hash Address Registers HASHI and HASH2 EE 2 19 Required Physical Address Register RPA 2 20 BAT Registers BAT4 BAT 7 ssssssssessssssssseessesscsseeseesses
554. st Qut Op rationS resse loons E E A A N 4 21 4 6 5 Cache Block Kee 4 22 4 6 6 Data Cache RTE e 4 22 4 6 7 Cache Block Replacement Selection ssssssessssseesseessresseesseeesetesstesseesseessseessseesseest 4 22 4 7 Tel Cache Parity etatoeugeeentehieneuugeeheue egene 4 25 4 8 Bu Aale 4 26 4 9 Caches and CSB Transactions EE 4 27 4 9 1 Simgle Beat Trans EE 4 27 4 9 2 amer ati eA COI aos saci ao sachs ee der 4 27 4 9 3 Instruction Fetch Burst Enable for Caching Inhibited Space eee eeeceeeeeeeeeeteeeees 4 28 4 9 4 CSB Operations Caused by Cache Control Instructions 0 0 0 0 eeeeceeeeceeeeeeeeeeneeeees 4 29 4 9 5 STOOPING e cvadnndvavacdasaccasvarsansaetetenceade Ea E a S Vaavausuanssadeusdacadeaspenede atonsaetessmeasvuntersetes 4 29 4 10 Applications Information Cache Lockg AA 4 32 4 10 1 Cache Locking Terminology ege degen eegene 4 32 4 10 2 Cache Locking Register HEEN enee gedd 4 33 4 10 3 Performing Cache Bockia opier a o o eutetuertnceate 4 34 4 10 3 1 Data Cache Locking Procedures s seseesseesseesseesseeesseessessessseessseesseessresseessee 4 35 4 10 3 1 1 Enabling the Data ee Zeenen EE EE 4 35 4 10 3 1 2 Address Translation for Data Cache Locking eeieeeseeceeneeceeeeeceeeeeenteeeesaes 4 35 4 10 3 1 3 Disabling Interrupts for Data Cache Locking A 4 36 4 10 3 1 4 Invalidating the Data EE 4 36 e300 Power Architecture Core Family Reference Manual Rev 3 viii Freescale Semiconductor Paragra
555. st and debug interface 1 14 1 33 JTAG signals 8 1 8 3 8 9 L Little endian mode 5 14 Load store unit LSU 1 1 7 3 caching allowed loads stores 4 13 execution timing 7 24 latencies of load and store instructions 7 32 load store ordering 4 14 overview 1 10 performed loads and stores definition 4 13 Logical addresses translation into physical addresses 6 1 LR link register 2 6 lwarx stwex atomic memory references 4 14 8 7 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Index 5 Machine check interrupt 5 3 5 21 5 22 checkstop state 5 22 enabling with MSR ME 5 13 SRR1 bit settings 5 9 MBAR system memory base address reg 2 11 2 23 Memory accesses 1 32 Memory management unit MMU data organization 3 1 segmented memory 6 19 see also Memory management unit MMU sequential consistency of accesses 4 13 Memory cache access attributes WIMG bits 4 6 4 8 caching inhibited accesses I bit 4 7 6 16 ci internal signal 8 2 performance 7 25 address translation flow 6 11 address translation mechanisms 6 8 6 11 block address translation BAT 6 8 6 11 6 19 block diagram 6 5 6 7 cache locking 4 34 4 35 4 37 4 40 4 41 4 45 data accesses DMMU 6 1 direct address translation 6 11 6 19 enabling disabling translation data address translation MSR DR 5 13 instruction address translation MSR IR 5 13 features summary 6 2 hashing fu
556. state no action is taken e If the addressed block is in the exclusive or shared state the address snoop forces the state of the addressed block to invalid e If the addressed block is in the modified state the address snoop signals retry and initiates a flush of the modified block out of the cache and changes the state of the block to invalid e Any associated reservation is canceled Write with flush Write with flush atomic MEI MESI Write with flush and write with flush atomic operations occur after the processor issues a store or stwex instruction respectively e If the addressed block is in the invalid state no action is taken e Ifthe addressed block is in the shared state the address snoop forces the state of the addressed block to invalid e If the addressed block is in the exclusive state the address snoop forces the state of the addressed block to invalid e f the addressed block is in the modified state the address snoop signals retry initiate a push of the modified block out of the cache and change the state of the block to invalid e Any associated reservation is canceled e300 Power Architecture Core Family Reference Manual Rev 3 30 Freescale Semiconductor Instruction and Data Cache Operation Table 4 9 Snoop Response to CSB Transactions continued Snooped Transaction Type e300 Core Response Kill block MEI MESI The kill block operation is an address only bus transaction
557. stem e300 Power Architecture Core Family Reference Manual Rev 3 4 Freescale Semiconductor Memory Management Instruction Accesses Data Accesses A20 A31 MMU 32 Bit EAO EA19 G EA4 EA19 EA15 EA19 EAO EA3 IBATOU IBATOL IBAT7L Rn Upper 24 Bits of Virtual Address EAO EA14 eee ee eee _ _DBATOU _ On Chip EA0 EA14 DBATOL TLBs Optional eee ete Si egegeg DBAT7U DBAT7L EA go ee Al Page Table Search Logic Optional PAO PA14 a PA15 PA19 x PA0 PA19 o rq H Optional Geer PAO PA31 Figure 6 1 MMU Conceptual Block Diagram 32 Bit Implementations e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 5 Memory Management Instruction Unit A20 A31 BPU EAO EA19 EAO EA3 IBAT Array IBATOU IBATOL IBAT3U IBAT3L Segment Registers EAO EA14 IBAT4U Se IBATAL Cache e dE KS IBAT7 ee Paoa 427 PAO PA19 IMISS SPR980 ICMP SPR981 SPR25 SPR978 SPR979 SPR982 PAO PA19 Cache Hit Miss PAO PA31 Figure 6 2 e300 Core IMMU Block Diagram e300 Power Architecture Core Family Reference Manual Rev 3 6 Freescale Semiconductor Memory Management Load Store Unit A20 A31 DBAT Array DBATOU DBATOL
558. struction with a zero length e Accesses generated by a stwex instruction when no store is performed because a reservation does not exist e Accesses that cause interrupts and are not completed 6 4 1 2 Change Bit The change bit of a page is located both in the PTE in the page table and in the copy of the PTE loaded into the TLB if a TLB is implemented as in the e300 core Whenever a data store instruction is executed successfully if the TLB search for page address translation results in a hit the change bit in the matching TLB entry is checked If it is already set the processor does not change the C bit If the TLB change bit is 0 it is set and a table search operation is performed to also set the C bit in the corresponding PTE in the page table The e300 core causes a data TLB miss on store interrupt for this case so that the software can perform the table search operation for setting the C bit Refer to Section 6 5 2 Implementation Specific Table Search Operation for an example code sequence that handles these conditions The change bit in both the TLB and PTE in the page tables is set only when a store operation is allowed by the page memory protection mechanism and all conditional branches occurring earlier in the program have been resolved such that the store is guaranteed to be in the execution path Furthermore the following conditions may cause the C bit to be set e The execution of an stwex instruction is allowed by
559. sts the integer compare instructions Integer Compare Instructions Table 3 4 Integer Compare Instructions Name Mnemonic Operand Syntax Compare cmp crfD L rA rB Compare Immediate cmpi crfD L rA SIMM Compare Logical cmpl crfD L rA rB Compare Logical Immediate cmpli crfD L rA UIMM The erfD operand can be omitted if the result of the comparison is to be placed in CRO Otherwise the target CR field must be specified in the instruction erfD field e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 11 Instruction Set Model For more information refer to Appendix F Simplified Mnemonics in the Programming Environments Manual 3 2 4 1 3 Integer Logical Instructions The logical instructions shown in Table 3 5 perform bit parallel operations Logical instructions with the CR update enabled and instructions andi and andis set CR field CRO to characterize the result of the logical operation These fields are set as if the sign extended low order 32 bits of the result were algebraically compared to zero Logical instructions without CR update and the remaining logical instructions do not modify the CR Logical instructions do not affect the XER SO XER OV and XER CA bits For simplified mnemonics examples for the integer logical operations see Appendix F Simplified Mnemonics in the Programming Environments Manual Table 3 5 Integer Logical Instructions
560. stwex instruction with the same effective address used for both instructions of the pair Note that the reservation granularity is 32 bytes The concept behind the use of the lwarx and stwex instructions is that a processor may load a semaphore from memory compute a result based on the value of the semaphore and conditionally store it back to the same location only if that location has not been modified since it was first read and determine if the store was successful The conditional store is performed based on the existence of a reservation established by the preceding lwarx instruction If the reservation exists when the store is executed the store is performed which sets a bit in the CR If the reservation does not exist when the store is executed the target memory location is not modified and a bit is cleared in the CR If the store was successful the sequence of instructions from the read of the semaphore to the store that updated the semaphore appear to have been executed atomically that is no other processor or mechanism modified the semaphore location between the read and the update thus providing the equivalent of a real atomic operation However in reality other cores may have read from the location during this operation In the e300 core the reservations are made on behalf of aligned 32 byte sections of the memory address space The Iwarx and stwex instructions require the EA to be aligned Interrupt handling software shoul
561. sued to the cache or to the bus if a cancelled instruction fetch is pending or active on the bus This is also called hit under cancel capability Before the last sentence in the first paragraph of the section add the following sentence In the e300c2 however each of the two execution units can execute one multiply for a total of two multiply instructions executed in parallel Add two figures Instruction Timing lInteger Execution in the e300c1Core and Instruction Timing lInteger Execution in the e300c2 to show difference in integer instruction throughput between e300c1 and e300c2 Modify Table 7 4 Integer Instructions to include latency information for the e300c2 core Replace statement that the data bus can be selected to be 32 or 64 bits wide with the following The address bus is 32 bits wide and the data bus is 64 bits wide In Figure 8 1 Core Interface Signals add tlbisync signal to the diagram In Table 8 1 Summary of Selected Internal Signals add tlbisync signal description to the I O description table e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Revision History C 3 Changes From Revision 0 to Revision 1 Major changes to the e300 PowerPC Core Reference Manual from Revision 0 to Revision 1 are as follows Section Page Book 1 0 1 1 Chapter 1 Chapter 4 1 3 1 7 2 1 17 1 3 1 7 2 1 17 1 3 4 2 1 24 1 25 1 3
562. t IU Instruction storage interrupt ISI ISI interrupt Interrupt Interrupt Privileged mode or privileged state Supervisor level privilege Problem mode or problem state User level privilege Real address Physical address Relocation Translation Storage locations Memory Storage the act of Access Store in Write back Store through Write through Table iii describes instruction field notation used in this manual Table iii Instruction Field Conventions The Architecture Specification Equivalent to BA BB BT crbA crbB crbD respectively BF BFA crfD crfS respectively D d DS ds FLM FM e300 Power Architecture Core Family Reference Manual Rev 3 xxxii Freescale Semiconductor Table iii Instruction Field Conventions continued FRA FRB FRC FRT FRS frA frB frC frD frS respectively FXM CRM RA RB RT RS rA rB rD rS respectively Sl SIMM U IMM Ul UIMM LA IIl 0 0 shaded e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor xxxiii e300 Power Architecture Core Family Reference Manual Rev 3 XXXIV Freescale Semiconductor Chapter 1 Overview This chapter provides an overview of features for the embedded microprocessors in the e300 core family which are PowerPC microprocessors built on Power Architecture technolo
563. t the core bus snooping logic responds with the appropriate snoop status for example a retry Additional snoop action may be forwarded to the cache as a result of a snoop hit in some cases a cache push of modified data or cache block invalidation The specific response depends on the transaction type There are several bus transaction types defined for the CSB The core snoops these transactions and performs the appropriate action to maintain memory coherency as described in Table 4 9 Table 4 9 Snoop Response to CSB Transactions Snooped Transaction e300 Core Response Type Clean block MEI No action is taken MESI The clean block operation is an address only bus transaction initiated when a debst instruction is executed The core may have the following response e f the addressed block is in the invalid or shared state no action is taken and the state of the addressed block is unchanged e f the addressed block is in the exclusive state the address snoop forces the state of the addressed block to shared e Ifthe addressed block is in the modified state the address snoop signals retry and initiate a push of the modified block out of the cache and changes the state of the block to shared Flush block MEI No action is taken MESI The flush block operation is an address only bus transaction that initiates when a dcbf instruction is executed The core may have the following response e If the addressed block is in the invalid
564. t 0x0100 When a soft reset occurs registers are set as shown in Table 5 13 A soft reset is recoverable provided that attaining the recoverable state does not cause a machine check interrupt This interrupt case is third in priority following hard reset and machine check When a soft reset occurs registers are set as shown in Table 5 13 in addition to the clearing of HIDO ICE Table 5 13 Soft Reset Interrupt Register Settings Register Setting Description SRRO Set to the effective address of the instruction that the processor would have attempted to complete next if no interrupt conditions were present SRR1 0 15 Cleared 16 31 Loaded from MSR 16 31 Note that if the processor state is corrupted to the extent that execution cannot be reliably restarted SRR1 80 is cleared MSR POW 0 FP 0 FEI 0 RI 0 TGPR 0 ME CE 0 LE Set to value of ILE LE FEO 0 IP EE 0 SE 0 IR 0 PR 0 BE 0 DR 0 5 5 1 3 Byte Ordering Considerations All interrupt routines are executed in the endian mode determined by the setting of the MSR ILE MSR LE and HID2 LET bits see Table 1 2 for endian mode indication when the interrupt is taken A special case is the system reset interrupt for both hard and soft reset for the e300 core When the tle signal is negated at the time hreset is negated the system interrupt handler of the device enters into big endian mode If MSR ILE MSR LE and HID2 LET are subsequent
565. t completely describe the mechanisms for the operations described 4 9 5 Snooping The core maintains data cache coherency in hardware by coordinating activity between the data cache memory system and bus interface logic e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 29 Instruction and Data Cache Operation The global gb signal asserted as part of the address attributes during a bus transaction enables the snooping hardware of the core Address bus masters assert gbl to indicate that the current transaction is a global access that is an access to memory shared by more than one device If gb is not asserted for the transaction that transaction is not snooped by the core The gbl signal is not asserted for instruction fetches except when HIDO IFEM is set and the instruction address is marked memory coherency required gbl is asserted for all data read or write operations when using real addressing mode Normally gb reflects the M bit value specified for the memory reference in the corresponding MMU translation descriptor s Care must be taken to minimize the number of pages marked as global because the retry protocol enforces coherency and this incurs additional overhead As global bus transactions are performed on the bus by other bus masters the core bus snooping logic monitors the addresses that are referenced These addresses are compared with the cache tags If there is a snoop hi
566. t hits in the cache use that entry in the cache Memory coherency is enforced by hardware nio Caching is inhibited The access is performed to external memory completely bypassing the cache Memory coherency is not enforced by hardware nii Caching is inhibited The access is performed to external memory completely bypassing the cache Memory coherency must be enforced by external hardware processor provides hardware indication that access is global 100 Data may be cached Load operations whose target hits in the cache use that entry in the cache Stores are written to external memory The target location of the store may be cached and is updated on a hit Memory coherency is not enforced by hardware 101 Data may be cached Load operations whose target hits in the cache use that entry in the cache Stores are written to external memory The target location of the store may be cached and is updated on a hit Memory coherency is enforced by hardware e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Instruction and Data Cache Operation 4 4 2 Coherency Support The data cache of the e300 core supports both the three state MEI and four state MESI cache coherency protocols as selected by the HID2 MESI control parameter For these modes each 32 byte block in the data cache is always in one of the following states modified M exclusive E shared S or invalid 1 The shared state is only avai
567. t if HIDO EMCP is cleared the core ignores the assertion of the mcp signal but continues to monitor the tea signal A machine check interrupt also occurs when an address or data parity error is detected on the bus and the address or data parity error is enabled in HIDO See Section 2 2 1 Hardware Implementation Register 0 HIDO for more information Note that the e300 core makes no attempt to force recoverability on a machine check however it does guarantee that the machine check interrupt is always taken immediately upon request with a nonpredicted address saved in SRRO regardless of the current machine state Because pending stores in the store queue see Figure 7 6 are not canceled when a machine check interrupt occurs two consecutive stores that result in the assertion of tea can cause the processor to checkstop To prevent a checkstop in this case a syne instruction must be placed between two stores that can result in assertion of fea Software can use the machine check interrupt handler in a recoverable mode to probe memory For this case a sync load sync instruction sequence is used If the load access results in a system error for example the assertion of fea the processor can handle this in a recoverable state If the syne instruction is not used a second access to the same address as the first load could cause the processor to enter the checkstop state If the MSR ME bit is set the interrupt is recognized and hand
568. t instructions which include the following e Floating point arithmetic instructions e Floating point multiply add instructions e Floating point rounding and conversion instructions e Floating point compare instructions e Floating point status and control register instructions e Floating point move instructions See Section 3 2 4 3 Load and Store Instructions for information about floating point loads and stores Implementation Note The e300c2 core does not support floating point instructions The PowerPC architecture supports a floating point system as defined in the IEEE 754 standard but requires software support to conform with that standard All floating point operations conform to the IEEE 754 standard except if software sets the non IEEE mode bit NI in the FPSCR The e300 core is in the e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 13 Instruction Set Model nondenormalized mode when the NI bit is set in the FPSCR If a denormalized result is produced a default result of zero is generated The generated zero has the same sign as the denormalized number The core performs single and double precision floating point operations compliant with the IEEE 754 floating point standard Implementation note Single precision denormalized results require two additional processor clock cycles to round When loading or storing a single precision denormalized number the load store unit may take
569. t is set to the effective address of the instruction that the processor would have attempted to complete next if no critical interrupt had occurred The format of CSRRO is shown in Figure 2 14 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 21 Register Model SPR 58 Access Supervisor read write 0 29 30 31 R CSRRO W Reset All zeros Figure 2 14 Critical Interrupt Save Restore Register 0 CSRRO For information on how specific interrupts affect CSRRO refer to the descriptions of individual interrupts in Chapter 5 Interrupts and Exceptions 2 2 10 Critical Interrupt Save Restore Register 1 CSRR1 CSRRI is used to save machine status on interrupts and to restore machine status when an rfci instruction is executed Figure 2 15 shows the CSRR1 format Figure 2 15 Critical Interrupt Save Restore Register 1 CSRR1 SPR 59 Access Supervisor read write 0 31 R CSRR1 W Reset All zeros Figure 2 16 Critical Interrupt Save Restore Register 0 CSRRO For information on how specific interrupts affect CSRR1 refer to the individual interrupts in Chapter 5 Interrupts and Exceptions 2 2 11 SPRGO SPRG7 The core provides four additional SPRG SPRG4 SPRG7 registers for general operating system use such as performing a fast state save or for supporting multiprocessor implementations The formats of the SPRG registers are shown in Figure 2 17 Note that S
570. t is the only cache control instruction that may broadcast without being specifically enabled using HIDO ABE The debz instruction is essentially a store operation rather than a cache management operation and therefore always participates in normal cache coherency as directed by the M bit The core initiates an alignment interrupt when the operand of a dcbz is in a page that is write through or caching inhibited or when the data cache is disabled The debz instruction is not executed in these cases If the debz instruction is otherwise not aborted it allocates into the cache and performs an address broadcast if necessary for normal store coherency The resulting cache line is marked as modified In the case where the dchz operation hits in the cache it re allocates to that cache line Note that the dcbz instruction ignores HIDO DLOCK setting and always allocates a tag That is when DLOCK is set the dcbz occurs as if the cache were not locked e300 Power Architecture Core Family Reference Manual Rev 3 18 Freescale Semiconductor Instruction and Data Cache Operation 4 5 2 4 Data Cache Block Store dcbst Instruction This instruction is treated as a load to the addressed byte with respect to address translation and protection If the address hits in the data cache the target cache line is pushed to memory as a burst write bus transaction The cache line remains in the data cache and is marked as exclusive if operating in MEI mode o
571. t number whose exponent has a reserved value usually the format s minimum and whose explicit or implicit leading significand bit is zero Direct mapped cache A cache in which each main memory address can appear in only one location within the cache operates more quickly when the memory request is a cache hit Effective address EA The 32 bit address specified for a load store or an instruction fetch This address is then submitted to the MMU for translation to either a physical memory address or an I O address e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Glossary 3 Exception A condition that if enabled generates an interrupt Exclusive state MEI state E in which only one caching device contains data that is also in system memory Execution synchronization A mechanism by which all instructions in execution are architecturally complete before beginning execution appearing to begin execution of the next instruction Similar to context synchronization but doesn t force the contents of the instruction buffers to be deleted and refetched Exponent In the binary representation of a floating point number the exponent is the component that normally signifies the integer power to which the value two is raised in determining the value of the represented number See also Biased exponent Fall through branch fall through A not taken branch On the e300 core fall through branch instr
572. t of integer add and compare instructions executed by the system register unit SRU Floating point instructions that access or modify the FPSCR or CR mtfsb1 merfs mtfsfi mffs and mtfsf Instructions that manage caches and TLBs Instructions that directly access the GPRs load and store multiple word and load and store string instructions Instructions defined by the architecture to have synchronizing behavior e Dispatch serialized inhibit the dispatching of subsequent instructions until the serializing instruction is retired Dispatch serialization is used for instructions that access renamed resources used by the dispatcher and for instructions requiring refetch serialization including the following The load multiple instructions Imw Iswi and Iswx The mtspr XER and merxr instructions The synchronizing instructions sync isync mtmsr rfi rfci and se e Refetch serialized instructions inhibit dispatching of subsequent instructions and force the refetching of subsequent instructions after the serializing instructions are retired The context synchronizing instruction isyne is refetch serializing 7 3 3 3 Execution Unit Considerations As previously noted the e300 core can dispatch and retire two instructions per clock cycle The peak dispatch rate is affected by the availability of execution units on each clock cycle For an instruction to be dispatched the required execution unit must be availabl
573. t pipelined NOTE Floating point instructions are not supported on the e300c2 Table 7 6 provides latencies for the load and store instructions Table 7 6 Load and Store Instructions Mnenionic Primary Extended Unit Latency Opcode Opcode in Cycles lwarx 31 020 LSU 2 1 icbt 31 022 LSU 2 e300 Power Architecture Core Family Reference Manual Rev 3 32 Freescale Semiconductor Table 7 6 Load and Store Instructions continued Muenmonl Primary Extended Unit Latency Opcode Opcode in Cycles lwzx 31 023 LSU 2 1 dcbst 31 054 LSU 2 5 amp Iwzux 31 055 LSU 2 1 dcbf 31 086 LSU 2 5 amp Ibzx 31 087 LSU 2 1 Ibzux 31 119 LSU 2 1 stwex 31 150 LSU 8 stwx 31 151 LSU 2 1 stwux 31 183 LSU 2 1 stbx 31 215 LSU 2 1 dcbtst 31 246 LSU 2 stbux 31 247 LSU 2 1 deht 31 278 LSU 2 Ihzx 31 279 LSU 2 1 tlbie 31 306 LSU 3 amp Ihzux 31 311 LSU 2 1 Ihax 31 343 LSU 2 1 Ihaux 31 375 LSU 2 1 sthx 31 407 LSU 2 1 sthux 31 439 LSU 2 1 debi 31 470 LSU 2 amp Iswx 31 533 LSU 2 n amp Iwbrx 31 534 LSU 2 1 Hey 31 535 LSU 2 1 tlbsync 31 566 LSU 2 amp lfsux 31 567 LSU 2 1 Iswi 31 597 LSU 2 m Hds 31 599 LSU 2 1 Ifdux 31 631 LSU 2 1 stswx 31 661 LSU 1 n amp stwbrx 31 662 LSU 2 1 stfsx 31 663 LSU 2 1 stfsux 31 695 LSU 2 1 stswi 31 725 LSU 1 n amp e300 Power Architecture Core Family Reference Manua
574. t to be performed ahead of one that may have preceded it in the sequential model for example speculative operations An operation is said to be performed out of order if at the time that it is performed it is not known to be required by the sequential execution model See In order Out of order execution A technique that allows instructions to be issued and completed in an order that differs from their sequence in the instruction stream e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Glossary 7 Overflow An condition that occurs during arithmetic operations when the result cannot be stored accurately in the destination register s For example if two 32 bit numbers are multiplied the result may not be representable in 32 bits Since the 32 bit registers of the e300 cannot represent this sum an overflow condition occurs Page A region in memory The OEA defines a page as a 4 Kbyte area of memory aligned on a 4 Kbyte boundary Page access history bits The changed and referenced bits in the PTE keep track of the access history within the page The referenced bit is set by the MMU whenever the page is accessed for a read or write operation The changed bit is set when the page is stored into See Changed bit and Referenced bit Page fault A page fault is a condition that occurs when the processor attempts to access a memory location that does not reside within a page not currently resident in
575. t unit The functional unit in the e300c1e300 processor responsible for executing all floating point instructions Flush An operation that causes a cache block to be invalidated and the data if modified to be written to memory Folding See Branch folding e300 Power Architecture Core Family Reference Manual Rev 3 Glossary 4 Freescale Semiconductor Fraction In the binary representation of a floating point number the field of the significand that lies to the right of its implied binary point General purpose register GPR Any of the 32 registers in the general purpose register file These registers provide the source operands and destination results for all integer data manipulation instructions Integer load instructions move data from memory to GPRs and store instructions move data from GPRs to memory Guarded The guarded attribute pertains to out of order execution When a page is designated as guarded instructions and data cannot be accessed out of order Harvard architecture An architectural model featuring separate caches and other memory management resources for instructions and data Hashing An algorithm used in the page table search process IEEE 754 A standard written by the Institute of Electrical and Electronics Engineers that defines operations and representations of binary floating point numbers Illegal instructions A class of instructions that are not implemented for a particular PowerPC processor
576. ta cache because of self modifying code for example the program must first perform a debf or debst of that address to flush that data block back to main memory for the icbt to access it A sync instruction must follow an icbt instruction to ensure all cache load operations are completed An isync instruction must also follow an icbt if instruction fetching to the newly touched and loaded instructions may occur immediately after the icbt was executed 4 10 3 2 5 Locking the Entire Instruction Cache Locking the entire instruction cache is controlled by the instruction cache lock bit HIDO ILOCK bit 18 Setting HIDO ILOCK locks the entire instruction cache and clearing HIDO ILOCK allows the instruction cache to operate normally The setting of the HIDO ILOCK should be preceded by an isyne instruction to prevent the instruction cache from being locked during an instruction access The following assembly code locks the contents of the entire instruction cache Set the ILOCK bit in HIDO bit 18 mfspr rl HIDO ori rl rl 0x2000 sync isync mtspr HIDO rl isync 4 10 3 2 6 Way Locking the Instruction Cache Instruction cache way locking is controlled by the HID2 IWLCK bits 16 18 Table 4 20 shows the HID2 IWLCK 0 2 settings for the core Table 4 20 e300c1 Core IWLCK 0 2 Encodings IWLCK 0 2 Ways Locked 0b000 No ways locked 0b001 Way 0 locked 0b010 Ways 0 and 1 locked 0b011 Ways 0
577. tase canta een 6 15 6 5 Instruction Stret eegene NEES 6 17 6 6 ITIP aS E 6 17 6 7 Table Search Operations to Update History Bits TLB Hit Case 6 20 6 8 Model for Guaranteed R and C Bit Senge 6 22 6 9 Implementation Specific Resources for Table Search Operatons 6 30 6 10 Implementation Specific SRRI Bits ics ssentciiacctattaastedalesaeaeeuatei coneseeelsaleaeieacenl casing 6 31 6 11 DECEME and IC MP Eet eege 6 32 6 12 HASHI and IAS H2 Bit E EE 6 33 6 13 RPA Bit Stine sccctascesscas ecassdaatawastia NEE ENNEN EE ENNEN 6 33 7 1 Branch Ee 7 28 7 2 System Register Instructions issii siisi isie eren nis ESEE A EESE EEEN aN ii 7 28 7 3 Condition Register Logical Instructions cee eeeeeeeceeseeesseceseeeeeeesaeecsaecsseeeaeecaecnseensneenaee es 7 29 7 4 nteger Instructions ccc cab as os See asain a Satach a Bates a Maat E A TE ace Namen eaata ase 7 29 7 5 Floating Point Jeton erer 7 31 7 6 EE and Store Strut ONS EE 7 32 8 1 Summary of Selected Internal Signals 00 0 0 ee eescecssccecesececseececscecseececseeeecseeecsseeecsteeeesaeees 8 2 8 2 Core PILL EE e TEE 8 4 9 1 e300 Core Programmable Power Modes eege geed EE 9 2 10 1 Other Debug and Support Register Beete eier 10 3 10 2 Debug Interrupts and Conditions beer 10 3 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor xix Table Number 10 3 10 4 10 5 10 6 11 1 11 2 11 3 11 4 11 5 11 6 11 7 11 8 11 9 A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 A 9 A
578. tatus Output Enable lt _ Debug Control Input Enable e JTAG Debug Interface Debug Control e Test Interface 15v Figure 1 7 Core Interface The core interface supports bus pipelining allowing the address tenure of one transaction to overlap the data tenure of another The extent of the pipelining depends on external arbitration and control circuitry Similarly the core supports split bus transactions for systems with multiple potential bus masters one device can have mastership of the address bus while another has mastership of the data bus Allowing multiple bus transactions to occur simultaneously increases the available bus bandwidth for other activity and as a result improves performance The core clocking structure allows the bus to operate at integer multiples of the core cycle time The following sections describe the core bus support for memory operations Note that some signals perform different functions depending on the addressing protocol used The e300c4 can optionally support the core complex bus CCB See Chapter 8 Core Interface Operation for more information on the CCB 1 3 7 1 Memory Accesses The e300 core CSB is a 64 bit data bus With a 64 bit CSB memory accesses allow transfer sizes of 8 16 24 32 40 48 56 or 64 bits in one bus clock cycle Data transfers occur in either single beat transactions or four beat burst transactions Single beat transactions are caused by noncached acce
579. te Instruction List Sorted by Mnemonic continued Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 merxr 31 ef 00 00000 00000 512 0 mfcr 31 D 00000 00000 19 0 mffsx 63 D 00000 00000 583 Re mfmsr 2 31 D 00000 00000 83 0 mtepr 31 D spr 339 0 mfsr 2 31 D 0 SR 00000 595 0 mfsrin 2 31 D 00000 B 659 0 mftb 31 D tbr 371 0 mterf 31 S 0 CRM o 144 0 mtfsb0x 63 crbD 00000 00000 70 Re mtfsb1x 63 crbD 00000 00000 38 Rc mtfsfx 63 0 FM 0 B 711 Re mtfsfix 63 crfD 00 00000 IMM 0 134 Rc mtmsr 2 31 S 00000 00000 146 0 mtspr 31 S spr 467 0 mtsr 2 31 S 0 SR 00000 210 0 mtsrin 2 31 S 00000 B 242 0 mulhdx 31 D A B 0 73 Re mulhdux 31 D A B 0 9 Re mulhwx 31 D A B 0 75 Re mulhwux 31 D A B 0 11 Rc mulldx 31 D A B OE 233 Rc mulli 7 D A SIMM mullwx 31 D A OE 235 Rc nandx 31 S A 476 Rc negx 31 D A 00000 JOE 104 Rc norx 31 S A 124 Rc orx 31 S A 444 Rc orcx 31 S A 412 Rc ori 24 S A UIMM oris 25 S A UIMM rfi 2 19 00000 00000 00000 50 0 ridelx 30 S A B mb Rc ders 30 S A B me Rc ridicx 30 S A sh mb sh Re ridiclx 1 30 S A sh mb sh Rc e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Instruction Set Listings Table A 1 Complete Instruction List Sorted by Mnemonic continued Name 0 5 6 7 8 9 10 11 12 13 14 15
580. te register MSR and the hardware implementation register 0 HIDO When entering into a power mode other than full power the core will request entry via a greq signal and will only enter another power mode after an acknowledge qack is received The four power modes are as follows e Full power This is the default power state of the e300 core The e300 core is fully powered and the internal functional units are operating at the full processor clock speed If the dynamic power management mode is enabled functional units that are idle will automatically enter a low power state without affecting performance software execution or external hardware e Doze All the functional units of the e300 core are disabled except for the time base decrementer registers and the bus snooping logic When the processor is in doze mode an external asynchronous interrupt system management interrupt decrementer interrupt hard or soft reset or machine check brings the e300 core into the full power state The core in doze mode maintains the PLL in a fully powered state and locked to the system external clock input sysclk so a transition to the full power state takes only a few processor clock cycles e Nap tThe nap mode further reduces power consumption by disabling bus snooping leaving only the time base register and the PLL in a powered state The core returns to the full power state on receipt of an external asynchronous interrupt system management interru
581. te the LR or CTR take an entry in the completion queue Figure 7 5 e300 Core Processor Pipeline Stages e300 Power Architecture Core Family Reference Manual Rev 3 8 Freescale Semiconductor Instruction Timing 7 3 Timing Considerations The e300 core is a superscalar processor as many as three instructions can be dispatched to the execution units one branch instruction to the branch processing unit and two instructions dispatched from the dispatch queue to the other execution units during each clock cycle Only one instruction can be dispatched to each execution unit Although instructions appear to the programmer to execute in program order the core improves performance by executing multiple instructions at a time using hardware to manage dependencies When an instruction is dispatched the register file provides the source data to the execution unit The register files and rename register have sufficient bandwidth to allow dispatch of two instructions per clock under most conditions The BPU decodes and executes branches immediately after they are fetched When a conditional branch cannot be resolved due to a CR data dependency the branch direction is predicted and execution continues from the predicted path If the prediction is incorrect the following steps are taken 1 The instruction queue is purged and fetching continues from the correct path 2 Any instructions ahead of the predicted branch in the CQ are allowed to co
582. tecture and more specifically at another For example conditions that cause a floating point interrupt are defined by the UISA while the interrupt mechanism itself is defined by the OEA For ease in reference topics in this book are presented in the same order as the Programming Environments Manual Topics build on one another beginning with a description and complete summary of the e300 core register model and followed by the instruction set model and progressing to more specific architecture based topics regarding the cache interrupt and memory management models As such chapters may include information from multiple levels of the architecture For example the discussion of the cache model uses information from both the VEA and the OEA The PowerPC Architecture A Specification for a New Family of RISC Processors defines the architecture from the perspective of the three programming environments and remains the defining document for the PowerPC architecture For information about ordering Freescale documentation see Suggested Reading on page xxv The information in this book is subject to change without notice as described in the disclaimers on the title page of this book As with any technical documentation it is the readers responsibility to be sure they are using the most recent version of the documentation For more information contact your sales representative To locate any published errata or updates for this reference
583. ted For address only transactions that bypass translation always asserted Write through Assertion indicates that a single beat transaction is write through reflecting the value of the W bit of the WIMG bits for the block or page that contains the address of the current transaction e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Core Interface Operation Table 8 1 Summary of Selected Internal Signals continued Signal 1 0 Comments or Meaning when Asserted tea Transfer error acknowledge Indicates that a data bus error occurred on the core interface Causes a machine check interrupt and possibly causes the processor to enter checkstop state if machine check enable bit is cleared MSR ME 0 External Interrupts hreset Hard reset Assertion resets the core sreset Soft reset When sreset is asserted the processor attempts to reach a recoverable state by allowing the next instruction to either complete or cause an interrupt blocking the completion of subsequent instructions and allowing the completed store queue to drain Unlike a hard reset no registers or latches are initialized int External interrupt Initiates an external interrupt to the core cint Critical interrupt Initiates a critical interrupt to the core mcp Machine check interrupt Indicates that the e300 core should initiate a machine check interrupt or e
584. teedsavtseandevsgeeasanesdecssavesncduleusdnaseaesasedewedbes 3 21 Floating Point Store Instructions reene 3 22 Branch ee Ee 3 23 Condition Register Logical Instructions ccceeececsseceesseceesneeeeseceeseceeaeceenaeeeeaeeeeaeeenas 3 24 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor xvii Tables Table Page Number Title Number 3 23 Ree e E 3 24 3 24 Move To From Condition Register Instructions ceeseceesceceeeeeceeeeeceeeeeceeneecseeeenteeeesaes 3 24 3 25 Memory Synchronization Instructions UISA EE 3 26 3 26 Move From Time Base Tobes a a a i 3 26 3 27 Memory Synchronization Instructpons NBA 3 27 3 28 User Level Cache Instructions iscsi inaite issia 3 27 3 29 System Linkage Mista Cri GMs esni rn n e a a aa a 3 28 3 30 Move To From Machine State Register Instructions ss sesesseeesseesseesseeseeessressesseesseeesseee 3 29 3 31 Move To From Special Purpose Register Instructions es ssssssssesssssesesseeessressersseesseeesseee 3 29 3 32 Implementation Specific SPR Encodings mfspr sssssesseessesesseessessesseessessreseesseeseeseessee 3 29 3 33 Performance Monitor APU Instructions geriet Eeer eed aes 3 32 3 34 Segment Register Manipulation Instructions ceeccecesecesececseeeeceeceeceeeeeceeeeeceeeeeeteeeenaes 3 32 3 35 Translation Lookaside Buffer Management Instructions eeseeeeseeceeeceeeeeeeceeeeeenteeeenaes 3 33 4 1 Combinatio
585. ter 3 Instruction Set Model This chapter describes the operand conventions as they are represented in two levels of the PowerPC architecture It also provides detailed descriptions of conventions used for storing values in registers and memory accessing the core registers and the representation of data in these registers This chapter explains the following e Operand conventions e e300 core instruction set 3 1 Operand Conventions This section describes the integer and floating point operand conventions It also describes the big and little endian byte ordering for the e300 core Note that floating point instructions or operands are not supported on the e300c2 core 3 1 1 Data Organization in Memory and Memory Operands Bytes in memory are numbered consecutively starting with 0 Each number is the address of the corresponding byte Memory operands may be bytes half words words or double words or for the load store multiple and move assist instructions a sequence of bytes or words The address of a memory operand is the address of its first byte that is of its lowest numbered byte Operand length is implicit for each instruction 3 1 2 Endian Modes and Byte Ordering The PowerPC architecture supports both big and little endian byte ordering The default byte and bit ordering is big endian See Section 3 1 2 Byte Ordering in the Programming Environments Manual for more information about big and little endian byte order
586. terrupt e Asynchronous maskable interrupts that is the external system management and decrementer interrupts are enabled by setting the MSR EE bit When MSR EE 0 recognition of these interrupt conditions is delayed MSR EE is cleared automatically when an interrupt is taken to delay recognition of conditions causing those interrupts e A machine check interrupt can occur only if the machine check enable bit MSR ME is set If MSR ME is cleared the processor goes directly into checkstop state when a machine check exception condition occurs Individual machine check exceptions can be enabled and disabled through bits in the HIDO register as described in Table 2 5 e The e300 core enables the critical interrupt with the MSR CE bit e System reset interrupts cannot be masked e300 Power Architecture Core Family Reference Manual Rev 3 14 Freescale Semiconductor Interrupts and Exceptions 5 2 3 Steps for Interrupt Processing After it is determined that the interrupt can be taken by confirming that any instruction caused interrupts occurring earlier in the instruction stream have been handled and by confirming that the interrupt is enabled for the exception condition the processor does the following 1 The machine status save restore register 0 SRRO is loaded with an instruction address that depends on the type of interrupt See the individual interrupt description for details about how this register is used for spec
587. that MSR IR and MSR DR are cleared whenever an interrupt occurs 6 5 2 1 1 Data and Instruction TLB Miss Address Registers DMISS and IMISS The DMISS and IMISS registers have the same format as shown in Figure 6 11 They are loaded automatically on a data or instruction TLB miss The DMISS and IMISS contain the effective page address of the access which caused the TLB miss interrupt The contents are used by the processor when calculating the values of HASH1 and HASH2 and by the tlbld and tlbli instructions when loading a new e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 31 Memory Management TLB entry Note that the core always loads a big endian address into the DMISS register These registers are both read and write accessible However great caution should be used when writing to these registers Effective Address 31 Figure 6 11 DMISS and IMISS Registers 6 5 2 1 2 Data and Instruction TLB Compare Registers DCMP and ICMP The DCMP and ICMP registers are shown in Figure 6 12 These registers contain the first word in the required PTE The contents are constructed automatically from the contents of the segment registers and the effective address DMISS or IMISS when a TLB miss interrupt occurs Each PTE read from the tables in memory during the table search process should be compared with this value to determine whether or not the PTE is a match Upon execution of a tlbld or tlbli i
588. that provide a mechanism for translating additional blocks as large as 256 Mbytes from the 32 bit effective address space into the physical memory space This can be used for translating large address ranges whose mappings do not change frequently BATs are software controlled arrays that store the available block address translations on chip The core supports block address translation through the use of two independent instruction and data block address translation IBAT and DBAT arrays each array is comprised of four additional entries used for instruction accesses and four additional entries used for data accesses IBAT4 IBAT7 and DBAT4 DBAT7 are implementation specific registers on the e300 core which are optionally enabled in HID2 The format of these registers is the same as that of IBATO IBAT3 and DBATO DBATS3 Each BAT array entry consists of a pair of BAT registers an upper and a lower BAT register for each entry Figure 2 12 and Figure 2 13 show the format and bit definitions of the upper and lower BATs for 32 bit processor cores respectively e300 Power Architecture Core Family Reference Manual Rev 3 20 Freescale Semiconductor Register Model SPR 528 IBATOU 536 DBATOU Access Supervisor read write 530 IBAT1U 538 DBAT1U 532 IBAT2U 540 DBAT2U 534 IBAT3U 542 DBAT3U 560 IBAT4U 568 DBAT4U 562 IBAT5U 570 DBAT5U 564 IBAT6U 572 DBAT6U 566 IBAT7U 574 D
589. the corresponding cache the physical address is used to access system memory In addition to loads stores and instruction fetches the core performs software table search operations following TLB misses cache cast out operations when pseudo least recently used PLRU cache lines are written to memory after a cache miss and cache line snoop push out operations when a modified cache line experiences a snoop hit from another bus master The core uses separate address and data buses and a variety of control and status signals for performing reads and writes The address bus is 32 bits wide and the data bus is 64 bits wide The bus can run at the full processor clock frequency or at an integer division of the processor clock speed The implementation of the internal voltage of the e300 core is process dependent all I O signals for the device depend on the device implementation Note that the e300 core has no direct external I O connection 8 3 Interrupt Checkstop and Reset Signals This section describes external interrupts checkstop operations and hard and soft reset inputs 8 3 1 External Interrupts Assertion of the external interrupt input signals int smi pm_event_in in e300c3 and mcp of the core eventually forces the processor to take an external interrupt or a system management interrupt smi if MSR EE is set or performance monitor interrupt or the machine check interrupt if MSR ME and HIDO EMCP are set e300 Power Archite
590. the memory protection mechanism but a store operation is not performed because no reservation exists e The execution of an stswx instruction is allowed by the memory protection mechanism but a store operation is not performed because the specified length is zero e The store operation is not performed because an interrupt occurs before the store is performed Again note that although the execution of the debt and debtst instructions may cause the R bit to be set they never cause the C bit to be set e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 21 Memory Management 6 4 1 3 Scenarios for Reference and Change Bit Recording This section provides a summary of the model defined by the OEA that is used by the processors for maintaining the reference and change bits In some scenarios the bits are guaranteed to be set by the processor in some scenarios the architecture allows that the bits may be set not absolutely required and in some scenarios the bits are guaranteed to not be set In implementations that do not maintain the R and C bits in hardware such as the e300 core software assistance is required For these processors the information in this section still applies except that the software performing the updates is constrained to the rules described that is must set bits shown as guaranteed to be set and must not set bits shown as guaranteed to not be set Table 6 8 defines
591. tialization ann n nanan anna n set up HID registers for the various processors of this family hid setup taken from minix s mpxPowerPC s mfspr Hol pve pvr reg srawi rod EL Z resetTest603 cmpi Dien BI Eech lt 3 bne cr0 endHIDSetup addi EU S 0 oris CD r0 Ox8000 enable machine check pin EMCP oris r0 r0 0x0010 enable dynamic power mgmt DPM oris CD CD 0x0020 enable SLEEP power mode ori CD CD Ox8000 nable the Icache ICE ori ro r0 0x4000 nable the Dcache DCE ori CD r0 O0x0800 invalidate Icache ICFI ori ro r0 0x0400 invalidate Dcache DCFI mtspr hid0 ro isync Hf A ee Ke eA A I eA A I I kkk then when the processor is in a loop force an SMI interrupt Hh eee Ie eA I A A I A I ee orig 0x00001400 System Management Interrupt force big endian mode stw r0 0x05f8 r0 need nop every second inst stw r0 0x05fc r0 mfmsr rO ori CD CD CD ori r0 r0 0x0001 force big endian LE bit ori r0 r0 r0 xori r0 r0 0x0001 force big endian LE bit ori rO TOEO mtmsr rO ori rO fe EI isync ori CD CD ro save off additional registers to be corrupted stw r20 0x05f4 r0 mfspr r21 SKE put srrO in r21 stw r21 0x05f0 T0 put r21 in 0x05f0 mfspr P22 po Bee put srrl in r22 stw r22 0x05ec r0 put r22 in 0x05ec stw r23 0x05e8 r0 mfcr E23 stw r23 0x05e4 xr0 xor CD CD CD e300 Power Architecture Core Family R
592. tical interrupt SPRs CSRRO and CSRR1 eight SPRGs SPRGO SPRG7 eight pairs of instruction BATs IBATO IBAT7 eight pairs of data BATs DBATO DBAT7 one system version register SVR one system memory base address MBAR one instruction address breakpoint control IBCR one data address breakpoint control DBCR a new instruction breakpoint register IABR2 and two data address breakpoint registers DABR and DABR2 are integrated into the core Supervisor level SPRs include the following e The DSISR defines the cause of data access and alignment interrupts The cause of a DSI interrupt for a data breakpoint match with DABR and DABR2 can be determined by the value of the DSISR DABR bit bit 9 e The data address register DAR holds the address of an access after an alignment or DSI interrupt For example it contains the address of the breakpoint match condition e The decrementer register DEC is a 32 bit decrementing counter that provides a mechanism for causing a decrementer interrupt after a programmable delay e SDR1 specifies the page table format used in virtual to physical address translation for pages Note that physical address is referred to as real address in the architecture specification e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 19 Overview The machine status save restore register 0 SRRO is used for saving the address of the instruction that caused the
593. tion fetching does not normally wrap within the cache line an instruction fetch initiated to double word address 3 of a 32 byte block is still performed as a single beat read on the CSB This feature is enabled by setting the HID2 IFEB register bit When the HID2 IFEB register bit is set and the HIDO ICE register bit is cleared instruction cache disabled any instruction fetch results in burst transactions on the bus similar to caching allowed instruction space but the instruction is not cached This feature should not be enabled for systems that do not support bursting from all caching inhibited instruction space which could otherwise be accessed while burst mode is enabled This feature affects only caching inhibited instruction fetches not caching inhibited load or store operations e300 Power Architecture Core Family Reference Manual Rev 3 28 Freescale Semiconductor Instruction and Data Cache Operation 4 9 4 CSB Operations Caused by Cache Control Instructions Table 4 8 provides an overview of the bus operations initiated by cache control instructions The cache control TLB management and synchronization instructions supported by the core may affect or be affected by the operation of the CSB None of the instructions actively broadcast through address only transactions on the bus except for dcbz and no broadcasts by other masters are snooped by the core except for kills and those required by the MESI protocol The operati
594. tion unit stalls The count of such events can be used to trigger the performance monitor interrupt The performance monitor can be used to do the following Improve system performance by monitoring software execution and then recoding algorithms for more efficiency For example memory hierarchy behavior can be monitored and analyzed to optimize task scheduling or data distribution algorithms Characterize processors in environments not easily characterized by benchmarking Help system developers bring up and debug their systems The performance monitor uses the following resources The performance monitor mark bit in the MSR MSR PMM This bit controls which programs are monitored The move to from performance monitor registers PMR instructions mtpmr and mfpmr The external input pm_event_in PMRs The performance monitor counter registers PMCO PMC3 are 32 bit counters used to count software selectable events Each counter counts up to 128 events UPMCO UPMC3 provide user level read access to these registers Reference events are those that should be applicable to most microprocessor microarchitectures and be of general value They are identified in Table 11 9 The performance monitor global control register PMGC0 controls the counting of performance monitor events It takes priority over all other performance monitor control registers UPMGCO provides user level read access to PMGCO The performance monitor local co
595. tions OEA Processor control instructions are used to read from and write to the condition register CR machine state register MSR and special purpose registers SPRs and to read from the time base register TBU or TBL e300 Power Architecture Core Family Reference Manual Rev 3 28 Freescale Semiconductor Instruction Set Model 3 2 6 2 1 Move To From Machine State Register Instructions Table 3 30 lists the instructions provided by the core for reading from or writing to the MSR Table 3 30 Move To From Machine State Register Instructions Name Mnemonic Operand Syntax Move from Machine State Register mfmsr rD Move to Machine State Register mtmsr rs 3 2 6 2 2 Move To From Special Purpose Register Instructions Simplified mnemonics are provided for the mtspr and mfspr instructions so they can be coded with the SPR name as part of the mnemonic rather than as a numeric operand See Appendix F Simplified Mnemonics in the Programming Environments Manual for simplified mnemonic examples The mtspr and mfspr instructions are shown in Table 3 31 Table 3 31 Move To From Special Purpose Register Instructions Name Mnemonic Operand Syntax Move from Special Purpose Register mfspr rD SPR Move to Special Purpose Register mtspr SPR rS For mtspr and mfspr instructions the SPR number coded in assembly language does not appear directly as a 10 bit binary number in the instructio
596. tor Overview Table 1 3 Differences Between e300 and G2_LE Cores continued e300 Core G2_LE Core Impact Instruction fetch bursts to caching inhibited space Single beat instruction fetches to caching inhibited space The 300 s instruction fetch burst extension allows all caching inhibited instruction fetches to be performed on the bus as burst transactions even though the instructions are not cached This improves performance for instruction space that is caching inhibited because up to eight instructions are returned with one bus operation The G2_LE core must use single beat instruction fetches for caching inhibited space returning only two instructions per bus operation Instruction cache way protection The e300 core can protect locked ways in the instruction cache from invalidation the G2_LE does not support instruction cache way protection Data cache queue sharing The e300 has a new data cache queue sharing extension that allows the two burst write queues in the bus unit to be used interchangeably for cache replacements and snoop pushes Thus the data cache can support two outstanding cache replacements or two outstanding snoop push operations on the bus at any given time icbt instruction The e300 supports a new instruction cache block touch instruction that facilitates preloading the instruction cache before locking the G2_LE core requires speculatively fetching instructions before
597. truction The icbt instruction performs a bus read operation from the bus and allocates into the instruction cache This instruction is new to the e300 core and supplements the instruction cache locking mechanisms and the new way protect feature The icbt instruction is treated as a no op if touch load operation is disabled by HIDO NOPTT The icbt instruction is effective regardless of WIMG settings instruction or data cache enable status and the instruction cache lock status that is unconditional allocate This allows the icbt instruction to easily initialize the locked portion of the instruction cache before enabling Note that to prevent user level code from inadvertently overwriting a supervisor level page that has been locked in the instruction cache that page of memory should be protected with appropriate MMU translation and access privileges or by using HIDO NOPIT to treat the cht instruction as a no op The icbt instruction is dispatched to the load store unit and data cache and therefore goes through data side address translation It is treated similarly to debt for translation and storage protection purposes The data cache then issues the icbt operation to the bus unit without checking for a data cache hit Therefore if the instruction block is already residing in the data cache for example due to self modifying code the program must first perform a debf of that address to flush that data block back to main memory for the
598. tructions control the order in which memory operations are performed with respect to asynchronous events and the order in which memory operations are seen by other processors or memory access mechanisms See Chapter 4 Instruction and Data Cache Operation for additional information about these instructions and about related aspects of memory synchronization Implementation Notes The following describes how the core handles memory synchronization in the VEA e The Instruction Synchronize isync instruction causes the core to discard all prefetched instructions wait for any preceding instructions to complete and then branch to the next sequential instruction having the effect of clearing the pipeline behind the isync instruction s The Enforce In Order Execution of I O eieio instruction is used to ensure memory reordering of noncacheable memory access Because the core does not reorder noncacheable memory accesses the eieio instruction is treated as a no op Table 3 27 lists the VEA memory synchronization instructions for the core Table 3 27 Memory Synchronization Instructions VEA Name Mnemonic Operand Syntax Enforce In Order Execution of I O eieio Instruction Synchronize isync 3 2 5 3 Memory Control Instructions VEA Memory control instructions include the following types e Cache management instructions e Segment register manipulation instructions e Translation lookaside buffer management i
599. ts required to reach the locked ways to the value 1 so that the binary tree is traversed down one of the remaining branches The final PLRU value for the selected way of the cache set is written back to the cache for load hits store hits including debz and normal cache allocations The value written back is adjusted to ensure that way will not be immediately selected the next time s a replacement is required for that particular cache set The PLRU value written back is the PLRU value pointing to that way with its three critical bits inverted The three critical bits are the three bits that were traversed down the binary tree to reach the final way selection The remaining bits are left unchanged Inverting the three critical bits essentially converts the PLRU value to a PMRU pseudo most recently used value In the case of way locking the way locking override described above is not factored into the PLRU write back value In the case of a cache hit the PLRU write back value has the three critical bits inverted for the specific hitting way 4 7 L1 Cache Parity The e300 core includes parity checking for both the instruction and data caches A machine check interrupt is taken upon detecting an instruction or data cache parity error when HIDO ECPE and MSR ME are e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 25 Instruction and Data Cache Operation both set For the instruction cache parity is chec
600. ts of GPRO GPR3 remain unchanged while MSR TGPR 1 Attempts to use GPR4 GPR31 with MSR TGPR 1 yield undefined results Temporarily replaces TGPRO TGPR3 with GPRO GPR3 for use by TLB miss routines The TGPR bit is set when either an instruction TLB miss data read miss or data write miss interrupt is taken The TGPR bit is cleared by an rfi instruction 15 ILE Interrupt little endian mode When an interrupt occurs this bit is copied into MSR LE to select the endian mode for the context established by the interrupt 16 EE External interrupt enable 0 The processor ignores external interrupts system management interrupts and decrementer interrupts 1 The processor is enabled to take an external interrupt system management interrupt or decrementer interrupt 17 PR Privilege level D The processor can execute both user and supervisor level instructions supervisor mode 1 The processor can only execute user level instructions user mode e300 Power Architecture Core Family Reference Manual Rev 3 12 Freescale Semiconductor Interrupts and Exceptions Table 5 8 MSR Bit Settings continued Bits Name Description 18 FP Floating point available this bit is read only on the e300c2 core 0 The processor prevents dispatch of floating point instructions including floating point loads stores and moves If execution of one of these types of instructions is attempted a floatin
601. ts to point to way LO of each set For the e300 core the proper use of the ICFI and DCFI bits is to set and clear them with two consecutive mtspr operations 22 23 E Reserved should be cleared 24 IFEM Enable M bit on bus for instruction fetches 0 M bit not reflected on bus for instruction fetches Instruction fetches are treated as nonglobal on the bus 1 Instruction fetches reflect the M bit from the WIM settings 25 DECAREN Decrementer auto reload not supported on the e300c1 0 Normal operation 1 Decrementer loads last mtdec value for precise periodic interrupt 25 26 Reserved should be cleared Bit 25 reserved in e300c1 only 27 FBIOB Force branch indirect on the bus 0 Register indirect branch targets are fetched normally 1 Forces register indirect branch targets to be fetched externally 28 ABE Address broadcast enable Controls whether certain address only operations such as cache operations are broadcast on the bus 0 Address only operations affect only local caches and are not broadcast 1 Address only operations are broadcast on the bus Affected instructions are debi dcbf and dcbst Note that these cache control instruction broadcasts are not snooped by the e300 core Refer to Section 4 3 3 Data Cache Control for more information e300 Power Architecture Core Family Reference Manual Rev 3 14 Freescale Semiconductor Register Model Table 2 5 e300 HIDO Field Descriptions continued
602. ttached system logic The hardware debug function accesses the JTAG test port providing a means for executing test routines and facilitating chip and software debugging All instruction and data address breakpoints are accessible in the IBCR and DBCR See Section 1 3 8 Debug Features for more information The stopped signal allows observation of the internal clock state and ext_halt can be used to force the core into a halted state See Chapter 5 Interrupts and Exceptions for more information on test signals 1 1 7 4 Clock Multiplier The internal clocking of the e300 core is generated from and synchronized to the external clock signal sysclk by means of a voltage controlled oscillator based PLL The PLL provides programmable internal processor clock multiplier ratios which multiply the externally supplied clock frequency The bus clock is the same frequency and is synchronous with sysclk The configuration of the PLL can be read by software from the hardware implementation register 1 HID1 1 1 7 5 Core Performance Monitor The performance monitor provides the ability to count predefined events and processor clocks associated with particular operations such as cache misses mispredicted branches or the number of cycles an execution unit stalls The count of such events can be used to trigger the performance monitor interrupt The performance monitor can be used to do the following e Improve system performance by monitorin
603. two integer units System Register Completion Unit Performance Monitor Completes up to two instructions per clock Power Time Base Dissipation Counter Control Decrementer Debug COP JTAG Interface PLL amp Clock Multiplier R Sequeniial Fetcher 64 Bit Instruction Queue Load Store Unit D MMU DBAT Array 16 Kbyte D Cache Touch Load Buffer Copy Back Buffer 32 Bit Address Bus 64 Bit Data Bus 64 bit Iwo Instructions Branch Processing Unit Floating Point Unit 16 Kbyte Cache Core Interface Figure 1 3 e300c3 Core Block Diagram e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Overview The e300c1 provide independent on chip 32 Kbyte eight way set associative physically addressed caches for instructions and data and on chip instruction and data memory management units MMUs The e300c2 and e300c3 include 16 Kbyte four way set associative instruction and data caches The MMUs contain 64 entry two way set associative data and instruction translation lookaside buffers DTLB and ITLB that provide support for demand paged virtual memory address translation and variable sized block translation The TLBs use a least recently used LRU replacement algorithm and the caches use a pseudo least recently used algorithm PLRU The core also supports block address translation through the use of two independent instruction
604. uction is executed no other integer instructions can also begin to execute In the e300c2 e300c3 e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor EN Instruction Timing however each of the two execution units can execute one multiply for a total of two multiply instructions executed in parallel See Table 7 4 for integer instruction execution timing Figure 7 9 shows how the e300c1 core handles integer instructions Execution of multiply half word unsigned instructions may take 2 6 cycles in the e300c1 The example shows the worst case timing for the following instruction sequence D mulhwu add 1 2 mulhwu 3 add Fetch in IQ In Dispatch Entry IQ0 IQ1 Wm Execute z Complete In CQ EEN In Retirement Entry CQ0 CQ1 Completion Queue Figure 7 9 Instruction Timing Integer Execution in the e300c1Core Figure 7 10 shows how the e300c2 e300c3 handle integer instructions The example shows the timing for the same instruction sequence as th
605. uctions are removed from the instruction stream at dispatch That is these instructions are allowed to fall through the instruction queue through the dispatch mechanism without either being passed to an execution unit and or given a position in the CQ Feed forwarding A e300 feature that reduces the number of clock cycles that an execution unit must wait to use a register When the source register of the current instruction is the same as the destination register of the previous instruction the result of the previous instruction is routed to the current instruction at the same time that it is written to the register file With feed forwarding the destination bus is gated to the waiting execution unit over the appropriate source bus saving the cycles which would be used for the write and read Fetch Retrieving instructions from either the cache or main memory and placing them into the instruction queue Finish Finishing occurs in the last cycle of execution In this cycle the CQ entry is updated to indicate that the instruction has finished executing Floating point register FPR Any of the 32 registers in the floating point register file These registers provide the source operands and destination results for floating point instructions Load instructions move data from memory to FPRs and store instructions move data from FPRs to memory The FPRs are 64 bits wide and store floating point values in double precision format Floating poin
606. ue LRU value The hreset signal can be asserted for the following reasons e System power on reset e System reset from a panel switch The following is also true after a hard reset operation e External checkstops are enabled e The on chip test interface has given control of the I Os to the rest of the chip for functional use e Because the reset interrupt has data and instruction translation disabled MSR DR and MSR IR both cleared the chip operates in real addressing mode as described in Section 6 2 Real Addressing Mode 5 5 1 2 Soft Reset As described in Section 5 1 2 Summary of Front End Interrupt Handling the soft reset interrupt is a type of system reset interrupt that is recoverable nonmaskable and asynchronous When sreset is asserted the processor attempts to reach a recoverable state by allowing the next instruction to either complete or cause an interrupt blocking the completion of subsequent instructions and allowing the completed store queue to drain see Section 7 1 Terminology and Conventions for the definition e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 19 Interrupts and Exceptions Unlike a hard reset no registers or latches are initialized however the instruction cache is disabled HIDO ICE 0 After sreset is recognized as asserted the processor begins fetching instructions from the system reset routine at offse
607. ultaneously written to the cache and forwarded to the requesting unit thus minimizing stalls due to load delays 4 6 2 Instruction Cache Fill Operations When the core is configured with a 64 bit data bus instruction cache blocks are loaded in four beats of 8 bytes each When the core is configured with a 32 bit bus cache block loads are performed with eight beats of 4 bytes each The burst fetch is performed as critical double word first On a cache miss the critical and following double words fetched from memory are simultaneously written to the instruction cache and forwarded to the dispatch queue thus minimizing stalls due to cache fill latency The instruction cache allows sequential fetching from the remainder of the cache block during a cache block load 4 6 3 Instruction Fetch Cancel Extension In superscalar architectures instructions are routinely fetched ahead of the time they are needed but these prefetched instructions may be cancelled due to instruction redirection such as by branches or interrupts The instruction fetch cancel extension improves the utilization of the instruction cache during such cancel operations The instruction fetch cancel extension allows a new instruction fetch to be issued to the cache or to the bus if a cancelled instruction fetch is pending or active on the bus This supports hit under cancel and miss under cancel instruction fetch operations This feature is enabled using HID2 IFEC 4 6 4 Data Cach
608. ument 10 4 10 5 Change IABR_ADDR variable to IABR CEA and IABR2_ADDR variable to IABR2 CEA to eliminate redundant information 10 2 1 10 4 Remove Breakpoint Enabled section as it was redundant with other information in this chapter e300 Power Architecture Core Family Reference Manual Rev 3 6 Freescale Semiconductor Glossary The glossary contains an alphabetical list of terms phrases and abbreviations used in this book Some of the terms and definitions included in the glossary are reprinted from IEEE Standard 754 1985 JEEE Standard for Binary Floating Point Arithmetic copyright 1985 by the Institute of Electrical and Electronics Engineers Inc with the permission of the IEEE A Architecture A detailed specification of requirements for a processor or computer system It does not specify details of how the processor or computer system must be implemented instead it provides a template for a family of compatible implementations Asynchronous interrupt interrupts that are caused by events external to the processor s execution In this document the term asynchronous interrupt is used interchangeably with the word interrupt Atomic access A bus access that attempts to be part of a read write operation to the same address uninterrupted by any other access to that address the term refers to the fact that the transactions are indivisible The PowerPC architecture implements atomic accesses through the lwarx stwex instruction
609. umerous system level performance optimizations For accesses performed with real addressing mode MSR IR 0 or MSR DR 0 for instruction or data access respectively the WIMG bits are automatically generated as 0b0011 the data is write back caching is enabled memory coherency is enforced and memory is guarded Careless use of these bits may create situations where coherency paradoxes are observed by the processor In particular this can happen when the state of these bits is changed without appropriate precautions For example when flushing the pages that correspond to the changed bits from the caches of all processors in the system is required or when the address translations of aliased physical addresses referred to as real addresses in the architecture specification specify different values for any of the WIM bits The core considers any of these cases to be a programming error that may compromise the coherency of memory These paradoxes can occur within a single processor or across several devices as described in Section 4 4 2 4 Coherency in Single Processor Systems 4 4 1 1 Write Through Attribute W When an access is designated as write through W 1 if the data is in the cache a store operation updates the cached copy of the data In addition the update is written to the external memory location as described below While the PowerPC architecture permits multiple store instructions to be combined for write thr
610. upport for all 32 bit PowerPC instructions e The core provides two implementation specific instructions used for software table search operations following TLB misses Load Data TLB Entry tlbld Load Instruction TLB Entry tlbli e The core implements the following instruction which is added to support critical interrupts also supported on the G2_LE This is a supervisor level context synchronizing instruction e300 Power Architecture Core Family Reference Manual Rev 3 22 Freescale Semiconductor Overview Return from Critical Interrupt rfci e The core implements the following instruction which is added to support easy start up initialization or reloading of the instruction cache Instruction Cache Block Touch icbt e The core provides the following performance monitor instructions Move to Performance Monitor Register mtpmr Move from Performance Monitor Register mfpmr 1 3 3 Cache Implementation The following sections describe the general cache characteristics as implemented in the PowerPC architecture and the core implementation 1 3 3 1 PowerPC Cache Characteristics The PowerPC architecture does not define hardware aspects of cache implementations The e300 core controls the following memory access modes on a page or block basis e Write back write through mode e Caching inhibited mode e Memory coherency Note that in the core a cache block is defined as eight words The VEA defines c
611. upt prefix bit MSR IP The IP bit is set on assertion of hreset 8 3 4 Core Quiesce Control Signals The core quiesce control signals greq and qack allow the processor to enter a low power state and bring bus activity to a quiescent state in an orderly fashion The system quiesce state is entered by configuring the processor to assert the greq output This signal allows the system to terminate or pause any bus activities that are normally snooped When the system is ready to enter the system quiesce state it asserts gack At this time the core may enter a quiescent low power state during which it stops snooping bus activity 8 4 IEEE 1149 1 Compliant Interface The e300 core boundary scan interface is a fully compliant implementation of the IEEE 1149 1 standard This section describes the core IEEE 1149 1 JTAG interface 8 4 1 IEEE 1149 1 Interface Description The e300 core has five dedicated JTAG signals described in Table 8 1 The tdi and tdo scan ports are used to scan instructions as well as data into the various scan registers for JTAG operations The scan operation is controlled by the test access port controller which is controlled by the tms input sequence The scan data is latched in at the rising edge of tck e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 9 Core Interface Operation Test reset trst is a JTAG optional signal used to reset the TAP controller asynchronousl
612. uses modified blocks to be flushed to system memory if they are the target of a debf instruction whereas by definition in the PowerPC architecture the debi instruction only invalidates modified blocks 3 2 6 3 2 Segment Register Manipulation Instructions The instructions listed in Table 3 34 provide access to the segment registers for the e300 core These instructions operate completely independently from the MSR IR and MSR DR bit settings Refer to Synchronization Requirements for Special Registers and TLBs in Chapter 2 Register Set in the Programming Environments Manual for serialization requirements and other recommended precautions to observe when manipulating the segment registers Table 3 34 Segment Register Manipulation Instructions Name Mnemonic Operand Syntax Move from Segment Register mfsr rD SR Move from Segment Register Indirect mfsrin rD rB e300 Power Architecture Core Family Reference Manual Rev 3 32 Freescale Semiconductor Instruction Set Model Table 3 34 Segment Register Manipulation Instructions continued Name Mnemonic Operand Syntax Move to Segment Register mtsr SR rS Move to Segment Register Indirect mtsrin rS rB 3 2 6 3 3 Translation Lookaside Buffer Management Instructions The address translation mechanism is defined in terms of segment descriptors and page table entries PTEs used by the processors to locate the effective to physical addres
613. ut this chapter the phrase next instruction implies the next instruction to complete in program order Note that interrupts can occur while an interrupt handler routine is executing and multiple interrupts can become nested It is up to the interrupt handler to save the states to allow control to ultimately return to the original excepting program Unless a catastrophic condition causes a system reset or machine check interrupt only one interrupt is handled at a time If for example a single instruction encounters multiple interrupt conditions those conditions are handled sequentially After the interrupt handler is finished instruction execution continues until the next interrupt is encountered However in many cases there is no attempt to re execute the instruction This method of recognizing and handling interrupts sequentially guarantees that interrupts are recoverable To prevent loss of state information interrupt handlers should save the information stored in SRRO and SRR1 or in CSRRO CSRR1 for critical interrupts soon after the interrupt is taken This prevents loss of e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 1 Interrupts and Exceptions information due to a system reset or machine check interrupt or to an instruction caused interrupt in the interrupt handler before disabling external interrupts In this chapter the following terminology is used to describe the various st
614. ve physically addressed PLRU replacement algorithm on the e300c1 16 Kbyte four way set associative instruction and data caches on the e300c2 and e300c3 Cache write back or write through operation programmable on a per page or per block basis Features for instruction and data cache locking and protection BPU that performs CR lookahead operations Address translation facilities for 4 Kbyte page size variable block size and 256 Mbyte segment size A 64 entry two way set associative ITLB and DTLB Eight entry data and instruction BAT arrays providing 128 Kbyte to 256 Mbyte blocks Software table search operations and updates supported through fast trap mechanism 52 bit virtual address 32 bit physical address e Facilities for enhanced system performance A 64 bit split transaction internal data bus interface to the coherent system bus CSB with burst transfers Support for one level address pipelining on the CSB interface True little endian mode for compatibility with other true little endian devices e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Overview Critical interrupt support Hardware support for misaligned little endian accesses Integrated power management Internal processor bus clock multiplier ratios Three power saving modes doze nap and sleep Automatic dynamic power reduction when internal functional units are idle In system testability and debugging f
615. ved for e300c1 should be cleared 11 ELRW Enable weighted LRU This bit enables the use of an adjusted weighted LRU 0 Normal operation 1 The dcbt dcbtst and dcbz instructions use and adjusted weighted LRU such that they always select and replace the lowest unlocked way in the data cache 12 NOKS Reserved for e300c1 should be cleared For e300c2 300c3 No kill for snoop This bit enables the forcing of kill type snoops to flush data instead of killing it 0 Normal operation 1 Forces write with kill snoops to flush instead of kill snoop can never kill data 13 HBE High BAT enable Regardless of the setting of HID2 HBE these BATs are accessible by mfspr and mtspr 0 IBAT 4 7 and DBAT 4 7 are disabled 1 IBAT 4 7 and DBAT 4 7 are enabled 14 15 Reserved should be cleared 16 18 IWLCK 0 2 Instruction cache way lock Useful for locking blocks of instructions into the instruction cache for time critical applications that require deterministic behavior 000 no ways locked 001 way 0 locked 010 way 0 through way 1 locked 011 way 0 through way 2 locked 100 way 0 through way 3 locked in e300c1 Way 0 through way 2 locked in e300c2 e300c3 101 way 0 through way 4 locked in e300c1 Way 0 through way 2 locked in e300c2 e300c3 110 way 0 through way 5 locked in e300c1 Way 0 through way 2 locked in e300c2 e300c3 111 way 0 through way 6 locked in e300c1 Way 0 through way 2 locked in e300c2 e300c3 Setting HI
616. volves an individual bus operation that reduces the effective bus bandwidth The frequent use of misaligned accesses is discouraged because they can compromise the overall performance 3 1 4 Floating Point Execution Model The e300c1 core provides hardware support for all single and double precision floating point operations for most value representations and all rounding modes The PowerPC architecture provides for hardware to implement a floating point system as defined in ANSI IEEE Standard 754 1985 IEEE Standard for Binary Floating Point Arithmetic For detailed information about the floating point execution model refer to Chapter 3 Operand Conventions in the Programming Environments Manual Note that the e300c2 core does not support floating point operations The IEEE 754 standard includes 64 and 32 bit arithmetic The standard requires that single precision arithmetic be provided for single precision operands The standard permits double precision arithmetic instructions to have either or both single precision or double precision operands but states that single precision arithmetic instructions should not accept double precision operands The UISA follows these guidelines e Double precision arithmetic instructions may have single precision operands but always produce double precision results e Single precision arithmetic instructions require all operands to be single precision and always produce single precision results
617. word to be compared with the first word of a PTE in the table search software routine to determine if a PTE contains the address translation for the instruction or data access The contents of ICMP and DCMP are automatically derived by the core when a TLB miss interrupt occurs HASH1 and HASH2 The HASH1 and HASH2 registers contain the primary and secondary PTEG addresses that correspond to the address causing a TLB miss These PTEG addresses are automatically derived by the core by performing the primary and secondary hashing function on the contents of IMISS or DMISS for an ITLB or DTLB miss interrupt respectively RPA The system software loads a TLB entry by loading the second word of the matching PTE entry into the RPA register and then executing the tlbli or tlbId instruction for loading the ITLB or DTLB respectively Instructions tlbli rB Loads the contents of the ICMP and RPA registers into the ITLB entry selected by lt ea gt and SRR1 WAY tlbld rB Loads the contents of the DCMP and RPA registers into the DTLB entry selected by lt ea gt and SRR1 WAY In addition the core contains the following features that do not specifically control the MMU but that are implemented to increase performance and flexibility in the software table search routines whenever one of the three TLB miss interrupts occurs e Temporary GPRO GPR3 These registers are available as r0 r3 when MSR TGPR is set The e300 core aut
618. xception is generated when any of the conditions specified in a trap instruction is met A floating point unavailable interrupt is caused by an attempt to execute a floating point instruction including floating point load store and move instructions when the floating point available bit is cleared MSR FP 0 Decrementer 00900 The decrementer interrupt occurs when DEC O changes from 0 to 1 This interrupt is enabled with MSR EE Critical interrupt 00A00 A critical interrupt is taken when cint is asserted and MSR CE 1 Reserved 00B00 00BFF System call 00C00 A system call interrupt occurs when a System Call sc instruction is executed e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor Interrupts and Exceptions Table 5 2 Interrupts and Exception Conditions continued Interrupt Type Vector OfSOL Exception Conditions hex Trace 00D00 A trace interrupt is taken when MSR SE 1 or when the currently completing instruction is a branch and MSR BE 1 Reserved 00E00 The e300 core does not generate an interrupt to this vector Other devices may use this vector for floating point assist interrupts Performance OOFOO Caused when pm_event_in is asserted monitor Instruction 01000 An instruction translation miss interrupt is caused when the effective address for an translation miss instruction fetch cannot be
619. xecuted by the branch processing unit BPU The BPU receives branch instructions from the fetch unit and performs CR lookahead operations on conditional branches to resolve them early achieving the effect of a zero cycle branch in many cases Some branch instructions can redirect instruction execution conditionally based on the value of bits in the CR When the branch processor encounters one of these instructions it scans the execution pipelines to determine whether an instruction in progress may affect the particular CR bit If no interlock is found the branch can be resolved immediately by checking the bit in the CR and taking the action defined for the branch instruction If an interlock is detected the branch is considered unresolved and the direction of the branch is predicted using static branch prediction as described in Conditional Branch Control in Chapter 4 Addressing Modes and Instruction Set Summary in the Programming Environments Manual The interlock is e300 Power Architecture Core Family Reference Manual Rev 3 22 Freescale Semiconductor Instruction Set Model monitored while instructions are fetched for the predicted branch When the interlock is cleared the branch processor determines whether the prediction was correct based on the value of the CR bit If the prediction is correct the branch is considered completed and instruction fetching continues If the prediction is incorrect the fetched instruction
620. xori V D xoris V D 64 bit instruction Supervisor level instruction Optional in the PowerPC architecture Load and store string or multiple instruction Supervisor and user level instruction e300 core implementation specific instruction oa A OUO N e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 41 Instruction Set Listings e300 Power Architecture Core Family Reference Manual Rev 3 42 Freescale Semiconductor Appendix B Instructions Not Implemented This appendix provides a list of the 32 and 64 bit instructions that are not implemented in the e300 core It also provides a list of the 64 bit SPR encodings not implemented by the e300 Note that any attempt to execute unimplemented instructions generates an illegal instruction exception Table B 1 lists the 32 bit instructions that are optional to the PowerPC architecture and not implemented by the e300 core Table B 1 32 Bit Instructions Not Implemented by the e300 core Mnemonic Instruction eciwx External Control In Word Indexed ecowx External Control Out Word Indexed fsqrt Floating Square Root Double Precision fsqrts Floating Square Root Single tlbia TLB Invalidate All Table B 2 provides a list of 64 bit instructions that are not implemented by 32 bit implementation such as the e300 core Table B 2 64 Bit Instructions Not Implemented by the e300 core
621. y Protection Violation Store Access with PTE C 0 Otherwise Page Table Z Search Operation PA 0 31 lt RPN II A 20 31 See Figure 6 9 Continue Access to Memory Subsystem with WIMG Bits from PTE Figure 6 8 Page Address Translation Flow for 32 Bit Implementations TLB Hit e300 Power Architecture Core Family Reference Manual Rev 3 26 Freescale Semiconductor Memory Management 3 The PTE in the selected PTEG is tested for a match with the virtual page number VPN of the access The VPN is the VSID concatenated with the page index field of the virtual address For a match to occur the following must be true PTE H 0 PYE V 1 PTE VSID VA 0 23 PTE API VA 24 29 4 Ifamatch is not found step 3 is repeated for each of the other seven PTEs in the primary PTEG If a match is found the table search process continues as described in step 8 If a match is not found within the eight PTEs of the primary PTEG the address of the secondary PTEG is generated 5 The first PTE PTEO in the secondary PTEG is read from memory Again because PTE reads typically have a WIM bit combination of 0b001 an entire cache line is burst into the on chip cache 6 The PTE in the selected secondary PTEG is tested for a match with the virtual page number VPN of the access For a match to occur the following must be true PTE H 1 PTE V 1 PTE VSID VA 0 23 PTE API VA 24 29
622. y The trst signal assures that the JTAG logic does not interfere with the normal operation of the device this signal can be asserted concurrently with the assertion of hreset The e300 core implements the JTAG debug in the same manner as does the G2 core with the exception of the 33 bit Run_N counter register in which the most significant 32 bits form a 32 bit counter The function of the least significant bit remains unchanged The Run_N counter is used by the debug functions to control the number of processor cycles that the processor runs before halting e300 Power Architecture Core Family Reference Manual Rev 3 10 Freescale Semiconductor Chapter 9 Power Management The e300 core is specifically designed for low power operation It provides both automatic and program controllable power reduction modes for progressive reduction of power consumption This chapter describes the hardware support provided by the e300 core for power management 9 1 Overview The e300 core has explicit power management features that are described in this chapter Note that the design of the core is fully static allowing the internal processor core state to be preserved when no internal clock is present The device drivers must be modified for power management because operating systems service I O requests by system calls to the device drivers When a device driver is called to reduce the power of a device it needs to be able to check the power state
623. y a source FPR Instruction syntax used to identify a destination FPR Abbreviations or acronyms for registers are shown in uppercase text Specific bits fields or ranges appear in brackets For example MSR LE refers to the little endian mode enable bit in the machine state register In certain contexts such as a signal encoding this indicates a don t care Used to express an undefined numerical value NOT logical operator AND logical operator OR logical operator Indicates reserved bits or bit fields in a register Although these bits may be written to as either ones or zeros they are always read as zeros Acronyms and Abbreviations Table i contains acronyms and abbreviations that are used in this reference manual Table i Acronyms and Abbreviated Terms Term Meaning ABE Address bus enable ALU Arithmetic logic unit BAT Block address translation BATL Block address translation lower BATU Block address translation upper BE Branch trace enable BIST Built in self test BIU Bus interface unit BL Block size mask BPU Branch processing unit BUID Bus unit ID CE Critical interrupt enable CIA Current instruction address CMOS Complementary metal oxide semiconductor CMP IABR compare type e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor xxvii Table i Acronyms and Abbreviated Terms continued
624. y be locked in the cache and the desired instructions may not be in their expected way within the cache The icbt is a new feature to the e300 core and supplements the instruction cache locking mechanisms and the new way protect feature The icbt instruction performs a bus read operation from the bus and allocates into the instruction cache The icbt instruction functions independently of WIMG settings instruction or data cache enable status or the instruction cache lock status i e unconditional allocate This allows the icbt instruction to be used to easily initialize the locked portion of the instruction cache before enabling Note that to prevent user mode code from inadvertently over writing a supervisor mode page that has been locked in the instruction cache that page of memory should be protected with appropriate MMU translation and access privileges or HIDO NOPTT should be set in order to no op further uses of the instruction The icbt instruction is dispatched to the load store unit and data cache and therefore goes through data side address translation It is treated similarly to debt for translation and storage protection purposes The data cache then issues the icbt operation to the bus unit without checking for a data cache hit e300 Power Architecture Core Family Reference Manual Rev 3 42 Freescale Semiconductor Instruction and Data Cache Operation Therefore if the instruction block may already be residing in the da
625. y instruction tlbie whenever the corresponding PTE is modified Because the core is intended primarily for uniprocessor environments it does not provide coherency checking for TLBs between multiple processors If the e300 core is used in a multiprocessor environment where TLB coherency is required synchronization must be implemented in software Processors may write reference and change bits with unsynchronized atomic byte store operations Note that each V R and C bits reside in a distinct byte of a PTE Therefore extreme care must be taken to use byte writes when updating only one of these bits Explicitly altering certain MSR bits using the mtmsr instruction PTEs or certain system registers may have the side effect of changing the effective or physical addresses from which the current instruction stream is being fetched This kind of side effect is defined as an implicit branch Implicit branches are not supported and an attempt to perform one causes boundedly undefined results Therefore PTEs must not be changed in a manner that causes an implicit branch Chapter 2 Register Set in the Programming Environments Manual lists the possible implicit branch conditions that can occur when system registers and MSR bits are changed 6 5 4 Segment Register Updates Synchronization requirements for using the move to segment register instructions mtsr and mtsrin are described in Synchronization Requirements for Special Registers and f
626. y locked portion of the instruction cache to have the same persistence as main memory while still allowing the remaining unlocked portions of the instruction cache to be managed by the program 4 5 1 11 Cache Operation Broadcasting HIDO ABE In MEI mode the core does not broadcast cache operations caused by cache instructions They are intended for the management of the local cache but not for other caches in the system The HIDO ABE bit enables the cache management instructions to unconditionally cause a bus transaction either a push as a result of modified data that needs to be written back to memory or an address only transaction The use of ABE is generally intended to extend cache management to a front side level 2 cache In MESI mode the cache management instructions are automatically broadcast on the bus as defined by the M bit of the WIMG field 4 5 2 Cache Control Instructions The PowerPC architecture defines instructions for controlling both the instruction and data caches when they exist The core interprets the cache control instructions debt dcbtst dcbz dcbst dcbf debi icbt and icbi as if they pertain only to the core caches They are not intended for use in managing other caches in the system The debz instruction causes an address only broadcast on the bus if the addressed cache block is marked memory coherency required global through the WIMG bits This broadcast is performed for coherency reasons the debz instru
627. y page cause an interrupt to be taken Finally there is a facility in the VEA and OEA that allows pages or blocks to be designated as guarded preventing out of order accesses that may cause undesired side effects For example areas of the memory map that are used to control I O devices can be marked as guarded so that accesses for example instruction prefetches do not occur unless they are explicitly required by the program For more information on memory protection see Memory Protection Facilities in Chapter 7 Memory Management in the Programming Environments Manual 6 1 5 Page History Information The MMUs of these processors also define reference R and change C bits in the page address translation mechanism that can be used as history information relevant to the page This information can then be used by the operating system to determine the areas of memory to write back to disk when new pages must be allocated in main memory While these bits are initially programmed by the operating system into the page table the architecture specifies that the R and C bits may be maintained either by the processor hardware automatically or by some software assist mechanism that updates these bits when required as needed by the core The software table search routines used by the core set the R bit when a PTE is accessed the core causes an interrupt to vector to the software table search routines when the C bit in the corresponding TL
628. y to disable bus snooping including all bus activity Once the processor has entered a quiescent state it no longer snoops bus activity When the system logic has ensured that snooping is no longer necessary it allows the processor to enter the nap or sleep mode and causes the assertion of the core gack input signal for the duration of the nap mode period Nap mode is characterized by the following features s Time base decrementer still enabled e Most functional units disabled including bus snooping e PLL running and locked to internal sysclk To enter the nap mode the following conditions must occur e Set nap bit HIDO 9 1 MSR POW bit is set e e300 core asserts greq e System asserts gack e The processor core enters nap mode after several processor clocks To return to full power mode one of the following conditions must occur e Assert int smi or mcp internal signals e Decrementer interrupt e Hard reset or soft reset Transition to full power takes only a few processor cycles gack can remain asserted however greq negates before any bus transaction begins 9 3 1 5 Sleep Mode Sleep mode consumes the least amount of power of the four modes since all functional units are disabled To conserve the maximum amount of power the PLL and internal sysclk signals can be disabled Due to the fully static design of the e300 core the internal processor state is preserved when no internal clock is present Because the ti
629. ystem register unit that allows the dispatch and execution of multiple integer add and compare instructions on each cycle Refer to Chapter 7 Instruction Timing for more information Because the PowerPC architecture can be applied to such a wide variety of implementations instruction timing among processor cores varies accordingly 1 3 7 Core Interface The core interface is specific for each processor core implementation The e300 core provides a versatile core interface that allows for a wide range of implementations The interface includes a 32 bit address bus a 64 bit data bus and 56 control and information signals see Figure 1 7 The core interface allows for address only transactions as well as address and data transactions The core control and information signals include the address arbitration address start e300 Power Architecture Core Family Reference Manual Rev 3 Freescale Semiconductor 31 Overview address transfer transfer attribute address termination data arbitration data transfer data termination and core state signals Test and control signals provide diagnostics for selected internal circuits Address Arbitration lt _ gt Data Arbitration Address Start lt gt Data Transfer Address Transfer lt _ Data Termination Transfer Attribute lt gt Interrupt Checkstops e300 Core Address Termination lt gt Reset Clocks _ gt Processor S
630. ystem reset 5 3 5 18 Return from critical interrupt rfci 5 16 rfci 5 16 rfi 5 15 Rotate and shift instructions 3 12 A 16 RPA required physical address register 2 20 6 33 S SDR1 2 9 see also Memory management unit MMU Segment registers SRn 1 19 SR manipulation instructions A 24 Segmented memory model see Memory management unit MMU Self modifying code 3 17 Sequential consistency of memory accesses 4 13 Serializing instructions 7 17 Signals checkstop 8 9 cint 5 4 5 13 5 30 clock signals 8 3 coherency system bus CSB internal signals 8 2 external interrupt signals 8 3 int 5 4 5 12 5 25 overview 1 33 pll_cfgn 8 4 qack 8 3 qreq 8 3 smi 5 5 5 12 5 36 test interface signals 8 4 Single stepping 10 1 10 3 trace enable MSR SE 5 13 5 32 10 3 10 4 Sleep mode 9 2 9 4 9 6 Snooping 4 29 4 32 core bus interface unit BIU 4 2 global signaling and M bit 7 25 retry core initiated 4 32 see also Caches coherency snoop response to CSB transactions 4 30 4 31 Soft reset see also Reset 5 18 5 19 5 20 Software debug facilities see Debug facilities SPR encodings not implemented in e300 B 2 SPRGO SPRG7 1 5 2 10 2 11 2 22 5 11 conventional uses 5 11 SPRs special purpose registers 1 18 1 21 SRn segment registers 0 15 1 19 2 10 SRRO 1 save restore regs 0 1 2 10 5 8 5 10 5 14 5 15 5 16 5 17 bit settings for machine check interrupt 5 9 bit settings for table s
631. ze exception conditions out of order they are presented strictly in order When an instruction caused interrupt is recognized any unexecuted instructions that appear earlier in the instruction stream including any that have not yet entered the execute stage are required to complete before the interrupt is taken Any interrupts caused by those instructions are handled first Likewise asynchronous precise interrupts are recognized when they occur but are not handled until the instruction currently in the completion stage successfully completes execution or generates an interrupt and the completed store queue is emptied Unless a catastrophic condition causes a system reset or machine check interrupt only one interrupt is handled at a time If for example a single instruction encounters multiple interrupt conditions those conditions are handled sequentially After the interrupt handler completes the instruction execution continues until the next interrupt condition is encountered However in many cases there is no attempt to re execute the instruction This method of recognizing and handling interrupts sequentially guarantees that interrupts are recoverable To prevent the program state from being lost due to a system reset a machine check interrupt or an instruction caused interrupt in the interrupt handler interrupt handlers should save the information stored in SRRO and SRR1 early and before enabling external interrupts The PowerPC arch

e300 Power Architecture™ Core Family Reference Manual Supports

Contents

Download Pdf Manuals

Related Search

Related Contents

e300 Power Architecture&trade; Core Family Reference Manual Supports

Contents

Download Pdf Manuals

Related Search

Related Contents

e300 Power Architecture™ Core Family Reference Manual Supports