Home

MPC7450 RISC Microprocessor Family Product Brief

1. qi Auyu5 82 mopeys sus suononuisul p 49 8Z NNW uononasul uol els uolealesay z uun uollgls uonels 406e uU z suoneis uonenas y nss I Alua z nss Yd4 anss AujU3 9 anss Hd suayng eaweUeY 94 old YA uolTeAasey uoneAu s u UOeAasay uonesnosoy gun 1 B lul 1O199A Yun lnuu q 1O199A uollels uonels yooyo Jad SUONOMAJSU 0 dn sajajdwo5 nss z nu3 r nss HA P1OM Z l n no uolonulsul suononulsul 14 96 yun uon lduoo Aqu3 9 1 n ny uon lduioo Yun yoyedsiq 149434 Hq Aqu3 8702 LH8 Hlo A u3 8z4 O1L yun uononnsul yun Buirss o5oiq youeig sul S AHIV Jo nss 1 piO JO InO 1OlluONW SOUBWOLIDd juawabeuey JaMod JEUEY o l lul dOO OVLE JIAN 42019 J l u uu io q i lunoo seg Sul saunjeay IPuonippV iagram Block D MPC7448 Microprocessor Figure 2 MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor MPC7450 Microprocessor Overview 1 1 MPC7451 Microprocessor Overview The functionality between the MPC7451 and the MPC7450 is t
2. SPR 22 Thermal Management Register Instruction Cache Throttling Control Register ICTC SPR 1019 SPR 532 SPR 533 SPR 534 SPR 535 SPR 560 SPR 561 SPR 562 SPR 563 SPR 564 SPR 565 SPR 566 SPR 567 Exception IBAT2U IBAT2L IBAT3U IBAT3L IBAT4U IBAT4L IBAT5U IBAT5L IBAT6U IBAT6L IBAT7U IBAT7L SPRGs SPRG0 SPR 272 DBAT1L DBAT2U DBAT2L DBAT3U DBAT3L DBAT4U DBAT4L DBAT5U DBAT5L DBAT6U DBAT6L DBAT7U SPR 574 DBAT7L SPR 575 Handling Registers Data Address Register SPR 539 SR15 SPR S40 PTE High Low SPR 541 Registers Si oe PTEH SPR 981 543 PTELO SPR 982 SPR 568 TLB Miss Register SPR 569 SPR 570 TLBMISS SPR 980 SPR 571 spR1 SPR 572 SDR1 SPR 573 SPR25 Save and Restore Registers SPRG1 SPR 273 DAR SPR 19 SRRO SPR 26 SPRG2 SPR 274 SRR1 SPR 27 DSISR SPRG3 SPR 275 DSISR SPR 18 SPR 276 SPR 277 SPR 278 SPR 279 SPRG4 SPRG5 SPRG6 SPRG7 Load Store Cache Memory Subsystem Registers L2 Error Control and Capture Registers Instruction Cache Interrupt Control Register L2CAPTDATAHI SPR 988 Control Register L2CAPTDATALO SPR 989 ICTRL SPR 1011 LDSTCR_ SPR 1016 Memory S
3. suolonuisul p 8 8ZL NW uononnsul nss Anud z nss Ydd nss g Anu3 9 nss id uoneis si jjng we eu y 9 uonem as y uoensesey uoensesay luongAu s u L Yun 1 B 1ul 1O199A zuun 1 B lul 1O199A Yun lnuui q 1O199A uollels uollels uolle s alld YA nss z nud r nss HA PIOM Z l anand uononulsul yoo o ued suononu su uul O dn sajajdwo5 Aua 91 anand uoln lduoo suononulsul 49 96 yun uon lduoo Yun uoledsiq u1 Anua gr0z 1H8 Hlo Anua 8z 4 OILA yun uononnsul Hun Buiss 5oiq yourlg JOWUOP 99u8uuoJi9d juowabeuey JOMOd JEWJOUL BdeHOIU dOO OVIE JaldnINW 490 D 1J lu uu io q i lunoo seg Owl soinjeay IeuonippV Figure 1 MPC7450 Microprocessor Block Diagram ly Product Brief Rev 5 Fami MPC7450 RISC Microprocessor Freescale Semiconductor usnd e uo ajqejrene aq m Aue Buunsu s ulu 6 0 pawl s asy anano InolseO SUL sng e eq SOLUS o JO Ie1o1 p ulquuo2 e 104 uons s ounos 1i JEYS n n usnd pue n no 1nolseO Sul LS JON 1g v9 sng sseippy g 9 2 SSIN a015 lqe uoeo I
4. MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 63 How to Reach Us Home Page www freescale com email support freescale com USA Europe or Locations Not Listed Freescale Semiconductor Technical Information Center CH370 1300 N Alma School Road Chandler Arizona 85224 800 521 6274 480 768 2130 support freescale com Europe Middle East and Africa Freescale Halbleiter Deutschland GmbH Technical Information Center Schatzbogen 7 81829 Muenchen Germany 44 1296 380 456 English 46 8 52200080 English 49 89 92103 559 German 33 1 69 35 48 48 French support freescale com Japan Freescale Semiconductor Japan Ltd Technical Information Center 3 20 1 Minami Azabu Minato ku Tokyo 106 0047 Japan 0120 191014 81 3 3440 3569 support japan freescale com Asia Pacific Freescale Semiconductor Hong Kong Ltd Technical Information Center 2 Dai King Street Tai Po Industrial Estate Tai Po N T Hong Kong 800 2666 8080 support asia freescale com For Literature Requests Only Freescale Semiconductor Literature Distribution Center P O Box 5405 Denver Colorado 80217 800 441 2447 303 675 2140 Fax 303 675 2150 LDCForFreescaleSemiconductor hibbertgroup com MPC7450TS Rev 5 11 2004 Information in this document is provided solely to enable system and software implementers to use Freescale Semiconductor products There are no express or imp
5. The MPC7450 bus interface includes a 64 bit data bus with 8 bits of data parity a 36 bit address bus with 5 bits of address parity and additional control signals to allow for unique system level optimizations The bus interface protocol is configured using the BMODEO configuration signal at reset If BMODEDO is asserted at the negation of HRESET the MPC7450 uses the MPX bus protocol if BMODEO is negated during the negation of HRESET the MPC7450 uses a limited subset of the 60x bus protocol Note that the inverse state of BMODE 0 1 at the negation of HRESET is saved in MSSCRO BMODE 2 8 MPC7450 Bus Operation Features The MPC7450 has a separate address and data bus each with its own set of arbitration and control signals This allows for decoupling the data tenure from the address tenure of a transaction and provides for a wide range of system bus implementations including Nonpipelined bus operation Pipelined bus operation Split transaction operation The MPC7450 supports only the normal memory mapped address segments defined in the PowerPC architecture Access to direct store segments results in a DSI exception 2 8 1 MPX Bus Features The MPX bus has the following features e Extended 36 bit address bus plus 5 bits of odd parity 41 bits total 64 bit data bus plus 8 bits of odd parity 72 bits total a 32 bit data bus mode is not supported Support for a four state MESI cache coherence protocol On chip snoopin
6. MPC7447 TBEN RACK MPC7457 1 lt AACK gt 1 MPX QREQ ARTRY OACK SHDOSHDI 1 S BvSEL gt 2 1 lt lt HT 4 gt lt EBMODEI0 1 PMON_IN 1c ee PMON_OUT DBG 1 gt DTI 0 3 4 lt SYSCLK ey PLL_CFG 0 4 DRDY 5 lt 4 lt EXT_QUAL lt 64 CLK_OUT DP O 7 _ TCK lt lt TP MA 1 1 TDO gt TEA TMS 1 1 lt 4 TRST r AV Vpp _1 DD OVpp GND 1 For the MPC7457 there are 19 L3 ADDR signals L3_ADDR 0 18 MPC7450 RISC Microprocessor Family Product Brief Rev 5 L3 Cache Address Data Note L3 cache interface is not supported in the MPC7447 L3 Cache Clock Control Interrupts Resets Processor Status Control Clock Control Test Interface JTAG Figure 8 MPX Bus Signal Groups in the MPC7447 and MPC7457 Freescale Semiconductor 27 MPC7450 Microprocessor Features Figure 9 illustrates the signal configuration in MPX bus mode for the MPC7447A BR INT a BR Address lt BG lt SMi Arbitration B 1 1 lt I gt 1 lt MCP Interrupts A 0 35 36 1 lt SRESET Resets Address lt AP 0 4 1 lt HRESET Transfer lt 5 CKSTP IN ee CKSTP_OUT 1 gt 2E TS S TT 0 4 lt 11041 5 x TBST i
7. There are no micro architectural differences between the MPC7447A and the MPC7447 The MPC7447A provides new functionality to reduce the power consumption on the microprocessor The following features were also added to the MPC7447A Additional bits to the HID1 register for dynamic frequency switching DFS Temperature diode Other than the new features the MPC7447A supports the same functionality as the MPC7447 1 8 MPC7448 Microprocessor Overview The MPC7448 operates similarly to the MPC7447A However the MPC7448 has a number of changes over the core in the MPC7447A Some of these changes are feature improvements and some are performance changes improvements or changes necessary for feature improvements The following changes were added to the MPC7448 e Larger L2 cache 1 Mbyte e 2 data error correction code ECC Extended L2 pipeline Expanded DFS capability DFS2 and DFS4 mode Out of order issue of AltiVec instructions Second cacheable store miss e Additional bits to the HID1 register for dynamic frequency switching DFS and PLL configuration e Signals with new functionality DFS2 DFS4 PLL_CFG 5 BVSEL 1 and LVRAM This document also describes the functionality of the MPC7448 All information herein applies to the MPC7448 except where otherwise noted in particular the L3 cache information does not apply to the MPC7448 which does not support the L3 cache or the L3 cache interface The SPRGs provide additional
8. 2 Mbyte 1 Mbyte 2 Mbyte 4 Mbyte Parity Byte Byte L3 bus ratios 2 1 2 5 1 3 1 3 5 1 4 1 5 1 6 1 2 1 2 5 1 3 1 3 5 1 4 1 5 1 6 1 6 5 1 7 1 7 5 1 8 1 Signals L3 address signals L3_ADDR 0 17 L3_ADDR 0 18 PLL configuration signals PLL_CFG 0 3 PLL_CFG 0 4 System Interface System bus multipliers 2 2 5 3 3 5 4 4 5 5 5 5 6 6 5 7 7 5 8 2 5 5 5 6 6 5 7 7 5 8 8 5 9 9 5 10 10 5 11 11 5 12 12 5 13 13 5 14 15 16 17 18 19 20 21 22 23 24 25 28 32 L3 cache interface is not supported on the MPC7441 and MPC7447 7 Differences Between MPC7447 and MPC7447A Table 7 compares the key features of the MPC7447A with the key features of the earlier MPC7445 and MPC7447 All are based on the MPC7450 RISC microprocessor and are very similar architecturally The MPC7447A is identical to the MPC7447 but includes the DFS and temperature diode features Table 7 Microarchitecture Comparison Microarchitectural Specs MPC7447A MPC7447 Basic Pipeline Functions Logic inversions per cycle 18 Pipeline stages up to execute 5 MPC7450 RISC Microprocessor Family Product Brief Rev 5 58 Freescale Semiconductor Differences Between MPC7447 and MPC7447A Table 7 Microarchitecture Comparison continued Microarchitectural Specs MPC7447A MPC7447 Total pipeline stages minimum
9. 32 entry vector register file VRs Vector permute unit VPU Vector integer unit 1 VIU1 handles short latency AltiVec integer instructions such as vector add instructions for example vaddsbs vaddshs and vaddsws Vector integer unit 2 VIU2 handles longer latency AltiVec integer instructions such as vector multiply add instructions for example vmhaddshs vmhraddshs and vmladduhm Vector floating point unit VFPU Three stage load store unit LSU Supports integer floating point and vector instruction load store traffic Four entry vector touch queue VTQ supports all four architected AltiVec data stream operations Three cycle GPR and AltiVec load latency byte half word word vector with single cycle throughput Four cycle FPR load latency single double with single cycle throughput No additional delay for misaligned access within double word boundary Dedicated adder calculates effective addresses EAs Supports store gathering Performs alignment normalization and precision conversion for floating point data Executes cache control and TLB instructions Performs alignment zero padding and sign extension for integer data Supports hits under misses multiple outstanding misses Supports both big and little endian modes including misaligned little endian accesses Three issue queues FIQ floating point issue queue VIQ vector issue queue and
10. 956 TBU SPR285 External Access PMC3 SPR957 MMCR2 1 SPR944 Sampled Instruction Instruction Address Register PMC4_ SPR 958 Address Register Breakpoint Register EAR SPR 282 PMC5 _ SPR 945 SIAR SPR 955 PMC6 SPR 946 BEE SPR 1010 Decrementer Thermal Management Register DEC SPR 22 Instruction Cache Throttling Control Register ICTC SPR 1019 1 MPC7441 MPC7451 specific register may not be supported on other processors that implement the PowerPC architecture 2 Register defined as optional in the PowerPC architecture Register defined by the AltiVec technology 4 MPC7451 specific register Figure 11 Programming Model MPC7441 MPC7451 Microprocessor Registers MPC7450 RISC Microprocessor Family Product Brief Rev 5 34 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation Figure 12 shows the MPC7445 MPC7455 MPC7447 MPC7457 and MPC7447A register set USER MODEL VEA Time Base Facility For Reading TBL TBR 268 TBU TBR 269 USER MODEL UISA General Purpose Count Register CTR SPR9 XER XER Link Register LR SPR1 SPR8 Performance Monitor Registers Floating Point Performance Counters UPMC1 SPR 937 UPMC2 SPR 938 UPMC3 SPR 941 UPMC4 SPR 942 UPMC5 SPR 929 UPMC6 SPR 930 Sampled Instruc
11. AltiVec execution units per clock cycle from the bottom two VIQ entries VIQI VIQO This means an instruction in VIQ1 does not have to wait for an instruction in VIQO that is waiting for operand availability The FIQ can accept one instruction from the dispatch unit per clock cycle It looks at the first instruction in its queue and determines if the instruction can be issued to the FPU in this cycle The execute stage accepts instructions from its issue queue when the appropriate reservation stations are not busy In this stage the operands assigned to the execution stage from the issue stage are latched The execution unit executes the instruction perhaps over multiple cycles writes results on its result bus and notifies the CQ when the instruction finishes The execution unit reports any exceptions to the completion stage Instruction generated exceptions are not taken until the excepting instruction is next to retire Most integer instructions have a 1 cycle latency so results of these instructions are available 1 clock cycle after an instruction enters the execution unit The FPU LSU IU2 VIU2 VFPU and VPU units are pipelined as shown in Chapter 7 AltiVec Technology Implementation in the MPC7450 RISC Microprocessor Family User s Manual Note that AltiVec computational instructions are executed in the four independent pipelined AltiVec execution units The VPU has a two stage pipeline the VIU1 has a one stage pip
12. Architectural Implementation The PowerPC architecture defines the term cache block as the cacheable unit The VEA and OEA define cache management instructions a programmer can use to affect cache contents 3 3 2 MPC7450 Microprocessor Cache Implementation The MPC7450 cache implementation is described in Section 1 2 4 On Chip L1 Instruction and Data Caches Section 1 2 5 L2 Cache Implementation and Section 1 2 6 L3 Cache Implementation The BPU also contains a 128 entry BTIC that provides immediate access to cached target instructions For more information see Section 1 2 2 2 Branch Processing Unit BPU 3 4 Exception Model The following sections describe the PowerPC exception model and the MPC7450 implementation A detailed description of the MPC7450 exception model is provided in Chapter 4 Exceptions of the MPC7450 RISC Microprocessor Family User s Manual 3 4 1 PowerPC Exception Model The OEA portion of the PowerPC architecture defines the mechanism by which processors that implement the PowerPC architecture invoke exceptions Exception conditions may be defined at other levels of the architecture For example the UISA defines conditions that may cause floating point exceptions the OEA defines the mechanism by which the exception is taken The PowerPC exception mechanism allows the processor to change to supervisor state as a result of unusual conditions arising in the execution of instruct
13. FPR VR 16 16 16 6 6 6 MPC7450 RISC Microprocessor Family Product Brief Rev 5 54 Freescale Semiconductor Differences Between MPC7450 and MPC7400 MPC7410 Table 4 MPC7450 and MPC7400 MPC7410 Feature Comparison continued Microarchitectural Feature MPC7451 MPC7400 MPC7410 Maximum Execution Throughput Short latency integer units IU1s 3 2 Vector units 2 any 2 of 4 units 2 permute integer Floating point unit 1 1 Out of Order Window Size in Execution Queues Short latency integer units 1 entry 3 queues 1 entry 2 queues Vector units In order 4 queues In order 2 queues Floating point unit In order In order Branch Processing Resources Prediction structures BTIC BHT link stack BTIC BHT BTIC size associativity 128 entry 4 way 64 entry 4 way BHT size 2K entry 512 entry Link stack depth 8 none Unresolved branches supported 3 2 Branch taken penalty BTIC hit 1 0 Minimum misprediction penalty 6 4 Execution Unit Timings Latency Throughput Aligned load integer float vector 3 1 4 1 3 1 2 1 2 1 2 1 Misaligned load integer float vector 4 2 5 2 4 2 3 2 3 2 3 2 L1 miss L2 hit latency 9 data access 9 11 13 instruction access lU1s adds subs shifts rotates compares logicals 1 1 1 1 Integer multiply 32 8 32 16 32 32 3 1 3 1 4 2 2 1
14. Microprocessor Features 2 2 Instruction Flow As shown in Figure 1 the MPC7450 instruction unit provides centralized control of instruction flow to the execution units The instruction unit contains a sequential fetcher 12 entry instruction queue IQ dispatch unit and branch processing unit BPU It determines the address of the next instruction to be fetched based on information from the sequential fetcher and from the BPU See Chapter 6 Instruction Timing of the MPC7450 RISC Microprocessor Family User s Manual for a detailed discussion of instruction timing The sequential fetcher loads instructions from the instruction cache into the instruction queue The BPU extracts branch instructions from the sequential fetcher Branch instructions that cannot be resolved immediately are predicted using either the MPC7450 specific dynamic branch prediction or the architecture defined static branch prediction Branch instructions that do not affect the LR or CTR are often removed from the instruction stream Chapter 6 Instruction Timing of the MPC7450 RISC Microprocessor Family User s Manual describes when a branch can be removed from the instruction stream Instructions dispatched beyond a predicted branch do not complete execution until the branch is resolved preserving the programming model of sequential execution If branch prediction is incorrect the instruction unit flushes all predicted path instructions and instructions are
15. PVR 287 Processor version register Read only register that identifies the version model and revision level of the processor MPC7450 RISC Microprocessor Family Product Brief Rev 5 40 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation Table 1 Register Summary for MPC7450 continued Name SPR Description SDAR Sampled data address register The MPC7450 does not implement the optional registers USDAR SDAR or the user level read only USDAR register defined by the PowerPC architecture Note that in previous processors the SDA and USDA registers could be written to by boot code without causing an exception this is not the case in the MPC7450 A mtspr or mfspr SDAR or USDAR instruction causes a program exception SDR1 25 Sample data register Specifies the base address of the page table entry group PTEG address used in virtual to physical address translation Implementation Note The SDR1 register has been modified with the SDR1 HTABEXT and SDR1 HTMEXT fields for the MPC7450 to support the extended 36 bit physical address when HIDO XAEN 1 SIAR 3 955 Sampled instruction address register Contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor exception condition USIAR provides user level read access to the SIAR SPRGO 272 275 SPRGO 3 Provided for operating system us
16. RISC Microprocessor Family User s Manual Note that the MPC7441 MPC7445 MPC7447 MPC7447A and MPC7448 do not support the L3 cache or L3 cache interface The MPC7450 has three power saving modes nap sleep and deep sleep which progressively reduce power dissipation When functional units are idle a dynamic power management mode causes those units to enter a low power mode automatically without affecting operational performance software execution or external hardware Chapter 1 Overview of the MPC7450 RISC Microprocessor Family User s Manual describes how the power management can be used to reduce power consumption when the processor or portions of it are idle It also describes how the instruction cache throttling mechanism reduces the instruction dispatch rate The information in these sections are described more fully in Chapter 10 Power and Thermal Management of the MPC7450 RISC Microprocessor Family User s Manual The performance monitor facility provides the ability to monitor and count predefined events such as processor clocks misses in the instruction cache data cache or L2 cache types of instructions dispatched mispredicted branches and other occurrences The count of such events which may be an approximation can be used to trigger the performance monitor exception Chapter 1 Overview of the MPC7450 RISC Microprocessor Family User s Manual describes the operation of the performance monitor diagnostic
17. Tags are sectored to support either two or four cache blocks per tag entry depending on the L2 cache size Each sector 32 byte cache block in the L3 cache has three status bits that are used to implement the MESI cache coherency protocol Accesses to the L3 cache can be designated as write back or write through and the L3 maintains cache coherency through snooping The L3 interface can be configured to use 1 or 2 Mbytes of the SRAM area as a private memory space The MPC7457 in particular can support 1 2 or 4 Mbytes of private memory Accesses to private memory do not propagate to the system bus The MPC7450 can also be configured to use 1 Mbyte of SRAM as L3 cache and a second Mbyte as private memory Also in this case private memory accesses do not propagate to the L3 cache or the external system bus The private memory space provides a low latency high bandwidth area for critical data or instructions Accesses to the private memory space do not propagate to the L3 cache nor are they visible to the external system bus The private memory space is also not snooped so the coherency of its contents must be maintained by software or not at all For more information see Chapter 3 L1 L2 and L3 Cache Operation of the MPC7450 RISC Microprocessor Family User s Manual The L3 cache control register L3CR provides control of L3 cache configuration and interface timing The L3 private memory control register L3PM configures the private mem
18. The MPC7450 L2 cache is implemented with an on chip 256 Kbyte eight way set associative physically addressed memory available for storing data instructions or both In the MPC7447 MPC7457 and MPC7447A the L2 cache is 512 Kbytes In the MPC7448 the L2 cache is 1 Mbyte The L2 cache supports parity generation and checking for both tags and data It responds with a 9 cycle load latency for an L1 miss that hits in L2 In the MPC7448 the L2 load access time is 11 cycles with ECC disabled and 12 cycles with ECC enabled The L2 cache is fully pipelined for single cycle throughput in the MPC7450 2 cycle throughput in the MPC7448 For information about the L2 cache implementation see Chapter 3 L1 L2 and L3 Cache Operation of the MPC7450 RISC Microprocessor Family User s Manual The L3 cache is implemented with an on chip eight way set associative tag memory and with external synchronous SRAMs for storing data instructions or both The external SRAMs are accessed through a dedicated L3 cache port that supports a single bank of 1 or 2 Mbytes of synchronous SRAMs for L3 cache data The L3 data bus is 64 bits wide and provides multiple SRAM options as well as quick quad word forwarding to reduce latency Alternately the L3 interface can be configured to use half or all of the SRAM area as a direct mapped private memory space For information about the L3 cache implementation see Chapter 3 L1 L2 and L3 Cache Operation of the MPC7450
19. and MPC7448 Interrupts resets These signals include the external interrupt signal checkstop signals and both soft reset and hard reset signals They are used to interrupt and under various conditions to reset the processor Processor status and control These signals enable the time base facility and are used to select the bus mode and control sleep mode Clock control These signals determine the system clock frequency They are also used to synchronize multiprocessor systems Test interface The JTAG IEEE 1149 1a 1993 interface and the common on chip processor COP unit provide a serial interface to the system for performing board level boundary scan interconnect tests Voltage selection These signal control the electrical characteristics of the I O circuitry of the device as appropriate to support various signaling levels NOTE Active low signals are shown with overbars For example ARTRY address retry and TS transfer start Active low signals are referred to as asserted active when they are low and negated when they are high Signals that are not active low such as AP 0 4 address bus parity signals and TT 0 4 transfer type signals are referred to as asserted when they are high and negated when they are low 2 9 3 MPX Bus Mode Functional Groupings Figure 7 illustrates the signal configuration in MPX bus mode for the MPC7450 MPC7451 MPC7441 MPC7455 and MPC7445 showing how the signals are grouped A pinout dia
20. are divided into the following categories Integer instructions These include computational and logical instructions Integer arithmetic instructions Integer compare instructions Integer logical instructions Integer rotate and shift instructions Floating point instructions These include floating point computational instructions as well as instructions that affect the FPSCR Floating point arithmetic instructions Floating point multiply add instructions Floating point rounding and conversion instructions Floating point compare instructions Floating point status and control instructions Load and store instructions These include integer and floating point load and store instructions Integer load and store instructions Integer load and store multiple instructions Floating point load and store Primitives used to construct atomic memory operations Iwarx and stwex instructions Flow control instructions These include branching instructions condition register logical instructions trap instructions and other instructions that affect the instruction flow Branch and trap instructions Condition register logical instructions Processor control instructions These instructions are used for synchronizing memory accesses and management of caches TLBs and the segment registers Move to from SPR instructions Move to from MSR Synchronize MPC7450 RI
21. core clock is synchronized to SYSCLK with the aid of a VCO based PLL The PLL_CFG 0 4 signals PLL_CFG 0 5 in the MPC7448 are used to program the internal clock rate to a multiple of SYSCLK as defined in the hardware specifications The bus clock is maintained at the same frequency as SYSCLK SYSCLK does not need to be a 50 duty cycle signal The MPC7450 generates the clock for the external L3 synchronous data RAMs The clock frequency for the RAMs is divided down from and phase locked to the MPC7450 core clock frequency using a divisor selected through L3CR L3CLK Note that the MPC7441 MPC7445 MPC7447 MPC7447A and MPC7448 do not support the L3 cache or the L3 cache interface 2 10 Power and Thermal Management The MPC7450 is designed for low power operation It provides both automatic and program controlled power reduction modes If an MPC7450 functional unit is idle it automatically goes into a low power mode This mode does not affect operational performance Dynamic power management automatically supplies or withholds power to execution units individually based upon the contents of the instruction stream The operation of dynamic power management is transparent to software or any external hardware The following three programmable power modes are available to the system Nap lInstruction fetching is halted Only those clocks for time base decrementer and JTAG logic remain running The MPC7450 goes into the doze state to snoop
22. exception for cache inhibited AltiVec loads and stores and write through stores that execute when in 60x bus mode Both exceptions are described fully in Chapter 4 Exceptions of the MPC7450 RISC Microprocessor Family User s Manual Also the default setting for VSCR NJ bit has changed from being non Java compliant VSCR NJ 1 in the MPC7400 7410 to having a default setting of Java compliant VSCR NJ 0 in the MPC7450 The AltiVec implementation is described fully in Chapter 7 AltiVec Technology Implementation of the MPC7450 RISC Microprocessor Family User s Manual 4 Differences Between MPC7450 and MPC7400 MPC7410 Table 4 compares the key features of the MPC7450 with the earlier MPC7400 MPC7410 To achieve a higher frequency the number of logic levels per clock cycle is reduced In addition the pipeline of the MPC7450 is extended compared to the MPC7400 while maintaining the same level of performance in terms of number of instructions executed per clock cycle Table 4 shows these differences Table 4 MPC7450 and MPC7400 MPC7410 Feature Comparison Microarchitectural Feature MPC7451 MPC7400 MPC7410 Basic Pipeline Functions Logic inversions per cycle 18 28 Pipeline stages up to execute 5 3 Total pipeline stages minimum 7 4 Pipeline maximum instruction throughput 3 branch 2 branch Pipeline Resources Instruction queue size 12 6 Completion queue size 16 8 Renames GPR
23. fetched from the correct path 2 2 1 Instruction Queue and Dispatch Unit The instruction queue IQ shown in Figure 1 holds as many as 12 instructions and loads as many as 4 instructions from the instruction cache during a single processor clock cycle The fetcher attempts to initiate a new fetch every cycle The two fetch stages are pipelined so as many as four instructions can arrive to the IQ every cycle All instructions except branch bx Return from Exception rfi System Call sc Instruction Synchronize isync and no op instructions are dispatched to their respective issue queues from the bottom three positions in the instruction queue IQ0 IQ2 at a maximum rate of three instructions per clock cycle Reservation stations are provided for the three IU1s IU2 FPU LSU VPU VIU2 VIU1 and VFPU The dispatch unit checks for source and destination register dependencies determines whether a position is available in the CQ and inhibits subsequent instruction dispatching as required Branch instruction can be detected decoded and predicted from entries IQO IQ7 See Chapter 6 Instruction Timing of the MPC7450 RISC Microprocessor Family User s Manual 2 2 2 Branch Processing Unit BPU The BPU receives branch instructions from the IQ and executes them early in the pipeline achieving the effect of a zero cycle branch in some cases Branches with no outstanding dependencies CR LR or CTR unresolved can be processed and r
24. follows Instruction cache one parity bit per instruction Data cache one parity bit per byte of data No snooping of instruction cache except for icbi instruction Caches implement a pseudo least recently used PLRU replacement algorithm within each way Data cache supports AltiVec LRU and transient instructions Critical double and or quad word forwarding is performed as needed Critical quad word forwarding is used for AltiVec loads and instruction fetches Other accesses use critical double word forwarding On chip level 2 L2 cache has the following features Integrated 256 Kbyte eight way set associative unified instruction and data cache 512 Kbyte for the MPC7457 MPC7447 and MPC7447A 1 Mbyte for the MPC7448 Fully pipelined to provide 32 bytes per clock cycle to the L1 caches Total latency of 9 processor cycles for L1 data cache miss that hits in the L2 In the MPC7448 total latency of 11 processor cycles for L1 data cache miss that hits in the L2 with ECC disabled 12 cycles when ECC is enabled MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor MPC7450 Microprocessor Features Uses one of two random replacement algorithms selectable through L2CR Cache write back or write through operation programmable on a per page or per block basis Organized as 32 bytes block and 2 blocks sectors line a cache block is the block of memory that a coherency sta
25. i iz TBEN Address REO Transfer lt TSIZ 0 2 3 1 a gt Attributes GBL s WT 1 1 lt BVSEL Processor lt 1 1 lt Status Cl BMODE 0 1 Control cae 1 roar eects ile PMON_IN RACK MPC7447A PMON_OUT 1 MPX Address z ARTRY 1 Transfer SHD0 SHD1 Termination B 2 SYSCLK HIT 1 1 lt L lt 5 le PLL_CFG 0 4 Clock DBG 1 Tle EXT_QUAL Control Data DTI 0 3 CLK_OUT s 4 1 gt Arbitration bese DRDY 1 x D 0 63 Data lt 2l 64 1 TCK Transfer lt 0 7 8 i TDI 1 lt TDO Test 1 gt Interface Transfer TEA 1 lt TRST Termination 1 Vpp L AVpp VDD_SENSE L GVpp OVpp GND OVDD_SENSE GND_SENSE Figure 9 MPX Bus Signal Groups in the MPC7447A MPC7450 RISC Microprocessor Family Product Brief Rev 5 28 Freescale Semiconductor MPC7450 Microprocessor Features Figure 10 illustrates the MPC7448 s signal configuration in MPX bus mode Add lt BE 1 1 Dy ress Saal Arbitration _ BG gt 1 1 SMI 1 MCP Interrupts A 0 35 _ 36 je SBESET Reset Address lt gt HRESET Transfer lt AP 0 4 gt 5 1 1 CKSTP_IN a CKSTP_OUT TS 1 lt gt 1 he TT 0 4 gt 5 TBST 1 BEN Address TSIZ 0 2 1 QREQ Transfer lt 3 QACK Attributes ag GBL _ 1 1 lt BM
26. in a cache line share the same address tag each block maintains the three separate status bits for the 8 words of the cache block the unit of memory at which coherency is maintained Thus each cache line can contain 16 contiguous words from memory that are read or written as 8 word operations MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 19 MPC7450 Microprocessor Features The MPC7450 integrated L2 cache organization is shown in Figure 5 aie aie a I Line 4 Line 5 Address Tag 5 Q v 72 3 T n T N Q v Cc on Q a n i 1 9I Bsd Ered eed Reo Line 6 Address Tag 6 Words 0 7 Words 8 15 toe 7 were Address Tag 7 Block 0 Block 1 Figure 5 L2 Cache Organization for MPC7450 Line 7 Figure 6 shows L2 cache organization for the MPC7447 MPC7457 MPC7447A and MPC7448 Line 0 Words 0 7 Status Words 8 15 Line 1 Words 8 15 Line 2 Words 8 15 Line 3 Words 8 15 Lre worse as ONM res ae 92203 oaf wama L i ftw fe were L Block 0 Block 1 i Figure 6 L2 Cache Organization for the MPC7447 MPC7457 MPC7447A and MPC7448 MPC7450 RISC Microprocessor Family Product Brief Rev 5 20 Freescale Semiconductor MPC7450 Microprocessor Features The L2 cache controller contains the L2 cache control register L2CR which does the following Includes bits for enabling
27. memory whether the access causes a snoop hit from another device that generates additional activity and other conditions that affect memory accesses To improve throughput the MPC7450 implements pipelining superscalar instruction issue branch folding removal of fall through branches three level speculative branch handling and multiple execution units that operate independently and in parallel As an instruction passes from stage to stage the subsequent instruction can follow through the stages as the preceding instruction vacates them allowing several instructions to be processed simultaneously Although it may take several cycles for an instruction to pass through all the stages when the pipeline is full one instruction can complete its work on every clock cycle Figure 14 represents a generic four stage pipelined execution unit which when filled has a throughput of one instruction per clock cycle E0 E2 E3 Clock 0 Instruction A III lll II x WHHL Clock 1 Instruction B Instruction A Clock 2 Instruction C Instruction B Instruction A Clock 3 Instruction D Instruction C Instruction B Full pipeline Clock 4 Instruction E Instruction D Instruction C Instruction B Full pipeline Figure 14 Pipelined Execution Unit I I Instruction A I Figure 15 shows the entire path that instructions take through the fetch1 fetch2 decode dispatch execute issue complete and write back stages which is consi
28. reset Machine check 0x00200 Assertion of TEA during a data bus transaction assertion of MCP an address bus parity error on the MPX bus a data bus parity error on the MPX bus an L1 instruction cache error an L1 data cache error and a memory subsystem detected error including the following L2 data parity error e L2 tag parity error L3 SRAM error L3 tag parity error e Single bit and multiple bit L2 ECC errors MSR ME must be set Note that the L3 cache is not supported on the MPC7441 MPC7445 MPC7447 MPC7447A and MPC7448 DSI 0x00300 As specified in the PowerPC architecture Also includes the following A hardware table search due to a TLB miss on load store or cache operations results in a page fault Any load or store to a direct store segment SR T 1 A lwarx or stwex instruction to memory with cache inhibited or write through memory cache access attributes ISI 0x00400 As specified in the PowerPC architecture External interrupt 0x00500 MSRI EE 1 and INT is asserted Alignment 0x00600 A floating point load store stmw stwcx Imw Iwarx eciwx or ecowx instruction operand is not word aligned A multiple string load store operation is attempted in little endian mode An operand of a dcbz instruction is on a page that is write through or cache inhibited for a virtual mode access An attempt to execute a dcbz instruction occurs when the cache is disabled or locked Program 0x00700 As specified in the PowerPC arch
29. size and flushing and invalidating the L2 cache L2ERRINJHI 985 L2 error registers The L2 cache supports error injection into the L2 data data ECC or L2ERRINJLO 986 tag which can be used to test error recovery software by deterministically creating error L2ERRINJCTL 987 scenarios L2AERRINJHI L2ERRINJLO and L2ERRINJCTL are error injection registers L2CAPTDATAHI 988 The rest of the registers error control and capture registers control the detection and L2CAPTDATALO 989 reporting of tag parity ECC and L2 configuration errors L2CAPTDATAECC 990 L2ERRDET 991 L2ERRDIS 992 L2ERRINTEN 993 L2ERRATTR 994 L2ERRADDR 995 L2ERREADDR 996 L2ERRCTL 997 L3CR 1018 L3 cache control register Includes bits for enabling parity checking setting the L3 to processor clock ratio and identifying the type of RAM used for the L3 cache implementation L3ITCRO 984 L3 cache input timing control register Includes bits for controlling the input AC timing of L3ITCR1 7 1001 the L3 cache interface L3ITCR2 7 1002 L3ITCR3 7 1003 L3OHCR 7 1000 L3 cache output hold control register Includes bits for controlling the output AC timing of the L3 cache interface of the MPC7457 L3PM 983 The L3 private memory register Configures the base address of the range of addresses that the L3 uses as private memory not cache LDSTCR 1016 Load store control register Controls data L1 cache way locking MMCRo 8 952 Monitor mode control r
30. 1 3 1 4 2 Scalar float 5 1 VSFX vector simple 1 1 VCFX vector complex 4 1 VFPU vector float 4 1 VPER vector permute 2 1 MMUs TLBs instruction and data 128 entry 2 way Tablewalk mechanism Hardware software Instruction BATs Data BATs 8 8 L1 I Cache D Cache Features Size 32K 32K Associativity 8 way Locking granularity Way Parity on cache Word Parity on D cache Byte Number of D cache misses load store 5 1 5 2 Data stream touch engines 4 streams On Chip Cache Features Cache level L2 Size associativity 512 Kbyte 1 Mbyte 8 way 8 way MPC7450 RISC Microprocessor Family Product Brief Rev 5 62 Freescale Semiconductor Document Revision History Table 8 Microarchitecture Comparison continued Microarchitectural Specs MPC7447A MPC7448 Access width 32 bytes 16 bytes 2 Number of 32 byte sectors line 2 Parity Byte ECG No Yes Thermal Control Dynamic frequency switching DFS Yes Thermal diode Yes 1 42 cycles with ECC enabled 2 See Section 3 1 3 2 L2 Cache Block for more information 9 Document Revision History Table 9 provides a revision history for this product brief Table 9 Document Revision History ae Substantive Changes 3 Added information on the MPC7447 and MPC7457 4 Added information on the MPC7447A 5 Added information on the MPC7448
31. 3 2 5 4 Scalar floating point 5 1 3 1 VIU1 vector integer unit 1 shorter latency vector integer 1 1 1 1 VIU2 vector integer unit 2 longer latency vector integer 4 1 3 1 VFPU vector floating point 4 1 4 1 VPU vector permute 2 1 1 1 MMUs MMUs instruction and data 128 entry 2 way 128 entry 2 way Table search mechanism Hardware and software Hardware MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 55 Differences Between MPC7441 MPC7451 and MPC7445 MPC7455 Table 4 MPC7450 and MPC7400 MPC7410 Feature Comparison continued Microarchitectural Feature MPC7451 MPC7400 MPC7410 L1 Instruction Cache Date Cache Features Size 32K 32K 32K 32K Associativity 8 way 8 way Locking granularity style 4 Kbyte way Full cache Parity on instruction cache Word None Parity on data cache Byte None Number of data cache misses load store 5 1 8 any combination Data stream touch engines 4 streams 4 streams On Chip L2 Cache Features Cache level L2 Size associativity 256 Kbytes 8 way Tags and controller only see off chip cache support below Access width 256 bits Number of 32 byte sectors line 2 Parity Byte Off Chip Cache Support Cache level L3 L2 On chip tag logical size 1 Mbyte 2 Mbytes 512 Kbytes 1 Mbyte 2 Mbytes Associativity 8 way 2 way Number of 32 byte sectors line 2 4 1 2 4 Off chip data SRAM support
32. 450 For the full table see Table 1 1 in the MPC7450 RISC Microprocessor Family User s Manual Table 1 Register Summary for MPC7450 Name SPR Description UISA Registers CR Condition register The 32 bit CR consists of eight 4 bit fields CR0 CR7 that reflect results of certain arithmetic operations and provide a mechanism for testing and branching CTR 9 Count register Holds a loop count that can be decremented during execution of appropriately coded branch instructions The CTR can also provide the branch target address for the Branch Conditional to Count Register bcctrx instruction FPRO Floating point registers FPRn The 32 FPRs serve as the data source or destination for FPR31 all floating point instructions FPSCR Floating point status and control register Contains floating point exception signal bits exception summary bits exception enable bits and rounding control bits for compliance with the IEEE 754 standard GPR0 General purpose registers GPRn The thirty two GPRs serve as data source or GPR31 destination registers for integer instructions and provide data for generating addresses LR 8 Link register Provides the branch target address for the Branch Conditional to Link Register bclrx instruction and can be used to hold the logical address of the instruction that follows a branch and link instruction typically used for linking to subroutines U
33. 7 Pipeline maximum instruction throughput 3 branch Pipeline Resources Instruction buffer size 12 Completion buffer size 16 Renames integer float vector 16 16 16 Maximum Execution Throughput SFX 3 Vector 2 any 2 of 4 units Scalar floating point 1 Out of Order Window Size in Execution Queues SFX integer units 1 entry x 3 queues Vector units In order 4 queues Scalar floating point unit In order Branch Processing Resources Prediction structures BTIC BHT link stack BTIC size associativity 128 entry 4 way BHT size 2K entry Link stack depth 8 Unresolved branches supported 3 Branch taken penalty BTIC hit 1 Minimum misprediction penalty 6 Execution Unit Timings Latency Throughput Aligned load integer float vector 3 1 4 1 3 1 Misaligned load integer float vector 4 2 5 2 4 2 L1 miss L2 hit latency 9 data 13 instruction SFX aDd Sub Shift Rot Cmp logicals 1 1 Integer multiply 32 x 8 32 x 16 32 x 32 3 1 3 1 4 2 Scalar float 5 1 VSFX vector simple 1 1 VCFX vector complex 4 1 VFPU vector float 4 1 MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 59 Differences Between MPC7447 and MPC7447A Table 7 Microarchitecture Comparison continued Microarchitectural Specs MPC7447A MPC7447 VPER vec
34. Basic Pipeline Functions Logic inversions per cycle 18 Pipeline stages up to execute 5 Total pipeline stages minimum 7 Pipeline maximum instruction throughput 3 branch Pipeline Resources Instruction buffer size 12 Completion buffer size 16 Renames integer float vector 16 16 16 Maximum Execution Throughput SFX 3 Vector 2 any 2 of 4 units Scalar floating point 1 Out of Order Window Size in Execution Queues SFX integer units 1 entry x 3 queues Vector units In order 4 queues Scalar floating point unit In order Branch Processing Resources Prediction structures BTIC BHT link stack BTIC size associativity 128 entry 4 way MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 61 Differences Between MPC7447A and MPC7448 Table 8 Microarchitecture Compar ison continued Microarchitectural Specs MPC7447A MPC7448 BHT size 2K entry Link stack depth 8 Unresolved branches supported 3 Branch taken penalty BTIC hit 1 Minimum misprediction penalty 6 Execution Unit Timings Latency Throughput Aligned load integer float vector 3 1 4 1 3 1 Misaligned load integer float vector 4 2 5 2 4 2 L1 miss L2 hit latency 9 data 13 11 data 15 16 instruction instruction SFX aDd Sub Shift Rot Cmp logicals 1 1 Integer multiply 32 x 8 32 x 16 32 x 32 3
35. C7450 Microprocessor Exception Classifications Synchronous Asynchronous Precise Imprecise Exception Types Asynchronous nonmaskable Imprecise System reset machine check Asynchronous maskable Precise External interrupt system management interrupt decrementer exception performance monitor exception Synchronous Precise Instruction caused exceptions The exception classifications are discussed in greater detail in Section 4 2 MPC7450 Exception Recognition and Priorities For a better understanding of how the MPC7450 implements precise exceptions see Chapter 6 Instruction Timing of the MPC7450 RISC Microprocessor Family User s Manual Table 3 lists the exceptions implemented in the MPC7450 and conditions that cause them Table 3 also notes the MPC745 1 specific exceptions The three software table search exceptions support software page table searching and are enabled by setting HIDO STEN See Section 4 6 15 TLB Miss Exceptions and Chapter 5 Memory Management of the MPC7450 RISC Microprocessor Family User s Manual MPC7450 RISC Microprocessor Family Product Brief Rev 5 46 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation Table 3 Exceptions and Conditions Exception Type Vector Offset Causing Conditions Reserved 0x00000 System reset 0x00100 Assertion of either HRESET or SRESET or at power on
36. Caches The MPC7450 implements separate L1 instruction and data caches Each cache is 32 Kbyte eight way set associative As defined by the PowerPC architecture they are physically indexed Each cache block contains eight contiguous words from memory that are loaded from an eight word boundary that is bits EA 27 31 are zeros thus a cache block never crosses a page boundary An entire cache block can be updated by a four beat burst load across a 64 bit system bus Misaligned accesses across a page boundary can incur a performance penalty The data cache is a nonblocking write back cache with hardware support for reloading on cache misses The critical double word is transferred on the first beat and is forwarded to the requesting unit minimizing stalls due to load delays For vector loads the critical quad word is handled similarly but is transferred on the second beat The cache being loaded is not blocked to internal accesses while the load completes MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 17 MPC7450 Microprocessor Features The MPC7450 L1 cache organization is shown in Figure 3 Block 0 Block 1 Block 2 Block 3 Block 7 Words 0 7 ja 8 Words Block gt Figure 3 L1 Cache Organization The instruction cache provides up to four instructions per clock cycle to the instruction queue The instruction cache can be invalidated entirely or on a cache block basis It is inv
37. Freescale Semiconductor MPC7450TS Product Brief Rev 5 11 2004 MPC7450 RISC Microprocessor Family Product Brief This product brief provides an overview of the MPC7450 microprocessor features including a block diagram showing the major functional components This document also provides information about how the MPC7450 implementation complies with the PowerPC and AltiVec architecture definitions The MPC7450 RISC Microprocessor Family User s Manual supports the MPC7441 MPC7445 MPC7451 MPC7455 MPC7457 MPC7447 MPC7447A and MPC7448 Any differences between the MPC7450 and the other microprocessors including the MPC7451 are noted in the user s manual 1 MPC7450 Microprocessor Overview This section describes the features and general operation of the MPC7450 and provides a block diagram showing the major functional units The MPC7450 implements the PowerPC architecture and is a reduced instruction set computer RISC microprocessor The MPC7450 consists of a processor core 32 Kbyte separate L1 instruction and data caches a 256 Kbyte L2 cache 512 Kbyte for MPC7457 and 1 Mbyte for the MPC7448 and an internal L3 controller with tags that support a glueless backside L3 cache through a dedicated high bandwidth interface The MPC7441 MPC7445 MPC7447 MPC7447A and MPC7448 do not support the L3 cache and the L3 interface The core is a high performance superscalar design supporting multiple execution units including four independent uni
38. GIQ general purpose issue queue can accept as many as one two and three instructions respectively in a cycle Instruction dispatch requires the following Instructions can be dispatched only from the three lowest IQ entries IQO IQ1 and IQ2 A maximum of three instructions can be dispatched to the issue queues per clock cycle Space must be available in the completion queue CQ for an instruction to dispatch this includes instructions that are assigned a space in the CQ but not in an issue queue e Rename buffers 16 GPR general purpose register rename buffers 16 FPR floating point register rename buffers 16 VR vector register rename buffers Dispatch unit The decode dispatch stage fully decodes each instruction Completion unit The completion unit retires an instruction from the 16 entry CQ when all instructions ahead of it have been completed the instruction has finished execution and no exceptions are pending Guarantees sequential programming model precise exception model Monitors all dispatched instructions and retires them in order Tracks unresolved branches and flushes instructions after a mispredicted branch Retires as many as three instructions per clock cycle MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 9 MPC7450 Microprocessor Features L1 cache has the following characteristics Two separate 32 Kbyte instr
39. I I 1ole nuinoov sng suonu Alul p 2 49194 uollonisul usng doous slnolseO 1 g u919J91d Z7 0SZ1 anand 21015 z1 OL anano ysnd e eneno T JE 1nolseo snes g ssIN peo1 1 snieig sper 077 anand peo1 17 n no iuols sng LL anano MPC7450 Microprocessor Overview peo oSeJi 1ul Sng uu ls sS 1 8 z8 4901 e14g ze 0 4901g ulq 1 llonuo Y Z7 p ltun 14qN L s n no 9IA49S anand 101 1 s7 wajsAsqns ioui ln SSI peoT Hun ulod Buneold sieyng w u y 9 z suonels said uolleAU9sS 8H s io s p l lduioo usnd 11 HWa 8c L Wa 8cl L HUN 1 B 1ul 10 39A Ndd 1O199A s 1o s 1nolseO 1 uogenojeo v3 p uslu q ulI65u3 uonol 10499A va Anug Z SUONEIS uun a101S peo7 si jjng ueu ti 9 eneno yonoL 109A uo emosoy keny vad y e9g q g11q Auq 821L reuiu sus lqy zg NWN Bed Keay IWal
40. L2 cache 512 Kbyte for MPC7447 MPC7457 and MPC7447A 1 Mbyte for the MPC7448 and L3 cache controller simultaneously Store miss queue transactions are queued up in the L2 cache controller and sent to the L3 cache if necessary If no match is found in the L2 or L3 cache tags the physical address is used to access system memory In addition to loads stores and instruction fetches the MPC7450 performs hardware table search operations following TLB misses L1 L2 and L3 cache castout operations and cache line snoop push operations when a modified cache line detects a snoop hit from another bus master MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 23 MPC7450 Microprocessor Features 2 9 1 System Interface Operation The primary activity of the MPC7450 system interface is transferring data and instructions between the processor and system memory There are three types of transfer accesses Single beat transfers These memory accesses allow transfer sizes of 1 2 3 4 or 8 bytes in one bus clock cycle Single beat transactions are caused by uncacheable read and write operations that access memory directly that is when caching is disabled cache inhibited accesses and stores in write through mode Two beat burst 16 byte data transfers Generated to support caching inhibited or write through AltiVec loads and stores only generated in MPX bus mode and for caching inhibited instruction fetch
41. MMCRO 936 User monitor mode control registers UMMCRn Used to enable various performance UMMCRI1 940 monitor exception functions UMMCRs provide user level read access to MMCR UMMCR2 928 registers UPMC1 937 938 User performance monitor counter registers UPMCn Used to record the number of UPMG6 941 942 times a certain event has occurred UPMCs provide user level read access to PMC 929 930 registers USIAR 939 User sampled instruction address register Contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor exception condition USIAR provides user level read access to the SIAR VRO VR31 Vector registers VRn Data source and destination registers for all AltiVec instructions VRSAVE 2 256 Vector save restore register Defined by the AltiVec technology to assist application and operating system software in saving and restoring the architectural state across process context switched events The register is maintained only by software to track live or dead information on each AltiVec register VSCR Vector status and control register A 32 bit vector register that is read and written in a manner similar to the FPSCR XER 1 Indicates overflows and carries for integer operations Implementation Note To emulate the POWER architecture Iscbx instruction XER 16 23 are be read with mfspr XER and written with mtspr XER MPC7450 RI
42. MSUG2 DDR LW PB2 LW PB2 PB3 Data path width 64 64 Private memory SRAM sizes 1 Mbyte 2 Mbytes 512 Kbyte 1 Mbyte 2 Mbytes Parity Byte Byte 1 Numbers in parentheses are for 2 1 SRAM 5 Differences Between MPC7441 NPC7451 and MPC7445 MPC7455 Table 5 compares the key differences between the MPC7451 and the MPC7455 The table provides the section number where the details of the differences are discussed Differences between the two processors are defined throughout the MPC7450 RISC Microprocessor Family User s Manual Table 4 provides a high level overview to the differences Table 5 shows these differences MPC7450 RISC Microprocessor Family Product Brief Rev 5 56 Freescale Semiconductor Differences Between MPC7441 MPC7451 and MPC7447 MPC7457 Table 5 MPC7451 and MPC7455 Differences Microarchitectural Feature MPC7441 MPC7451 MPC7445 MPC7455 MMU Block address translation BAT 16 BAT registers 32 BATs registers 8 additional instruction and 8 Maps regions of memory data BAT registers IBAT4U IBAT4L IBAT5U IBAT5L IBAT6U IBAT6L IBAT7U IBAT7L DBAT4U DBAT4L DBAT5U DBAT5L DBAT6U DBAT6L DBAT7U DBAT7L SPRGs 4 SPRs 8 SPRs Used by system software for 4 additional SPRs registers software table searches SPRG4 SPRG7 Additional HIDO bits HIDO HIGH_BAT_EN 1 enables additional BATs Block size range HIDO XBSEN 1 128 K
43. Monitor Control F SPRG3 SPR275 DSISR__ SPR 18 Floating Point UMMCRO_ SPR 936 Status and UMMCR1 SPR 940 Control Register UMMCR2_ SPR 928 FPSCR Load Store Instruction Cache L3 Private Memory Control Register Interrupt Control Register Processor ID Register Cache Memory Subsystem Registers AltiVec Registers LDSTCR SPR 1016 Register L3PM SPR 983 Vector Save Restore Vector Registers ICTRL SPR 1011 Memory Subsystem L3 Cache Control Register VR0 Status Control L2 Cache Control Registe VRSAVE SPR 256 Registers Register L3CR SPR1018 Vector Status an MSSCRO_ SPR 1014 L2CR SPR1017 Control Register MSSSRO SPR1015 VSCR L3 Cache Input Timing Control Register L3ITCRO SPR 984 Performance Monitor Registers Miscellaneous Registers Performance Monitor Control Breakpoint Address s Counters Registers Mask Register Time Base Data Address 2 For Writing Breakpoint Register2 PMC1 SPR 953 MMCR0 SPR 952 BAMR SPR 951 TBL SPR 284 DABR SPR 1013 PMC2 SPR954 MMCR12 SPR
44. ODEIO0 1 Processor 2 WT 2 lt 0 1 Status Z Cl 1 4 lt __PMON_IN Control PMON_OUT EEK MPC7448 DFS2 AACK gt 1 MPX s Address ARTRY gt 1 1 lt _ Transfer SHDOISHDT i a LIVRAM Termination x BVSEL 0 1 Nae HT 2 lt 0 1 DBG gt SYSCLK Data DTI 0 3 4 1 lt I Arbitration 4 lt PLL_CFG 0 5 lt DRDY 4 e EXT_QUAL Clock 1 CLK OUT Control D 0 63 Data lt s gt 64 Transfer lt DP 0 7 gt 3 4 lt TCK TDI Data TA 1 lt TDO Test Transfer TEA 1 Interface Termination hel qs JTAG 4 Le TRST Vpp L AVpp VDD_SENSE GVpp OVpp GND OVDD_SENSE GND_SENSE Figure 10 MPX Bus Signal Groups in the MPC7448 Signal functionality is described in detail in Chapter 8 Signal Descriptions and Chapter 9 System Interface Operation of the MPC7450 RISC Microprocessor Family User s Manual MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 29 MPC7450 Microprocessor Features 2 9 3 1 Clocking For functional operation the MPC7450 uses a single clock input signal SYSCLK from which clocking is derived for the processor core the L3 interface and the MPX bus interface Additionally internal clock information is made available at the pins to support debug and development The MPC7450 s clocking structure supports a wide range of processor to bus clock ratios The internal processor
45. SC Microprocessor Family Product Brief Rev 5 42 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation Instruction synchronize Order loads and stores Memory control instructions These instructions provide control of caches TLBs and SRs Supervisor level cache management instructions User level cache instructions Segment register manipulation instructions Translation lookaside buffer management instructions This grouping does not indicate the execution unit that executes a particular instruction or group of instructions Integer instructions operate on byte half word and word operands Floating point instructions operate on single precision one word and double precision one double word floating point operands The PowerPC architecture uses instructions that are four bytes long and word aligned It provides for byte half word and word operand loads and stores between memory and a set of 32 GPRs It also provides for word and double word operand loads and stores between memory and a set of 32 floating point registers FPRs Computational instructions do not modify memory To use a memory operand in a computation and then modify the same or another memory location the memory contents must be loaded into a register modified and then written back to the target location with distinct instructions Processors that implement the PowerPC architecture follow the program flow when they
46. SC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 37 MPC7450 Microprocessor Architectural Implementation Table 1 Register Summary for MPC7450 continued Name SPR Description VEA TBL TBR 268 Time base facility Consists of two 32 bit registers time base lower and upper registers TBU TBR 269 TBL TBU TBL TBR 268 and TBU TBR 269 can only be read from and not written For reading to TBU and TBL can be read with the move from time base register mftb instruction Implementation Note Reading from SPR 284 or 285 using the mftb instruction causes an illegal instruction exception OEA BAMR 951 Breakpoint address mask register Used in conjunction with the events that monitor IABR hits DABR 2 1013 Data address breakpoint register Optional register implemented in the MPC7450 and used to cause a breakpoint exception if a specified data address is encountered DAR 19 Data address register After a DSI or alignment exception DAR is set to the effective address EA generated by the faulting instruction DEC 22 Decrementer register A 32 bit decrementer counter used with the decrementer exception Implementation Note In the MPC7450 DEC is decremented and the time base increments at 1 4 of the system bus clock frequency DSISR 18 DSI source register Defines the cause of DSI and alignment exceptions EAR 282 External access register Used with eciwx and ecowx Note
47. This means an instruction in VIQ1 does not have to wait for an instruction in VIQO that is waiting for operand availability Note that for the MPC7450 double and single precision versions of floating point instructions have the same latency For example a floating point multiply add instruction takes 5 cycles to execute regardless of whether it is single fmadds or double precision fmadd The MPC7450 has independent on chip 32 Kbyte eight way set associative physically addressed L1 level one caches for instructions and data and independent instruction and data memory management units MMUs Each MMU has a 128 entry two way set associative translation lookaside buffer DTLB and ITLB that saves recently used page address translations Block address translation is implemented with the four entry eight entry for the MPC7455 MPC7457 MPC7447 MPC7447A and MPC7448 instruction and data block address translation IBAT and DBAT arrays defined by the PowerPC architecture During block translation effective addresses are compared simultaneously with all BAT entries as described in Chapter 5 Memory Management of the MPC7450 RISC Microprocessor Family User s Manual For information about the L1 caches see Chapter 3 L1 L2 and L3 Cache Operation of the MPC7450 RISC Microprocessor Family User s Manual MPC7450 RISC Microprocessor Family Product Brief Rev 5 2 Freescale Semiconductor MPC7450 Microprocessor Overview
48. W Power management enable MPC7450 specific and optional to the PowerPC architecture 0 Power management is disabled 1 Power management is enabled The processor can enter a power saving mode determined by HIDO NAP SLEEP when additional conditions are met 29 PMM Performance monitor marked mode MPC7450 specific and optional to the PowerPC architecture See Chapter 11 Performance Monitor of the MPC7450 RISC Microprocessor Family User s Manual 0 Process is not a marked process 1 Process is a marked process MSSCRO 1014 Memory subsystem control register Used to configure and operate many aspects of the memory subsystem MSSSRO 1015 Memory subsystem status register Used to configure and operate the parity functions in the L2 and L3 caches for the MPC7450 PIR 1023 Processor identification register Provided for system use MPC7450 does not change PIR contents PMC1 953 954 Performance monitor counter registers PMCn Used to record the number of times a PMC6 3 957 958 certain event has occurred UPMCs provide user level read access to these registers 945 946 PTEHI 981 The PTEHI and PTELO registers are used by the tlbld and tlbli instructions to create a PTELO 982 TLB entry When software table searching is enabled HIDO STEN 1 anda TLB miss exception occurs the bits of the page table entry PTE for this access are located by software and saved in the PTE registers
49. ace is 128 bits the L1 L2 L3 bus interface allows up to 256 bits The LI data cache is fully pipelined to provide 128 bits cycle to or from the VRs L2 cache is fully pipelined to provide 32 bytes per processor clock cycle to the L1 cache In the MPC7448 the L2 cache is pipelined to provide 32 bytes every other clock cycle to the L1 cache As many as eight outstanding out of order cache misses are allowed between the L1 data cache and L2 L3 bus As many as 16 out of order transactions can be present on the MPX bus Store merging for multiple store misses to the same line Only coherency action taken address only for store misses merged to all 32 bytes of a cache block no data tenure needed Support for a second cacheable store miss Three entry finished store queue and five entry completed store queue between the LSU and the L1 data cache Separate additional queues for efficient buffering of outbound data such as castouts and write through stores from the L1 data cache and L2 cache Multiprocessing support features include the following Hardware enforced MESI cache coherency protocols for data cache Load store with reservation instruction pair for atomic memory references semaphores and other multiprocessor operations Power and thermal management The following three power saving modes are available to the system Nap Instruction fetching is halted Only those clocks for the t
50. al BAT registers organized as four pairs of instruction BAT DBAT1U L 538 539 registers IBAT4U IBAT7U paired with IBAT4L IBAT7L and four pairs of data BAT DBAT2U L 540 541 registers DBAT4U DBAT7U paired with DBAT4L DBAT7L are available Thus the DBATSU L 542 543 MPC7455 can define a total of 16 blocks implemented as 32 BAT registers DBAT4U L 4 568 569 Because BAT upper and lower words are loaded separately software must ensure that DBATSU L 970 571 Bat translations are correct during the time that both BAT entries are being loaded DBAT6U L 4 572 573 I DBAT7U L 4 574 575 The MPC7450 implements IBAT G however attempting to execute code from an IBAT area with G 1 causes an ISI exception MPC7450 RISC Microprocessor Family Product Brief Rev 5 38 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation Table 1 Register Summary for MPC7450 continued Name SPR Description ICTC 1019 Instruction cache throttling control register Has bits for enabling instruction cache throttling and for controlling the interval at which instructions are fetched This controls overall junction temperature ICTRL 1011 Instruction cache and interrupt control register Used in configuring interrupts and error reporting for the instruction and data caches L2CR 1017 L2 cache control register Includes bits for enabling parity checking setting the L2 cache
51. alidated and disabled by setting HIDO ICFI and then clearing HIDO ICE The instruction cache can be locked by setting HIDO ILOCK The instruction cache supports only the valid invalid states The data cache provides four words per clock cycle to the LSU Like the instruction cache the data cache can be invalidated all at once or on a per cache block basis The data cache can be invalidated and disabled by setting HIDO DCFI and then clearing HIDO DCE The data cache can be locked by setting HIDO DLOCK The data cache tags are dual ported so a load or store can occur simultaneously with a snoop The MPC7450 also implements a 128 entry 32 set four way set associative branch target instruction cache BTIC The BTIC is a cache of branch instructions that have been encountered in branch loop code sequences If the target instruction is in the BTIC it is fetched into the instruction queue a cycle sooner than it can be made available from the instruction cache Typically the BTIC contains the first four instructions in the target stream The BTIC can be disabled and invalidated through software As with other aspects of MPC7450 instruction timing BTIC operation is optimized for cache line alignment If the first target instruction is one of the first five instructions in the cache block the BTIC entry holds four instructions If the first target instruction is the last instruction before the cache block boundary it is the only instruction in the
52. are in the normal execution state However the flow of instructions can be interrupted directly by the execution of an instruction or by an asynchronous event Either kind of exception may cause one of several components of the system software to be invoked Effective address computations for both data and instruction accesses use 32 bit unsigned binary arithmetic A carry from bit 0 is ignored in 32 bit implementations 3 2 2 AltiVec Instruction Set The AltiVec instructions are divided into the following categories Vector integer arithmetic instructions These include arithmetic logical compare rotate and shift instructions Vector floating point arithmetic instructions These include floating point arithmetic instructions as well as a discussion on floating point modes Vector load and store instructions These include load and store instructions for vector registers The AltiVec technology defines LRU and transient type instructions that can be used to optimize memory accesses LRU instructions The AltiVec architecture specifies that the Ivxl and stvxl instructions differ from other AltiVec load and store instructions in that they leave cache entries in a least recently used LRU state instead of a most recently used state Transient instructions The AltiVec architecture describes a difference between static and transient memory accesses A static memory access should have some reasonable degree of locality a
53. bld instructions are used in reloading the TLBs during a software table search operation The following exceptions support software table searching if HIDO STEN is set and a TLB miss occurs For an instruction fetch an ITLB miss exception MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 49 MPC7450 Microprocessor Architectural Implementation Fora data load an DTLB miss on load exception Fora data store an DTLB miss on store exception The MPC7450 implements the optional TLB invalidate entry tlbie and TLB synchronize tlbsync instructions that can be used to invalidate TLB entries For more information about the tlbie and tlbsync instructions see Section 5 4 4 2 TLB Invalidation 3 6 Instruction Timing This section describes how the MPC7450 microprocessor performs operations defined by instructions and reports the results of instruction execution The MPC7450 design minimizes average instruction execution latency which is the number of clock cycles it takes to fetch decode dispatch issue and execute instructions and make results available for subsequent instructions Some instructions such as loads and stores access memory and require additional clock cycles between the execute phase and the write back phase Latencies depend on whether an access is to cacheable or noncacheable memory whether it hits in the L1 L2 or L3 cache whether a cache access generates a write back to
54. bytes to 256 Mbytes increases block size Block size range 128 Kbytes to 4 Gbytes 6 Differences Between MPC7441 MPC7451 and MPC7447 MPC7457 Table 6 compares the key differences between the MPC7451 and the MPC7457 The table provides the section number where the details of the differences are discussed Differences between the two processors are defined throughout the MPC7450 RISC Microprocessor Family User s Manual Table 4 provides a high level overview of the differences Table 6 shows these differences Table 6 MPC7451 and MPC7457 Differences Microarchitectural Feature MPC7441 MPC7451 MPC7447 MPC7457 L2 Cache Cache level L2 L2 Size associativity 256 Kbyte 8 way 512 Kbyte 8 way MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 57 Differences Between MPC7447 and MPC7447A Table 6 MPC7451 and MPC7457 Differences continued Microarchitectural Feature MPC7441 MPC7451 MPC7447 NPC7457 Access width 256 bits 256 bits Number of 32 byte sectors line 2 2 Parity Byte Byte Off Chip Cache Support Cache level L3 L3 On chip tag logical size 1 Mbyte 2 Mbytes 1 Mbyte 2 Mbytes 4 Mbytes Associativity 8 way 8 way Number of 32 byte sectors line 2 2 Off chip data SRAM support MSUG2 DDR LW PB2 MSUG2 DDR LW PB2 Data path width 64 bits 64 bits Private memory SRAM sizes 1 Mbyte
55. caches Supports parity generation and checking for both tags and data enabled through L3CR Same choice of two random replacement algorithms used by L2 cache selectable through L3CR Configurable core to L3 frequency divisors 64 bit external L3 data bus sustains 64 bits per L3 clock cycle Supports MSUG2 dual data rate DDR synchronous burst SRAMs PB2 pipelined synchronous burst SRAMs and pipelined register register late write synchronous burst SRAMs Separate memory management units MMUs for instructions and data 52 bit virtual address 32 or 36 bit physical address Address translation for 4 Kbyte pages variable sized blocks and 256 Mbyte segments Memory programmable as write back write through caching inhibited caching allowed and memory coherency enforced memory coherency not enforced on a page or block basis Separate IBATs and DBATs four each also defined as SPRs Eight IBATs and eight DBATs in the MPC7455 MPC7445 MPC7457 MPC7447 MPC7447A and MPC7448 Separate instruction and data translation lookaside buffers TLBs Both TLBs are 128 entry two way set associative and use LRU replacement algorithm TLBs are hardware or software reloadable that is on a TLB miss a page table search is performed in hardware or by system software MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 11 MPC7450 Microprocessor Features Efficient data flow Although the VR LSU interf
56. cal address bits to the cache and the cache lookup completes For caching inhibited accesses or accesses that miss in the cache the untranslated lower order MPC7450 RISC Microprocessor Family Product Brief Rev 5 16 Freescale Semiconductor MPC7450 Microprocessor Features address bits are concatenated with the translated higher order address bits the resulting 32 or 36 bit physical address is used by the memory subsystem and the bus interface unit which accesses external memory The TLBs store page address translations for recent memory accesses For each access an effective address is presented for page and block translation simultaneously If a translation is found in both the TLB and the BAT array the block address translation in the BAT array is used Usually the translation is in a TLB and the physical address is readily available to the on chip cache When a page address translation is not in a TLB hardware or system software searches for one in the page table following the model defined by the PowerPC architecture Instruction and data TLBs provide address translation in parallel with the on chip cache access incurring no additional time penalty in the event of a TLB hit The MPC7450 instruction and data TLBs are 128 entry two way set associative caches that contain address translations The MPC7450 can initiate a hardware or system software search of the page tables in memory on a TLB miss 2 4 On Chip L1 Instruction and Data
57. causes an illegal instruction exception TLBMISS 980 The TLBMISS register is automatically loaded when software searching is enabled HIDO STEN 1 and a TLB miss exception occurs Its contents are used by the TLB miss exception handlers the software table search routines to start the search process 1 MPC7441 MPC7445 MPC7447 MPC7447A MPC7448 MPC7451 MPC7455 and MPC7457 specific register that may not be supported on other processors that implement the PowerPC architecture Register is defined by the AltiVec technology Defined as optional register in the PowerPC architecture 4 MPC7445 MPC7447 MPC7447A MPC7448 MPC7455 and MPC7457 specific register MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 41 MPC7450 Microprocessor Architectural Implementation 5 MPC7448 specific register MPC7451 MPC7455 and MPC7457 specific register 7 MPC7457 specific register 3 2 Instruction Set All PowerPC instructions are encoded as single word 32 bit opcodes Instruction formats are consistent among all instruction types permitting efficient decoding to occur in parallel with operand accesses This fixed instruction length and consistent format greatly simplifies instruction pipelining For more information see Chapter 2 Programming Model of the MPC7450 RISC Microprocessor Family User s Manual 3 2 1 PowerPC Instruction Set The PowerPC instructions
58. corresponding BTIC entry If the next to last instruction in acache block is the target the BTIC entry holds two valid target instructions as shown in Figure 4 MPC7450 RISC Microprocessor Family Product Brief Rev 5 18 Freescale Semiconductor Instruction Cache Block BTIC Entry Instruction Cache Block BTIC Entry Branch Target MPC7450 Microprocessor Features TO T1 T2 T3 T4 T5 T6 T7 a T2 T3 T4 T5 Branch Target TO T1 T2 T3 T4 T5 y T7 lt T6 T7 Figure 4 Alignment of Target Instructions in the BTIC BTIC ways are updated using a FIFO algorithm For more information and timing examples showing cache hit and cache miss latencies see Chapter 6 Instruction Timing of the MPC7450 RISC Microprocessor Family User s Manual 2 5 L2 Cache Implementation The L2 cache is a unified cache that receives memory requests from both the L1 instruction and data caches independently The integrated L2 cache on the MPC7450 is a unified containing both instructions and data 256 Kbyte on chip cache In the MPC7447 MPC7457 and MPC7447A the L2 cache has been increased to 512 Kbyte on chip cache In the MPC7448 the L2 cache is 1 Mbyte It is eight way set associative and organized with 32 byte blocks and two blocks line Each line consists of 64 bytes of data organized as two blocks also called sectors Although all 16 words
59. dered the MPC7450 s master pipeline The FPU LSU IU2 VIU2 VFPU and VPU are multiple stage pipelines MPC7450 RISC Microprocessor Family Product Brief Rev 5 50 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation The MPC7450 contains the following execution units Branch processing unit BPU Three integer unit 1s Ula IU1b and U1c execute all integer instructions except multiply divide and move to from SPR instructions Integer unit 2 U2 executes miscellaneous instructions including the CR logical operations integer multiplication and division instructions and move to from special purpose register instructions 64 bit floating point unit FPU Load store unit LSU The AltiVec unit contains the following four independent execution units for vector computations the latencies are shown in Chapter 7 AltiVec Technology Implementation AltiVec permute unit VPU AltiVec integer unit 1 VIU1 Vector integer unit 2 VIU2 Vector floating point unit VFPU A maximum of two AltiVec instructions can be issued in order to any combination of AltiVec execution units per clock cycle In the MPC7448 a maximum of two AltiVec instructions can be issued out of order to any combination of AltiVec execution units per clock cycle from the bottom two VIQ entries VIQI VIQO An instruction in VIQ1 does not have to wait for an instruction in VIQO that is waiting for ope
60. e SPRG3 SPRG4_ 276 279 The SPRG4 7 provide additional registers to be used by system software for software SPRG7 table searching SRO SR15 Segment registers SRn Note that the MPC7450 implements separate instruction and data MMUs It associates architecture defined SRs with the data MMU It reflects SRs values in separate shadow SRs in the instruction MMU SRRO 26 Machine status save restore registers SRRn Used to save the address of the SRR1 27 instruction at which execution continues when rfi executes at the end of an exception handler routine SRR1 is used to save machine status on exceptions and to restore machine status when rfi executes Implementation Note When a machine check exception occurs the MPC7450 sets one or more error bits in SRR1 Refer to the individual exceptions for individual SRR1 bit settings SVR 286 System version register Read only register provided for future product compatibility TBL 284 Time base A 64 bit structure two 32 bit registers that maintains the time of day and TBU 285 operating interval timers The TB consists of two registers time base upper TBU and For writing time base lower TBL The time base registers can be written to only by supervisor level software TBL SPR 284 and TBU SPR 285 can only be written to and not read from TBL and TBU can be written to with the move to special purpose register mtspr instruction Implementation Note Reading from SPR 284 or 285
61. echnology Implementation of the MPC7450 RISC Microprocessor Family User s Manual provides complete details 3 1 PowerPC Registers and Programming Model The PowerPC architecture defines register to register operations for most computational instructions Source operands for these instructions are accessed from the registers or are provided as immediate values embedded in the instruction opcode The three register instruction format allows specification of a target register distinct from the two source operands Load and store instructions transfer data between registers and memory The PowerPC architecture also defines two levels of privilege supervisor mode of operation typically used by the operating system and user mode of operation used by the application software The programming models incorporate 32 GPRs 32 FPRs SPRs and several miscellaneous registers The AltiVec extensions to the PowerPC architecture augment the programming model with 32 VRs one status and control register and one save and restore register Each processor that implements the PowerPC architecture also has a unique set of implementation specific registers to support functionality that may not be defined by the PowerPC architecture Having access to privileged instructions registers and other resources allows the operating system to control the application environment providing virtual memory and protecting operating system and critical machine resources Inst
62. ectural state when the MPC7450 must recover from a mispredicted branch or any exception An instruction is retired as it is removed from the CQ For a more detailed discussion of instruction completion see Chapter 6 Instruction Timing of the MPC7450 RISC Microprocessor Family User s Manual MPC7450 RISC Microprocessor Family Product Brief Rev 5 14 Freescale Semiconductor MPC7450 Microprocessor Features 2 2 4 Independent Execution Units In addition to the BPU the MPC7450 provides the ten execution units described in the following sections 2 2 4 1 AltiVec Vector Permute Unit VPU The VPU execute permutation instructions such as pack unpack merge splat and permute on vector operands 2 2 4 2 AltiVec Vector Integer Unit 1 VIU1 The VIU1 executes simple vector integer computational instructions such as addition subtraction maximum and minimum comparisons averaging rotation shifting comparisons and Boolean operations 2 2 4 3 AltiVec Vector Integer Unit 2 VIU2 The VIU2 executes longer latency vector integer instructions such as multiplication multiplication addition and sum across with saturation 2 2 4 4 AltiVec Vector Floating Point Unit VFPU The VFPU executes all vector floating point instructions A maximum of two AltiVec instructions can be issued in order to any combination of AltiVec execution units per clock cycle In the MPC7448 a maximum of two AltiVec instructions can be issued out of
63. egisters MMCRn Enable various performance monitor MMCR1 8 956 exception functions UMMCRO UMMCR2 provide user level read access to these MMCR2 944 registers MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 39 MPC7450 Microprocessor Architectural Implementation Table 1 Register Summary for MPC7450 continued Name SPR Description MSR Machine state register Defines the processor state The MSR can be modified by the mtmsr sc and rfi instructions It can be read by the mfmsr instruction When an exception is taken MSR contents are saved to SRR1 See Section 4 2 MPC7450 Exception Recognition and Priorities The following bits are optional in the PowerPC architecture Note that setting MSR EE masks decrementer and external interrupt exceptions and MPC7450 specific system management and performance monitor exceptions Bit Name Description 6 VEC_ AltiVec available MPC7450 and AltiVec technology specific optional to the PowerPC architecture 0 AltiVec technology is disabled 1 AltiVec technology is enabled Note When a non stream AltiVec instruction accesses VRs or the VSCR when VEC 0 an AltiVec unavailable exception is generated This does not occur for data streaming instructions dst t dstst t and dss the VRs and the VSCR are available to data streaming instructions even if VEC 0 VRSAVE can be accessed even if VECp 0 13 PO
64. eline and the VIU2 and VFPU have four stage pipelines As many as 10 AltiVec instructions can be executing concurrently The complete and write back stages maintain the correct architectural machine state and commit results to the architected registers in the proper order If completion logic detects an instruction containing an exception status all following instructions are cancelled their execution results in rename buffers are discarded and the correct instruction stream is fetched MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 53 Differences Between MPC7450 and MPC7400 MPC7410 The complete stage ends when the instruction is retired Three instructions can be retired per clock cycle If no dependencies exist as many as three instructions are retired in program order Section 6 7 4 Completion Unit Resource Requirements describes completion dependencies The write back stage occurs in the clock cycle after the instruction is retired 3 7 AltiVec Implementation The MPC7450 implements the AltiVec registers and instruction set as they are described in the AltiVec Technology Programming Environments Manual in Chapter 2 AltiVec Register Set and in Chapter 6 AltiVec Instructions Two additional implementation specific exceptions have been added they are as follows The AltiVec assist exception which is used in handling denormalized numbers in Java mode An alignment
65. es identically to the MPC7455 except that it does not support the L3 cache and the L3 cache interface This document also describes the functionality of the MPC7445 All information herein applies to the MPC7445 except where otherwise noted in particular the L3 cache information does not apply to the MPC7445 1 5 MPC7457 Microprocessor Overview The MPC7457 operates similarly to the MPC7455 However the following changes are visible to the programmer or system designer These changes include e Larger L2 cache 512 Kbytes e Additional support for L3 private memory size 4 Mbytes e An additional L3_ADDR signal L3_ADDR 18 Modifications to bits in the L3 control register L3CR All information that applies to the MPC7455 also complies to the MPC7457 except where otherwise noted in particular the increased L2 cache and the additional L3 cache support is new for the MPC7457 MPC7450 RISC Microprocessor Family Product Brief Rev 5 6 Freescale Semiconductor MPC7450 Microprocessor Overview 1 6 MPC7447 Microprocessor Overview The MPC7447 is a lower pin count device that operates identically to the MPC7457 except that it does not support the L3 cache and the L3 cache interface This document also describes the functionality of the MPC7447 All information herein applies to the MPC7447 except where otherwise noted in particular the L3 cache information does not apply to the MPC7447 1 7 MPC7447A Microprocessor Overview
66. es in MPX mode Four beat burst 32 byte data transfers Initiated when an entire cache block is transferred into or out of the internal caches Because the first level caches on the MPC7450 are write back caches burst read memory operations are the most common memory accesses followed by burst write memory operations and single beat caching inhibited or write through memory read and write operations Memory accesses can occur in single beat 1 2 3 4 and 8 bytes double beat 16 bytes and four beat 32 bytes burst data transfers For memory accesses the address and data buses are independent to support pipelining and split transactions The bus interface can pipeline as many as 16 transactions and in MPX bus mode supports full out of order split bus transactions The MPC7450 bursts out of reset in MPX bus mode fetching eight instructions on the MPX bus at a time Access to the system interface is granted through an external arbitration mechanism that allows devices to compete for bus mastership This arbitration mechanism is flexible allowing the MPC7450 to be integrated into systems that implement various fairness and bus parking procedures to avoid arbitration overhead Typically memory accesses are weakly ordered to maximize the efficiency of the bus without sacrificing coherency of the data The MPC7450 allows load operations to bypass store operations except when a dependency exists Because the processor can dynamically
67. esolved immediately For branches in which only the direction is unresolved due to a CR or CTR dependency the branch path is predicted using either architecture defined static branch prediction or MPC7450 specific dynamic branch prediction Dynamic branch prediction is enabled if HIDO BHT is set For belr branches where the target address is unresolved due to a LR dependency the branch target can be predicted using the hardware link stack Link stack prediction is enabled if HIDO LRSTK is set When a prediction is made instruction fetching dispatching and execution continue from the predicted path but instructions cannot complete and write back results to architected registers until the prediction is determined to be MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 13 MPC7450 Microprocessor Features correct resolved When a prediction is incorrect the instructions from the incorrect path are flushed from the processor and processing begins from the correct path Dynamic prediction is implemented using a 2048 entry branch history table BHT a cache that provides two bits per entry that together indicate four levels of prediction for a branch instruction not taken strongly not taken taken strongly taken When dynamic branch prediction is disabled the BPU uses a bit in the instruction encoding to predict the direction of the conditional branch Therefore when an unresolved conditional branch instr
68. g to maintain L1 data cache L2 and L3 cache coherency for multiprocessing applications and DMA environments Support for address only transfers useful for a variety of broadcast operations in multiprocessor applications Address pipelining Support for up to 16 out of order transactions using 4 data transaction index DTI 0 3 signals Full data streaming Support for data intervention in multiprocessor systems MPC7450 RISC Microprocessor Family Product Brief Rev 5 22 Freescale Semiconductor MPC7450 Microprocessor Features 2 8 2 60x Bus Features The following list summarizes the 60x bus interface features Extended 36 bit address bus plus 5 bits of odd parity 41 bits total 64 bit data bus plus 8 bits of odd parity 72 bits total a 32 bit data bus mode is not supported Support for a four state MESI cache coherence protocol On chip snooping to maintain L1 data cache L2 and L3 cache coherency for multiprocessing applications and DMA environments Support for address only transfers useful for a variety of broadcast operations in multiprocessor applications e Address pipelining Support for up to 16 outstanding transactions No reordering is supported 2 9 Overview of System Interface Accesses The system interface includes address register queues prioritization logic and a bus control unit The system interface latches snoop addresses for snooping in the L1 data L2 and L3 caches
69. gram and tables showing pin numbers are included in the hardware specifications Note that the left side of each figure depicts the signals that implement the MPX bus protocol and the right side of each figure shows the remaining signals on the MPC7450 not part of the bus protocol MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 25 MPC7450 Microprocessor Features Address Arbitration Address Transfer Address Transfer Attributes Address Transfer Termination Data Arbitration Data Transfer Data Transfer Termination ja L3_ADDRI17 0 4 lt 3 DATA 0 63 L3_DP 0 7 2 BR g le 3 DPD0 71 BG T LS_VSEL L3_CLKI0 1 L3_ECHO_CLK 0 3 D DA il Al L3 CNTLI0 1 lt 7 gt 5 2 S 2 To 1 lt INE TT z ak a lt 0 4 gt 5 1 lt MCP B TBST TSIZ 0 2 1 4 lt SRESET lt HRESET lt GBL gt 1 i lt aT lt CKSTP_IN aie CKSTP_OUT lt 1 1 gt TBEN NAT MPC7450 1 lt AACK MPC7451 QREQ a ARTRY MPC7441 QACK SHDO SHDT MPC7455 lt BysEL lt gt 2 MPC7445 1 kx HIT 1 MPX gt e BMODE 0 1 e PMON_IN 7 PMON_OUT DBG 1 gt DTHI03 4 lt SYSCLK gt RADY PLL CFG 0 4 lt DRDY 5 lt _CFG 0 4 4 lt _EXT_QUAL CLK_OUT 2 D063 o4 1 DP 0 7 lt gt 8 y a TGK lt TPI TA TD
70. he instruction at the appropriate vector offset is fetched and the exception handler routine begins executing in supervisor mode e Handling Exception handling is performed by the software at the appropriate vector offset Exception handling is begun in supervisor mode The term interrupt describes the external interrupt the system management interrupt and sometimes the asynchronous exceptions Note that the PowerPC architecture uses the word exception to refer to IEEE defined floating point exception conditions that may cause a program exception to be taken see Section 4 6 7 Program Exception 0x00700 The occurrence of these IEEE exceptions may or may not cause an exception to be taken ITEEE defined exceptions are referred to as IEEE floating point exceptions or floating point exceptions 3 4 2 MPC7450 Microprocessor Exceptions As specified by the PowerPC architecture exceptions can be either precise or imprecise and either synchronous or asynchronous Asynchronous exceptions are caused by events external to the processor s execution synchronous exceptions are caused by instructions The types of exceptions are shown in Table 2 Note that all exceptions except for the performance monitor AltiVec unavailable instruction address breakpoint system management AltiVec assist and the three software table search exceptions are described in Chapter 6 Exceptions in The Programming Environments Manual Table 2 MP
71. he same This document describes the functionality of the MPC7450 and any differences in data regarding bus timing signal behavior and AC DC and thermal characteristics can be found in the hardware specifications 1 2 MPC7441 Microprocessor Overview The MPC7441 is a lower pin count device that operates identically to the MPC7451 except that it does not support the L3 cache and the L3 cache interface This document also describes the functionality of the MPC7441 All information herein applies to the MPC7441 except where otherwise noted in particular the L3 cache information does not apply to the MPC7441 1 3 MPC7455 Microprocessor Overview The MPC7455 operates similarly to the MPC7451 However the following changes are visible to the programmer or system designer These changes include e Four additional IBAT and four additional DBAT registers e Additional HIDO bits HIDO HIGH_BAT_EN and HIDO XBSEN Four additional SPRG registers The additional IBATs and DBATs provide mapping for more regions of memory For more information on new features see Chapter 5 Memory Management Unit of the MPC7450 RISC Microprocessor Family User s Manual The SPRGs provide additional registers to be used by system software for table software searching If the SPRGs are not used for software table searches they can be used by other supervisor programs 1 4 MPC7445 Microprocessor Overview The MPC7445 is a lower pin count device that operat
72. ilable from the instruction cache Typically a fetch that hits the BTIC provides the first 4 instructions in the target stream 2048 entry branch history table BHT with 2 bits per entry for four levels of prediction not taken strongly not taken taken strongly taken Upto three outstanding speculative branches Branch instructions that do not update the count register CTR or link register LR are often removed from the instruction stream Eight entry link register stack to predict the target address of Branch Conditional to Link Register belr instructions Four integer units Us that share 32 GPRs for integer operands Three identical Us IU 1a IU1b and IU1 can execute all integer instructions except multiply divide and move to from special purpose register instructions U2 executes miscellaneous instructions including the CR logical operations integer multiplication and division instructions and move to from special purpose register instructions 64 bit floating point unit FPU Five stage FPU Fully IEEE 754 1985 compliant FPU for both single and double precision operations Supports non IEEE mode for time critical operations Hardware support for denormalized numbers Thirty two 64 bit FPRs for single or double precision operands MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor MPC7450 Microprocessor Features Four vector units and
73. ime base decrementer and JTAG logic remain running The part goes into the doze state to snoop memory operations on the bus and then back to nap using a QREQ QACK processor system handshake protocol Sleep Power consumption is further reduced by disabling bus snooping leaving only the PLL ina locked and running state All internal functional units are disabled Deep sleep When the part is in the sleep state the system can disable the PLL The system can then disable the SYSCLK source for greater system power savings Power on reset procedures for restarting and relocking the PLL must be followed upon exiting the deep sleep state Inthe MPC7447A and MPC7448 DFS dynamic frequency switching conserves power by lowering processor operating frequency The MPC7447A has the ability to divide the processor to system bus ratio by two during normal functional operation The MPC7448 has the additional ability to divide by four Instruction cache throttling provides control of instruction fetching to limit device temperature Performance monitor can be used to help debug system designs and improve software efficiency In system testability and debugging features through JTAG boundary scan capability Reliability and serviceability Parity checking on system bus and L3 cache bus Parity checking on L1 L2 and L3 cache arrays MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor MPC7450
74. ime the memory system takes to respond to the request Instructions retrieved are latched into the instruction queue IQ for subsequent consideration by the dispatcher Instruction fetch timing depends on many variables such as whether an instruction is in the branch target instruction cache BTIC the on chip instruction cache or the L2 or L3 cache Those factors increase when it is necessary to fetch instructions from system memory and include the processor to bus clock ratio the amount of bus traffic and whether any cache coherency operations are required The decode dispatch stage fully decodes each instruction most instructions are dispatched to the issue queues branch isync rfi and sc instructions do not go to issue queues MPC7450 RISC Microprocessor Family Product Brief Rev 5 52 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation The three issue queues FIQ VIQ and GIQ can accept as many as one two and three instructions respectively in a cycle Instruction dispatch requires the following Instructions are dispatched only from the three lowest IQ entries IQO IQ1 and IQ2 A maximum of three instructions can be dispatched to the issue queues per clock cycle Space must be available in the CQ for an instruction to dispatch this includes instructions that are assigned a space in the CQ but not an issue queue e The issue stage reads source operands from rename regi
75. ion and report their results To prevent loss of state information exception handlers must save the information stored in the machine status save restore registers SRRO and SRR1 soon after the exception is taken to prevent this information from being lost due to another exception event Because exceptions can occur while an exception handler routine is executing multiple exceptions can become nested It is the exception handler s responsibility to save the necessary state information if control is to return to the excepting program In many cases after the exception handler handles an exception there is an attempt to execute the instruction that caused the exception Instruction execution continues until the next exception condition is encountered MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 45 MPC7450 Microprocessor Architectural Implementation Recognizing and handling exception conditions sequentially guarantees that the machine state is recoverable and processing can resume without losing instruction results The following terms are used to describe the stages of exception processing recognition taken and handling Recognition Exception recognition occurs when the condition that can cause an exception is identified by the processor Taken An exception is said to be taken when control of instruction execution is passed to the exception handler that is the context is saved and t
76. ions and from external signals bus errors or various internal conditions When exceptions occur information about the state of the processor is saved to certain registers and the processor begins execution at an address exception vector predetermined for each exception Processing of exceptions begins in supervisor mode Although multiple exception conditions can map to a single exception vector often a more specific condition may be determined by examining a register associated with the exception for example the DSISR and the floating point status and control register FPSCR Also software can explicitly enable or disable some exception conditions The PowerPC architecture requires that exceptions be taken in program order therefore although a particular implementation may recognize exception conditions out of order they are handled strictly in order with respect to the instruction stream When an instruction caused exception is recognized any unexecuted instructions that appear earlier in the instruction stream including any that have not yet entered the execute state are required to complete before the exception is taken In addition if a single instruction encounters multiple exception conditions those exceptions are taken and handled sequentially Likewise exceptions that are asynchronous and precise are recognized when they occur but are not handled until all instructions currently in the execute stage successfully complete execut
77. itecture Floating point 0x00800 As specified in the PowerPC architecture unavailable Decrementer 0x00900 As defined by the PowerPC architecture when the msb of the DEC register changes from 0 to 1 and MSR EE 1 Reserved 0x00A00 00BFF System call 0x00C00 Execution of the System Call sc instruction Trace 0x00D00 MSR SE 1 or a branch instruction is completing and MSR BE 1 The MPC7451 operates as specified in the OEA by taking this exception on an isync MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 47 MPC7450 Microprocessor Architectural Implementation Table 3 Exceptions and Conditions continued Exception Type Vector Offset Causing Conditions Reserved 0x00E00 The e600 core does not generate an exception to this vector Other processors that implement the PowerPC architecture may use this vector for floating point assist exceptions Reserved 0x00E10 00EFF Performance 0x00F00 The limit specified in PMCn is met and MMCRO ENINT 1 e600 specific monitor AltiVec 0x00F20 Occurs due to an attempt to execute any non streaming AltiVec instruction unavailable when MSR VEC 0 This exception is not taken for data streaming instructions dstx dss or dssall e600 specific ITLB miss 0x01000 An instruction translation miss exception is caused when HIDO STEN 1 and the effective address for an instruction fetch cannot be translated by
78. le Semiconductor MPC7450 Microprocessor Features Address transfer termination These signals are used to acknowledge the end of the address phase of the transaction They also indicate whether a condition exists that requires the address phase to be repeated Data arbitration The MPC7450 uses these signals to arbitrate for data bus mastership Data transfer These signals which consist of the data bus and data parity signals are used to transfer the data and to ensure the integrity of the transfer Data transfer termination Data termination signals are required after each data beat in a data transfer In a single beat transaction data termination signals also indicate the end of the tenure In burst accesses data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat Data termination signals also indicate whether a condition exists that requires the data phase to be repeated Many other MPC7450 signals control and affect other aspects of the device aside from the bus protocol They are as follows L3 cache address data The MPC7450 has separate address and data buses for accessing the L3 cache Note that the L3 cache interface is not supported by the MPC7441 MPC7445 MPC7447 MPC7447A and MPC7448 L3 cache clock control These signals provide clocking and control for the L3 cache Note that the L3 cache interface is not supported by the MPC7441 MPC7445 MPC7447 MPC7447A
79. lied copyright licenses granted hereunder to design or fabricate any integrated circuits or integrated circuits based on the information in this document Freescale Semiconductor reserves the right to make changes without further notice to any products herein Freescale Semiconductor makes no warranty representation or guarantee regarding the suitability of its products for any particular purpose nor does Freescale Semiconductor assume any liability arising out of the application or use of any product or circuit and specifically disclaims any and all liability including without limitation consequential or incidental damages Typical parameters which may be provided in Freescale Semiconductor data sheets and or specifications can and do vary in different applications and actual performance may vary over time All operating parameters including Typicals must be validated for each customer application by customer s technical experts Freescale Semiconductor does not convey any license under its patent rights nor the rights of others Freescale Semiconductor products are not designed intended or authorized for use as components in systems intended for surgical implant into the body or other applications intended to support or sustain life or for any other application in which the failure of the Freescale Semiconductor product could create a situation where personal injury or death may occur Should Buyer purchase or use Freescale Semicond
80. memory operations on the bus and then back to nap using a QREQ QACK processor system handshake protocol Sleep Power consumption is further reduced by disabling bus snooping leaving only the PLL in a locked and running state All internal functional units are disabled Deep sleep The system can disable the PLL The system can then disable the SYSCLK source for greater system power savings Power on reset procedures for restarting and relocking the PLL must be followed upon exiting deep sleep The dynamic frequency switching DFS feature in the MPC7447A conserves power by lowering processor operating frequency The MPC7447A adds the ability to divide the processor to system bus ratio by two during normal functional operation With the introduction of DFS4 mode in the MPC7448 the processor to system bus ratio can also be divided by four Chapter 10 Power and Thermal Management in the MPC7450 RISC Microprocessor Family User s Manual provides information on power saving with DFS in the MPC7447A and the MPC7448 The MPC7450 also provides an instruction cache throttling mechanism to effectively reduce the instruction execution rate without the complexity and overhead of dynamic clock control When used with the dynamic power management instruction cache throttling provides the system designer with a flexible way to control device temperature while allowing the processor to continue operating For thermal management the MPC7450
81. mory accesses can occur out of order Synchronizing instructions can be used to enforce strict ordering When there are no data dependencies and the guarded bit for the page or block is cleared a maximum of one out of order cacheable load operation can execute per clock cycle from the perspective of the LSU Loads to FPRs require a 4 cycle total latency Data returned from the cache is held in a rename register until the completion logic commits the value to a GPR FPR or VR Stores cannot be executed out of order and are held in the store queue until the completion logic signals that the store operation is to be completed to memory The MPC7450 executes store instructions with a maximum throughput of one per clock cycle and a 3 cycle total latency to the data cache The time required to perform the load or store operation depends on the processor bus clock ratio and whether the operation involves the on chip caches the L3 cache system memory or an I O device 2 3 Memory Management Units MMUs The MPC7450 s MMUs support up to 4 Petabytes 252 of virtual memory and 64 Gigabytes 236 of physical memory for instructions and data The MMUs control access privileges for these spaces on block and page granularities Referenced and changed status is maintained by the processor for each page to support demand paged virtual memory systems The memory management units are contained within the load store unit The LSU calculates effective addresses for da
82. n the architecture specification for memory accesses and I O accesses I O accesses are assumed to be memory mapped In addition the MMU provides access protection on a segment block or page basis Note that the MPC7450 does not implement the optional direct store facility MPC7450 RISC Microprocessor Family Product Brief Rev 5 48 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation Two general types of memory accesses generated by processors that implement the PowerPC architecture require address translation instruction accesses and data accesses generated by load and store instructions In addition the addresses specified by cache instructions and the optional external control instructions also require translation Generally the address translation mechanism is defined in terms of the segment descriptors and page tables that the processors use to locate the effective to physical address mapping for memory accesses The segment information translates the effective address to an interim virtual address and the page table information translates the virtual address to a physical address The segment descriptors used to generate the interim virtual addresses are stored as on chip segment registers on 32 bit implementations such as the MPC7450 In addition two translation lookaside buffers TLBs are implemented on the MPC7450 to keep recently used page address translations on chip Although the PowerPC OEA de
83. nd be referenced several times or reused over some reasonably long period of time A transient memory reference has poor locality and is likely to be referenced a very few times or over a very short period of time MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 43 MPC7450 Microprocessor Architectural Implementation The following instructions are interpreted to be transient dstt and dststt transient forms of the two data stream touch instructions lvxl and stvxl Vector permutation and formatting instructions These include pack unpack merge splat permute select and shift instructions described in Section 2 5 5 Vector Permutation and Formatting Instructions Processor control instructions These instructions are used to read and write from the AltiVec status and control register described in Section 2 3 4 6 Processor Control Instructions UISA Memory control instructions These instructions are used for managing of caches user level and supervisor level described in Section 2 3 5 3 Memory Control Instructions VEA 3 2 3 MPC7450 Microprocessor Instruction Set The MPC7450 instruction set is defined as follows The MPC7450 provides hardware support for all 32 bit PowerPC instructions The MPC7450 implements the following instructions optional to the PowerPC architecture External Control In Word Indexed eciwx External Control Out Wo
84. ne _ TEA 4 1 lt 1 lt _TRST Vpp L AVpp L GVpp OVpp GND F L3 Cache Address Data Note L3 cache interface is not supported in the MPC7441 or the MPC7445 L3 Cache Clock Control Interrupts Resets Processor Status Control Clock Control Test Interface JTAG Figure 7 MPX Bus Signal Groups in the MPC7450 MPC7451 MPC7441 MPC7455 and MPC7445 MPC7450 RISC Microprocessor Family Product Brief Rev 5 26 Freescale Semiconductor MPC7450 Microprocessor Features Figure 8 illustrates the signal configuration in MPX bus mode for the MPC7447 and the MPC7457 Address Arbitration Address Transfer Address Transfer Attributes Address Transfer Termination Data Arbitration Data Transfer Data Transfer Termination 1 13 __LS_ADDRIA7 0 L3_DATA O ali ec lt BR 1 8 lt BG i 1 lt L3 VSEL gt 13 CLKI0 1 z A 0 35 36 ra L3_ECHO_CLK 0 3 L3 CNTLI0 1 Z AP 04 5 Z 0 1 TS INT lt TS 1 1 lt SMI 2 TT I 5 plz ag TBST ila MCP TSIZ 0 2 lt Sebang k SRESET lt GB lt HRESET ull CKSTP_IN al li s lt cl POT
85. nit 2 VIU2 performs longer latency integer calculations Vector floating point unit VFPU The ability to execute several instructions in parallel and the use of simple instructions with rapid execution times yield high efficiency and throughput for MPC7450 based systems Most integer instructions including VIU1 instructions have a one clock cycle execution latency Several execution units feature multiple stage pipelines that is the tasks they perform are broken into subtasks executed in successive stages Typically instructions follow one another through the stages so a four stage unit can work on four instructions when its pipeline is full So although an instruction may have to pass through several stages the execution unit can achieve a throughput of one instruction per clock cycle AltiVec computational instructions are executed in four independent pipelined AltiVec execution units A maximum of two AltiVec instructions can be issued in order to any combination of AltiVec execution units per clock cycle Moreover the VIU2 VFPU and VPU are pipelined so they can operate on multiple instructions The VPU has a two stage pipeline the VIU2 and VFPU each have four stage pipelines As many as ten AltiVec instructions can be executing concurrently In the MPC7448 a maximum of two AltiVec instructions can be issued out of order to any combination of AltiVec execution units per clock cycle from the bottom two VIQ entries VIQI VIQ0O
86. of the architecture described above For more information about the PowerPC architecture see Programming Environments Manual for 32 Bit Implementations of the PowerPC Architecture Specific MPC7450 features are listed in Chapter 1 Overview of the MPC7450 RISC Microprocessor Family User s Manual This section describes the PowerPC architecture in general and specific details about the implementation of the MPC7450 as a low power 32 bit device that implements this architecture The structure of this section follows the user s manual organization Each subsection provides an overview of that chapter Registers and programming model Describes the registers for the operating environment architecture common among processors of this family and describes the programming model It also describes the registers that are unique to the MPC7450 Instruction set and addressing modes Describes the PowerPC instruction set and addressing modes for the PowerPC operating environment architecture and defines and describes the PowerPC instructions MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 31 MPC7450 Microprocessor Architectural Implementation implemented in the MPC7450 The information in this section is described more fully in Chapter 2 Programming Model of the MPC7450 RISC Microprocessor Family User s Manual Cache implementation Describes the cache model that is defined generally by the
87. optimize run time ordering of load store traffic overall performance is improved Note that the synchronize sync and enforce in order execution of I O eieio instructions can be used to enforce strong ordering The system interface is synchronous All MPC7450 inputs are sampled and all outputs are driven on the rising edge of the bus clock cycle The hardware specifications gives timing information The system interface is specific for each microprocessor that implements the PowerPC architecture 2 9 2 Signal Groupings Signals are provided for implementing the bus protocol clocking and control of the L3 caches as well as separate L3 address and data buses Test and control signals provide diagnostics for selected internal circuits The MPC7450 MPX and 60x bus interface protocol signals are grouped as follows Address arbitration The MPC7450 uses these signals to arbitrate for address bus mastership e Address transfer start These signals indicate that a bus master has begun a transaction on the address bus e Address transfer These signals include the address bus and address parity signals They are used to transfer the address and to ensure the integrity of the transfer Transfer attribute These signals provide information about the type of transfer such as the transfer size and whether the transaction is bursted write through or cache inhibited MPC7450 RISC Microprocessor Family Product Brief Rev 5 24 Freesca
88. order to any combination of Alti Vec execution units per clock cycle from the bottom two VIQ entries VIQI VIQ0O An instruction in VIQ1 does not have to wait for an instruction in VIQO that is waiting for operand availability Moreover the VIU2 VFPU and VPU are pipelined so they can operate on multiple instructions 2 2 4 5 Integer Units lUs The integer units three U1s and IU2 are shown in Figure 1 The IU1s execute shorter latency integer instructions that is all integer instructions except multiply divide and move to from special purpose register instructions U2 executes integer instructions with latencies of 3 cycles or more IU2 has a 32 bit integer multiplier divider and a unit for executing CR logical operations and move to from SPR instructions The multiplier supports early exit for operations that do not require full 32 32 bit multiplication 2 2 4 6 Floating Point Unit FPU The FPU shown in Figure 1 is designed such that double precision operations require only a single pass with a latency of 5 cycles As instructions are dispatched to the FPUs reservation station source operand data can be accessed from the FPRs or from the FPR rename buffers Results in turn are written to the rename buffers and are made available to subsequent instructions Instructions start execution from the bottom reservation station only and execute in program order The FPU contains a single precision multiply add array and the floating
89. ormance Counters PMC1 PMC2 PMC3 PMC4 PMC5 PMC6 Exception Control Register Memory Subsystem Status Control Configuration Registers Processor Version Register PVR SPR 1008 SPR 1009 SPR 287 Data BAT Registers DBATOU DBATOL DBAT1U DBATIL DBAT2U DBAT2L DBAT3U DBAT3L DBAT4U DBAT4L DBAT5U DBAT5L DBAT6U DBAT6L DBAT7U SPR 574 DBAT7L SPR575 Handling Registers Data Address Register DAR SPR 528 SPR 529 SPR 530 SPR 531 SPR 532 SPR 533 SPR 534 SPR 535 SPR 560 SPR 561 SPR 562 SPR 563 SPR 564 SPR 565 SPR 566 SPR 567 SPR 536 SPR 537 SPR 538 SPR 539 SPR 540 SPR 541 SPR 542 SPR 543 SPR 568 SPR 569 SPR 570 SPR 571 SPR 572 SPR 573 SPR 272 SPR 273 SPR 274 SPR 275 SPR 276 SPR 277 SPR 278 SPR 279 SPR 19 DSISR DSISR Instruction Cache Interrupt Control SPR 1016 Register ICTRL SPR 1011 L2 Cache Control Register L2CR SPR 1017 L3 Cache Output Hold ControlRegister L3OHCR_ SPR 1000 Performance Monitor Registers Monitor Control Registers MMCRO 2 MMCR1 2 MMCR2 1 SPR 1014 SPR 1015 SPR 953 SPR 954 SPR 957 SPR 958 SPR 945 SPR 946 SPR 952 SPR 956 SPR 944 Machine State Register MSR Processor ID PIR Register SPR 1023 Segment Registers SRO SR1 SR15 PTE High Lo
90. ory feature The L3 cache interface provides two clock outputs that allow the clock inputs of the SRAMs to be driven at select frequency divisions of the processor core frequency For the MPC7457 the L3 cache interface provides two sets of two differential clock outputs Requests from the L3 cache generally result from instruction misses data load or store misses write through operations or cache management instructions Requests from the L1 and L2 cache are compared against the L3 tags and serviced by the L3 cache if they hit if they miss in the L3 cache they are forwarded to the bus interface Note MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 21 MPC7450 Microprocessor Features that the MPC7441 MPC7445 MPC7447 MPC7447A and MPC7448 do not support the L3 cache and the L3 interface 2 7 System Interface The MPC7450 supports two interface protocols MPX bus protocol and a subset of the 60x bus protocol Note that although this protocol is implemented by the MPC603e MPC604e MPC740 and MPC750 processors it is referred to as the 60x bus interface The MPX bus protocol is derived from the 60x bus protocol The MPX bus interface includes several additional features that provide higher memory bandwidth than the 60x bus and more efficient use of the system bus in a multiprocessing environment Because the MPC7450 s performance is optimized for the MPX bus its use is recommended over the 60x bus
91. parity checking on the L2 Provides for Instruction only and data only modes Provides hardware flushing for the L2 Selects between two available replacement algorithms for the L2 cache The L2 implements the MESI cache coherency protocol using three status bits per sector Requests from the L1 cache generally result from instruction misses data load or store misses write through operations or cache management instructions Requests from the L1 cache are compared against the L2 tags and serviced by the L2 cache if they hit if they miss in the L2 cache they are forwarded to the L3 cache The L2 cache tags are fully pipelined and non blocking for efficient operation Thus the L2 cache can be accessed internally while a load for a miss is pending allowing hits under misses A reload for a cache miss is treated as a normal access and blocks other accesses for only 1 cycle For more information see Chapter 3 L1 L2 and L3 Cache Operation of the MPC7450 RISC Microprocessor Family User s Manual 2 6 L3 Cache Implementation The unified L3 cache receives memory requests from L1 and L2 instruction and data caches independently The L3 cache interface is implemented with an on chip two way set associative tag memory with 2 048 2K tags per way and a dedicated interface with support for up to 2 Mbytes of external synchronous SRAMs Note that the L3 cache is not supported on the MPC7441 MPC7445 MPC7447 MPC7447A and the MPC7448
92. point status and control register FPSCR The multiply add array allows the MPC7450 to implement multiply and multiply add operations efficiently The FPU is pipelined so that one single or double precision instruction can be issued per clock cycle Note that an execution bubble occurs after four consecutive independent floating point arithmetic instructions execute to allow for a normalization special case Thirty two 64 bit floating point registers are provided to support floating point operations Stalls due to contention for FPRs are minimized by automatic allocation of the 16 MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 15 MPC7450 Microprocessor Features floating point rename registers The MPC7450 writes the contents of the rename registers to the appropriate FPR when floating point instructions are retired by the completion unit The MPC7450 supports all IEEE 754 floating point data types normalized denormalized NaN zero and infinity in hardware eliminating the latency incurred by software exception routines 2 2 4 7 Load Store Unit LSU The LSU executes all load and store instructions as well as the AltiVec LRU and transient instructions and provides the data transfer interface between the GPRs FPRs VRs and the cache memory subsystem The LSU also calculates effective addresses and aligns data Load and store instructions are issued and translated in program order however some me
93. provides a supervisor level instruction cache throttling control register ICTC Chapter 10 Power and Thermal Management of the MPC7450 RISC Microprocessor Family User s Manual provides information about how to configure the ICTC register for the MPC7450 MPC7450 RISC Microprocessor Family Product Brief Rev 5 30 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation 2 11 Performance Monitor The MPC7450 incorporates a performance monitor facility that system designers can use to help bring up debug and optimize software performance The performance monitor counts events during execution of instructions related to dispatch execution completion and memory accesses The performance monitor incorporates several registers that can be read and written to by supervisor level software User level versions of these registers provide read only access for user level applications These registers are described in Chapter 1 Overview of the MPC7450 RISC Microprocessor Family User s Manual Performance monitor control registers MMCRO MMCRI and MMCR2 can be used to specify which events are to be counted and the conditions for which a performance monitoring exception is taken Additionally the sampled instruction address register SIAR USIAR holds the address of the first instruction to complete after the counter overflowed Attempting to write to a user level read only performance monitor register cause
94. r MSR Processor Version Register HIDO SPR 1008 Processor ID Register PVR SPR 287 HID1 SPR 1009 Memory Management Registers Instruction BAT Registers PIR SPR 1023 Data BAT Registers Segment Registers SRO XER SPR1 GPR1 IBATOU SPR 528 DBATOU SPR 536 SR1 Link Register LR SPR 8 GPR31 IBATOL SPR529 DBATOL SPR 537 F IBAT1U SPR 530 DBAT1U SPR 538 IBATIL SPR 531 Performance Monitor Registers Floating Point Registers FPR0 FPR1 Performance Counters UPMC1 SPR 937 UPMC2 SPR 938 UPMC3 SPR 941 UPMC4 SPR 942 UPMC5 SPR 929 UPMC6 SPR 930 Sampled Instruction Address USIAR SPR 939 Monitor Control UMMCRO_ SPR 936 UMMCR1_ SPR 940 UMMCR2_ SPR 928 FPR31 Condition Register CR Floating Point Status and Control Register FPSCR AltiVec Registers Vector Save Restore Vector Registers Register VRO VRSAVE _ SPR 256 Vector Status an Control Register VSCR Miscellaneous Registers Time Base Data Address For Writing Breakpoint Register TBL SPR 284 DABR SPR 1013 TBU SPR 285 External Access aed Instruction Address Register Breakpoint Register EAR IABR SPR 1010 pecrementer DEC SPR 282
95. rand availability Moreover the VIU2 VFPU and VPU are pipelined so they can operate on multiple instructions The MPC7450 can complete as many as three instructions on each clock cycle In general the MPC7450 processes instructions in seven stages fetch1 fetch2 decode dispatch issue execute complete and write back as shown in Figure 15 Note that the pipeline example in Chapter 6 Instruction Timing of the MPC7450 RISC Microprocessor Family User s Manual is similar to the four stage VFPU pipeline in Figure 15 MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 51 MPC7450 Microprocessor Architectural Implementation Fetch1 Maximum four instruction fetch Fetch2 per clock cycle Maximum three instruction dispatch per clock cycle Decode Dispatch VR Issue FPR Issue Queue Queue VIQ FIQ Execute Stage VIU2 E0 FPU E0 FPU E1 FPU E1 FPU E2 IU2 E0 LSU EO lu2 E1 LSu Ei VIU2 E1 _ VPU E0 VIU2 E2 FPU EJ FPU E3 ae VPU E1 VIU1 VIU2 E3 FPU E FPU E4 lU1 IU2 E2 LSU E2 A ywiy iii S ELELEEDELELPEDELEE PEEL ESPEDELESEEDELESPE EG SG RED ETETE EEEE E Maximum three instruction completion per clock cycle Complete Figure 15 Superscalar Pipeline Diagram The instruction pipeline stages are described as follows Instruction fetch Includes the clock cycles necessary to request an instruction and the t
96. rd Indexed ecowx Data Cache Block Allocate dcba Floating Select fsel Floating Reciprocal Estimate Single Precision fres Floating Reciprocal Square Root Estimate frsqrte Store Floating Point as Integer Word stfiwx Load Data TLB Entry tlbld Load Instruction TLB Entry tIbli 3 3 On Chip Cache Implementation The following subsections describe the PowerPC architecture s treatment of cache in general and the MPC7450 specific implementation respectively A detailed description of the MPC7450 cache implementation is provided in Chapter 3 L1 L2 and L3 Cache Operation of the MPC7450 RISC Microprocessor Family User s Manual 3 3 1 PowerPC Cache Model The PowerPC architecture does not define hardware aspects of cache implementations For example processors that implement the PowerPC architecture can have unified caches separate L1 instruction and data caches Harvard architecture or no cache at all These microprocessors control the following memory access modes on a page or block basis Write back write through mode Caching inhibited caching allowed mode Memory coherency required memory coherency not required mode The caches are physically addressed and the data cache can operate in either write back or write through mode as specified by the PowerPC architecture MPC7450 RISC Microprocessor Family Product Brief Rev 5 44 Freescale Semiconductor MPC7450 Microprocessor
97. registers to be used by system software for table software searching If the SPRGs are not used for software table searches they can be used by other supervisor programs MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 7 MPC7450 Microprocessor Features 2 MPC7450 Microprocessor Features This section describes the features of the MPC7450 The interrelationships of these features are shown in Figure 1 2 1 Overview of the MPC7450 Microprocessor Features Major features of the MPC7450 are as follows High performance superscalar microprocessor As many as 4 instructions can be fetched from the instruction cache at a time As many as 3 instructions can be dispatched to the issue queues at a time As many as 12 instructions can be in the instruction queue IQ As many as 16 instructions can be at some stage of execution simultaneously Single cycle execution for most instructions One instruction throughput per clock cycle for most instructions Seven stage pipeline control Eleven independent execution units and three register files Branch processing unit BPU features static and dynamic branch prediction 128 entry 32 set four way set associative branch target instruction cache BTIC a cache of branch instructions that have been encountered in branch loop code sequences If a target instruction is in the BTIC it is fetched into the instruction queue a cycle sooner than it can be made ava
98. ructions that control the state of the processor the address translation mechanism and supervisor registers can be executed only when the processor is operating in supervisor mode Figure 11 through Figure 13 show all the MPC7450 registers available at the user and supervisor level The numbers to the right of the SPRs indicate the number that is used in the syntax of the instruction operands to access the register For more information see Chapter 2 Programming Model of the MPC7450 RISC Microprocessor Family User s Manual The OEA defines numerous SPRs that serve a variety of functions such as providing controls indicating status configuring the processor and performing special operations During normal execution a program can access the registers shown in Figure 11 through Figure 13 depending on the program s access privilege supervisor or user MPC7450 RISC Microprocessor Family Product Brief Rev 5 32 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation determined by the privilege level bit MSR PR GPRs FPRs and VRs are accessed through operands that are part of the instructions Access to registers can be explicit that is through the use of specific instructions for that purpose such as Move to Special Purpose Register mtspr and Move from Special Purpose Register mfspr instructions or implicit as the part of the execution of an instruction MPC7450 RISC Microprocessor Family Prod
99. s a program exception regardless of the MSR PR setting When a performance monitor exception occurs program execution continues from vector offset 0x00F00 Chapter 11 Performance Monitor of the MPC7450 RISC Microprocessor Family User s Manual describes the operation of the performance monitor diagnostic tool incorporated in the MPC7450 3 MPC7450 Microprocessor Architectural Implementation The PowerPC architecture consists of three layers Adherence to the PowerPC architecture can be described in terms of which of the following levels of the architecture is implemented PowerPC user instruction set architecture UISA Defines the base user level instruction set user level registers data types floating point exception model memory models for a uniprocessor environment and programming model for a uniprocessor environment PowerPC virtual environment architecture VEA Describes the memory model for a multiprocessor environment defines cache control instructions and describes other aspects of virtual environments Implementations that conform to the VEA also adhere to the UISA but may not necessarily adhere to the OEA PowerPC operating environment architecture OEA Defines the memory management model supervisor level registers synchronization requirements and the exception model Implementations that conform to the OEA also adhere to the UISA and the VEA The MPC7450 implementation supports the three levels
100. s the memory management specification of the PowerPC OEA for 32 bit implementations but adds capability for supporting 36 bit physical addressing Thus it provides 4 Gbytes of physical address space accessible to supervisor and user programs with a 4 Kbyte page size and 256 Mbyte segment size In addition the MPC7450 MMUs use an interim virtual address 52 bits and hashed page tables in the generation of 32 or 36 bit physical addresses depending on the setting of HIDO XAEN Processors that implement the PowerPC architecture also have a BAT mechanism for mapping large blocks of memory Block range from 128 Kbytes to 256 Mbytes and are software programmable The MPC7450 provides table search operations performed in hardware The 52 bit virtual address is formed and the MMU attempts to fetch the PTE that contains the physical address from the appropriate TLB on chip If the translation is not found in either the BAT array or in a TLB that is a TLB miss occurs the hardware performs a table search operation using a hashing function to search for the PTE Hardware table searching is the default mode for the MPC7450 however if HIDO STEN 1 a software table search is performed The MPC7450 also provides support for table search operations performed in software if HIDO STEN is set In this case the TLBMISS register saves the effective address of the access that requires a software table search The PTEHI and PTELO registers and the tlbli and tl
101. scribes one MMU conceptually the MPC7450 hardware maintains separate TLBs and table search resources for instruction and data accesses that can be performed independently and simultaneously Therefore the MPC7450 is described as having two MMUs one for instruction accesses IMMU and one for data accesses DMMU The block address translation BAT mechanism is a software controlled array that stores the available block address translations on chip BAT array entries are implemented as pairs of BAT registers that are accessible as supervisor special purpose registers SPRs There are separate instruction and data BAT mechanisms In the MPC7450 they reside in the instruction and data MMUs respectively The MMUs together with the exception processing mechanism provide the necessary support for the operating system to implement a paged virtual memory environment and for enforcing protection of designated memory areas Section 4 3 Exception Processing describes how the MSR controls critical MMU functionality 3 5 2 MPC7450 Microprocessor Memory Management Implementation The MPC7450 implements separate MMUs for instructions and data It maintains a copy of the segment registers in the instruction MMU however read and write accesses to the segment registers mfsr and mtsr are handled through the segment registers in the data MMU The MPC7450 MMU is described in Section 1 2 3 Memory Management Units MMUs The MPC7450 implement
102. so contains the branch target address for Branch Conditional to Link Register belrx instructions The CTR contains the branch target address for Branch Conditional to Count Register bectrx instructions Because the LR and CTR are SPRs their contents can be copied to or from any GPR Also because the BPU uses dedicated registers rather than GPRs or FPRs execution of branch instructions is largely independent from execution of integer and floating point instructions 2 2 3 Completion Unit The completion unit operates closely with the instruction unit Instructions are fetched and dispatched in program order At the point of dispatch the program order is maintained by assigning each dispatched instruction a successive entry in the 16 entry CQ The completion unit tracks instructions from dispatch through execution and retires them in program order from the three bottom CQ entries CQ0 CQ2 Instructions cannot be dispatched to an execution unit unless there is a CQ vacancy Branch instructions that do not update the CTR or LR are often removed from the instruction stream Those that are removed do not take a CQ entry Branches that are not removed from the instruction stream follow the same dispatch and completion procedures as non branch instructions but are not dispatched to an issue queue Completing an instruction commits execution results to architected registers GPRs FPRs VRs LR and CTR In order completion ensures the correct archit
103. sters and register files and determines when instructions are latched into the execution unit reservation stations The GIQ FIQ and VIQ AltiVec issue queues have the following similarities Operand lookup in the GPRs FPRs and VRs and their rename registers Issue queues issue instructions to the proper execution units Each issue queue holds twice as many instructions as can be dispatched to it in one cycle the GIQ has six entries the VIQ has four and the FIQ has two The three issue queues are described as follows The GIQ accepts as many as three instructions from the dispatch unit each cycle U1 IU2 and all LSU instructions including floating point and AltiVec loads and stores are dispatched to the GIQ Instructions can be issued out of order from the bottom three GIQ entries GIQ2 GIQ0 An instruction in GIQ1 destined for an IU1 does not have to wait for an instruction in GIQO that is stalled behind a long latency integer divide instruction in the IU2 The VIQ accepts as many as two instructions from the dispatch unit each cycle All AltiVec instructions other than load store and vector touch instructions are dispatched to the VIQ In the MPC7450 as many as two instructions can be issued to the four AltiVec execution units but unlike the GIQ instructions in the VIQ cannot be issued out of order In the MPC7448 a maximum of two AltiVec instructions can be issued out of order to any combination of
104. tF2ZOdIN 9U uo p lu uu lduui JOU SI BBLS Byoed T SUL S lON 1 SSiN 3101S lqe uoeo Z u9 d uoloniisul S SSiN peo1 L 1 H snes spel L 0 y0 ur1 a1Ag ze 49018 Snels 4g z 0 X29019 ur1 Sn e S spe V ttr OdN pUe St 2OdN th OdN u 9 qQA ZLS 1 llonuoo Y z1 POUN l qy 9SZ s n no O9IA19S 1 071 n nO peo1 17 os anand 101S 1 ui ls sqns ioui m SSI peo s 1ols Hun luloq Buneol3 si jing we u y 94 z suonel s ie uopenas y pajejduo5 ysnd 11 810 S qnoseo p uslulj uole noleS v3 u B5ud uonol 10499A yun 810 S peo7 Wa 8cl Wa 8eb Nd 10 99 suayng we u y 94 n ano uonol 10 99A z uun 1 B 1ul uollels z suone s uol 8A19S89lI uolleA19S8lH Anug z suoneig ja UOI JPA19S 9H Aeuy 1vga y J9 q g11q nua g8z1 reuibuo sus l qy zg YDED NWIN eea keny Lal Mqy ze gll mopeys Anug 8z sus
105. ta loads and stores the instruction unit calculates effective addresses for instruction fetching The MMU translates the effective address to determine the correct physical address for the memory access The MPC7450 supports the following types of memory translation Real addressing mode In this mode translation is disabled by clearing bits in the machine state register MSR MSR IR for instruction fetching or MSR DR for data accesses When address translation is disabled the physical address is identical to the effective address When extended addressing is disabled HIDO XAEN 0 a 32 bit physical address is used PA 4 35 For more details see Chapter 5 Memory Management Unit of the MPC7450 RISC Microprocessor Family User s Manual e Page address translation translates the page frame address for a 4 Kbyte page size Block address translation translates the base address for blocks 128 Kbytes to 256 Mbytes MPC7441 MPC7450 MPC7451 or 4 GBytes MPC7445 MPC7455 MPC7457 MPC7447 MPC7447A and MPC7448 If translation is enabled the appropriate MMU translates the higher order bits of the effective address into physical address bits Lower order address bits are untranslated and are the same for both logical and physical addresses These bits are directed to the on chip caches where they form the index into the eight way set associative tag array After translating the address the MMU passes the higher order physi
106. te describes In the MPC7448 supports error correction and detection using a SECDED single error correction double error detection protocol Every 64 bits of data comes with 8 bits of error detection correction which can be programmed as ECC across the 64 bits of data byte parity or no error detection correction Supports parity generation and checking for both tags and data enabled through L2CR In the MPC7448 tag parity is enabled separately in the L2ERRDIS register and data parity can be enabled through L2CR only when ECC is disabled In the MPC7448 error injection modes provided for testing e Level 3 L3 cache interface not supported on the MPC7441 MPC7445 MPC7447 MPC7447A and MPC7448 Provides critical double word forwarding to the requesting unit On chip tags support 1 or 2 Mbytes of external SRAM that is eight way set associative Maintains instructions data or both instructions and data selectable through L3CR Cache write back or write through operation programmable on a per page or per block basis Organized as 64 bytes line configured as 2 blocks sectors with separate status bits per line for 1 Mbyte configuration Organized as 128 bytes line configured as 4 blocks sectors with separate status bits per line for 2 Mbyte configuration 1 2 or 4 Mbytes 4 Mbytes is only for the MPC7457 of the L3 SRAM can be designated as private memory Supports same four state MESI coherency protocol as L1 and L2
107. that the EAR and the eciwx and ecowx instructions are optional in the PowerPC architecture HIDO 1008 1009 Hardware implementation dependent registers Control various functions such as the HID1 power management features and locking enabling and invalidating the instruction and data caches The HID1 includes bits that reflects the state of PLL_CFG 0 4 PLL_CFG 0 5 for the MPC7448 clock signals and control other bus related functions IABR 1010 Instruction address breakpoint register Used to cause a breakpoint exception if a specified instruction address is encountered IBATOU L 528 529 Block address translation BAT registers The PowerPC OEA includes an array of block IBAT1U L 530 531 address translation registers that can be used to specify four blocks of instruction space IBAT2U L 532 533 and four blocks of data space The BAT registers are implemented in pairs four pairs of IBAT3U L 534 535 instruction BATs IBATOU IBAT3U and IBATOL IBATS3L and four pairs of data BATs IBAT4U L 4 560 561 DBATOU DBAT3U and DBATOL DBAT3L There are four additional pairs of instruction IBAT5U L 4 562 563 BATs and four additional pairs of instruction BATs in the MPC7455 MPC7457 IBAT6U L 564 565 MPC7447 MPC7447A and MPC7448 IBAT7U L 4 566 567 Sixteen additional BAT registers have been added for the MPC7455 These registers are enabled by setting HIDO HIGH_BAT_EN When HIDO HIGH_BAT_EN 1 the 16 DBATOU L 536 537 addition
108. the ITLB e600 specific DTLB 0x01100 A data load translation miss exception is caused when HIDO STEN 1 and the miss on load effective address for a data load operation cannot be translated by the DTLB e600 specific DTLB 0x01200 A data store translation miss exception is caused when HIDO STEN 1 and miss on store the effective address for a data store operation cannot be translated by the DTLB or when a DTLB hit occurs and the changed bit in the PTE must be set due to a data store operation e600 specific Instruction 0x01300 IABR 0 29 matches EA 0 29 of the next instruction to complete and address IABR BE 1 e600 specific breakpoint System 0x01400 MSR EE 1 and SMI is asserted e600 specific management interrupt Reserved 0x01500 015FF AltiVec assist 0x01600 This e600 specific exception supports denormalization detection in Java mode as specified in the AltiVec Technology Programming Environments Manual in Chapter 3 Operand Conventions Reserved 0x01700 02FFF 3 5 Memory Management The following subsections describe the memory management features of the PowerPC architecture and the MPC7450 implementation respectively 3 5 1 PowerPC Memory Management Model The primary function of the MMU in a processor that implements the PowerPC architecture is the translation of logical effective addresses to physical addresses referred to as real addresses i
109. the memory hierarchy address register queues and the reservation controlled by the Load Word and Reserve Indexed Iwarx and Store Word Conditional Indexed stwex instructions Accesses are prioritized with load operations preceding store operations Note that the L3 cache interface is not supported on the MPC7441 MPC7445 MPC7447 MPC7447A and MPC7448 Instructions are automatically fetched from the memory system into the instruction unit where they are issued to the execution units at a peak rate of three instructions per clock cycle Conversely load and store instructions explicitly specify the movement of operands to and from the integer floating point and AltiVec register files and the memory system When the MPC7450 encounters an instruction or data access it calculates the effective address and uses the lower order address bits to check for a hit in the on chip 32 Kbyte L1 instruction and data caches During L1 cache lookup the instruction and data memory management units MMUs use the higher order address bits to calculate the virtual address from which they calculate the physical real address The physical address bits are then compared with the corresponding cache tag bits to determine if a cache hit occurred in the L1 instruction or data cache If the access misses in the corresponding cache the transaction is sent to L1 load miss queue or the L1 store miss queue L1 load miss queue transactions are sent to the internal 256 Kbyte
110. tion Address USIAR SPR 939 Monitor Control UMMCRO SPR 936 UMMCR1_ SPR 940 UMMCR2_ SPR 928 AltiVec Registers Vector Registers Vector Save Restore Register VRSAVE SPR 256 Vector Status an Control Register VSCR Registers GPRO SUPERVISOR MODEL OEA Hardware Implementation Registers HIDO HID1 Memory Management Registers Instruction BAT Registers GPR1 IBATOU GPR31 IBATOL IBAT1U IBATIL Registers FPRO FPR1 FPR31 tion Register CR Floating Point Status and Control Register FPSCR VR0 IBAT2U IBAT2L IBAT3U IBAT3L IBAT4U IBAT4L IBAT5U IBAT5L IBAT6U IBAT6L IBAT7U IBAT7L SPRGs SPRGO SPRG1 SPRG2 SPRG3 SPRG4 SPRGS5 SPRG6 _SPRG7 Load Store Miscellaneous Registers Time Base For Writing Data Address Breakpoint Register TBL SPR 284 TBU SPR 285 Instruction Address DABR SPR 1013 External Access Register Breakpoint Register EAR SPR 282 IABR SPR 1010 Decrementer DEC SPR 22 Thermal Management Register Instruction Cache Throttling Control Register ICTC SPR 1019 LDSTCR Registers MSSCRO MSSSRO Perf
111. tool This functionality is fully described in Chapter 11 Performance Monitor of the MPC7450 RISC Microprocessor Family User s Manual Figure 1 shows the parallel organization of the execution units shaded in the diagram and the instruction unit fetches dispatches and predicts branch instructions Note that this is a conceptual model showing basic features rather than an attempt to show how features are implemented physically Figure 2 shows the organization of the MPC7448 execution units MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 3 MPC7450 Microprocessor Overview sng eq 1Yg r9 sng sseippy Wa 9E s 1 qN z 40 WVHS eus91x3 Qued Ha 8 z 01 anano ysnd 6 anano 1nolseo n no uols sng 1ole nuinoov sng oSeJi lul sng uu 1s S eea a p9 V rr Od ss ppv 1 gq 8L a Sa t 012 LL anano peor 19110 U09 y 7 1ole nuunoov sng SUO JUSAJO U ysnd doous sinojsey 17 osz1 n no 201s Z1 g u91 J 1d 21 usnd e 10 aqe rene aq IIIA AUS buunsu s ulu 6 0 pawl s JOSH enano InolseO SUL s ulu o Jo e o1 POUIQUUOD e JO YONS s ojnos i seys n nO usnd pue n no 1nolseO ul Z IN 10 tZOdIN SrhZOdIN L
112. tor permute MMUs TLBs instruction and data 128 entry 2 way Tablewalk mechanism Hardware software Instruction BATs Data BATs 8 8 L1 I Cache D Cache Features Size 32K 32K Associativity 8 way Locking granularity Way Parity on cache Word Parity on D cache Byte Number of D cache misses load store 5 1 Data stream touch engines 4 streams On Chip Cache Features Cache level L2 Size associativity 512 Kbyte 8 way Access width 256 bits Number of 32 byte sectors line 2 Parity Byte MPC7450 RISC Microprocessor Family Product Brief Rev 5 60 Freescale Semiconductor Differences Between MPC7447A and MPC7448 Table 7 Microarchitecture Comparison continued Microarchitectural Specs MPC7447A MPC7447 Thermal Control Dynamic frequency switching DFS Yes No Thermal diode Yes No 8 Differences Between MPC7447A and MPC7448 The MPC7448 has a number of changes over the core in the MPC7447A Some of these changes are feature improvements larger 1 Mbyte L2 cache expanded DFS capability L2 data ECC Some are performance changes improvements second store miss or changes necessary for feature improvements extended L2 pipeline Table 8 describes the differences between the MPC7447A and the MPC7448 Table 8 Microarchitecture Comparison Microarchitectural Specs MPC7447A MPC7448
113. ts that execute AltiVec instructions The MPC7450 implements the 32 bit portion of the PowerPC architecture which provides 32 bit effective addresses integer data types of 8 16 and 32 bits and floating point data types of 32 and 64 bits The MPC7450 provides virtual memory support for up to 4 Petabytes 252 of virtual memory and real memory support for up to 64 Gigabytes 2 of physical memory Freescale Semiconductor Inc 2004 All rights reserved e Z freescale semiconductor MPC7450 Microprocessor Overview The MPC7450 also implements the AltiVec instruction set architectural extension The MPC7450 is a superscalar processor that can dispatch and complete three instructions simultaneously It incorporates the following execution units 64 bit floating point unit FPU Branch processing unit BPU Load store unit LSU Four integer units IUs Three shorter latency Us U1la IU1c execute all integer instructions except multiply divide and move to from special purpose register SPR instructions Longer latency IU IU2 executes miscellaneous instructions including condition register CR logical operations integer multiplication and division instructions and move to from SPR instructions Four vector units that support AltiVec instructions Vector permute unit VPU Vector integer unit 1 VIU1 performs shorter latency integer calculations Vector integer u
114. ubsystem L2CAPTECC L2ERRDET L2ERRDIS SPR 990 SPR 991 SPR 992 L2 Cache Control Register L2CR SPR 1017 Status Control Registers MSSCRO MSSSRO SPR 1015 L2 Error Injection SPR 1014 Registers L2ERRINJHI L2ERRINTEN L2ERRATTR L2ERRADDR SPR 993 SPR 994 SPR 995 SPR 985 L2ERRINJLO SPR 986 L2ERREADDR_ SPR 996 L2ERRCTL SPR 997 L2ERRINJCTL SPR 987 Performan Performance Counters PMC1 PMC2 PMC3 PMC4 PMC5 PMC6 SPR 953 SPR 954 SPR 957 SPR 958 SPR 945 SPR 946 ce Monitor Registers Monitor Control Registers MMCR0 2 MMCR1 2 MMCR2 Breakpoint Address Mask Register BAMR SPR 951 SPR 952 SPR 956 SPR 944 Sampled Instruction Address Register SIAR SPR 955 1 MPC7448 specific register may not be supported on other processors that implement the PowerPC architecture 2 Register defined as optional in the PowerPC architecture 3 Register defined by the AltiVec technology Figure 13 Programming Model MPC7448 Microprocessor Registers MPC7450 RISC Microprocessor Family Product Brief Rev 5 36 Freescale Semiconductor MPC7450 Microprocessor Architectural Implementation Some registers can be accessed both explicitly and implicitly In the MPC7450 all SPRs are 32 bits wide Table 1 describes registers implemented by the MPC7
115. uct Brief Rev 5 Freescale Semiconductor 33 MPC7450 Microprocessor Architectural Implementation Figure 11 shows the MPC7441 and MPC7451 register set SUPERVISOR MODEL OEA USER MODEL VEA Configuration Registers i i Time Base Facility For Reading Hardware Machine State Register Implementation Processor Version MSR TBL TBR 268 TBU TBR 269 Registers Register HID0 SPR 1008 PVR SPR 287 USER MODEL UISA HID1 SPR 1009 PIR SPR 1023 Count Register General Purpose Memory Management Registers CTR SPR 9 Gene ers Instruction BAT Data BAT Segment Registers XER Registers Registers SRO XER SPR1 OPRI IBATOU SPR528 DBATOU SPR 536 SRI Link Register IBATOL SPR 529 DBATOL SPR 537 z LR SPR8 GPR31 IBAT1U SPR 530 DBAT1U SPR 538 IBAT1L SPR531 DBAT1L SPR 539 SR15 Performance Monitor Registers IBAT2U SPR 532 DBAT2U_ SPR540 p TE High Low Floating Point IBAT2L SPR 533 DBAT2L SPR 541 Registers Performance Counters Registers IBAT3U SPR 534 DBAT3U SPR 542 PTEHI SPR 981 UPMC1 SPR 937 FPRO IBAT3L SPR 535 DBAT3L SPR 543 PTELO SPR 982 UPMC2_ SPR 938 BER SDR1 TLB Miss Register UPMC3 SPR941 SDR1 SPR25 UPMC4 SPR 942 TLBMISS SPR 980 UPMC5 SPR 929 FERS Exception Handling Registers UPMC6_ SPR 930 SPRGs Data Address Save and Restore Sampled Instruction Condition Register SPRGO SPR 272 Register Registers Address CR SPRG1 SPR 273 DAR SPR19 SRRO SPR26 USIAR SPR 939 SPRG2 SPR274 DSISR SRRI SPR27
116. uction and data caches Harvard architecture Instruction and data caches are eight way set associative Instruction and data caches have 32 byte cache blocks A cache block is the block of memory that a coherency state describes it corresponds to a cache line for the L1 data cache Cache directories are physically addressed The physical real address tag is stored in the cache directory The caches implement a pseudo least recently used PLRU replacement algorithm within each way Cache write back or write through operation is programmable on a per page or per block basis Instruction cache can provide four instructions per clock cycle data cache can provide four words per clock cycle Two cycle latency and single cycle throughput for instruction or data cache accesses Caches can be disabled in software Caches can be locked in software Supports a four state modified exclusive shared invalid MESI coherency protocol A single coherency status bit for each instruction cache block allows encoding for the following two possible states Invalid INV Valid VAL Two status bits MESI 0 1 for each data cache block allow encoding for coherency as follows 00 invalid D 01 shared S 10 exclusive E 11 modified M Separate copy of data cache tags for efficient snooping Both L1 caches support parity generation and checking enabled through bits in the ICTRL register as
117. uction is encountered the MPC7450 executes instructions from the predicted target stream although the results are not committed to architected registers until the conditional branch is resolved Unresolved branches are held in a three entry branch queue When the branch queue is full no further conditional branches can be processed until one of the conditions in the branch queue is resolved When a branch is taken or predicted as taken instructions from the untaken path must be flushed and the target instruction stream must be fetched into the IQ The BTIC is a 128 entry four way set associative cache that contains the most recently used branch target instructions up to four instructions per entry for b and be branches When a taken branch instruction of this type hits in the BTIC the instructions arrive in the IQ two clock cycles later a clock cycle sooner than they would arrive from the instruction cache Additional instructions arrive from the instruction cache in the next clock cycle The BTIC reduces the number of missed opportunities to dispatch instructions and gives the processor a 1 cycle head start on processing the target stream The BPU contains an adder to compute branch target addresses and three user accessible registers the link register LR the count register CTR and the condition register CR The BPU calculates the return pointer for subroutine calls and saves it in the LR for certain types of branch instructions The LR al
118. uctor products for any such unintended or unauthorized application Buyer shall indemnify and hold Freescale Semiconductor and its officers employees subsidiaries affiliates and distributors harmless against all claims costs damages and expenses and reasonable attorney fees arising out of directly or indirectly any claim of personal injury or death associated with such unintended or unauthorized use even if such claim alleges that Freescale Semiconductor was negligent regarding the design or manufacture of the part Freescale and the Freescale logo are trademarks of Freescale Semiconductor Inc The PowerPC name is a trademark of IBM Corp and is used under license All other product or service names are the property of their respective owners Freescale Semiconductor Inc 2004 Pp Psa Z freescale semiconductor
119. virtual environment architecture It also provides specific details about the MPC7450 cache implementation The information in this section is described more fully in Chapter 3 L1 L2 and L3 Cache Operation of the MPC7450 RISC Microprocessor Family User s Manual Exception model Describes the exception model of the PowerPC operating environment architecture and the differences in the MPC7450 exception model The information in this section is described more fully in Chapter 4 Exceptions of the MPC7450 RISC Microprocessor Family User s Manual Memory management Describes generally the conventions for memory management This section also describes the MPC7450 s implementation of the 32 bit PowerPC memory management specification The information in this section is described more fully in Chapter 5 Memory Management of the MPC7450 RISC Microprocessor Family User s Manual Instruction timing Provides a general description of the instruction timing provided by the superscalar parallel execution supported by the PowerPC architecture and the MPC7450 The information in this section is described more fully in Chapter 6 Instruction Timing of the MPC7450 RISC Microprocessor Family User s Manual AltiVec implementation Points out that the MPC7450 implements AltiVec registers instructions and exceptions as described in the Al tiVec Technology Programming Environments Manual Chapter 7 AltiVec T
120. w Registers PTEHI SPR 981 PTELO SPR 982 TLB Miss Register TLBMISS SPR 980 SDR1 SDR1 SPR 25 Save and Restore Registers SRRO SPR 26 SRR1 SPR 27 Cache Memory Subsystem Registers L3 Private Memory Register L3PM SPR 983 L3 Cache Control Register L38CR L3 Cache Input Timing Control Register L3ITCRO SPR 984 SPR 1018 Breakpoint Address Mask Register BAMR SPR 951 Sampled Instruction Address Register SIAR SPR 955 1 MPC7445 MPC7447 MPC7455 and MPC7457 specific register may not be supported on other processors that implement the PowerPC architecture 2 Register defined as optional in the PowerPC architecture Register defined by the AltiVec technology 4 MPG7455 and MPG7457 specific register 5 MPC7457 specific register Figure 12 Programming Model MPC7445 MPC7447 MPC7455 MPC7457 MPC7447A Registers MPC7450 RISC Microprocessor Family Product Brief Rev 5 Freescale Semiconductor 35 MPC7450 Microprocessor Architectural Implementation Figure 13 shows the MPC7448 register set USER MODEL VEA Time Base Facility For Reading TBL TBR 268 TBU TBR 269 USER MODEL UISA Count Register General Purpose CTR SPR 9 Registers SUPERVISOR MODEL OEA Hardware Implementation Registers Configuration Registers Machine State Registe

MPC7450 RISC Microprocessor Family Product Brief

Contents

Download Pdf Manuals

Related Search

Related Contents