Home

MIPS32® 74Kc™ Processor Core Datasheet

1. backup structure for the ITLB If a fetch address cannot be translated by the ITLB the JTLB attempts to translate it in the following clock cycle or when available If successful the translation information is copied into the ITLB for future use The JTLB port used for ITLB miss access is shared with other MMU management activities Fixed Mapping Translation FMT The FMT is much simpler and smaller than the TLB style MMU and is a good choice when the full protection and flexibility of the TLB are not needed Like a TLB the FMT performs virtual to physical address translation and provides attributes for the different segments Those segments that are unmapped in a TLB implementation kseg0 and kseg1 are handled identically by the FMT Instruction Cache The instruction cache is an on chip memory block of 0 16 32 64 KB with 4 way associativity All size references made will assume a default size of 32 KB Because the instruction cache is virtually indexed the virtual to physical address translation occurs in parallel with the tag access rather than having to wait for the physical address translation A tag entry holds 21 bits of physical address a valid bit a lock bit and an optional parity bit There are 7 precode bits per instruction pair making a total of 28 bits per tag entry The data array line consists of 256 bits 8 MIPS32 instructions of data Each instruction doubleword 64 bits has 8 bits of byte parity The I
2. Q15 Q31 e Saturating arithmetic e SIMD instructions operate on 2x16 bit or 4x8 bit simultaneously Instruction Fetch Unit e 4 instruction fetch per cycle e 8 entry Return Prediction Stack e Combined Majority Branch Predictor using three 256 entry Branch History Tables BHT e 64 entry 4 way associative jump register cache to predict target for indirect jumps e Hardware prefetching of the next 1 or 2 sequential cache lines on a miss Number of prefetched lines 0 1 or 2 controllable via configuration bits Dual Out of Order Instruction Issue e Separate ALU and AGEN pipes MIPS32 74Kc Processor Core Datasheet Revision 01 07 e AGEN pipe executes load store and con trol transfer instructions e ALU pipe executes all other instructions e 32 18 ALU 14 AGEN completion buff ers hold execution results until instructions are graduated in program order Programmable Memory Management Unit e 16 32 48 64 dual entry dual ported TLB shared by Instruction and Data MMU e 4 entry ITLB 4KB 16KB page size 4K 16K 64K 256K 1M 4M 16M 64M 256M byte page size supported in JTLB Optional simple Fixed Mapping Transla tion FMT mechanism Programmable L1 Cache Sizes e Individually configurable instruction and data caches e Instruction Cache sizes of 0 16 32 64 KB e Data Cache sizes of 0 16 32 64 KB e 4 way set associative e 32 byte cache line size e Virtually indexed physically t
3. SLTIU SLTU SLL shift lt 8 and SRL 31 lt shift lt 25 will complete and bypass the results from AC stage to both ALU and AGEN pipe consumers e ADD ADDU ADDI ADDIU instructions can bypass the results in AC to the con sumers in the ALU pipe If the consumer instructions are in the AGEN pipe these instructions will bypass the results from the AB stage e The AC stage is aligned with the start of the Multiply Divide Unit MDU and the CorEx tend unit MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved e Results bypass for all operations is performed in the AB stage The results are also prepared for writing into the completion buffer in the following cycle One exception to this rule are the ADD operations bypassing to the consumer instructions in the ALU pipe The latency of the ALU operation is 1 or 2 cycles For 2 cycle operations the first cycle is required to perform the arithmetic operation and the second cycle is required to select and forward the results to potential consumer instructions The ALU supports a throughput of 1 operation per cycle AGEN Pipe The AGEN pipe spans 5 stages as follows e The first stage EM is used to select the oper ands that are read from the register file and completion buffer The register file and com pletion buffer read stage overlays the DM stage of the IDU and does not contribute to the pipe
4. arrives from the BIU Graduation Unit GRU The Graduation Unit is responsible for committing execution results into architectural state and releas ing buffers and resources used by these instructions The GRU is also responsible for evaluating the MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved exception conditions reported by execution units and taking the appropriate exception Asynchro nous interrupts are also funneled into the GRU which prioritizes those events at the existing condi tions and takes the appropriate interrupt The GRU receives information about the program order of instruction from the Graduation FIFO GFIFO The GFIFO is written by the IDU at dis patch time The GFIFO entry has a pointer to the completion buffer and associated structures where various attributes such as PC exception informa tion etc are held The GRU will read up to 2 completed instructions from the GFIFO every cycle and then read the cor responding completion buffer and associated infor mation After processing the exception conditions the destination register s are updated and the com pletion buffers are released The GRU also sends graduation information to the IDU so that it can update the rename maps to reflect the state of exe cution results i e GPRs Accumulators etc The GRU also sends resolved branch information to the IFU so that bran
5. data external events or program errors Interrupt Handling The 74Kc core supports six hardware interrupt pins two software interrupts a timer interrupt and a per formance counter interrupt These interrupts can be used in any of three interrupt modes as defined by Release 2 of the MIPS32 Architecture e Interrupt compatibility mode which acts iden tically to that in an implementation of Release 1 of the Architecture Vectored Interrupt VI mode which adds the ability to prioritize and vector interrupts to a handler dedicated to that interrupt and to assign a GPR shadow set for use during inter rupt processing The presence of this mode is denoted by the Vint bit in the Config3 register This mode is architecturally optional As it is always present on the 74Kc core the Vint bit will always read 1 e External Interrupt Controller EIC mode which redefines the way in which interrupts are handled in order to provide full support for an external interrupt controller that handles priori tization and vectoring of interrupts This mode is optional in the Release 2 architecture The presence of this mode is denoted by the VE C bit in the Config3 register On the 74Kc core the VEIC bit is set externally by the static input S _E ICPresent to allow system logic to indicate the presence of an external interrupt controller If the 74Kc core is configured to use shadow regis ters the VI and EIC interrupt modes can
6. States of America shall be the governing law The information contained in this document constitutes one or more of the following commercial computer software commercial computer software documentation or other commercial items If the user of this information or any related documentation of any kind including related technical data or manuals is an agency department or other entity of the United States government Government the use duplication reproduction release modification disclosure or transfer of this information or any related documentation of any kind is restricted in accordance with Federal Acquisition Regulation 12 212 for civilian agencies and Defense Federal Acquisition Regulation Supplement 227 7202 for military agencies The use of this information by the Government is further restricted in accordance with the terms of the license agreement s and or applicable contract terms and conditions covering this information from MIPS Technologies or an authorized third party MIPS MIPS I MIPS II MIPS III MIPS IV MIPS V MIPSr3 MIPS32 MIPS64 microMIPS32 microMIPS64 MIPS 3D MIPS16 MIPS16e MIPS Based MIPSsim MIPSpro MIPS Technologies logo MIPS VERIFIED MIPS VERIFIED logo 4K 4Kc 4Km 4Kp 4KE 4KEc 4KEm 4KEp 4KS 4KSc 4KSd M4K MI4K 5K 5Kc 5Kf 24K 24Kc 24Kf 24KE 24KEc 24KEf 34K 34Kc 34Kf 74K 74Kc 74Kf 1004K 1004Kc 1004Kf 1074K 1074Kc 1074Kf R3000 R4000 R5000 ASMACRO Atlas At the core
7. and Data Scratchpad RAM arrays with reference design sup porting DMA interfaces for loading the arrays data virtual address value breakpoints Addition ally real time tracing of instruction program counter data address and data values can be sup ported An Enhanced JTAG EJTAG block allows for soft ware debugging of the processor and includes a TAP controller as well as optional instruction and Figure shows a block diagram of the 74Kc core Figure 1 74Kc Core Block Diagram OCP Interface On Chip Bus es ISPRAM Instruction Scratchpad RAM ISPRAM D cache 0 64 KB Interface 4KB 1MB 4 way set associative CorExtend interface Data Scratchpad DSPRAM CorExtend RAM DSPRAM 4KB 1MB Interface Debug Off chip Interface Off On chip Trace I F MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved 74Kc Core Features 14 stage ALU and 15 stage AGEN pipelines e 12 stage ALU fetch and execution pipe e 13 stage AGEN fetch and execution pipe Common 2 stage graduation pipe 32 bit address paths 128 bit data path for instruction cache and 64 or 128 bit data path for data cache 64 bit data paths to external interface MIPS32 Release2 Instruction Set and Privi leged Resource Architecture MIPS 16e Code Compression MIPS DSP ASE Revision 2 0 e 3 additional pairs of accumulator registers e Fractional data types
8. is accessed in one cycle and the corresponding instruction data array is accessed in the following cycle While the instruc 13 Copyright 2006 2011 MIPS Technologies Inc All rights reserved tion data is being accessed the tag data is compared to the translated address to determine a hit The result of this hit is used to select the way of the instruction data in the following cycle thus com pleting the 3 cycle sequence The data cache and tag arrays are accessed in the same cycle The JTLB is also accessed at the same time for virtual to physical address translation The virtual tag match with the virtual address is used to select the data cache way in order to bypass data as soon as possible The result of the JTLB compare is used to further determine a match with the physical tag in the tag array to validate the virtual tag match If the two comparisons do not agree the data cache access is deemed to be a miss The data cache refill can be done via a 64 or 128 bit interface and is a synthesis time configuration option Table 2 lists the attributes of the 74Kc core instruc tion and data caches Table 2 74Kc Core Instruction and Data Cache Attributes Parameter Instruction Data 0 16 32 or 64 KB 0 16 32 or 64 KB Organization 4 way set associa _ 4 way set associa tive tive Line Size 32 Bytes 32 Bytes Read Unit 128 bits 64 or 128 bits Write Unit 128 bits Write Policies 64
9. or 128 bits Non coherent write through without write allocate write back with write allo cate Cache Locking per line per line 1 Logical size of instruction cache The cache contains some extra bits used for precoding the instruction type Cache Protocols The 74Kc core supports the following cache proto cols e Uncached Addresses in a memory area speci fied as uncached are not read from the cache Stores to uncached addresses are written directly to main memory without changing the contents of the cache Non Coherent Write through no write allo cate Loads and instruction fetches first search the cache reading main memory only if the desired data does not reside in the cache On data store operations the cache is first searched to see if the target address is in the cache If it is the cache contents are updated and main memory is also written If the cache look up misses only main memory is written e Writeback write allocate Stores that miss in the cache will cause a cache refill Store data however is only written to the cache Cache lines that are written by stores will be marked as dirty If a dirty line is selected for replace ment the cache line will be written back to main memory e Uncached accelerated As with the uncached protocol data is never loaded into the cache In this mode store data can be gathered in a write buffer before being sent out on the bus as a b
10. sideband signals None Implementation restrictions 1 MReqInfo handled in a user defined way 3 bits used to send cacheable attribute information or encode type of L2 CACHE instruc tion and 1 bit used to signify SYNC 2 MAddrSpace is used 2 bits to indicate L2 L3 access 3 MTagld is used 4bits to identify the transaction Tags 0 3 identify Deache read tags 4 5 12 13 identify I cache read Tag 6 identifies SYNC and Tag 7 identifies Write operations and CACHE ops The remaining val ues are reserved 4 Core clock is synchronous but must be a multiple of the OCP clock The ratios supported are 1 1 1 1 5 1 2 1 2 5 1 3 1 3 5 1 4 1 5 and 1 10 A helper pulse is required by the Core to transfer data from to the OCP inter face without any hazards Write Buffer The BIU contains a merging write buffer The pur pose of this buffer is to store and combine write transactions before issuing them to the external interface The write buffer is organized as four 32 byte buffers Each buffer contains data from a sin gle 32 byte aligned block of memory When using the write through cache policy the write buffer significantly reduces the number of write transactions on the external interface and reduces the amount of stalling in the core caused by the issuance of multiple writes in a short period of time The write buffer also holds eviction data for write back lines The load store unit opportunistica
11. specify which shadow register to use on entry to a particu lar vector The shadow registers further improve interrupt latency by avoiding the need to save con text when invoking an interrupt handler Modes of Operation The 74Kc core supports four modes of operation user mode supervisor mode kernel mode and debug mode User mode is most often used for 11 Copyright 2006 2011 MIPS Technologies Inc All rights reserved application programs Supervisor mode provides an intermediate privilege level with access to the ksseg address space Supervisor mode is not supported with the fixed mapping MMU Kernel mode is typi cally used for handling exceptions and operating system kernel functions including CPO manage ment and I O device accesses An additional Debug mode is used during system bring up and software development Refer to EJTAG Debug Support on page 16 for more information on debug mode Memory Management Unit MMU The 74Kc core contains a Memory Management Unit MMU that is primarily responsible for con verting virtual addresses to physical addresses and providing attribute information for different seg ments of memory At synthesis time the type of MMU can be chosen independently from the fol lowing options Translation Lookaside Buffer TLB e Fixed Mapping Translation FMT The following sections explain the MMU options in more detail Translation Lookaside Buffer TLB The basic TLB f
12. their destina tion marked unavailable e Load misses and Stores hits and misses are activated in the LSU for further processing address This translated physical address is used to compare against tags in the instruction cache to determine a hit The functionality of the IFU is spread across 4 core visible pipeline stages in MIPS32 mode Additional stages are in the shadow of execution and do not account for the minimum recirculation path in the event of a PC redirection In the MIPS16e mode the IFU takes an additional 3 stages to recode and expand the compressed code There is a 12 entry Instruction Buffer to decouple the instruction fetch from execution Up to 4 instructions can be written into this buffer but a maximum of 2 instructions can be read from this buffer by the IDU The IFU can also be configured to allow for hard ware prefetching of cache lines on a miss When an instruction cache miss is detected the IFU can prefetch the next 0 1 or 2 lines besides the missed line to reduce average miss latency The number of prefetched lines can be configured by software via Config7 register settings Copyright 2006 2011 MIPS Technologies Inc All rights reserved MIPS16e Application Specific Extension The 74Kc core includes support for the MIPS 16e ASE This ASE improves code density by using 16 bit encoding of many MIPS32 instructions plus some MIPS 16e specific instructions PC relative loads allow quic
13. two types of simple hardware breakpoints imple mented in the 74Kc core Instruction breakpoints and Data breakpoints During synthesis the 74Kc core can be configured to support the following breakpoint options e Zero instruction zero data breakpoints e Four instruction two data breakpoints Instruction breaks occur on instruction fetch opera tions and the break is set on the virtual address Instruction breaks can also be made on the ASID value used by the MMU A mask can be applied to the virtual address to set breakpoints on a range of instructions Data breakpoints occur on load and or store trans actions Breakpoints are set on virtual address and ASID values similar to the Instruction breakpoint Data breakpoints can also be set based on the value of the load store operation Finally masks can be applied to the virtual address ASID value and the load store value In debug mode EJTAG can request that a soft reset be masked This request is signalled via the EJ_SRestE pin When this pin is deasserted the system can choose to block some sources of soft reset Hard resets such as power on reset or a reset switch should not be blocked by this signal This reset pin has no effect inside the core Fast Debug Channel The 74Kc CPU includes the EJTAG Fast Debug Channel FDC as a mechanism for efficient bi directional data transfer between the CPU and the debug probe Data is transferred serially via the TAP i
14. 1 bit per clock radix 2 iterative SRT algorithm The operands are always normalized i e leading zeroes in the divisors and dividend are removed This reduces the total number of cycles required to pro duce the result Divide operations block the MDU and will not allow another MDU operation to enter until the current operation is complete The MDU however looks ahead and informs the IDU that a divide operation is about to complete which pre vents any bubbles in the MDU pipeline Table 1 lists the repeat rate i e peak rate in cycles at which these operations may be issued consecutively and latency number of cycles until a result is available for the 74Kc core multiply and divide instructions The approximate latency and repeat rates are listed in terms of pipeline clocks Table 1 74Kc Core Integer Multiply Divide Unit Latencies and Repeat Rates Operand Size mul rt Repeat Opcode div rs Latency Rate MULT MULTU 32 bits 5 MADD MADDU MSUB MSUBU MUL 32 bits DIV DIVU 8 bits 32 bits Min 11 Min 11 Max 50 Max 50 1 If there is no data dependency a MUL can be issued every cycle CorExtend Unit The CorExtend unit allows the user to add a func tional unit to the 74Kc core pipeline with access to all programmer visible GPR and Accumulator state Copyright 2006 2011 MIPS Technologies Inc All rights reserved The user will be provided with a template to define the operand for
15. FU interface consists of 128 bits 4 MIPS32 instructions with 16 bits of parity The LRU replacement bits 6 bits are shared among the 4 ways of the data and tag array and are stored in a separate array The instruction cache block also contains and man ages the two instruction line fill buffers Besides accumulating data to be written to the cache instruction fetches that reference data in the line fill buffer are serviced either by a bypass of that data or by data coming from the external interface The instruction cache control logic controls the bypass function The 74Kc core supports instruction cache locking Cache locking allows critical code segments to be locked into the cache on a per line basis enabling the system programmer to maximize the efficiency of the system cache MIPS32 74Kc Processor Core Datasheet Revision 01 07 The cache locking function is always available on all instruction cache entries Entries can then be marked as locked or unlocked on a per entry basis using the CACHE instruction Data Cache The data cache is an on chip memory block of 0 16 32 64 KB with 4 way associativity Because the data cache is virtually indexed the virtual to physi cal address translation occurs in parallel with the cache access A tag entry holds 21 bits of physical address a valid bit a lock bit and an optional parity bit At each tag entry there is also a corresponding 21 bit virtual tag The data entry
16. Mss Ver TECHNOLOGIES MIPS32 74Kc Processor Core Datasheet June 03 2011 The MIPS32 74Kc core from MIPS Technologies is a high performance low power 32 bit RISC Superscalar core designed for custom system on chip SoC applications The core is designed for semiconductor manufacturing companies ASIC developers and system OEMs who want to rapidly integrate their own custom logic and peripher als with a high performance RISC processor Fully synthesizable and highly portable across processes it can be eas ily integrated into full SoC designs allowing developers to focus their attention on end user products The 74Kc core implements the MIPS32 Release 2 Architecture in a superscalar out of order execution pipeline The deeply pipelined core can support a peak issue and graduation rate of 2 instructions per cycle The 74Kc core also implements the MIPS DSP ASE Revision 2 0 which provides support for signal processing instructions and includes support for the MIPS16e ASE and the 32 bit privileged resource architecture This architecture is sup ported by a wide range of industry standard tools and development systems The 74Kc core has a Level 1 L1 Instruction Cache which is configurable at 0 16 32 or 64 KB in size It is orga nized as 4 way set associative Up to four instruction cache misses can be outstanding The instruction cache is virtu ally indexed and physically tagged to make the data access independent of vir
17. User Defined Instruction Set Extensions e Allows user to define and add instructions to the core at build time Maintains full MIPS32 compatibility e Includes access to GPRs and Accumulator registers e Instruction operand format source desti nation registers and latency specified by a programmable template e Allows latencies of 3 5 or gt 5 cycles when destination is a GPR Accumulator Single cycle latency is allowed when there is no modification to the architectural state of the 74Kc core e Allows in order issue of CorExtend instructions that do not modify the 74Kc core architectural state e Supported by industry standard develop ment tools Relocatable Reset Vector e Support for user pin programmable reset vector in a multi core environment Power Control e Minimum frequency 0 MHz e Power down mode triggered by WAIT instruction e Support for software controlled clock divider e Support for top level block level fine grained and data cache clock gating EJTAG Debug 5 0 e Support for single stepping e Instruction address and data address value breakpoints TAP controller is chainable for multi CPU debug MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved Cross CPU breakpoint support e PDtrace version 6 compliant e Relocatable debug handler e Testability e MIPS Trace e Full scan design achiev
18. able Consumers that have already been dispatched are replayed through the pipe and held back at the IDU on its second pass through the pipe Loads that hit in the data cache and bypass to the AGEN pipe have a 4 cycle load use latency while those that bypass to the ALU pipe will have a 3 cycle load use latency Graduated load misses and store hits and misses are sent in order to the Load Store Graduation Buffer LSGB The LSGB has corresponding data and address buffers to hold all relevant attributes LSGB entries are processed in a FIFO order with data cache updates and requests made at one canonical point Cache fill requests are merged and processed at this point A 4 entry Fill Store Buffer FSB tracks outstanding fill requests and fills the data cache when the line is completely received Each FSB entry can hold an entire cache line The Load Data Queue LDQ keeps track of outstanding load misses and forwards the critical data to the main pipe as soon as it becomes available The FSB also holds data for store instructions regardless of a hit or miss in the cache that have not yet updated the cache Loads that reference the same line as the pending store in the FSB will receive the store data bypassed if they are younger than the store and the incoming line is merged with the store data before being written into the cache Loads that are older than the store are tracked in the Load Data Queue LDQ and will receive the data when it
19. agged e Cache line locking support e Up to 4 outstanding I cache misses e Virtual tag based hit prediction in data cache e Up to 4 unique outstanding D cache line misses and 9 total load misses Writeback and write through support in data cache e Non blocking data cache prefetches Copyright 2006 2011 MIPS Technologies Inc All rights reserved e Optional parity support Scratchpad RAM support e Independent Instruction and Data Scratch pad RAMs e Scratchpad RAM size from 4KB to 1MB e Independent of cache configuration e 64 bit OCP interfaces for external DMA e OCP port runs at the same core bus clock ratio as the BIU interface Front side L2 support e Support for inline L2 cache e L2 cache can be configured to be bypass able Bus Interface OCP version 2 1 interface with 32 bit address and 64 bit data e OCP version 2 1 interface runs at core bus clock ratios of 1 1 5 2 2 5 3 3 5 4 5 or 10 via a separate synchronous bus clock e Clock ratio can be changed dynamically e Burst size of four 64 bit beats e 4 entry write buffer e Simple byte enable mode allows easier bridging to other bus standards e Extensions for front side L2 cache Multiply Divide Unit Maximum issue rate of one 32x32 multi ply per clock e 7 cycle multiply latency e Iterative SRT divide algorithm Minimum 10 and maximum 50 clock latency divi dend rs sign extension dependent CorExtend
20. ch history tables can be updated Load misses and store hits misses are sent to the LSGB for further processing When the LSU receives the data back from outside it directly updates the architectural state but the GRU ensures that the LSGB is kept up to date so that only the latest data is written If there is no space in the LSGB the GRU will stop graduating load store instructions which holds the releasing of comple tion buffers The GRU also handles instructions such as CACHE MTCO and TRAP on condition type operations that require serialized operation During such operations the GRU throttles down to gradu ating 1 instruction per cycle otherwise the GRU will always attempt to graduate 2 instructions per cycle System Control Coprocessor CPO In the MIPS architecture CPO is responsible for the virtual to physical address translation and cache protocols the exception control system the proces sor s diagnostic capability the operating modes kernel user supervisor and debug and whether interrupts are enabled or disabled Configuration information such as cache size and associativity and the presence of features like MIPS16e or a MIPS32 74Kc Processor Core Datasheet Revision 01 07 floating point unit are also available by accessing the CPO registers CPO also contains the state used for identifying and managing exceptions Exceptions can be caused by a variety of sources including boundary cases in
21. ctionality that is sup ported Table 3 OCP Performance Report Core Attribute Description Core name 74Kc Core code 0x10a Revision code 0x1 Core identity Additional identification is available in the Pr D and EBase Coprocessor0 registers Process dependent Frequency range Area Power estimate Special reset require ments Yes Core is synthesizable so these param eters vary according to process librar ies and implementation No Number of Interfaces 1 OCP master Master OCP Interface Operations issued RD WR Issue rate per OCP cycle Maximum number of operations outstanding Burst support and its effect on issue rates High level flow control One per cycle for all types listed above except for a non standard RD SYNC which depends on ACK latency 12 operations 4 LSU reads 4 IFU reads and 4 WBB flushes All writes are posted so the OCP fabric deter mines the maximum number of out standing writes Fixed burst length of four 64 bit beats with single request per burst Burst sequences of WRAP or XOR are sup ported None Number of threads supported and use of those threads All transactions utilize a single thread MIPS32 74Kc Processor Core Datasheet Revision 01 07 Table 3 OCP Performance Report Continued Core Attribute Description Connection ID and use of connection information None Use of
22. el block level fine grain D cache or none Control and Observe flops Present or not N A N A Repeat rate for CorExtend instructions 1 through 15 N A using private state Number of CorExtend completion buffers 1 through 15 N A N A N A Sideband inputs to external CorExtend Bus width in bits module Sideband outputs to external CorExtend Bus width in bits module MIPS32 74Kc Processor Core Datasheet Revision 01 07 19 Copyright 2006 2011 MIPS Technologies Inc All rights reserved 1 These bits indicate the presence of external blocks Bit will not be set if interface is present but block is not Revision History Change bars vertical lines in the margins of this document descriptions and EJTAG register definitions and change bars indicate significant changes in the document since its last in those sections indicate changes since the previous version release Change bars are removed for changes that are more of the relevant Architecture document than one revision old This document may refer to Architecture specifications for example instruction set Revision Date Description 00 50 May 31 2006 Initial document 01 00 January 30 2007 Preliminary external release 01 01 May 18 2007 General Access Release Updates for single cycle ALU operations instruc tion latencies and pipeline stages 01 02 November 1 2007 Pipeline stage merge changes L2 support ADD latency and Misc chang
23. es 01 03 December 14 2007 Add support for sequential hardware prefetching by IFU Change ALU and AGEN pipestage count 01 04 November 14 2008 Corrected outstanding I cache misses Updates for ISPRAM PDtrace fea tures 01 05 June 04 2010 Added FDC JRC information Reflects ITLB page size change additional probe data transfer width 01 06 March 30 2011 Minor Edits 01 07 June 03 2011 Corrected FDC related information 20 MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved 21 Copyright 2006 2011 MIPS Technologies Inc All rights reserved Unpublished rights if any reserved under the copyright laws of the United States of America and other countries This document contains information that is proprietary to MIPS Technologies Inc MIPS Technologies Any copying reproducing modifying or use of this information in whole or in part that is not expressly permitted in writing by MIPS Technologies or an authorized third party is strictly prohibited At a minimum this information is protected under unfair competition and copyright laws Violations thereof may result in criminal penalties and fines Any document provided in source format i e in a modifiable form such as in FrameMaker or Microsoft Word format is subject to
24. es test coverage in excess of 99 dependent on library and e PC data address data value performance configuration options counter value processor pipeline ineffi ciency tracing with trace compression e Optional memory BIST for internal SRAM arrays e Support for on chip and off chip trace memory Pipeline Flow The 74Kc core implements a 14 15 stage pipeline Three extra fetch stages are conditionally added when executing MIPS 16e instructions This pipe line allows the processor to achieve a high fre quency while maintaining optimal area and power numbers Figure 2 shows the 74Kc core pipeline Figure 2 74Kc Core Pipeline GRU IFU IDU ca lt SSS es WB GC IT ID IS IB DD DR DS DM IFU Added Stages for MIPS 16e prr mode Instruction Fetch Unit IFU e Tag compare Detect I cache hit IT Instruction Tag Read e cache tag arrays accessed Branch History Table JRC accessed e ITLB address translation performed e Instruction watch and EJTAG break compares done ID Instruction Data Read e I cache data array accesses MIPS32 74Kc Processor Core Datasheet Revision 01 07 IS Instruction Select e Way select e Target calculation start IB Instruction Buffer Instruction Buffer write e Target calculation done IR Instruction Recode e MIPS16e instruction recode Copyright 2006 2011 MIPS Technologies Inc All rights rese
25. escription of these fields The value of some options that do not For a core that has already been built software can have a functional effect on the core are not visible determine the value of many of these options by to software Table 4 Build time Configuration Options Configuration Option Choices Software Visibility Memory Management Type TLB or FMT Config TLB Size 16 32 48 or 64 dual entries Contig 1MMUSize Integer Register File sets 1 2 or 4 SRSCTLyss Instruction Data hardware breakpoints 0 0 or 4 2 DCRip IBSgcn Fast Debug FIFO Sizes Min 2Tx 2Rx Useful 12Tx 4Rx FDCFG MIPS Trace support Present or not Config3 7 MIPS Trace memory location On core off chip or both TCBCONFIGonrt TCBCONFIGopr MIPS Trace on chip memory size 256B 8MB TCBCONFIGsz MIPS Trace triggers 0 8 TCBCONFIGrRig MIPS Trace source field bits in trace 0 2 or 4 TCBCONTROLBwesrewidth word CorExtend Block Present or not Co nfigyp Data ScratchPad RAM interface Present or not Contigpsp Instruction ScratchPad RAM interface Present or not Configs p I cache size 0 16 32 or 64 KB Config1 Contig Tis D cache size 0 16 32 or 64 KB Config1 p Contig1 ps D cache hardware aliasing support Present or not for 32 KB and 64 KB only Config7ar MMU type is TLB Cache parity Present or not ErrCtlpe Memory BIST Integrated March C or March C plus IFA N A 13 custom or none Clock gating Top lev
26. holds 64 bits of data per way with optional parity per byte There are 4 data entries for each tag entry The tag and data entries exist for each way of the cache There is an additional array that holds the dirty and LRU replacement algorithm bits for all 4 ways 6 bits LRU 4 bits dirty and optionally 4 bits dirty parity When using 4 KB pages in the TLB and 32 or 64 KB cache sizes virtual aliasing can occur in which a single physical address can exist in multiple cache locations if it was accessed via different virtual addresses For a 32 KB data cache there is an implementation option to eliminate virtual aliasing If this option is not selected or a 32 or 64 KB cache is implemented software must take care of any aliasing issues by using a page coloring scheme or some other mechanism The 74Kc core supports a data cache locking mech anism identical to that used in the instruction cache Critical data segments are locked into the cache on a per line basis The locked contents can be updated on a store hit but will not be selected for replacement on a cache miss The cache locking function is always available on all data cache entries Entries can then be marked as locked or unlocked on a per entry basis using the CACHE instruction Cache Memory Configuration The 74Kc core s on chip instruction and data caches are usually implemented from readily avail able single port synchronous SRAMs The instruction tag array
27. ions see Figure 1 Instruction Fetch Unit IFU The Instruction Fetch Unit IFU is responsible for fetching instructions from the Instruction Cache Instruction Scratchpad or Memory and feeding them to the execution units The IFU can fetch up to 4 instructions at a time from an aligned PC The IFU uses majority branch prediction based on a gshare predictor There are three 256 entry Branch History Tables that are indexed by different combi nations of instruction PC and Global History The majority of these 3 predictions are used to deter mine the predicted direction of a conditional branch The IFU also has an 8 entry Return Predic tion Stack to predict subroutine return addresses and a 64 entry jump indirect target address predic tor A 4 way 16 entry way buffer learns and pre dicts the target addresses for indirect jumps The IFU has a 4 entry microTLB which is used to translate the virtual address into the physical MIPS32 74Kc Processor Core Datasheet Revision 01 07 e Write execution results into ALU and AGEN completion buffers e Update all GRU structures to indicate instruc tion completion Oldest 2 entries that have completed execution are identified and their addresses are obtained to read the completion buffers and associated information to graduate 2 instructions GC Graduation Complete e Two instructions are graduated and Register File data is obtained for update e Load misses are graduated with
28. ites the instruction ID completion buffer ID and related information into structures in the Graduation Unit GRU The GRU reads instructions and corresponding results from the completion buffer graduates the instructions and updates the architectural state of the machine Execution Units The 74Kc core execution unit implements two pipes an ALU pipe for handling all arithmetic operations logical shift add subtract and an AGEN pipe for handling all load store operations and control transfer instructions and an autonomous multiply divide unit MDU and CorExtend unit The MDU and CorExtend pipe share control logic with the ALU pipe There is a 31 entry 32 bit reg ister file that is shared by both the pipes There is a separate 18 entry 64 bit completion buffer for the ALU pipe and a 14 entry 32 bit completion buffer for the AGEN pipe ALU Pipe The ALU pipe spans four stages as follows e The first two stages AF AM of the ALU pipe are used to prepare operands read the register file and completion buffer and mux select all operands for the arithmetic operation Execution is performed in the AC stage which includes e Arithmetic Logic Unit ALU for perform ing arithmetic and bitwise logical opera tions e Shifter e Leading Zero One detect unit for imple menting the CLZ and CLO instructions e All logical operations some arithmetic operations ADD rt 0 ADDU 1t 0 LUI SEH SEB ZEH ZEB SLT SLT
29. k access to constants SAVE RESTORE macro instructions provide for single instruction stack frame set up teardown for efficient subroutine entry exit Instruction Decode and Dispatch Unit IDU This unit is responsible for receiving instructions from the IFU and dispatching them to the execution units when their operands and required resources are available Up to two instructions can be received in order from the IFU per cycle The instructions are assigned an instruction ID and a completion buffer ID which identifies a buffer location to hold results temporarily The instruction is also renamed by looking up in a Rename Map and the source registers are replaced if necessary by completion buffer IDs of producer instructions so that oper ands may be bypassed as soon as possible Renamed instructions are assigned to one of two pipes ALU or AGEN and written into the Decode and Dispatch Queue DDQ The oldest instruction that has all the operands ready and meets all resource requirements is dispatched independently to the corresponding pipe It is possible that instruc tions will be dispatched out of order relative to pro gram order Dispatched instructions do not stall in the pipe and write the results into the completion buffer The IDU also keeps track of the progress of the instruction through the pipe updating the availabil ity of operands in the Rename Map and in all dependent instructions in the DDQ The IDU also wr
30. le byte enables is on an uncached tri byte load LWL LWR In SimpleBE mode such a read will be converted into a word read on the external inter face Writes with non simple byte enable patterns can arise when a sequence of stores is processed by the merging write buffer or from uncached tri byte stores SWL SWR In SimpleBE mode these stores will be broken into multiple write transac tions EJTAG Debug Support The 74Kc core includes an Enhanced JTAG EJTAG block for use in software debugging of application and kernel code For this purpose in addition to standard user supervisor kernel modes of operation the 74Kc core provides a Debug mode Debug mode is entered when a debug excep tion occurs resulting from a hardware breakpoint single step exception etc and continues until a debug exception return DERET instruction is exe cuted During this time the processor executes the debug exception handler routine The EJTAG interface operates through the Test Access Port TAP a serial communication port used for transferring test data in and out of the 74Kc core In addition to the standard JTAG instructions special instructions defined in the EJTAG specifica tion define which registers are selected and how they are used There are several types of simple hardware break points defined in the EJTAG specification These breakpoints stop the normal operation of the CPU and force the system into debug mode There are
31. lly pulls dirty data from the cache and sends it to the BIU It is gathered in the write buffer and sent out as a bursted write For uncached accelerated references the write buffer can gather multiple writes together and then perform a bursted write in order to increase the effi ciency of the bus Uncached accelerated gathering is supported for word or doubleword 15 Copyright 2006 2011 MIPS Technologies Inc All rights reserved Gathering of uncached accelerated stores starts on cache line aligned addresses i e 32 byte aligned addresses Uncached accelerated stores that do not meet the conditions required to start gathering are treated like regular uncached stores When an uncached accelerated store meets the requirements needed to start gathering a gather buffer is reserved for this store All subsequent uncached accelerated word or doubleword stores to the same 32 bit region will write sequentially into this buffer independent of the word address associ ated with these latter stores The uncached acceler ated buffer is tagged with the address of the first store SimpleBE Mode To aid in attaching the 74Kc core to structures that cannot easily handle arbitrary byte enable patterns there is a mode that generates only simple byte enables Only byte enables representing naturally aligned byte halfword word and doubleword transactions will be generated The only case in which a read can generate non simp
32. mat and latency for the new instruc tion s to be added Up to 15 new instructions may be added Each instruction may select up to 2 source GPRs and or Accumulator from the com plete architectural state of 32 GPRs and 4 accumu lators The instruction may have a destination of either a GPR an accumulator or a private state The latency for each instruction is also selectable to be either 3 5 or gt 5 cycles Instructions with a destina tion of private state have a latency of 1 cycle The CorExtend unit may also have private architectural state and the existence of such state can be indi cated in the template to restrict out of order issue If there is no private state or there is no dependence on private state then the IDU along with the ALU and MDU pipes manage the dependency checking operand delivery and results update If a CorExtend instruction has its source and or destination oper ands from its own private state it will be issued in program order The CorExtend unit is synthesized along with the core and will have an external interface for access to any state within that unit The number of comple tion buffers for CorExend instructions is selectable at synthesis configuration time from 1 to 15 and this will determine the number of CorExtend instructions that can be in flight before graduating This is analogous to the ALU and AGEN comple tion buffers The repeat rate of CorExtend instruc tions that can be issued back
33. med by comparing the upper bits of the virtual address along with the ASID with each of the entries in the tag portion of the joint TLB struc ture The JTLB is organized as pairs of even and odd entries that map pages ranging in size from 4 KB to 256 MB in factors of four to the 4 GB physical address space The JTLB is organized in page pairs to minimize the overall size Each tag entry corre sponds to two data entries an even page entry and an odd page entry The highest order virtual address bit not participating in the tag comparison is used to determine which of the data entries is used Because page size can vary on a page pair basis the determination of which address bits participate in the comparison and which bit is used to make the even odd determination is decided dynamically during the TLB look up Instruction TLB ITLB The ITLB is a 4 entry structure dedicated to per forming translations for the instruction stream The ITLB maps only 4 KB or 16 KB pages subpages For 4 KB or 16 KB pages the entire page is mapped in the ITLB If the main TLB page size is between 4 KB and 16 KB only the current 4 KB subpage is mapped Similarly for page sizes larger than 16 KB the current 16 KB subpage is mapped The ITLB is managed by hardware and is transpar ent to software The larger JTLB is used as a MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved
34. n start Logical operations some shift and arithmetic operations complete and bypass the results AB ALU Results Bypass Complete Integer Execution and bypass results EM AGEN Operand Mux Select source operands for Load Store index computation and set up for execution EA AGEN Effective Address Compute Compute Effective Address for Load Store instructions e Select source operands for Store data and Branch Jump instructions e Start JTLB access Load Store Unit LSU EC Cache Access e Access D cache and D tag arrays Read Virtual and Physical tags along with data e Continue JTLB access e AGEN pipe resolves conditional branch and Jump instruction MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved ES D Cache way select Select D cache way based on Virtual tag match with Effective Address e Start Physical Tag compare with JTLB data e AGEN pipe redirects IFU in the event of branch mis predict or register indirect jump EB Cache Data Bypass Complete data selection and align load data e Bypass results selected data to both AGEN and ALU pipes e Validate Virtual tag match with Physical tag comparison Graduation Unit GRU WB Writeback Consolidate and propagate D cache hit miss information 74Kc Core Logic Blocks The 74Kc core consists of the logic blocks defined in the following subsect
35. n that supports slowing or halting the clocks to reduce system power consumption during idle periods MIPS32 74Kc Processor Core Datasheet Revision 01 07 data addresses data values performance counters and processor pipeline inefficiencies The trace information is collected in an on chip or off chip memory for post capture processing by trace regeneration software Software only control of trace is possible in addition to probe based control An optional on chip trace memory may be config ured in size from 256B to 8 MB it is accessed either through load instructions or the existing EJTAG TAP interface which requires no additional chip pins Off chip trace memory is accessed through a special trace probe and can be configured to use 4 8 16 or 64 data pins plus a clock The 74Kc core provides two mechanisms for sys tem level low power support e Register controlled power management e Instruction controlled power management Register Controlled Power Management The RP bit in the CPO Status register provides a software mechanism for placing the system into a low power state The state of the RP bit is available externally via the S _AP signal pin The external agent then decides whether to place the device in a low power mode such as reducing the system clock frequency Three additional bits StatuSpy StatuSgpy and Debugpm support the power management func tion by allowing the user to change the power s
36. nterface A pair of memory mapped FIFOs buffer the data isolating software running on the CPU from the actual data transfer Software can configure the FDC block to generate an interrupt based on the FIFO occupancy or can poll the status MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved Figure 4 Fast Debug Channel CPU EJ_TDI ses TxFIFO 2 ie a 2 RxFIFO y a EJ_TDO TAP MIPS Trace The 74Kc core includes optional MIPS Trace sup port for real time tracing of instruction addresses Clock and Test Considerations The following sections describe clocking power management and testability features Clocking The 74Kc core has various clock domains e Core domain This is the main core clock domain controlled by the S _ClkIn clock input OCP domain This domain controls the OCP bus interface logic This domain is syn chronous to S _ Cik n but can be run at lower frequencies core to bus ratios of 1 1 1 1 5 1 2 1 2 5 1 3 1 3 5 1 4 1 5 and 1 10 are sup ported e TAP domain This is a low speed clock domain for the EJTAG TAP controller con trolled by the EJ_TCK pin It is asynchronous to S _Clkln Power Management The 74Kc core offers a number of power manage ment features including low power design active power management and power down modes of operation The core is a static desig
37. of the user experience BusBridge Bus Navigator CLAM CorExtend CoreFPGA CoreLV EC FPGA View FS2 FS2 FIRST SILICON SOLUTIONS logo FS2 NAVIGATOR HyperDebug HyperJTAG IASim JALGO Logic Navigator Malta MDMX MED MGB microMIPS OCI PDtrace the Pipeline Pro Series SEAD SEAD 2 SmartMIPS SOC it System Navigator and YAMON are trademarks or registered trademarks of MIPS Technologies Inc in the United States and other countries All other trademarks referred to herein are the property of their respective owners Template nDb0 02 Built with tags 2B MIPS32 74Kc Processor Core Datasheet Revision 01 07 MD00496 Copyright 2006 2011 MIPS Technologies Inc All rights reserved
38. rved IK Instruction Decode MIPS16e branch decode e MIPS16e target validate IX Instruction Expansion e MIPS16e macro expansion Instruction Decode and Dispatch Unit IDU DD Decode e Access Rename Map get source register avail ability to resolve source dependency e Decode instructions and assign pipe and instruction identifier e Check execution resources DR Rename e Update Rename Map at destination register to resolve output dependency e Send instruction information to Graduation Unit GRU e Send instruction to Decode and Dispatch Queue DDQ DS Select for Dispatch e Check for operand and resource availability and mark valid instructions as ready for dis patch e Select 1 out of 8 6 entry DDQ 2 staging reg isters ready instructions in each ALU and AGEN pipe independently DM Instruction Mux e Read out the selected instruction from the pre vious stage and update the selection informa tion e Generate controls for source operand bypass mux e ALU pipe will start premuxing operands based on the selected instruction e AGEN pipe will starting reading source oper ands from Register File and Completion Buff ers Integer Execution Unit IEU AF ALU Register file Read e AGEN pipe will complete reading source oper ands from Register File and Completion Buff ers AM ALU Operand Mux e Select source operands and set up for execution AC ALU Compute e Integer Executio
39. s the S _S eep signal which is part of the system interface whenever it has entered low power mode sleep mode It will enter sleep mode when all bus transactions are complete and there are no running instructions The WAIT instruction can put the processor in a mode where no instructions are running When the WAIT instruction is seen by the IFU subsequent instruction fetch is stopped The WAIT instruction is dispatched down the pipe and graduated Upon graduation of the WAIT the GRU waits for the pro cessor to reach a quiescent state and allows the pro cessor to enter sleep mode Local Clock Gating A significant portion of the power consumed by the 74Kc core is often in the clock tree and clocking Build Time Configuration Options The 74Kc core allows a number of features to be customized based on the intended application registers The core has support for extensive use of local gated clocks Clock gating can be turned on at the top level block level or at the register fine grained level Power conscious implementors can use these gated clocks to significantly reduce power consumption within the core D Cache Clock Gating Any load instruction involves reading of four ways of the data array though the required data may be available only in one of the four ways of the D cache The way information for four recently used D cache lines are stored in a data structure and a subsequent load to one of those lines enables
40. stage delay of the instruction The data address for load store operations is calculated using a 32 bit adder in the EA stage e Data cache access and JTLB access for load store instructions is performed in the EC stage e The EC stage is also used for resolving condi tional branches and register indirect jumps e The ES and EB stages are used by the load store instructions to select the appropriate way of data from the data cache to compare the JTLB results with the physical tags align the data resolve any exceptions and to bypass the data if applicable back into the ALU and AGEN pipes e The ES stage is also used to send the redirect PC to the IFU if there is a mis predicted branch jump instruction Multiply Divide Unit MDU The 74Kc core includes a multiply divide unit MDU that contains a separate pipeline for integer multiply and divide operations This unit also exe cutes multiply class instructions in the DSP ASE This pipeline operates in parallel with the integer unit pipeline and has a separate write port to the ALU completion buffer MIPS32 74Kc Processor Core Datasheet Revision 01 07 The MDU consists of a pipelined 32x32 multiplier result accumulation registers HI and LO a divide state machine and the necessary multiplexors and control logic The MDU supports execution of one multiply or multiply accumulate operation every clock cycle Divide operations are implemented with a simple
41. tate if an exception or error occurs while the 74Kc core is in a low power state Depending on what type of exception is taken one of these three bits will be set to 1 and be reflected in the S _EXL S _ERL and EJ_DebugM outputs The external agent can look at these signals and determine whether to leave the low power state to service the exception 17 Copyright 2006 2011 MIPS Technologies Inc All rights reserved The following four power down signals are part of the system interface and change state as the corre sponding bits in the CPO registers are set or cleared e The S _AP signal represents the state of the FP bit 27 in the CPO Status register e The S _EXL signal represents the state of the EXL bit 1 in the CPO Status register e The S _ERL signal represents the state of the ERL bit 2 in the CPO Status register The EJ_DebugM signal represents the state of the DM bit 30 in the CPO Debug register Instruction Controlled Power Management The second mechanism for invoking power down mode is through execution of the WAIT instruction When the WAIT instruction is executed the internal clock is suspended however the internal timer and some of the input pins S _Int 5 0 SI_NMI and SI_ Reset continue to run When the CPU is in this instruction controlled power management mode any interrupt NMI or reset condition causes the CPU to exit this mode and resume normal oper ation The 74Kc core assert
42. the clock to only one of the data arrays thereby saving the memory power required for a read operation on three ways of the D cache Also for additional power savings the D cache data array clocks are disabled for store instructions and idle cycles This optional feature significantly reduces the power consumed by the D cache data array Internal Scan The 74Kc supports full mux based scan for maxi mum test coverage with a configurable number of scan chains ATPG test coverage can exceed 99 depending on standard cell libraries and configura tion options Memory BIST The core provides an integrated memory BIST solution for testing the internal cache SRAMs scratchpad memories and on chip trace memory using BIST controllers and logic tightly coupled to the cache subsystem These BIST controllers can be configured to utilize the March C or IFA 13 algo rithms Memory BIST can also be inserted with a CAD tool or other user specified method Wrapper modules and signal buses of configurable width are provided within the core to facilitate this approach MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved Table 4 summarizes the key configuration options querying an appropriate register field Refer to the that can be selected when the core is synthesized MIPS32 74Kc Processor Core Family Software and implemented User s Manual for a more complete d
43. to back is also configurable at synthesis time This parameter con trols the repeat rate of instructions that may either read or write private state Load Store Unit AGEN pipe The Load Store Unit is responsible for interfacing with the core pipe and handling load store instruc tion to read write data from data caches and or memory This unit is capable of handling loads and stores issued out of order Loads however are not issued by the IDU until all prior stores have been issued Data cache sizes of OK 16K 32K and 64K bytes are supported The cache is 4 way set associative and uses an LRU replacement algorithm There are separate virtual and physical tag arrays correspond ing to the data array The virtual tag is accessed in parallel with the data cache array and is compared against the virtual address to predict the way The physical tag is always compared with the result of the JTLB to validate the way selection In addition to the data cache the LSU also supports a scratchpad RAM for sizes ranging from 4KB to 1MB The LSU interfaces to a 16 32 48 64 dual entry JTLB The LSU can handle both integer and floating point load store instructions and has a 64 bit data path Loads are non blocking in the 74Kc core Loads that miss in the data cache are allowed to graduate with their destination register marked unavailable Consumers of this destination register are held back at the IDU until all their operands become avail
44. tual to physical address translation Instruction cache tag and data access are staggered across 2 cycles with up to 4 instructions fetched per cycle The superscalar 74Kc core can dispatch up to 2 instructions per cycle into one of the arithmetic logic unit ALU or address generation AGEN pipes The AGEN pipe executes all Load Store and Control Transfer instructions while the ALU pipe executes all other instructions Instructions are issued and executed out of order however the results are buffered and the architectural state of up to 2 instructions per cycle is updated in program order The L1 Data Cache is configurable at 0 16 32 or 64 KB in size It is organized as 4 way set associative Data cache misses are non blocking and up to four may be outstanding The data cache is virtually indexed and physically tagged to make the data access independent of virtual to physical address translation The tag array also has a virtual address portion which is used to compare against the virtual address being accessed and generate a data cache hit prediction This virtual address hit prediction is always backed up by a comparison of the translated physical address against the physical tag To achieve high frequencies while using commercially available SRAM generators the cache access and hit determination is spread across three pipeline stages dedicating an entire cycle for the SRAM access The synthesizable 74Kc core includes a high performance Multipl
45. unctionality is specified by the MIPS32 Privileged Resource Architecture A TLB provides mapping and protection capability with per page granularity The 74Kc core implementa tion allows a wide range of page sizes to be present simultaneously The TLB contains a fully associative dual ported Joint TLB JTLB To enable higher clock speeds a smaller instruction micro TLB ITLB is also implemented When an instruction address is calcu lated the virtual address is compared to the con tents of the appropriate ITLB If the address is not found in the ITLB the JTLB is accessed If the entry is found in the JTLB that entry is then written into the ITLB if the address is not found in the JTLB a TLB exception is taken For data accesses the virtual address is looked up in the JTLB only and a miss causes a TLB exception Figure 3 shows how the ITLB and JTLB are imple mented in the 74Kc core Figure 3 Cache Access for Address Translation Instruction Cache Tag RAM Virtual Address Instruction Address Calculator Comparator Instruction Hit Miss Data Hit Miss Data Address Calculator Comparator Data Cache Tag RAM Virtual Address Joint TLB JTLB The JTLB is a dual ported fully associative TLB cache containing 16 32 48 or 64 dual entries mapping up to 128 virtual pages to their corre sponding physical addresses The address transla tion is perfor
46. ursted write This is more efficient than send ing out separate individual writes as is done in uncached mode Scratchpad RAM The 74Kc core allows blocks of scratchpad RAM to be attached to the load store and or instruction units These allow low latency access to a fixed block of memory These blocks can be modified by the user A refer ence design is provided that includes an SRAM array and an external DMA port that allows the sys tem to directly access the array L2 Cache Support The 74Kc core supports building a Level 2 cache on the front side bus inline with the memory access This L2 cache is unified and contains both instruc tion and data segments The L2 cache can be con figured to be by passable i e memory accesses from the 74Kc core can bypass the L2 cache directly access the main memory The L2 cache configuration and functional details are provided in the document MIPS SOC it L2 Cache Controller Datasheet MD00502 MIPS32 74Kc Processor Core Datasheet Revision 01 07 Copyright 2006 2011 MIPS Technologies Inc All rights reserved Bus Interface BIU The Bus Interface Unit BIU controls the external interface signals The primary interface implements the Open Core Protocol OCP Additionally the BIU includes a write buffer Open Core Protocol OCP Interface Table 3 shows the OCP Performance Report for the 74Kc core This table lists characteristics of the core and the specific OCP fun
47. use and distribution restrictions that are independent of and supplemental to any and all confidentiality restrictions UNDER NO CIRCUMSTANCES MAY A DOCUMENT PROVIDED IN SOURCE FORMAT BE DISTRIBUTED TO A THIRD PARTY IN SOURCE FORMAT WITHOUT THE EXPRESS WRITTEN PERMISSION OF MIPS TECHNOLOGIES INC MIPS Technologies reserves the right to change the information contained in this document to improve function design or otherwise MIPS Technologies does not assume any liability arising out of the application or use of this information or of any error or omission in such information Any warranties whether express statutory implied or otherwise including but not limited to the implied warranties of merchantability or fitness for a particular purpose are excluded Except as expressly provided in any written license agreement from MIPS Technologies or an authorized third party the furnishing of this document does not give recipient any license to any intellectual property rights including any patent rights that cover the information in this document The information contained in this document shall not be exported reexported transferred or released directly or indirectly in violation of the law of any country or international law regulation treaty Executive Order statute amendments or supplements thereto Should a conflict arise regarding the export reexport transfer or release of the information contained in this document the laws of the United
48. y Divide Unit MDU The MDU is fully pipelined to support a single cycle repeat rate for 32x32 MAC instructions The CorExtend block can utilize the accumulator registers in the MDU block allowing specialized functions to be efficiently implemented The MIPS DSP ASE Revision 2 0 provides support for a number of powerful data processing operations There are instructions for fractional arithmetic Q15 Q31 and for saturating arithmetic Additionally for smaller data sizes SIMD operations are supported allowing 2x16 bit or 4x8 bit operations to occur simultaneously Another feature of the ASE is the inclusion of additional HI LO accumulator registers to improve the parallelization of independent accumulation routines All 32 bit operand arithmetic DSP instructions except multiply are executed in the ALU pipe while the 64 bit operand arithmetic and multiply class DSP instructions are executed in the MDU pipe The Bus Interface Unit BIU implements the Open Core Protocol OCP which has been developed to address the needs of SoC designers This implementation features 64 bit read and write data buses to efficiently transfer data to and from the L1 caches The BIU also supports a variety of core bus clock ratios to give greater flexibility for system design implementations MIPS32 74Kc Processor Core Datasheet Revision 01 07 MD00496 Copyright 2006 2011 MIPS Technologies Inc All rights reserved Optional support for external Instruction

MIPS32® 74Kc™ Processor Core Datasheet

Contents

Download Pdf Manuals

Related Search

Related Contents

MIPS32&reg; 74Kc&trade; Processor Core Datasheet

Contents

Download Pdf Manuals

Related Search

Related Contents

MIPS32® 74Kc™ Processor Core Datasheet