Home

University of Stuttgart Diploma Thesis Design of a Memory

1. Register Windows CN elke Lil 2 een window spill REC 8 EE sp of win n 3 fp of win n 4 S 57 UU CERT window spill 8 Fa m sp of win 2 fp of win n 3 Kernel Stac Trap Frame 83 Process Stack trap User Stack 8 e sp of win 1 fp of win n 2 m E S window spill EEE Tw 43 sp of win 0 fp of win n 1 43 DOM EE 8g one function window spill 2 stackframe M TE 0 0000 i on i i z 5b Oxffff Figure 9 4 Process stack of the fixed size memory chunk is used as the kernel stack User programs can only enter the kernel by way of a trap command ta Ox 10 Linux system call trap or when interrupted by a irq Both trap handler and interrupt handler will switch to current process s kernel stack 9 4 2 Scheduling As noted in the previous section a user program enter the kernel by way of a system call or when interrupted i e by a hardware event The control path inside the kernel will then sometimes call the schedule function which will handle the task switching This is visualized in the following 3 figures Real switching is done at the end of the schedule function that will desert the current On SPARC the current process s task struct pointer is always present the g6 register 9 4 PROCESSES 69 schedule function s stack frame and switch into t
2. Item present in memory Access authorized Protection violation mode exception Mapping descriptor No Page fault exception Physical address Fetch from disc Figure 2 3 Page table hierarchy traversal Related to 14 Along the physical memory address the page table entries store additional flags which en able the operating system to swap unused pages out of physical memory onto disk and im plement protection on pages In paging only one hardware implemented logical entity exist Therefore protection in paging must be done by subdividing the entire virtual address space into attributed regions merely setting another layer on top This subdivision can only be done in the On 64 bit computers this would even increase to 2 52 entries which is not tolerable at all therefore another data structure instead of a lookup table has to be used for instance a inverted page table which is a hash 23 14 CHAPTER 2 MEMORY MANAGEMENT granularity defined by the page size usually 4k and is limited by the fact that these regions are bound to one virtual address space overlapping may occur 23 The sub partitioning into dif ferent protected regions is partly done at compile time On the other hand side in segmentation each segment form a logical entity with its own rights 2 3 Hardware support Hardware support include e Table Lookaside Buffer page table ent
3. 1 e 1 1 User Stack Kernel Thread 1 1 1 1 1 1 1 1 window spill i deserted schedule frame winddw spill 1 1 1 1 1 1 1 i 1 Figure 9 7 Process 2 running after call of schedule 72 CHAPTER 9 LINUX KERNEL Chapter 10 Appendix A Components req ur MMU 1 hold ur trans ur probe data probe ur 1 gt mmctrl2 flush ur gt gt data su ur gt gt DCache DCache read ur t gt gt cache data ur mmctrli gt TLB trans ur gt 4 gt hold ur R read_ur gt 1 data ICache ICache su ur gt L 2 accexc 21 data ur gt cache mcmmo E memmi p ACache Figure 10 1 MMU component 73 74 CHAPTER 10 APPENDIX A COMPONENTS DCache interface out mmctrl2 fault status and fault status register dir name desc in request translation in request probe in request flush in translation request parameters su supervisor 1 in virtual address or virtual flush probe address out asserted as long as a operation is still unfinished out return physical address on translation operation out PTE zero on probe operation out address cacheable out asserted on protection privilege or translation error in MMU ctx and ctx pointer control register from DCach
4. 85 11 7 subdirectory mmu vhdl 86 11 8 Subdirectory mmu xess XESS board development 86 12 Appendix C MMU source 87 2 Source code a2 uro ere he end eG ero er d Se id rd be re 4 87 Bibliography 89 0 1 Abbreviation index AMBA Advance Microcontroller Bus Architecture API Application Programming Interface ASB Advanced System Bus AMBA ASI Alternate Space Identifiers CAM Content Accessible Memory full associative tag match CPLD Complex Programmable Logic Device DCache Data Cache FPGA Field Programmable Gate Arrays IC Integrated Circuit ICache Instruction Cache LRU Least Recently Used MMU Memory Management Unit OS Operating System PIPT Physically Indexed Physically Tagged PTE Page Table Entry PTD Page Table Descriptor RTOS SoC SPARC V8 SRMMU TLB VHDL VIPT VIVT CONTENTS Real Time Operating System System on a Chip SPARC architectural manual Volume 8 SPARC Reference MMU Table Lookaside Buffer PTE cache Very High Speed Integrated Circuit Hardware Description Language Virtually Indexed Physically Tagged Virtually Indexed Virtually Tagged Chapter 1 Introduction This diploma thesis is inspired by the idea to get a full feature Linux API running on the open source System on a Chip SoC platform LEON which is a synthesisable VHDL implementa tion of the SPARC Architectural Manual V8 standard Linux has recently gained wide accep tance With a ric
5. cam vhd The TLBCAM testbench test the function of TLBCAM component It uses simple test vectors written directly into TLB cam vhd 11 4 1 2 tw tb vhd The TW testbench test a Table Walk The virtual addresses of the virtual address file are entered one after another 11 41 3 tb vhd The TLB testbench test a sequence of virtual address translations The virtual addresses of the virtual address file are entered one after another 11 4 1 4 mmu tb vhd The MMU testbench simulates the concurrent translation requests from DCache and ICache It needs three input files a memory image a virtual address vector file and a command file The DCache and ICache request sequence can be specified in command file mmu cmd txt file It has the following format Each line is one command Starting from the first command every second command is a DCache command starting from the second command each second command is a ICache command DCache commands take the format readlwritelflushlprobe addrnr flags ICache command take the format read addrnr flags On a DCache or ICache read or write 11 5 SUBDIRECTORY MMU SCRIPTS XESS BOARD DEVELOPMENT 8l command flags can be su supervisor or u user on a DCache flush or probe command flags can be page segment region context or entire addrnr is the index into the virtual address file s array of addresses See mmu tbench comp mmu cmd txt f
6. 6 3 VIRTUALLY TAGGED VIRTUALLY INDEXED SRMMU 37 6 3 Virtually tagged virtually indexed SRMMU The main advantage of a VTVI design is that the TLB lookup is positioned after the cache This leaves the pipeline cache inter working unchanged Only on a a cache miss the TLB lookup is initiated Because two virtual addresses can point to the same physical address synonym the cache tag has to be extended with the context number so that each address of different virtual address spaces of different processes are distinct This leads to multiple cache lines if the same physical address is referenced by multiple contexts which is a drawback Cache snooping is not possible A VTVI integration is shown in figure 6 3 FE DE EX ME WB A instruction data cache cache combined instruction data tlb memory ctrl amba master amba bus Figure 6 3 Pipeline with virtually tagged and virtually indexed cache The VTVI design is proposed by the SRMMU It is the easiest design to implement because it is fairly sequential 6 3 1 Writebuffer In LEON on a store command the writebuffer if empty will be initialized and will work in parallel to the pipeline When using a VIVT cache the writebuffer can be initialized by virtual addresses in which case the address translation is done after initializing the writebuffer or as a phys
7. cated at the start of physical memory which would equal a mapping of vaddr PAGE OFFSET gt paddr 0 the free memory that is located after the kernel image would be mapped in the virtual address space right behind of it so that translation of virtual to physical address and vice versa can be made by simply subtracting or adding the PAGE OFFSET offset you have to keep in mind that the MMU is always enabled when the kernel is running so that access to a given physical address can only be made by issuing the corresponding virtual address This somehow means that this memory is mapped into kernel space at compile time At bootup the page table region would be initialised on time accordingly 66 CHAPTER 9 LINUX KERNEL There are some limitations though The virtual address space upward from PAGE OFFSET 15 limited by the remaining space and other regions that are defined at compile time For instance on a SPARC architecture the region where memory can be directly mapped into kernel space is defined as O0xf0000000 0xfc000000 that means 192 MB Therefore memory is divided into 2 parts Lomem and Himem Lomem is the part that is constantly mapped and which can easily be managed to access parts of Himem every time a new page table mapping has to be created dynamically Himem is normally reserved for user space mappings 9 3 1 Memory Management Virtual address space layout User Space Kernel Space 0x0 0xf000000 Oxf1
8. 0x20000000 ioarea 100 0 ioarea 100 1 This will cause a rising edge in the dump signal of iram The memory content will be be written in the same directory as the initialization file of iram that is specified in the tbench configuration Because the memory is composed of several iram components the memory content will be split into several files To merge them the imagemerge pl script is provided in the mmu tsource scripts directory It takes the initialization filename as an argument Note first call make in this directory because imagemerge pl script will need imagemerge exe program to be compiled An example is shown here eiselekd ralab10 ls testos ram dat la rw r r 1 eiselekd SDA 45221 Aug 25 14 35 testos_ram dat rw r r 1 eiselekd SDA 1507360 Aug 25 14 38 testos ram dat b0 ar0 dump rw r r 1 eiselekd SDA 1507360 Aug 25 14 38 testos ram dat b0 arl dump rw r r 1 eiselekd SDA 1507360 Aug 25 14 38 testos ram dat bl ar0 dump rw r r 1 eiselekd SDA 1507360 Aug 25 14 38 testos ram dat bl arl dump eiselekd ralabl0 imagemerge pl testos_ram dat Using base testos_ram dat testos ram dat b0 ar0 dump Bank 0 arr testos ram dat b0 arl dump Bank 0 arr testos ram dat bl ar0 dump Bank 1 arr testos ram dat bl arl dump Bank 1 arr 1 imagemerge exe testos ram dat merge testos ram dat b0 ar0 dump testos ram dat b0 arl dump testos ram output Tying to open file testos ram dat b0 ar0 dump C
9. 64 CHAPTER 9 LINUX KERNEL 9 2 Linux bootup The standard LEON boot loader generator mkprom requires the text section to be located at address 0x600 mkprom is a binary distribution that can not be changed so to be more flexible a custom boot loader has been written The boot loader is automatically appended to the image when make hard is called which creates SPARC leonLEON vmlinux leon exo that can be downloaded to the XESS board using xsload Figure 9 3 visualizes the boot process ROM RAM data ssed compressed Pec gt zero 4 G pth CD bootloader data Figure 9 3 Linux hardware bootup configuration On powerup the processor jumps to start of ROM address 0x0 where the boot loader is located 1 The boot loader will then e decompress the data section to start of ram address 0x40000000 2 The data section is stored as LZSS compressed binary data e clear the bss section area 3 e initialize the MMU with Page Table Hierarchy that has been created at compile time and that is statically linked to the data section 4 e jump to start of the text section in ROM 4 7 When simulating Linux in vsim no boot loader is needed make sim will create memory images for the ROM and ram VHDL models that will initialize the system with text data and bss sections right from the start To be able to analyz
10. In case of a physical writebuffer the writebuffer is in sync with the pipeline Therefore ICache and DCache will only issue one translation request at a time In case of a virtual writebuffer where the writebuffer is not in sync with the integer unit pipeline translation can be deferred multiple DCache translation requests can be waiting in a queue Therefore a MMU pipeline would make sense only for a virtual writebuffer design 46 CHAPTER 7 MMU DESIGN COMPONENTS Fetch Table Lookaside Buffer Table Walk Memory Request el e dre 8 8 g Crd mee alle 8 JE Tabl ale able 2151 TLB Pen EX Walk 6 6 ire q S E Serialize Address translation to Amba Figure 7 2 Pipeline view 1 signal hit hold 0 virtual address register physical address l DCache MMU tlbcam 2 add physical address At rising edge the virtual address enters TLB 1 It is matched concurrently by all TLB entries Each can generate a hit signal only one element per match operation is expected to raise the hit signal that is used to form the syncram address 2 if multiple hit signals are raised the result is unspecified The entry that signalled a hit will be marked in the Least Recently Used LRU component 3 If a hit has occurred the hold signal will be deasserted
11. e Virtual Flush probe address The index part of the virtual address e Type 0 page probe until Level 3 entry flush Level 3 PTE 1 segment probe until Level 2 entry flush Level 2 amp 3 PTE PTDs 2 region probe until Level 1 entry flush Level 1 2 amp 3 PTE PTDs 3 context probe until Level 0 entry flush Level 0 1 2 amp 3 PTE PTDs 4 entire probe until Level n entry flush all PTEs PTDs 5 OxF none Probe operation will return the PTE of the page table hierarchy not the physical address the is coded into the PTE The probe operation can also be done in software using the ASI MMU physical address pass through and traversing the page table hierarchy by hand In fact the probe operation is rarely used 32 CHAPTER 5 SPARC V8 REFERENCE MMU SRMMU 5 2 2 1 flush A flush operation takes the form sta r addr asi flush probe where data supplied in r is ignored and addr forms the flush criteria see above Entries from the TLB satisfy the given criteria are flushed For detailed information refer to 21 Appendix H p 250 5 2 2 2 probe A probe operation takes the form lda addr asi flush probe r where addr forms the probe criteria see above The return value is either the PTE or zero For detailed information refer to 21 Appendix H p 250 T probe o 3 2 5 e 1 3 P2 Te THE ome o o o e o e Eccc DIDI pe i 20 0
12. memory stage and aaa would trap in ICache fetch stage i e page fault the aaa trap could bypass the st trap in the pipeline because the st trap could be deferred by the writebuffer This again would cause a trap in trap situation the writebuffer this time a one element writebuffer could not be emptied This situation can only be avoided if on a trap in ICache the writebuffer in DCache is forced to be emptied first before the pipeline can continue initiate a possible trap before the ICache trap could be initiated 6 3 1 2 Physical writebuffer A physical writebuffer runs in sync with the pipeline and therefore the trap in trap problem does not occur However a address translation has to be made before the writebuffer can be initialized therefore part of it s parallelism to the pipeline is lost 6 4 Design chosen In this chapter 2 alternatives are presented the virtual and the physical writebuffer design Both designs where implemented however in the end the physical writebuffer was chosen because this way the pipeline can be left unchanged In the first design a VTVI cache design with a virtual writebuffer was implemented This enabled to program the MMU as a plug and play component leaving the DCache unchanged except for the added context information All the MMU had to do was to intercept the AMBA requests forwarding them only after the translation had finished Also it has been changed this design is shown for completeness
13. problem see section 6 3 The PTVI design still has to do a TLB lookup on every cache access but because the cache index is virtual the tag retrieval can be initiated right away while the TLB lookup is done in parallel Because the tag is physical no synonyms can occur therefore no context information is needed This also means that sharing cache line among different processes is possible yet the virtual addresses mappings have to be equal Cache snooping is not possible Therefore on a memory access by another AMBA master Cache integrity has to be maintained my software A PTVI integration is shown in figure 6 2 FE DE EX ME WB virtual index lt virtual index icache instruction cache dcache data cache tlb tlb data 8 data pysical tag Figure 6 2 Pipeline with virtually tagged and physically indexed cache Because the TLB lookup has to be done in one cycle until the tag arrives either a dual port syncram or a split instruction data cache has to be implemented so that instruction data cache can work in parallel Integration could be done in ICache and DCache with minor changes in the cache pipeline inter working Cache lines that store the Linux kernel could be shared by all processes because the kernel is compiled to a fixed vaddress
14. 0 0 gt 0 BT E 0 0 gt 0 0 gt 0 0 gt 0 0 0 50xf undefined value O zero gt follow 21 5 2 3 ASI MMU diagnostic access I D TLB Alternate space MMU diagnostic access I D TLB gives direct read write access to the TLB This ASI is not intended for system operation but for system debugging only It is not required to be implemented The SRMMU specification gives a suggestion of the coding of the ad dress data supplied by the 1da addr asi_iodiag r and sta r addr asi iodiag command however these are highly implementation specific The method implemented in this diploma thesis simply reads out the whole TLB entries content to ram using the AMBA interface already connecting the TLB for write back of page table entries who s referenced or modified bits has changed This ASI can be removed when system has proven to works properly Write operation on TLB is not supported 5 2 4 ASI MMU physical address pass through Alternate Load Store with the ASI MMU physical address pass through bypass the MMU translation i e this can used to modify the page table hierarchy on bootup The original SPARC suggestion for this ASI is 0x20 Ox2f For detailed information refer to 21 Appendix I p 267 5 2 5 ASI I DCache flush This ASI affects the cache itself not the TLB This ASI is used by Linux and is thus added to the
15. custom build rules are defined 9 1 1 LEON dependent parts For the LEON port the directory SPARC leon was added It includes the makefile linker scripts helper programs scripts and subdirectories with boot loaders for hardware and for sim ulation in Modelsim 9 1 11 make xconfig The tcl script SPARC leon drivers is included in SPARC config in It adds a LEON cus tomization window to the tk configuration screen located in the general setup configuration entry The LEON customization window itself holds input fields to enter XESS Board and LEON synthesis settings It defines various symbols on exiting i e CONFIG LEON CON FIG LEON IU IMPL Future setting extensions can be allocated in the window For instance different board settings could be made and could be chosen here 9 1 1 2 make vmlinux The makefile SPARC leon Makefile was added It includes the 2 main rules hard and sim make hard will create a vmlinux_leon exo image that can be downloaded to flashram make sim will create sim vmlinux text dump and sim vmlinux data dump that are memory images that are loaded from the Modelsim testbench atb linux defined in mmu tbench tbleon m vhd of the mmu distribution Both make sim and make hard will first create the image linux vmlinux leon nc This is done in a 2 way process First the normal vmlinux build rule is activated This will create linux vmlinux linux vmlinux i
16. e using switch qprom or Ttext 0x600 the text segment will remain in flashram while only the data and bss will be allocated in ram by the boot loader e using load command of dsumon the design has to be be synthesized with the debug support monitor activated The load command of dumon loads the ELF program into memory 11 6 1 image Creating page table hierarchies To be able to test the MMU functionality a static memory image containing a initialized page table hierarchy has to be created This is done with the image tool which takes a configuration file and creates a memory image from it Its sources are located in mmu tsource imagecreate When running make in mmu tsource imagecreate the image executable will be copied into mmu scripts Use image with options image a v m c lt config gt lt outfile gt m modify read in outfile first a analyse mode c file configuration file v print out virtual addresses Offset image base address 0x0 default The default configuration file is config txt The default outfile is mem dat Image will output 3 files mem dat for initializing mmu tbench comp mmram vhd mem dat image for initializing with mmu tbench iram vhd and mem dat dsu which is a srecord file and can be used to directly link to a executable data section This way in combination with dsumon s hardware debug support unit monitor load command a memory image con
17. entry which caused the fault Entry in Context Table Entry in Level 1 Page Entry in Level 2 Page Entry in Level 3 Page AT The Access Type field defines the type of access which caused the fault Access Type 0 1 2 3 4 5 6 7 Load from User Data Space Load from Supervisor Data Space Load Execute from User Instruction Space Load Execute from Supervisor Instruction Space Store to User Data Space Store to Supervisor Data Space Store to User Instruction Space Store to Supervisor Instruction Space FT The Fault Type field defines the type of the current fault 5 2 ASI ALTERNATE SPACE INSTRUCTIONS 31 Fault type 0 None Invalid address error Protection error Privilege violation error Translation error Access bus error Internal error Reserved e Fault address register This register holds the virtual address that caused the exception virtual address 31 0 5 2 2 ASI flush probe Alternate space flush probe gives access to the MMU s translation process and the TLB A read access to ASI flush probe will initiate a probe operation either returning the PTE or zero a write access will initiate a flush operation that will remove entries form the TLB The flush probe criteria is coded into the write read address and has different meanings for flush and probe It has the following format Virtual Flush Probe Address type reserved 31 12 11 8 7 0
18. exo o 800hq240 6 rom 25 exo exo x c OxFF w u 0 800 4240 6 rom 25 bit 82 CHAPTER 11 APPENDIX B MMU DISTRIBUTION 11 5 2 selexo sh Handling the board selexo sh covers the XESS domain in figure 8 1 To simplify configuration of the XESS board the interactive script selexo sh is provided in the mmu scripts directory selexo sh uses the con figuration file mmu scripts selexo config will determine directories where to search for exo files Adjust selexo config to your environment The main screen looks like this eiselekd ralab18 mmu scripts selexo sh Reprogram clock Stop Processor Download exo EXIE selexo sh handles 3 common task e Reprogramming the clock this will use the CPLD design oscxsv svf provided in the mmu xess svf directory e Stopping the processor When reprogramming the flashram the processor running on the FPGA has to be flushed first because it still accesses flashram Stopping the processor is done by reprogramming the CPLD with downpar svf provided in the mmu xess svf directory after that when repowering the board the FPGA remains empty e Downloading exo files this will show a screen where up to two exo files for download can be selected The previous selection is kept persistent This section in turn uses the loadexo sh script provided in mmu scripts directory An example screen would look like this 11 home eiselekd archive 10hz20hz25hz hello LEON 25hz 38400baud exo
19. in figure 6 4 1 6 4 DESIGN CHOSEN 39 Request Wait addr out Y data out grant ready Y data in y data Trans Request Wait lt gt gt virtual addr data out X pysical addr grant L 4 ready Y data in data The top figure shows the original request The bottom side shows the modified AMBA request The AMBA request is propagated to AMBA only after the address translation has taken place Figure 6 4 1 Intercepting AMBA requests in a virtual writebuffer design 40 CHAPTER 6 DESIGN OPTIONS wwrite i IN mmudiag wwrite trans Li D tlbflush Bn ubprobe2 1 DU Gi asi itag idata Write op ASI space on cache miss The state machine in figure 6 4 2 shows the various ASI space transitions on the right side on the left side the read and write operations perform a translation in state wread_trans and wwrite trans before initializing the writebuffer State wread will wait for the result to be returned stalling the pipeline if a memory access command follows immediate after the current command the loadpend state is inserted State write will initialize the writebuffer with the translated address and will return immediately Figure 6 4 2 New DCache st
20. physical address of the translation will be valid at the end of the following cycle and will be registered 4 The following is not shown in the above figure The permission flags are checked against the information given by I DCache supervisor access Cache or DCache access and read or write access On a protection or privilege violation the fault signal is raised Also the referenced modified bits of the hit entry is updated It will not be synchronized with memory until the entry is flushed or replaced Figure 7 1 1 TLB on a hit e Translation operation on a TLB miss is shown in figure 7 1 2 The TLBCAM address generation is shown in figure 7 1 3 Probe operation on a TLB hit miss is shown in figure 7 1 4 Flush operation is shown in figure 7 1 5 7 1 FUNCTIONAL OVERVIEW 47 signal miss hold 1 virtual address register physical address hold 0 I DCache MMU tlb match i TW miss physical address e Table walk tlbcam Syncram Syncram modified C el AMBA writeback mem synchronisation U Table walk AMBA finished On a TLB miss the Table Walk TW is initialized 1 Both the table walk in the TW component and the AMBA writeback operation of the TLB component use one single interface to the AMBA interface of the AMBA master acache vhd The next update position is determined by the Least Recently Used LRU component If the referenced modified bit of the PTE of t
21. processes that use the same virtual address for different physical mappings The use of the Context Number in the SRMMU suggests that the Caches should be virtually tagged virtually indexed which is described in Chapter 6 in more detail 27 28 CHAPTER 5 SPARC V8 REFERENCE MMU SRMMU eese context table pointer 31 mn CTXNR Q context number 31 24 23 18 17 12 PTD PTE PTD PTE x page table descriptor 31 PTE o page table entry 31 Physical Page Number Offset 12 11 Figure 5 1 Translation overview table hierarchy 2 are indexed through the different parts of the virtual address If a Page Table Descriptor PTD is found when indexing into the Page Table the next level is traversed if a Page Table Entry PTE is found the traversal is completed PTE and PTD are distinguished by the ET field 3 where ET 1 indicates PTD and ET 2 indicates PTE ET 0 indicates a missing entry page fault Level 1 PTE context map to 4 GB level 2 PTE region map to 16 MB level 3 PTE segment map to 256k and level 4 PTE page map to 4k The PTE entry includes the Physical Page Number and additional flags for protection cache ability and referenced modified accounting The physical address 4 that is the result of the translation of the MMU is formed out of the Physical Page
22. the Linux kernel and deal with aspects related to to memory management in Linux however for detailed descriptions refer to for instance to 5 Because of the wide range of this diploma thesis spanning from hardware design to kernel hacking naturally some parts will not be covered in full detail Emphasis in this diploma thesis was put on the practical side The source distribution can be downloaded from 7 Development was done on a XSV300 and 800 board with a Xilinx Virtex FPGA chip for hardware prototyping Chapter 2 Memory Management Historically Memory Management evolved out of the need implied by a multiuser multitasking environment 14 In such an environment where multiple users share one memory resource mechanisms has to be introduced to prohibit accidental access that would crash the whole sys tem or unauthorized access to protect private data One of the first OS that pioneered the fun damental concepts of a multiuser multitasking system was MULTICS dating back to the late 760 It implemented the concept of virtual addresses in hardware by 2 major techniques paging and segmentation 16 These principles hold to date most current architecture use paging or segmentation or a combination of both A Virtual Memory Management scheme is defined by two main functions translation and protection Translation dissolve the mapping of virtual addresses into physical addresses which in term is closely linked to memory allocation
23. where paging is somehow related to fixed size allocation whereas segmentation is related to variable size allocation each of which has its advantages and disadvantages The second function is protection Each entity in paging and segmentation hold access parameters which in turn reflect on the underlying physical resource The SPARC Memory Management architecture which is the target of this diploma thesis only supports paging therefore segmentation will only be covered briefly 2 1 Virtual Address Spaces Virtual addresses draws a clear border of abstraction In a Virtual Memory Management scheme the actual physical memory configuration is transparent to the running program A distinct fea ture of this is that programs can be programmed for an address space at compile time that is actually larger than the physical address space at runtime The uniformity frees the programmer of memory considerations The seamless integration of other resources than memory such as files reduce system and hardware dependencies 14 Two of the main techniques for imple menting virtual address spaces in hardware are paging and segmentation which are discussed in the following sections 11 12 CHAPTER 2 MEMORY MANAGEMENT 2 20 Paging and Segmentation Paging use fixed size pages as base unit usually 4k large It provides one virtual address space in the logical domain which is mapped to physical memory through one mapping entry for each page in the virtual a
24. 00000 0xfC00000 text data bss Q 5 ramstart ramstart kernel end Physical resource ramstart kernel end data bss S nocache buddy system managed pages 0x40000000 RAM 0x40200000 Virtual address space layout and Lomem mapping There are three types of Memory Management Systems in LEON Linux e boottime allocator simple Memory Management System that is used while other Memory Management Systems are still uninitialized on bootup e MMU memory allocator Memory Management Systems for page table allocations and process structure allocation e Standard kernel buddy allocator standard Linux Memory Management Systems Bootup initialization is done in multiple phases 1 The LEON boot loader initialized the MMU with a static page table This map the kernel as shown in figure 9 2 All further operation are done in virtual address space This address space has to be reconfigured while running which is part of the complexity 2 Initialize the boot time allocator this will handle all available memory that is not occupied by the kernel image using a free page bitmap Sub page allocations are also supported for consecutive sub page requests 9 4 PROCESSES 67 3 Initialize the MMU allocator The MMU allocator will manage a fixed portion of memory that is allocated using the boot time allocator This part is marked as 5 in the above figure and called NOCACHE 4 Allocate various other memory portions that shoul
25. 004000 lt __stext gt 0Ox 00b3774 0x133c0304 sethi hi 0xf00c1000 01 setup arch r0x69 0 00 1000 Ox 00b3778 0x90122000 mov 00 00 lt setup_arch gt 10 68 0 0004000 lt __stext gt 0 000377 0 400023 call Oxf00bc628 lt setup_arch gt lt skb_init gt r0x6f 0xf00b377c lt setup_arch gt 0Ox 00b3780 0xd0226324 st 00 01 0x324 setup arch To avoid long printk output loops in simulation call make LEON_SIM 1 sim this will disable printk 9 3 Memory It is important to keep in mind that there is only one virtual address space in Linux even on a combined paging and segmentation architecture like the x86 where multiple virtual address spaces could be realized only one virtual address space is provided see section 2 This virtual address space for the kernel is partitioned into different regions at compile time This parti tioning differs for every architecture i e on a 1386 architecture of Linux the kernel is located at virtual address 0xc0000000 whereas on SPARC it 1s located at virtual address Oxf0000000 This value is defined with the PAGE OFFSET macro The underlying physical address space map ping could be totally arbitrary however to simplify managing the physical memory resource physical address space mapping is coupled as tight as possible to the virtual address space Therefore on a machine with contiguous memory the kernel image would for instance be lo
26. Cache TLB is the most simple design to implement 6 1 Physically tagged physically indexed PTPI On a PTPI design one TLB lookup has to be made on every cache access This requires the TLB lookup to be integrated into the pipeline to get reasonable performance In a PTPI design shared pages among different processes are possible with any virtual address mapping which means sharing of cache line among different tasks is possible Cache snooping is possible A PTPI integration is shown in figure 6 1 Implementing such a scheme into LEON would be difficult because it would mean in fact to rewrite the pipeline An advantage would be that DCache and Cache could be left unchanged with snooping enabled 6 2 Physically tagged virtually indexed PTVI The PTVI design combines the physical with the virtual cache design The drawback of a pure physical cache is that the TLB lookup has to be done before every cache access the drawback of s pure virtual cache design is that context information has to be added to avoid the synonym In Cache Snooping the AMBA Bus is constantly checked to see weather another AMBA Bus Master is modi fying a memory location which is stored in the Cache 35 36 CHAPTER 6 DESIGN OPTIONS T dcache DE EX db ME WB y instruction dui h ata cache cache Figure 6 1 Pipeline with physically tagged and physically indexed cache
27. D svf image Flash Ram FPGA bit sparc rtems objcopy O srec adjust vma 0x100000 exo gt 0 100000 Figure 8 1 Design flow overview 60 CHAPTER 8 DESIGN FLOW flashprog svf xsload exo 3 xsload flash_to_fpga svf parallel CPLD 3 UEM M9 m i ROM lashram 1 FPGA LL M Er RAM Ld Ld Powerup Running Configuration Operation Figure 8 2 FPGA initialization from flashram Chapter 9 Linux kernel The Sun Linux kernel is written to run on different SPARCstation with different MMUs An overview of all manufactured models can be found at 22 The Linux Kernel uses a monolithic kernel mapping The boot loader i e SILO will load the kernel image to physical address 0x4000 and will then jump to address 0x4000 which is _ start symbol The MMU will already be activated at that time depending on the various models That means program ex ecution will begin in the virtual address space at Oxf0004000 Figure 9 1 shows the Kernel mapping User Space Kernel Space 0 0 0xf000000 text data bss Virtual Address Space Physical Address Space text
28. DCache controller s ASI decoder A alternate Store with the Alternate Space Identifier 5 2 ASI ALTERNATE SPACE INSTRUCTIONS 33 V DCache flush will flush I DCache entries given a specific criteria For detailed informa tion refer to 21 Appendix I p 266 In the current implementation any alternate store to ASI DCache flush will flush the whole I DCache a future enhancement could be the implemen tation of a fine grade flush that is suggested by the SPARC standard 34 CHAPTER 5 SPARC V8 REFERENCE MMU SRMMU Chapter 6 Design options There are several possibilities for implementing a MMU for LEON each of which would have to be integrated into a different place of the current design For the cache MMU integration any of the three alternatives physically tagged and physically indexed virtually tagged and physically indexed VTPD and virtually tagged and virtually indexed VTVT has its drawbacks and advantages that are discussed in the following chapter The SRMMU actually suggests to use a the VTVI design by introducing the Context Number register however also a PTPI or VTPI design could be implemented that complies to the SRMMU standard In a VTVI design there are again 2 choices to choose from virtual writebuffer translation after initialization of writebuffer or physical writebuffer translation before initialization of writebuffer The VTVI design with a physical writebuffer and a combined I D
29. Number and the Offset the sizes vary depending on the page table hierarchy level 5 2 ASI Alternate Space Instructions The privileged versions of the load store integer instructions 1d st the load store alternate instructions 1da sta can directly specify an arbitrary 8 bit address space identifier ASI for the load store data access The privileged alternate space load store instructions take the form 1 addr asi_ident r and sta r addr asi ident where asi ident is the 8 bit ASI identifier The address and the value in case of a store are interpreted in a specific way that differ for each ASI identifier The privileged load store alternate instructions can be used by supervisor software to access special protected registers such as MMU cache control and 2For instance for ASI identifiers DCache_flush on a store to any address DCache is flushed In other ASI identifier spaces addresses are mapped to special registers that can be stored or loaded just like normal memory 5 2 ASI ALTERNATE SPACE INSTRUCTIONS 20 processor state registers and other processor or system dependent values 21 For the MMU the SPARC Architecture Manual V8 suggests the ASI MMU register 0x4 ASI MMU flush probe 0x3 ASI MMU bypass and a optional ASI MMU diagnostic access I D TLB 0x7 For fine grade cache flushing flushing depending on ctx number and a virtual address pattern five additional I DCache flus
30. PENDIX A COMPONENTS touch touchmin pos mmetrll same as LRUE pos same as LRUE pos pos pos Outputs the tail entry address tail element should bubble to front Figure 10 0 5 LRU component gt movetop 7 pos left fromleft right fromright dir name desc in touch force compare of entry addr set bubble to front mark in pos entry addr to compare to in clear 1 topmost element bubble to front stop in fromleft interconnecting elements fromright interconnecting elements left right signal copy movetop signal bubble to front one pos output entry addr Figure 10 0 6 LRUE component single entry in LRU component 11 Appendix B MMU distribution 11 1 Distribution overview LEON2 1 0 5 mmu modelsim gt scripts gt syn gt tsource gt tbench vhdl Xess The MMU is build on top of LEON distribution version LEON2 1 0 5 To install mmu tar gz distribution you have to first install the LEON2 1 0 5 tar gz After that unzip and tar mmu tar gz into the LEON2 1 0 5 directory When working with Modelsim the working directory has to be set to LEON2 1 0 5 mmu so that the original files can be accessed in The subdirectories LEON2 1 0 5 mmu scripts and LEON2 1 0 5 mmu xess are aimed on hardware development targeted for a XESS Corporation XSV800 board with the following softwa
31. Sat Aug 31 18 39 16 2002 2 home eiselekd archive 10hz20hz25hz hello LEON 10hz 19200baud exo Sat Aug 31 18 39 39 2002 3 home eiselekd archive 10hz20hz25hz hello LEON 10hz 38400baud exo Sat Aug 31 18 39 54 2002 4 home eiselekd exosoftarchive 10hz20hz25hz hello LEON 25hz 19200baud exo Sat Aug 31 18 39 30 200 sel 5 home eiselekd work batch 20 000 800hq240 6 rom 20 exo exo Tue Sep 3 22 02 36 2002 6 home eiselekd work batch 25 000 800hq240 6 rom 25 exo exo Fri Sep 6 09 32 06 2002 7 home eiselekd work 300batch 25 000 300 0240 4 rom 25 exo exo Tue Sep 3 18 27 55 2002 x Execute q Quit 11 6 Subdirectory mmu tsource When running LEON on a XSV300 800 board with the top entity mmu xess vhdl leon rom vhd the flashram appears as ROM 0x00000000 in the LEON address space several possibilities can be used to run a program this is shown in the Software part in figure 8 1 e using xsload to program the flash with a program that has been linked to address 0x00000000 this way only read only data section can be used no global variables used e using mkprom to create a boot loader that extracts a ELF program to the specified location Only text data and bss segments are supported therefore a custom build script has to 11 6 SUBDIRECTORY MMU TSOURCE 83 be used to make sure a all sections are placed in one of the three segments Default link address for text should be 0x40000000 when using 0x600 i
32. U supports memory management in hardware Chapter 2 gives an introduction to the theoretical concepts of memory management After that chapter 3 gives a brief overview over the SoC platform LEON which had to be extended with a MMU The LEON integer unit implements the SPARC architecture according to the SPARC Architectural Manual V8 which is described in Chapter 4 The SPARC Architectural Manual V8 does define a reference MMU in Appendix H the SRMMU which is described in Chapter 5 The following chapters will focus on the implementation of the MMU in hardware Also the SRMMU suggests a specific design in detail there are nevertheless a variety of design options from which to choose These options are discussed in Chapter 6 Chapter 7 describes the actual implemented design which is the l This source turned out to be well documented and because of the RISC nature of the low level assembler parts relatively easy to understand Of course excluding the tools for synthesis which are not open source yet 9 10 CHAPTER 1 INTRODUCTION main part of this report First a functional overview is given for the MMU as a whole then for each operation the MMU supports Each design component of the MMU is described separately Some notes on future timing optimization follow Chapter 8 will focus on the design flow giving an overview on the tools involved Chapter 9 start with the Linux porting effort It will describe some fundamentals about the working of
33. University of Stuttgart Diploma Thesis Examiner Prof Dr Hans Joachim Wunderlich Supervisor Dr Rainer Dorsch hardware Dr Thomas Sch bel Theuer linux Begin 01 05 2002 End 31 10 2002 14 1 1 2002 extension CR Classification 7 1 C 1 C 5 D 4 Dipoma Thesis Nr 2013 Design of a Memory Management Unit for System on a Chip Platform LEON Konrad Eisele Division of Computer Architecture Institute of Computer Science Breitwiesenstr 20 22 70565 Stuttgart A Memory Management Unit MMU for SoC Platform LEON was designed and integrated into LEON The MMU comply to the SPARC Architectural Manual V8 reference MMU SR MMU Contents 0 1 Abbreviation index 1 Introduction 2 Memory Management 2 1 Virt alAddr ss Spaces 4 ww wu X NUR MUR US 22 Paging and Segmentation 2 3 Hardware S ppott siis Bt ND telo AC An 3 System on a Chip platform LEON 31 EBEON Pipeline ace Gr We Weta de ses irda Ge Ray Bere 3 2 Cache subsystem 2d sek oae t ach Bahn acht 3 2 1 Data cache DC ache oet exe ru acis 222 JTaStOURCHOD CHO De sitie ete E Sales s ate T 32 3 AMBA ASB interface 4 SPARC standard Zl A oh ice aule Io x Se dot Ae aedes eut e SPARCO VE Aue A igi qe ie te
34. a data out gt le grant lt ready lt __ Y data in y data Figure 3 2 Simplified ASB query 3 2 CACHE SUBSYSTEM 21 Or icache state change streaming O miss detect 2 no pipeline transition icache blocking LI LI s 3 amba request mem ready latch data using ico mds decode orm continues Fetch De Gy no pipeline transition icache blocking i MERE 7 amba request Fetch Execution mem ready latch data using ico mds 1 V no pipeline transition icache blocking LI Ls 1 _ Decode Execution Commandl L Command2 The instruction address will either be calculated in execution stage from a previous branch or jump com mand or by normal increment of the pc It is valid at the beginning of the fetch stage and will be used to retrieve the tag The miss detect is made in fetch stage 1 if a miss was detected the ICache will change into streaming mode 2 and issue a memory request 3 while the pipeline stalls in the decode stage waiting for a command 4 If the memory request returns the result is strobed by the ici mds signal into the pipeline bypassing the normal pipeline propagation pipeline still stalls 5 Now the decode stage will get valid and can propagate one step at the next clock cycle 6 Meanwhile the ICache that is still in streaming mode issues the next mem
35. a RTOS typically handle fixed tasks in signal processing or in the industrial process measurement and control environment like finite state machines for control flow or detecting faults The current non MMU LEON is designed for such an realtime environment where the RTOS has to be as slim as possible 3 2 CACHE SUBSYSTEM 17 Decode Memory 41 EData EAddress MAddress HHK x EData EAddress MAddress e oKo I AYOLS So o4o opooop EAddress MAddress The above figure show a ld word and st word command on a DCache hit The load command will take 1 execution cycle to process in the execution stage the EAddress will be generated row 2 The store command will take 2 execution stage cycles to process in the first cycle row 2 the EAddress is generated that will then move into MAddress of memory stage in the second cycle row 3 EData the value to store is retrieved from the register file The store will initialize the writebuffer which will drive the memory request so that the pipeline can continue operation Figure 3 1 1 Load Store commands e ME Memory Data cache is accessed For cache read hit the data will be valid by the end of this stage at which point it is aligned as appropriate Store data on a cache hit read out in the E stage 2 cycl
36. and segmentation map from virtual addresses to physical addresses For the translation several data structures are possible to store the mapping It can for example be done by using one big flat table In this case the virtual address form an offset into the table where the corresponding descriptor for the physical address is found For the above paging example this would look somehow like this 2 2 PAGING AND SEGMENTATION 13 physical In case of segmentation a segment it is called a segment descriptor table In case of paging it is called a page table In case of paging with one flat page table a 2 32 virtual address space with 4k size pages would require 2 20 entries which would occupy around 4 MB 23 Therefore in most cases a sparse page table hierarchy is used An example of this can later be seen in section 5 In a page table hierarchy the virtual address 15 subdivided into several indices that does each offset into the next level of the page table tree By interrupting the traversal in between it is also possible to define larger pages than the fixed size 4k pages For instance in the SPARC page table tree with 4 levels level 0 maps 4G bytes whole virtual address space level 1 maps 16M level 2 maps 256K of memory and level 3 maps the standard 4K pages Figure 2 3 shows a schematic of the page table traversal Virtual address Miss edi Main memo TLB M table lookup Hit
37. andler handle this special case 15 SPARC this is done by swapping 9606 sp stack pointer 07 return addr psr Ywim and then issuing a jmpl 07 8 g0 command 70 CHAPTER 9 LINUX KERNEL schedule al gt Process Struct 3 store wim load wim 2 3 ad a window spill E M ne q A eee C S indow spill E window spill 00 window spi 90 window spill 0 SOOO Be Bu trap 1 window spill 1 window spill Kernel Stack Trap Frame Trap Frame trap User Stack Kernel Thread ORCL window spill RS RED 2 window spill A deserted schedule frame window spill Figure 9 6 Schedule 9 4 PROCESSES S H _ Process Process Process Struct 1 Struct 2 Struct 3 lee gt S b 2 window spill 2224 8 222 D 57072 on N EN window spill gt muss Trap Frame ae window spill N window spill on AM TERM Kernel Stack Trap Frame
38. are handled inside DCache The translation operation can be bypassed in case the MMU is disabled or if ASI MMU physical address pass through is used In this case the writebuffer of DCache and the instruction address buffer in Cache 3 will be initialized immediately and an AMBA request will be issued In case of MMU is enabled a translation will be requested from ICache and DCache If a ICache and DCache request is issued at the same time they are serialized the request that has to wait will be buffered in the meantime 4 ICache s translation request and DCache s translation flush probe and diagnostic access requests will then be issued serially to the Table Lookaside Buffer TLB 5 The translation operation flush operation and probe operation will be described in the following section Functional overview The diagnostic access operation is already been described in section 5 2 3 The translation flush and probe operation will initiate different match operations on the TLBCAM 6 that will assert a hit on success 7 In case of translation operation and probe operation a miss will initiate a Table Walk that will traverse the Page Table Hierarchy After the Page Table Entry is retrieved from memory the new entry will be stored in TLBCAM tag and syncram data The TLBCAM entry that is going to be replaced which is determined by the LRU component will be checked for memory synchronization ref modified bit changed 8 After comp
39. ate machine The virtual writebuffer design gave rise to the trap in trap problem This could be fixed with changes in the pipeline however instead in the second run the VTVI cache design with a physical writebuffer was chosen In this case the DCache and ICache had to be reprogrammed 6 4 VTVI DCache physical writebuffer DCache vhd The DCache has been rewritten It now implements a VTVI cache with a physical writebuffer context information was added to the tag field The ASI spaces decoder was extended to sup port flush probe MMU register MMU bypass I D flush and and Diagnostic access accesses A DCache state diagram is shown in figure 6 4 2 6 4 2 ICache ICache vhd The data cache has been rewritten It now implements a VTVI cache context information was added to the tag field The address translation is done before entering the ICache s streaming mode A ICache state diagram is shown in figure 6 5 6 4 5 Other changes made to LEON e AMBA interface acache vhd The AMBA master interface was rewritten and arbiters between ICache DCache and the 6 4 DESIGN CHOSEN 41 Figure 6 5 New ICache state machine table walk component ICache component has highest priority e Cachemem cachemem vhd Context number information was added to every i DCache line e 5 8 4 has been changed by adding the new ASI space identifiers Because the ASI space identifiers proposed by 21 in A
40. cache is divided into cache lines with 8 32 bytes of data Each line has a cache tag associated with it consisting of a tag field and one valid bit for each 4 byte sub block 18 A simplified ICache schematic is shown in figure 3 2 3 Figure 3 2 4 shows the ICache behavior on a miss ICache will change into streaming mode fetching one entire cache line Because the configurable cache line size is a power of 2 and smaller than the 4k page size this operation will not cross a page boundary Therefore only one 3 2 CACHE SUBSYSTEM 19 tag appears 7 L Q addr calc Execution Memory dcache state change 3 miss detect Execution Memory Write no pipeline transition amba request Write mem ready latch data using dco mds tpe to regfile falling gt pipeline continues Write The address is calculated in the execution stage 1 which will enter DCache and will be forwarded to cache s syncram to retrieve the tag Address is either register register or register immediate The tag from sycram will become valid at the beginning of the memory stage 2 In the memory stage the miss detect is made meaning that the tag is compared with the address 3 If a miss is detected not equal DCache will change its state at the beginning of write stage 4 and will stall the pipeline 5 This implies that if the following command now in memory stage is also a load one
41. cams hit modified N hit not modified gt tlbcam 2 13 Q 9 Q CUT 1G The flush operation is initiated through a write access with ASI identifier MMU flush probe The flush operation in the current implementation will take at least one clock cycle per TLB entry to perform a flush match operation which is different to translation match and probe match operation If memory synchronization is needed additional cycles for the AMBA request are needed At the end one cycle is needed to invalidate the matched TLB entries of the flush operation TLB entries are checked one by one First a hit against the virtual flush address Section 5 2 2 is generated In case of a hit and if the referenced modified flags have been changed the entry is written back to memory This is shown in the above figure where 3 examples are shown 1 TLBCAM 1 signals a hit and that it needs synchronization and therefore a AMBA writeback operation is started 2 TLBCAM 2 signals a hit but does not need synchronization and therefore AMBA writeback operation is not started 3 TLBCAM 3 has not hit at all At the last cycle all entries that have hit are invalidated In the example this would be entry 1 and 2 entry 3 would be left valid 4 Figure 7 1 5 Flush operation 50 CHAPTER 7 DESIGN COMPONENTS 7 2 Component Overview 7 2 1 Memory Management Unit MMU The MMU component is shown in figure 10 1 and table 10 1 The main purposes of th
42. chy level of fault u Figure 10 0 1 TLB component 76 CHAPTER 10 APPENDIX A COMPONENTS TLBCAM tagout hit write op match op probe op probe zero flush op acc frset Ivi tagin EE tagwrite dir name desc in write op write operation after table walk in match op compare tag for translation in flush op compare tag for flush in probe op compare tag for probe in frset set invalid ref modified in tagin data for match op flush op and probe op in tagwrite tag data for write op out hit signal hit on match op and probe op out probe zero probe op return value is zero out access flags of pte out page table hierarchy level of pte out referenced flags of pte out modified flags of pte out 1 synchronisation with memory needed out tagout complete tag can be removed later on lt m e Figure 10 0 2 TLBCAM component 77 walk_op_ur hold probe_op_ur data writeback_op_ur addr data lvl probe type fault trans fault mexc aaddr fault lvl adata areq adata aread aready agrant mcmmo mcmmi Figure 10 0 3 TW component dir name desc in table walk operation in probe operation in of probe see 21 in virtual address probe address pue AMBA error in AMBA interface for TLB out AMBA interface for TLB Figure 10 0 4 TW component 78 CHAPTER 10 AP
43. d bold italic the directory linux arch sparc will be referenced as SPARC in the follow ing text linux gt arch gt sparc lt in the following text this gt leon directory will be referenced as S SPARC gt drivers gt 5 include gt asm sparc gt init gt gt kernel gt 1lib mm gt net The main Makefile is located in linux Makefile with the build rules included from linux Rules make 29 73 The standard build command are make xconfig make dep and make vmlinux make xconfig will open a tk configuration screen that is build from SPARC config in as the tcl source and SPARC defconfig as the default configuration make vmlinux will compile Future distributions should define its own architecture subdirectories for instance leon sparc instead of sparc this setting was only kept because development was done by modifying the existing source Introduc ing a own leon sparc architecture would require changing the include statement in all the source files and setting up its own configuration screen config in files This will create the main configuration file linux include linux autoconf h included from linux config that is the fist include of every source file 9 1 LINUX FILE ORGANIZATION AND MAKE SYSTEM 63 the kernel The top makefile linux Makefile includes SPARC Makefile where the SPARC system dependent parts are configured and
44. d not be managed by the buddy alloca tor 5 Allocate the data structures for the buddy allocator using the boot time allocator and ini tialize them the buddy allocator is still unused at that point 6 Allocate page table hierarchy through the MMU allocator and map NOCACHE into vir tual address space starting from Oxfc000000 see 3 in the above figure LEON Lomem into the virtual address space starting from Oxf1000000 see 2 in the above figure and the kernel starting from virtual address space Oxf0000000 see 1 in the above figure 7 Switch to the newly allocated page table hierarchy 8 Hand over all memory that is left from the boot time allocator to the buddy allocator including the boot time allocator s free page bitmap as well 9 4 Processes 9 4 1 Process stack Figure 9 4 shown the layout of the process stack The process stack in the SPARC architecture differs from others in that it has to support window spilling Remember that the SPARC register file is divided into register windows see section 5 Because of the modulo arithmetic of the window counter on a SAVE or RESTORE command which decrement or increment the window counter the current window counter can run into a already used window To avoided this the invalid window mask will mark the wraparound border 9 when running into a window marked by the invalid window mask a window underflow or window overflow trap is generated that wil
45. d000 converting with htonl flags c d r e x w v p pr setflags from 0004000d02 to 0004000d0a ored 0000000008 add 4k pte to PTE table pageia 004000d000 at pteia 0040009a04 vaddr 1 0 0 1 Output mem dat Output mem dat image Output mem dat dsu 11 6 1 0 1 Analysing page table hierarchies To analyse the page table hierarchy of a mem ory image the image program can be called with the a option This will output the page table hierarchy eiselekd ralab10 image a HH Analyzing mem dat ctxp 0040009000 CXT 00000 1 0040009400 1 PGD 00000 1 0040009800 1 Icldlrlelwl vlplprl PMD 00000 pte 0040000000 0040040000 000 000 000 pte 0x000400000a CXT 00001 004000c000 PGD 00000 0040009900 11 6 SUBDIRECTORY MMU TSOURCE 85 PMD 00000 0040009a00 gt 000001 pte 004000b000 004000c000 001 000 000 000 pte 0x0004000b0a gt PTE 00001 pte 004000d000 004000e000 001 000 000 001 pte 0x0004000d0a eiselekd ralab10 11 6 1 0 2 Dumping memory content oftestbench The mmu tbench iram vhd memory mod els of the testbench was extended so that memory dumps could be forced A memory dump is initiated when by the following sequence unsigned int ioarea unsigned int
46. data bss 0 0 0 4000 Available Memory RAM Figure 9 1 Linux kernel mapping on Sun In the smallest possible configuration the kernel image is still 1 MB The following is a dump of the Linux image vmlinux rewritten for LEON Sections Idx Name Size VMA LMA 0 text 000 3000 0004000 0004000 1 data 00017400 00 7000 00a7000 2 000280 8 f00be400 f00be400 Linux running on LEON on a XSV board has 2 MB of RAM and 1 MB of ROM flashram available Using the standard approach to allocate a monolithic kernel in RAM would leave only view memory for operation Therefore another mapping was chosen On operation the text section will reside in ROM while the data and the bss sections will be associated at the beginning of RAM This is shown in the figure 9 2 61 62 CHAPTER 9 LINUX KERNEL User Space Kernel Space 0 0 Oxf000000 text data bss Virtual Address Space Physical Address Space ROM RAM text data bss 0x4000 Available Memory 0x0 0x 100000 0x40000000 0x40200000 Figure 9 2 Linux kernel mapping on LEON 9 1 Linux file organization and make system For a detailed description of the Linux make system read the text file Documentation knuild makefiles txt in Linux source distribution The LEON Linux port builds on linux 2 4 17 The sources are or ganized in the following way that SPARC architecture and LEON dependent directories are marke
47. ddress space which is shown in figure 2 1 Each mapping entry holds additional information for OS use This is shown in figure 2 1 Linear address space 1 2 3 LA 5 6 Logical level Mapping Physical Leve Figure 2 1 Paging Segmentation on the other hand side uses variable size segments Each segment form one independent virtual address space however only one mapping per segment is provided therefore a segment has to be contiguous in physical memory Each segment holds additional information which include its length and flags for OS use This is shown in figure 2 2 Linear i address Linear address space Linear address space i space 1 3 2 Logical level Mapping Physical level V gt gt i gt Linear Address Space Linear Address Space i Linear address space Figure 2 2 Segmentation The paging example figure 2 1 shows a mapping 1 1 2 3 4 2 Adjacent pages in the virtual address space can be scattered in physical memory This makes memory allocation in paging immune to fragmentation Paging forms the base for swapping demand paging of the OS where not the whole virtual address space has to be constantly present in physical memory Pages can be swapped in out from hard disc on demand The corresponding flags in the page table entries keep track on this process Both paging
48. e equ oi e 64 9 3 MOODY Se ec ue irs eee d PECES Eu 65 9 3 1 Memory Management 66 BA Processes mou Ah Rel En Sie Wed 67 941 Process Stack cioe ew bue voie UE IRE 67 942 Scheduling eb es tue nk or Soe t oru vw EB 68 10 Appendix A Components 73 0 1 ABBREVIATION INDEX 7 11 Appendix B MMU distribution 79 11 1 Distribution overview 79 11 2 Subdirectory mmu modelsim 79 11 3 Subdirectory mmu syn 80 11 4 Subdirectory mmu tbench 80 11 4 1 Testbenches for MMU components mmu tbench comp 80 14 4 0 DEB cami oru uo ud ie 80 11 41 Ze AO VB artt ttg s at ct eco dece e die 80 11413 TLB_tb vhd 80 11414 manu tbyvhd 80 11 5 Subdirectory mmu scripts XESS board development 8l 11 5 1 syn pl Xilinx tool chain build scripts 81 11 5 2 selexo sh Handling the board 0L 2424322045593 82 11 6 Subdirectory mmu tsource 82 11 6 1 image Creating page table 83 11 6 1 0 1 Analysing page table hierarchies 84 11 6 1 0 2 Dumping memory content of testbench 85 11 6 2 Small Operating System SOS
49. e mmcti2 ICache interface ir name desc in trans ur request translation in su ur read ur translation request parameters asserted as long as operation is still unfinished out return physical address on translation operation out address cacheable out accexc asserted on protection privilege or translation error ACache interface this interface is used by the the Table Walk component dir name desc in input from AMBA to the Table Walk component inside TLB output to AMBA from the Table Walk component inside TLB ou Table 10 1 MMU component i o 75 req ur TLB hold ur trans op ur probe data TLBCAM probe op ur data flush op ur cache su ead fault_inv isid fault_pro data fault_pri VV VV v vov v v mmctrl1 fault trans fault mexc TLBCAM fault lvl SYNCRAM name desc req ur request operation trans op request translation probe op request probe flush op request flush supervisor read l read access mmcetrl1 ctxnr ctxptr hold ur asserted until operation is finished probe data probe return data physical address cache address cacheable isid 1 0 DCache fault_inv invalid page fault fault_pro protection violation fault_pri privileged page access violation fault_trans translation error page fault PTE invalid fault_mexc memory exception during table walk fault lvl page table hierar
50. e MMU component are e Serialize the concurrently arriving translation requests from ICache and DCache e Implement fault status and fault address register ASI space MMU register DCache and ICache have to share a single TLB resource therefore stalls can occur DCache misses result in one TLB query for Load Store operations Double word accesses are double word aligned Therefore they will not cross a page boundary and only one translation is needed ICache cache misses result in one TLB query however in streaming mode the translated phys ical address is used for increment therefore for a 4 word size ICache line only one in four instruction results in a TLB query Requesting a TLB operation for DCache is done by asserting req ur This can be a translation trans ur a flush flush ur or a probe probe ur operation Requesting a TLB translation for ICache is done by asserting trans ur The various parameters for these operations are registered inside the MMU so that concurrent request from ICache and DCache can be serialized Until the operation result to DCache or ICache is returned hold ur will remain asserted 7 2 2 Translation Lookaside Buffer TLB The TLB component is shown in figure 10 0 1 Being a cache of previous translations the TLB can be designed in similar ways as the data instruction caches direct mapped n way associative or full associative Most TLBs how ever are full associative to maximize speed implemented as reg
51. e execution stage command is written to the data cache at this time e WR Write The result of any ALU logical shift or cache read operations are written back to the register file 18 In principle every command takes 5 cycles to finish if no stalls occur however the decode and execution stage are multi cycle stages that can take up to 3 cycles for instance the store double 1da command would remain 3 cycles in the execution stage before pipeline continues Memory access from data cache is initiated through the load 1a store st load alternate 1da store alternate sta and the atomic loadstore 1dst and swap swap commands in the memory stage Figure 3 1 1 shows a 1 cycle execution stage load 1a and a 2 cycle execution stage store st command 3 2 Cachesubsystem The LEON processor implements a Harvard architecture with separate instruction and data buses connected to two independent cache controllers Both data cache DCache and instruc 18 CHAPTER 3 SYSTEM ON A CHIP PLATFORM LEON EAddress MAddress addr Cachemem 222 tag out data in sal VF pipeline hold 011 hit Waiting Load Y decode cycles 001 data out In the above figure address that drives the cachemem comes either from execution stage or from memory stage 1 Both read and write co
52. e the bootup trace a simple anno tation program was written The VHDL testbench disassembles the processed commands and outputs it to the vsim tcl window This output is stored in the transcript file The tran script file is taken and annotated with the resolved function names for each program address also the data addresses i e on a sethi hi Oxf00a7000 g3 are tried to be resolved This way a complete trace of the boot process is possible however at a rate of around 10Hz on Because the MMU is already activated this is a jump to virtual address 0xf0004000 physical address 0x4000 The static page table hierarchy will locate the kernel image contiguous in the virtual address space starting from Oxf0000000 also that it is split in ROM and RAM part The mmu tbench iram m vhd ram model will zero itself when starting the simulation so no bss zeroing has to be made 9 3 65 a 2 GHz PC This is done with the SPARC LEON scripts annotate pl script that uses the sim vmlinux text dis and sim vmlinux data dis files to resolve address to symbol map pings sim vmlinux text dis and sim vmlinux data dis are created by make sim All input and output files for SPARC LEON scripts annotate pl are configured in SPARC LEON scripts annotate An example of the annotated transcript file is shown below the left side from is the testbench output the right side is the part added by SPARCYLEON
53. ers too which would simplify the design but increase hardware cost On a hit the syncram address to be valid at the end of the cycle the page table entry will appear from syncram after the start of the next clock cycle and can therefore only be registered at the end of the second cycle after the TLB query was received The registration is done in the MMU component In the current implementation the whole translation operation on a TLB hit will take 2 cycles to process Figure 7 1 3 TLB hit logic i signal finished hold 0 probe virtual adress signal finished ho DCache probe hit tIb match Q miss Table walk probe miss The probe operation is initiated through a read access with ASI identifier MMU flush probe A probe operation will retrieve a PTE from a given virtual address pattern On a probe operation first the TLBCAM probe match operation is done 1 which is different to the translation match operation If a hit occurs either PTE or zero is returned If no hit occurred the Table Walk is initiated 2 and either a PTE entry or Zero is returned see Section 5 2 2 for detailed description Figure 7 1 4 Probe operation with TLB hit miss 7 1 FUNCTIONAL OVERVIEW 49 DCache 27 MMU flush virtual address signal finished hold 0 modified TW A AMBA writeback AMBA finished Invalidate tlbcam ae n 1 gt n tlbcams tlb
54. h ASIs are suggested 5 2 1 ASE MMU register access Alternate space MMU register gives access to the MMU s control registers The instructions Ida addr asi mmureg r and sta r addr asi mmureg behave as you would ex pect For detailed information refer to 21 Appendix H There are 5 registers defined for the SRMMU fad Control Register Context Table Pointer Context Number Register Fault Status Register Fault Address Register e Control Register This register include enable flag and implementation specific flags among others IMPL Ver Custom resvd NF 31 28 27 24 23 8 7 6 25771 0 IMPL MMU Implementation VER MMU Version SC System control PSO Partial Store Order NF No Fault bit disable fault 1 trap E Enable enable 1 e Context Pointer register This register holds the root of the page table tree virtual address resvd 31 211 0 30 CHAPTER 5 SPARC 8 REFERENCE MMU SRMMU e Context Number register This register stores the context number of the running pro cess It will form the offset into the context table Context Number 31 e Fault status register This register holds the status of the MMU on a exception i e page fault reserved EBE L AT FT FAV OW 31 EBE unused 18 17 10 9 8 7 54 2i 1 0 L The Level field is set to the page table level of the
55. h literature footage and a broad spectrum of online documentation it is fairly well understood Porting Linux onto LEON was especially inviting because Linux has already been ported to the SPARC architecture running on Sun workstations However only SPARC processors with a Memory Management Unit MMU are supported The current LEON distri bution does not include a MMU because LEON is targeted on embedded realtime applications where nondeterministic page faults of the MMU could cause trouble to the realtime require ments of these applications Also in a deeply embedded environment where normally only one fixed task has to run the overhead of Virtual Memory Management is quite significant The SPARC Architectural Manual V8 SPARC V8 does not require a MMU to be present however the SPARC V8 which the LEON integer unit implements already defines a SPARC Reference MMU SRMMU This suggested that when adding the SRMMU to LEON porting Linux would be a straightforward job Therefore the main goal of this diploma thesis is the implementation of a SRMMU and it s integration into the LEON SoC platform This report will concentrate on the hardware side the design and implementation of a SRMMU Running Linux on LEON may not be practical for embedded realtime applications nevertheless there could be quite a few fields of application PDA s or the like Another nice aspect is that Linux running on LEON SoC would be Open Source from gate level on The MM
56. h minimal effort in source code change Using registers instead of syncram for the data part would reduce the source code complexity significantly However a address decoder would be needed to access a given element In a 32 element TLB with 32 bits for each element s data part this could be significant hardware cost 7 3 2 0 cycle penalty implementation In a zero penalty implementation the TLB hit logic would work in parallel to the DCache ICache hit logic To avoid stalls caused by TLB sharing of DCache ICache a split DCache ICache TLB or a dualport design should be used For ASI accesses that are made from DCache the ICache interface should be blocked These changes would require some more work Figure 7 3 1 shows a schematic of the proposed DCache TLB inter working 52 CHAPTER 7 MMU DESIGN COMPONENTS tag appears 1 1 L addr calc Decode Execution Memory Write dcache st te change rn Tb miss detect ce c ENS Execution Memory Write i no pipeline transition j m i i amba request i Memory Write Me i use syncram output unregistered to drive AMBA request signal hit hold 0 or implement data part in register i gt DCache I tlbcam Syncram addr DCache hit logic TLB hit logic work in parallel The DCache miss detect 1 and
57. he Current Window Pointer this way a large pool of fast registers can be accessed while still keeping the instruction size small 4 2 1 Register windows Figure 4 1 illustrates the register windows for a configuration with 8 windows The register windows are divided into 3 parts ins local and outs On a SAVE instruction which adds 1 to the Current Window Pointer CWP the current window s outs will get the new window s ins on a RESTORE which subtracts 1 from the CWP it s vice versa On the wraparound point one invalid window exist window 7 in the above figure This window is marked by the Window Invalid Mask WIM It is invalid because it s out registers would overwrite the ins 23 24 CHAPTER 4 SPARC STANDARD SAVE RESTORE Figure 4 1 SPARC register windows taken from 21 p 27 of it s neighbor which is not desirable therefore moving into a invalid marked window will cause a trap Typically the trap handler will take care of swapping the registers onto the stack The local registers are registers that are visible only to the current function while the ins and outs are shared between caller and callee 4 2 2 SPARC instruction overview SPARC instruction are 32 bit wide For the MMU design mainly instructions for memory access are relevant For memory access only few load to register and store from register commands are available The destinati
58. he TLB entry that is going to be replaced has to be synchronized with memory the PTE will be retrieved from syncram and will be updated in memory through the AMBA interface 2 When the Table Walk finishes the retrieved PTE is stored in syncram and the corresponding TLB entry is initialized 3 After another cycle the translated physical address is valid 4 The following is not shown in the above figure In addition the PTE s physical memory address is stored so that on a writeback operation when synchronizing the PTE s ref modified flag the Page Table Hierarchy has not to be traversed Access permissions are checked Figure 7 1 2 TLB on a miss 48 CHAPTER 7 MMU DESIGN COMPONENTS tag cam hit 3 hit 2 hit 0 1 3 5 7 9 e gt 2434614 104 a o o data 4 5 6 7 12 y syncram 27 data ram address The TLB s tags are stored in fast registers the associated page table entry is stored in syncram Each TLB entry can assert a hit signal that generate the syncram address of the associated page table entry through a or tree which is shown in the above figure for instance bit 0 of the syncram address is formed by or ing the hit signal of entry 1 3 5 This design was chosen to save hardware resources and to be able to still run a minimal MMU on a XSV300 board Future implementations could implement the data part in regist
59. he board that stores the design in flashram and initializes the FPGA from it on powerup achieve this first the flashram has to be initialized through the CPLD which first requires reprogramming of the CPLD using the flashprog svf design file 1 After that the exo FPGA design file can be uploaded to flashram 2 The exo files are srecord format files This simple ASCII format contains lines that comprise of a destination address and the data that should be stored at that address exo files do not have to contain FPGA designs for instance SPARC rtems objcopy O srec will convert a object file program into srecord format that can then also be downloaded to flashram In the last step the CPLD has to be reprogrammed again so that it will initialize the FPGA on powerup using the data in the flashram region 0x0 0x100000 first megabyte This is done using the flash to fpga svf design file 3 For flash to fpga svf to work the exo file has to be created so that the FPGA design is uploaded to flashram region 0x0 0x100000 On powerup the CPLD will now initialize the FPGA with the design from flashram region 0x0 0x100000 4 After that the FPGA will begin operation For the LEON SoC design this will mean that the FPGA is connected to RAM 2MB and ROM flashram region 0x100000 0x20000 second megabyte 5 8 1 1 2 Modelsim Modelsim from Mentor Graphics Corp 9 simulation environment comprise of the four com mands vlib vmap
60. he old schedule function s stack frame of the task that should be activated and that had been deserted on a previous switch The schedule sequence is visualized in figure 9 5 9 6 and 9 7 gt Process Process gt Process Struct 1 Struct 2 Struct 3 gt 2 222 o RE ut esse D window spill E window 5 1 8 3 g N window spill el 4 window spill rj 2 Trap Frame trap SSS window spill window spill N pee eg ate Reet ee e Kernel Stack Y Trap Frame trap 1 1 User Stack 3 i Kernel Thread 1 1 Be SETS OS i window spill 1 windo v spill deserted schedule frame window spill 1 i 1 i i 1 1 Figure 9 5 Process 1 running before call of schedule There is a special case when switching processes The fork command will create a new task who s stack will have to be initialized by hand before letting schedule jump into it Kernel threads seen in the right row of the above figure completely run in the kernel stack They do not have a user stack On interrupts and traps generated while a kernel thread is running the trap frame is placed just on top of the current stack frame The interrupt and irq h
61. he rtems documentation section from 2 or by searching the internet The commands and their switches that are used most often are e sparc rtems gcc v output all command issued by gcc T seg addr link segment seg to addr T linkscript use a custom linkscript The powerful format of the link script is described in the GNU documentation of the Id command e sparc rtems objcopy remove section sect remove unused sections i e comment adjust vma addr Reallocate sections to addr 58 CHAPTER 8 DESIGN FLOW O format O srec will output srec format that is used by xsload e sparc rtems objdump d dump disassembly s output hex dump x dump all information about all sections 8 1 XESS XSV800 BOARD DEVELOPMENT Synthesis vhd synplify pro Modelsim vhd dc shell lt vcom Xilinx edf C 5 ngdbuild library ned mE d2vhdl ee jel4 ngd2v map Ram Rom vum _ Model 1 ngdanno par image par ncd bitgen bit promgen 2 p exo lt 0x100000 linkscript Software c k 1 re sparc rtems gcc Xess board svf exo bit xsload Xess board Uart ein CPL
62. her changes made to 40 7 MMU design components 43 buncouonal DVelVIOW tx es ko priu dex ese deu eR ee de edes 45 7 2 COMpOnent OVERVIEW a RV ards Re eR Um rx Red goi ae diego pe dns 50 7 2 1 Memory Management Unit MMU 50 7 2 2 Translation Lookaside Buffer 50 7 2 3 Translation Lookaside Buffer Entry TLBCAM 50 Table Walk 5 dea ge BAM Ge Rue ces 51 7 2 5 Least Recently Used LRU amp LRU entry less 51 7 3 Possible future 51 7 3 1 1 cycle penalty implementation 51 7 3 2 cycle penalty implementation 51 7 3 3 Flush optimization 53 8 Design Flow 55 8 1 XESS 5 800 board 55 Westen ue quo 3S A S UR NR eS RS e 55 BALA XESS boardi xs up ki a RS A 56 S I 1 2 Modelsim XOU 56 821152 Synthesis d Sy RACE AR ARS 57 B LI d SoftWare answer Bee eR 57 9 Linux kernel 61 9 1 Linux file organization and 62 9 1 1 LEON dependent parts 63 9 1 1 1 make COMMS 63 9 1 1 2 63 9 2 Einu noue o x
63. ical writebuffer in which case the translation is done before initializing the writebuffer The difference of a physical and a virtual writebuffer is shown in figure 6 4 6 3 1 1 Virtual writebuffer A virtual writebuffer implies that on a MMU exception the pipeline state can not be recovered because a exception will take place after the pipeline has already continued for some time the exception is deferred Without extra precautions this leads to some situations where a trap in 38 CHAPTER 6 DESIGN OPTIONS PROC PROC synchronized on store WRITE 2 synchronized on store MMU synchronized on store WRITE not synchronized Buffer on store Amba bus Amba bus Figure 6 4 left physical writebuffer right virtual writebuffer trap would occur which is prohibited in SPARC and would force the processor in error mode One example which does not occur in LEON because the writebuffer in LEON is only a one element writebuffer would be if 2 succeeding memory writes that both cause an exception would be stored in the writebuffer In this case the second store could not be emptied after the first exception caused a trap Another example that could occur on LEON and that would need to require to add extra logic would occur when ICache and DCache would both cause a trap at the same time Jiri Gaisler pointed to this problem st 11 12 add 0x1 10 If st would trap in DCache
64. inet eive depo qs ice ute DERE Geka ee kee 22 1 Register Windows PES deo ses 4 2 0 SPARC instruction overview 5 SPARC V8 Reference MMU SRMMU 5 1 SPARC SRMMU translation overview 5 2 ASI Alternate Space Instructions 5 2 1 ASI MMU register access 5 2 2 ASIE flush probe 4 5 iuis os dre wa dee e os US screen Ach TES 3 2 2429 Probe alate S wur ata ee Ac ei AE AER ES 5 2 3 ASI MMU diagnostic access I D TLB 5 2 4 ASI MMU physical address pass 22 97 BASE VDEache Huch uocatur Ge leh ae Geeks debvR Hi 6 Design options 6 1 Physically tagged physically indexed PTPD 6 2 Physically tagged virtually indexed PT VI 5 11 11 12 14 15 16 17 18 18 20 23 23 23 23 24 27 27 28 29 31 32 32 32 32 32 6 CONTENTS 6 3 Virtually tagged virtually indexed VTVD SRMMU 37 DEA MER co 2 Gd Si Sei 37 6 3 1 1 37 6 3 1 2 _ 38 6 4 Dessen 8 s uos wusste S AE XE Io a Ce ed 38 6 4 4 VTVI DCache physical writebuffer DCache vhd 40 6 4 2 VTVI ICache 40 6 4 3 Ot
65. isters and to minimize the miss rate by choosing a appropriate replacement scheme In VIPT or PIPT cache designs TLB du alport tag cam or split instruction data TLBs are required to get reasonable speed Because LEON has to run on resource poor hardware the VIVT cache solution was best suited that al lows a mixed Tag Cam Data Ram on a combined Instruction Data cache TLB TLB receive the already serialized ICache and DCache requests from MMU 7 2 3 Translation Lookaside Buffer Entry TLBCAM The TLB component is shown in figure 10 0 2 The TLBCAM numbers can be configure from 2 32 entries The TLBCAM components implement the concurrent comparison of the full associative TLB It also implements the con current flush and probe operation In the current implementation the flush match logic which is working concurrently inside TLBE could be moved to TLBCAM because the flush operation is done one by one anyway see figure 7 1 5 However this would require to add lines to access the TLBCAM s tags which in turn would require a address decoder to access a specific tag out of the TLBCAM array The tagout line of the TLBCAM component is doing just this but it was only added for ASI space MMU I DCache diagnostic access that can be removed later 7 3 POSSIBLE FUTURE OPTIMIZATIONS 51 7 2 4 Table Walk TW The TW component is shown in figure 10 0 4 The main functions of the Table Walk component are e Perform a table walk on normal t
66. iven below It would map the whole virtual address space of context 0 one to one to the physical address space using region page ta ble entries where at 0x40000000 segment page table entries are used to cover the whole address space G 0 0 EXEC 064001 M 0 64 40000000 EXEC A example output of a image run is given below eiselekd ralab10 image init task 1 line 6 allocate for 0 0 0 0000000000 init task 0 add PGD tbl to CTX pgdia 0040009000 ctxia 0040009400 add PMD tbl to PGD pmdia 0040009800 at pgdia 0040009400 add pte to PMD tbl 1 0040009800 pageia 0040000000 lt size 0000040000 gt flags ci di r wi 2 p pr setflags from 0004000002 to 000400000a ored 0000000008 line 8 allocate file testprogs pl o taskid 1 0 0 0 init Loading testprogs pl o rs 356 aligned 356 Copy file content to ia 004000b000 converting with htonl add PGD tbl to CTX pgdia 0040009004 ctxia 004000c000 add PMD tbl to PGD pmdia 0040009900 at pgdia 004000c000 add PTE tbl to PMD pteia 0040009a00 at pmdia 0040009900 flags c d r e x w v p pr setflags from 0004000b02 to 0004000b0a ored 0000000008 add 4k pte to PTE table pageia 004000b000 at pteia 0040009a00 vaddr 1 0 0 0 line 9 allocate file testprogs p2 o taskid 1 0 0 1 Loading testprogs p2 o rs 80 aligned 80 Copy file content to ia 004000
67. l entity for a LEON design running on a XSV board that accesses flashram The subdirectory mmu xess ucf include the ucf constraint files used for the Xilinx tool ngdbuild The ucf file define the pin mapping used for a design For instance when using the top level entity mmu xess vhdl leon rom vhd running on 25 MHz the constraint file mmu xess ucf xsv rom 25hz ucf should be used 12 Appendix C MMU source 12 1 Source code The source code of the MMU components is appended after page 88 87 88 CHAPTER 12 APPENDIX C MMU SOURCE Bibliography 1 Daniel Bretz Digitales Diktierger t als System on a Chip mit FPGA Evaluierungsboard Master s thesis Institute of Computer Science University of Stuttgart Germany February 2001 2 OAR Corp RTEMS Documentation http www oarcorp com rtems releases 4 5 0 rtemsdoc 4 5 0 share index html 2002 3 OAR Corporation RTEMS Web Site http www oarcorp com 2002 4 XESS Corporation XSV Board Manual http www xess com 2001 5 Marco Cesati Daniel P Bovert Understanding the linux kernel O Reilly 2001 6 Virtex Datasheet DS003 2 v2 6 Virtex 2 5V Field Programmable Gate Arrays http www xilinx com July 2001 7 Source distribution MMU for Leon http www tamaki de data 2002 8 Jiri Gaisler LEON Web Site http www gaisler com 2002 9 Mentor Graphics homepage Mentor Graphics http www mentor com 2002 10 Free Software Founda
68. l spill out the register window onto the stack The OS which handles the window spilling and the program share the stack and have to be programmed to work together This is done in the following way The compiler will generate programs that will always keep a valid stack pointer in 9606 sp after a SAVE respective 9616 fp before a SAVE The SAVE command that increments the stack pointer is generated so that the first part of the function frame is reserved for a possible window spill Therefore for a given window that should be spilled the OS can use the sp and fp registers to determine the stack location Figure 9 4 in addition visualizes the kernel stack concept Each process has a fixed size memory chunk P which contains the task struct process structure at the beginning The rest 10This border can be multiple windows wide be more precise the compiler 12A function body of a program generated by the sparc rtems gcc cross compiler not written to target Linux will reserve at least 104 bytes of spilling memory on the stack by issuing SAVE sp 104 sp On the other hand the Linux OS will on a 8 windows SPARC processor with FPU disabled only spill 64 bytes sparc rtems gcc could be optimized here SPARC this is of size 0x2000 and aligned to 0x2000 68 CHAPTER 9 LINUX KERNEL Process Struct
69. letion the result is returned to the MMU from the TLB a translation operation will return the physical address that is used in DCache to initialize the Writebuffer 9 and in ICache to initialize the instruction address buffer 10 that is used for increment in streaming A probe operation will return the probed Page Table Entry A translation operation will also check for permissions on a protection or privilege violation a exception is raised 11 DCache ICache and Table Walk AMBA requests are handled using one single AMBA bus master interface 12 Another alternative view is given in figure 7 2 that visualizes the translation process as a pipeline with 4 stages Fetch Table Lookaside Buffer Table Walk and Memory Re quest This figure is only for making the translation concept explicit 7 1 Functional overview The three main MMU operations are translation flush and probe Each of these operations is described e Translation operation on a TLB hit is shown in figure 7 1 1 In the pipeline schematic figure a zero wait state translation would be realized by feeding in translation ad dresses while cache hit check is made The pipeline propagation from Table Lookaside Buffer to Table Walk in case of a miss or Memory Request in case of a hit would be blocked if the cache asserted hit no translation needed Either DCache or ICache translation request would be allowed to perform such a pre translation
70. mmands share a single writebuffer to issue memory requests 2 therefore read and write commands have to wait until the writebuffer empties in which case the pipeline stalls On a read the pipeline will of course wait for the result to be returned from memory On a store the pipeline will stall until the writebuffer is empty The memory result will be aligned 3 and is merged into the current cache line 4 Figure 3 2 1 LEON DCache schematic and state transition diagram tion cache ICache share one single Advanced Microcontroller Bus Architecture AMBA Ad vanced System Performance Bus ASB master interface to access the memory controller 3 2 1 Data cache DCache The LEON DCache is a direct mapped cache configurable to 1 64 kbyte It has a one element writebuffer that operates in parallel to the pipeline after a store operation has initialized it The write policy for stores is write through with no allocate on write miss The data cache is divided into cache lines of 8 32 bytes Each line has a cache tag associated with it containing a tag field and one valid bit per 4 byte sub block 18 A simplified DCache schematic is shown in figure 3 2 1 Figure 3 2 2 shows the DCache miss behavior This part is especially important to under stand because the address translation will have to be done here 3 2 2 Instruction cache The LEON instruction cache is a direct mapped cache configurable to 1 64 kbyte The in struction
71. nx tool chain with the tool promgen 8 1 XESS XSV800 BOARD DEVELOPMENT 57 e vsim when starting vsim from the command line it will start the interactive GUI that is controlled either from menus or using tcl commands vsim lt entityname gt entered in tcl will load a design for simulation See the Mentor Graphics User Manual for a full description of all tcl commands available A typical session in vsim would be Load a entity vsim LEON Show wave window view wave Load a saved wave list do savefile do run vsim 10000 8 1 1 3 Synthesis Synplify Pro from Synplicity Inc 12 was used for synthesis Synplify Pro can either be oper ated interactively in GUI or in batch mode Batch mode is called by using the batch lt batchfile gt switch A batch file is created by copying the Snyplify project file which is itself a tcl script and appending the project run command line If multiple configurations should be compiled in one run tcl loop constructs can be used i e set freqencies 25 000 20 000 10 000 foreach freq fregencies set option frequency freq set option top module LEON rom project result file work batch freq 800hq240 6 rom freq edf project run 8 1 1 4 Software The test software requires the gnu cross compilation system leccs to be installed that can be downloaded from 8 A documentation of the gnu gcc 14 as and the binutil suite can be downloaded in t
72. on for stores and source for loads addressing mode is register indirect with an optional offset either immediate or register This enables a simple pipeline with only one memory stage A reference to a absolute memory address will take up to 3 instructions two instructions for initializing the address register load lower 13 bit and load upper 19 bit and one for the memory operation If a base pointer is already loaded into a register like the stack pointer a reference will take only one instruction if a immediate offset is used two instructions if a register has to be first loaded with the offset Figure 4 2 gives an instruction layout overview There are 3 main formats Format 1 represent absolute jumps format 2 represent the command for initializing the upper part of a 32 bit register SETHI and for conditional branches format 3 represent the remaining arithmetic and control commands The callee has to issue a SAVE at the beginning and a RESTORE at the end when returning 4 2 SPARC V8 Format op 1 CALL 1 3 0 disp30 Format 2 0 SETHI amp branches op rd op2 imm22 31 30 29 25124 22 21 op2 disp22 3130 29128 05 24 22 2 Format 3 op 2 or 3 Remaining instructions op rd 3 rsl opf rs2 31 30 24 5 19 18 1411 5 4 op rd 3 rsl i 0 asi rs2 2 2 Tm ET op rd op3 rsl i 1 simm13 Figure 4 2 SPARC instruction over
73. ontent index 0 bank 0 Abits 19 Echk 4 Read 1507328 bytes Tying to open file testos ram dat b0 arl dump Content index 1 bank 0 Abits 19 Echk 4 Read 1507328 bytes Tying to open file testos ram dat bl ar0 dump Content index 0 bank 1 Abits 19 Echk 4 Read 1507328 bytes Tying to open file testos ram dat bl arl dump Content index 1 bank 1 Abits 19 Echk 4 Read 1507328 bytes Tying to open output file testos ram dat merge x dx 11 6 2 Small Operating System SOS SOS is a set of routines for testing of the MMU They are also used by the image tool to create static page table hierarchies 86 CHAPTER 11 APPENDIX B MMU DISTRIBUTION 11 77 Subdirectory mmu vhdl This directory contains the sources for the MMU and the changed LEON design files The newly added files for the MMU are mmu vh mmu config vhd TLB cam TLB TLB vhd tw vhd Iru vhd and Irue vhd These files are described in chapter 10 The LEON design files that have been changed are marked with a postfix m These acache m vhd ICache m vhd DCache m vhd cache vhd cachemem vhd iu vhd iface vhd and sparcv8 vhd 11 8 Subdirectory mmu xess XESS board development This directory includes the files needed when targeting XESS XSV800 and XSV300 boards The mmu xess svf subdirectory include CPLD designs that are used for reprogramming the CPLD so that flashram can be accessed and that the UART signals are routed out The mmu xess vhdl con tain the top leve
74. or an example 11 5 Subdirectory mmu scripts XESS board development The subdirectory mmu scripts include scripts to ease hardware development targeting a XSV board The scripts automate the tool chain handling and the XSV board configuration To use these scripts add this directory to your PATH environmental variable 11 5 1 syn pl Xilinx tool chain build scripts syn pl covers part of the Synthesis domain in design flow in figure 8 1 which simplifies working with the Xilinx tool chain It uses the configuration file mmu scripts syn config to determine the directories where to search for ucf and edf files Adjust syn config to your environment Various ucf files for xess board are provided in the subdirectory mmu xess ucf syn pl lets you interactively e choose target technology for XSV800 or XSV300 e choose ucf file e choose edf file e choose commands e save load batch file An example of a saved batch file of syn pl would be bin sh UCFDIR home eiselekd mmu scripts xess ucf WORKDIR home eiselekd mmu scripts syn work batch 25 000 800 4240 6 rom 25 UCF xsv romdsu 25hz ucf TECH XCV800HQ240 6 cd WORKDIR ngdbuild a p XCV800HQ240 6 uc UCFDIR UCF 800hq240 6 rom 25 map u 800 4240 6 rom 25 echo Maybe you should change the overall efford level using l switch 0 5 par w 800hq240 6 rom 25 800 4240 6 rom 25 par bitgen w 800 4240 6 rom 25 par 800 4240 6 rom 25 promgen p
75. ory request 7 Until it arrives the pipeline stalls again 8 9 10 11 repeat this pattern Figure 3 2 4 ICache miss pipeline behaviour 22 CHAPTER 3 SYSTEM ON A CHIP PLATFORM LEON Chapter 4 SPARC standard The LEON integer unit implements the SPARC Architecture Manual V8 standard This chapter tries to give a brief overview of it s RISC nature For the details refer to 21 4 4 RISC Other than the CISC architectures which where developed by commercial companies the RISC architecture emerged from a research and academic surrounding 19 The RISC key phrase was coined by the Berkeley RISC I II project led by David Patterson at UC Berkeley dating back to 1980 The RISC I architecture later became the foundation of Sun Microsystems s 13 SPARC V7 standard commercialized by SPARC International Inc 20 Another famous RISC architecture that resembles this development is the MIPS machine developed at Stanford led by John Hennessy later commercialized by MIPS Technologies Inc 15 4 2 SPARC V8 The current version 8 V8 of the SPARC standard was first published in 1990 and can be downloaded from 21 Like other reduced instruction set RISC architectures it s features include a fixed size instruction format with few addressing modes and a large register file The distinctive feature of SPARC is it s windowed register file where the instruction s source and destination register addresses are offseted by t
76. ppendix I are already used by LEON another partitioning was used e iu vhd the pipeline was slightly changed to propagate the supervisor mode flag to the execution stage 42 CHAPTER 6 DESIGN OPTIONS 7 43 44 CHAPTER 7 MMU DESIGN COMPONENTS MMU design components Pipeline Decode Fetch Memory Write vaddr vaddr fault data fault data Flush Probe diag diagnostic access Translation Translation to AMBA bus to AMBA bus buffer update on miss TLBCAM T Syncram H TLBCAM Table Lookaside Buffer Memory Management Umit Table Walk pte writeback diag Mem Ctrl Amba bus Figure 7 1 MMU schematic 7 1 FUNCTIONAL OVERVIEW 45 Figure 7 1 gives an overview of the MMU as a whole It s individual components MMU TLB TLBCAM Table Walk LRU will be described in the next chapter in more detail The figure tries to visualize the data paths in a simplified way ICache and DCache receive virtual addresses for translation 1 in addition DCache will also handle the various ASI identifiers for the MMU MMU Flush Probe and MMU I D diagnostic access will be forwarded to the MMU 2 the other ASI identifiers
77. ranslation e Perform a table walk on a probe operation e AMBA interface used for writeback operation of modified page table entries The Table Walk component will traverse the page table hierarchy by issuing memory requests to the AMBA interface The table walk is finished when a PTE is accessed a invalid page table entry was found or if a memory exception occurred 7 2 5 Least Recently Used LRU amp LRU entry The LRU and LRUE component are shown in figure 10 0 6 and figure 10 0 5 To get reasonable TLB hit rates a simple Least Recently Used LRU logic was added In principle it is a array of TLB addresses 1 n where referenced addresses are marked with a bit and bubble to the front The entry at the tail determine the address of the next TLB entry to replace For instance if TLB entry 3 asserted a hit on a translation operation the corresponding element in the LRU address array that contains 3 will be marked On each clock cycle it will move one position toward the top of the array moving away from the tail position The hardware needed for this scheme to work is one comparator for each array element and a swap logic for each adjacent array position 7 3 Possible future optimizations 7 3 1 1 cycle penalty implementation By using syncram output unregistered in the first cycle in I DCache to drive the AMBA bus or by implementing the entries data part in registers instead of syncram the translation can be reduced to one cycle wit
78. re installed e Synthesis Synplify Pro Synplicity Inc 12 e Xilinx Xilinx ISE 4 1 6 e XESS xsload tool a Linux port can be found at 1 25 e Cross compilation suite leccs for building the test software 8 11 2 Subdirectory mmu modelsim This directory contains the Modelsim compile do script that will compile the processor model in the correct order Modelsim has to be started from the mmu directory as root for this to work qu 80 CHAPTER 11 APPENDIX B MMU DISTRIBUTION Alternatively the make vsim executed in the mmu directory that uses Makefile build system can be used 11 3 Subdirectory mmu syn This directory and the other projects subdirectory contains various Synplify pro projects The different projects differ only in the target frequency and the technology settings XCV800 or XCV300 the use the same VHDL files The design is not tested for other synthesis tools like Synopsis design compiler The top entity is mmu xess vhdl leon rom vhd that targets the XSV Board with flashram accessed as ROM 11 4 Subdirectory mmu tbench 11 4 1 Testbenches for MMU components mmu tbench comp The MMU test suite require two files a memory image and a virtual address vector file The memory image is created using the image tool see section 11 6 1 The address vector file is created using the image tool in analyze mode a and option v to extract the address vectors from the created image 11 4 1 1
79. ries cache e Updating page table entries flags e Exception signals e Table walk The most primitive form of translation would be to raise an exception on every memory access and let the OS do the translation from virtual to physical addresses in software Hardware sup port accelerates this process by adding the Table Lookaside Buffer TLB which is in principle a cache of previous successful translations In most cases it is build as a full associative cache With an appropriate processor design that tightly integrates the TLB into the overall structure the translation can be done without any delay on a TLB hit In the course of this diploma thesis it became clear that it is hard to add a TLB with zero wait states to a design previously not designed with a MMU in mind On a TLB miss the page tables has to be traversed Table walk This can be done in hardware or in software The advantage of a software TLB miss handle could be that an advanced TLB updating scheme could be implemented to minimize TLB misses Nevertheless TLBs generally have a high hit ratio Additional hardware support is provided by updating the referenced and modified flags of a page table entry and checking access permissions The referenced flag logs any accesses to the page the modified flag logs write accesses to a page These flags in turn will be used by the OS on swapping operations On a privilege or protection violation the hardware raises a signal that cause the proces
80. s only needed to determine the sizes of the text data and bss sections which are needed for the boot loaders and for building the static page table hierarchy This static page table hierarchy for the boot loader is created by the helper program SPARC leon boot create pth that outputs a page table hierarchy in srecord format that will be linked at the end of the data section of the final linux vmlinux leon nc image 3In a future distribution when switching from architecture sparc to architecture leon sparc the xconfig screen could be reprogrammed completely The current version is just a interim solution The testbench will look for tsource linux ram dump and tsource linux rom dump relative to the di rectory where vsim was started The make rule make sim will copy sim vmlinux text dump and sim vmlinux data dump to mmu tsource linux rom dump and mmu tsource linux ram dump if the directory mmu tsource linux exist Therefore you have to add a symlink mmu to your home that points to the root of the MMU distribution gt See section 9 2 for difference between hardware and simulation image For the linker instead of the SPARC vmlinux lds linker script SPARC leon vmlinux lds is used The original linker script SPARC vmlinux lds defined more sections than the standard text data and bss sections Therefore the LEON linker script SPARC leon vmlinux lds had to combine all sections in text data and bss
81. scripts annotate pl 0 00052144 0x400001c7 call 0xf00b2860 lt __init_begin gt lt start_kernel gt rf 0xf00b2144 init begin Ox 00b2148 0x01000000 lt __init_begin gt Ox 00b2860 0x9de3bf90 save sp 0x70 sp start kernel r0x7e 0xf00b1f50 Ox 00b2864 0x133c029e sethi hi 0xf00a7800 01 start kernel 10 79 0 00 7800 Ox 00b2868 0x7ffdbf50 call 0xf00225a8 start kernel printk r0x7f 20xf0002868 start kernel Ox 00b286c 0xd002630c 1 01 0x30c 00 start kernel r0x78 0xf00955a0 bitops end Ox 00225a8 0xd223a048 st ol sp 0x48 lt printk gt Ox 00225ac 0xd423a04c st 02 5 0 4 lt printk gt Ox 00225b0 0xd623a050 st 03 sp 0x50 lt printk gt Ox 00225b4 0xd823a054 st 04 sp 0x54 Xprintk Ox 00225b8 0xda23a058 st 05 sp 0x58 lt printk gt Ox 00225bc 0x81c3e008 retl lt printk gt Ox 00225c0 0x90102000 mov 0 00 lt printk gt r0x78 0 Ox 0062870 0x400003bf call 0xf00b376c lt start_kernel gt lt setup_arch gt r0x7f 0xf00b2870 lt start_kernel gt 0x f00b2874 0x9007bff4 add 00 start kernel r0x78 0xf00Db1fb4 Ox 00b376c 0x9de3bf98 save sp 0x68 sp setup arch r0x6e 0xf00blee8 Ox 00b3770 0x113c0010 sethi hi 0xf0004000 00 setup arch lt __stext gt r0x68 0xf0
82. sor to trap Or dynamically using the mmap call in Linux Another interesting feature of segmentation is the possibility of dynamic linking at runtime a feature proposed by the late MINICS architecture In a segmented Memory Management scheme a function would be a segment with the appropriate access rights A jump to function n would equal in jumping to the offset 0 of segment n If any program uses a distinct n at compile time relinking a new function for all running programs in the running system would be possible by exchanging the segment descriptor in the segment descriptor table at position n No re compilation of programs or rebooting of the whole system would necessary 23 3 System on a Chip platform LEON This chapter gives an overview of the LEON architecture When adding a MMU to LEON it had to be placed somewhere between the integer unit the instruction cache ICache the data cache DCache and the AMBA memory interface After giving a brief overview over the global LEON system architecture the interaction of the LEON pipeline with the DCache and ICache will be presented in more detail Figure 3 1 shows a simplified overview of the LEON architecture Integer Unit iu vh Debug Support Data Cache dcache vhd Instruction Cache icache vhd MMU mmu vhd Unit dsu vhd Amba Interface acache vhd Advanced High Performancs Bus AHB sahbars
83. ss compilation system tsim LEON simulator 3 1 LEON pipeline The LEON integer unit IU implements SPARC integer instructions as defined in SPARC Ar chitecture Manual V8 It is a new implementation not based on previous designs The imple mentation is focused on portability and low complexity nevertheless it is very tightly woven making it hard to integrate new features and understand the source code The LEON pipeline is a 5 level pipeline fetch decode execute memory and write back stage 18 e FE Instruction Fetch If the instruction cache 1s enabled the instruction is fetched from the instruction cache Otherwise the fetch is forwarded to the memory controller The instruction is valid at the end of this stage and is latched inside the IU e DE Decode The instruction is decoded and the operands are read Operands may come from the register file or from internal data bypasses CALL and Branch target addresses are generated in this stage e EX Execute ALU logical and shift operations are performed For memory operations e g LD and for JMPL RETT the address is generated Real Time Operating Systems like eCos and RTEMS are aimed on a system with small memory footage and realtime requirements suitable for deeply embedded applications For instance a simple Hello world application with the RTEMS RTOS linked to it would require 133k of memory and can easily be place into ROM Embed ded applications running on
84. taining preinitial ized page table hierarchy can be directly loaded into the XESS boards memory Note link script parameters and o offset has to match The format of the configuration file is composed of one command each line Comments are trailed by e allocate context level page 4GB of context ctxnr with vaddr at physical addr addr c ctxnr addr flags e allocate region level page 16 MB of context ctxnr with vaddr index 1 at physical addr addr ctxnr index1 addr flags e allocate segment level page 256k of context ctxnr with vaddr index1 index2 at physical addr addr m ctxnr index1 index2 addr flags e copy file into memory and map it starting from virtual address index1 index2 in dex3 of context ctxnr p ctxnr index1 index2 index3 file flags 84 CHAPTER 11 APPENDIX B MMU DISTRIBUTION e fill whole region index1 of context ctxnr with segments starting at physical addr addr M ctxnr index1 addr flags e fill whole context ctxnr with regions starting at physical addr addr G ctxnr addr flags e remove a pte entry at level level of along the path of virtual address index1 index2 index3 of context ctxnr ctxnr index1 index2 index3 level Values allowed for flags are CACHE DIRTY REF EXEC WRITE VALID PRIV and PRIV RDONLY An example of a configuration file is g
85. the TLB s match operation for a translation request 2 work in parallel On a DCache miss which would cause a translation request the TLB s translated address would already be ready if a TLB element hit 3 By using the optimizations suggested in the previous section 7 3 1 4 the AMBA request could be issued right away Figure 7 3 1 0 cycle penalty implementation 7 3 POSSIBLE FUTURE OPTIMIZATIONS 53 7 3 3 Flush optimization Because only few TLBCAM entries generate a hit on a TLB flush a future optimization could be to check sets of hit signals in one clock cycle i e 0 3 4 7 instead of one after another Using a priority encoder the next hit element if any of a block could be determined or the whole block could be skip right away 54 CHAPTER 7 MMU DESIGN COMPONENTS Chapter 8 Design Flow 8 1 XESS XSV800 board development ATX Power Supply Connector 572K 16 SRAM Stereo input N gt 100 MHz Proy Output Jacks Fe XSV Prototyping board taken from 25 8 1 1 Design flow The design flow in the previous figure is partitioned into four regions e XESS board e Synthesis e Modelsim e Software 55 56 CHAPTER 8 DESIGN FLOW Each of the above items are described in the following subsections 8 1 1 1 XESS board For a technical description of the XESS board refer to 4 For a description of the Xilinx XCV300 800 Virtex FPGA chip refer to 6 Figure 8 2 shows a configuration of t
86. tion Inc GNU http www gnu org 2002 11 RedHat Inc eCos http www redhat com 2002 12 Synplicity Inc Synplicity http www synplicity com 2002 13 Sun Microsystems Sun Microsystems Inc http www sun com 1999 14 Milan Milenkovic Microprocessor memory management units IEEE Micro 10 2 p70 85 1990 15 MIPS MIPS Q Technologies Inc http www mips com 2002 16 E I Organick The Multics Multics 1972 17 Embedded Linux Microcontroller Project cULinux http www uclinux org 2002 18 Gaisler Research The LEON 2 User s Manual http www gaisler com 2002 89 90 19 20 21 22 23 24 25 BIBLIOGRAPHY Michael I Slater Risc Multiprocessors 1992 Inc SPARC International SPARC International Web Site http www sparc com 2002 Inc SPARC International SPARC Architecture Manual Version 8 http www sparc org 1992 sun hardware faq SUN SPARC models http www sunhelp org faq sunrefl html http www sunhelp org faq 2002 Albert S Tannenbaum Andrew S Woodhull Operating Systems Design and Implemen tation Second Edition 1997 LEOX Team LEOX http www 1 2002 XESS Corperation web site XESS Web Site http www xess com 2002 Ich versichere hiermit dass ich diese Arbeit selbst ndig verfasst und nur die angegebenen Hilfsmittel verwendet habe Konrad Eisele
87. vcom and vsim e vlib creates a design library directory It is typically called as just vlib work which will create the work library needed by vcom e In the vhdl source use lt libraryname gt lt package gt lt all gt statement will search for libraryname by resolving the libraryname to librarypath mapping which is defined in the file pointed to by MODELSIM libraryname lt librarypath gt will modify this file By just typing the current mappings and the location of the current modelsim ini file can be displayed e vcom vcom will compile a VHDL source file into the work library into the library path that has been created by vlib lt librarypath gt and mapped by work lt library path gt If a VHDL component is instantiated the component is searched in the current work library and linked Therefore on gate level simulation of LEON the gate level model has to be compiled into the work library before the testbench is recompiled so that the testbench is linked to the gate level model When calling xsload exe the first step of reprogramming the CPLD to connect to flashram is actually done automatically xsload will search for flashprog svf in the directory pointed to by the XSTOOLS BIN DIR envi ronmental variable 2 3 steps are automated using the script loadexo sh in the Distribution see section 11 5 This is done in the Xili
88. vd Advanced Peripheral Bus AHB AMBA Arbiter AHB APB Bridge Debug Ro Timer bpbmst vhd Serial Memory controller sica timers vhd mctrl vhd Link Ports UARTs dcom vhd ioport vhd uart vhd mcore vhd UART PROM I O SRAM SDRAM PI O Ports The LEON source distribution is a synthesisable VHDL implementation of the SPARC Ar chitectural Manual V8 standard It was developed by Jiri Gaisler and can be downloaded from Figure 3 1 Simplified LEON overview 8 It is provided under the GNU Public License GPL 10 It s main features are e Integer unit e Floating point unit e On chip AMBA bus making it easy to integrate custom ip blocks into the system 15 16 CHAPTER 3 SYSTEM ON A CHIP PLATFORM LEON e Cache subsystem e Hardware debug unit e Memory controller e UART On the software side the following packages are available e RTEMS Real Time Operating System RTOS 3 8 which features a Posix API RTEMS is currently the standard application platform for programs running on LEON e Just recently a port of the eCos RTOS from RedHat Inc 11 had been announced by Jiri Gaisler 8 which features a compatibility layer EL IX that implements a POSIX API and some of the Linux APIs e uCLinux OS port for LEON 17 24 which is a OS based on Linux that supports proces sors with no MMU e lecc GNU based cro
89. view taken from 21 p 44 25 26 CHAPTER 4 SPARC STANDARD 5 SPARC V8 Reference MMU SRMMU The MMU for LEON that is the target of this diploma thesis implements a MMU that is compli ant to the SPARC Reference MMU SRMMU The SPARC Architecture Manual V8 21 does not require a MMU to be present the standard rather specifies a reference MMU in Appendix H that is optional to implemented However all commercial SPARC V8 implementation follow the SRMMU suggestion The main features of SRMMU are e 32 bit virtual address e 36 bit physical address e Fixed 4K byte page size e Support for sparse address spaces with 3 level map e Support for large linear mappings 4K 256K 16M 4G bytes e Support for multiple contexts e Page level protections e Hardware miss processing Table Walk The following sections will give an overview For more information refer to 21 5 1 SPARC SRMMU translation overview Figure 5 1 gives an detailed overview of the translation process and the data structures that are involved The first level of the page table hierarchy is that of the Context Table 1 It is indexed by the Context Number CTXNR a register that is initialized with a unique number that is associated to each process On a process switch this register has to be updated The 1 4 levels of the page The Context Number together with the virtual address form the Cache s tag this way cache synonyms are avoided different
90. wait state has to be inserted to retrieve the tag for that command after the current memory command has written its value to the cache s syncram The memory command will therefore stall in write stage while the memory request is issued 6 When result is ready it is strobed into the pipeline on the dco mds signal bypassing the normal pipeline propagation pipeline still stalls 7 The result is saved to the register file on the falling edge of write stage after which the pipeline can continue 8 Figure 3 2 2 DCache miss pipeline behaviour on a load comand 20 CHAPTER 3 SYSTEM ON A CHIP PLATFORM LEON addr a Cachemem data out tag out line end streaming M T gt miss On a ICache miss the ICache will change into streaming mode The waddr buffer will hold the next memory address to retrieve 1 On each command that has been retrieved waddr will be incremented 2 Figure 3 2 3 LEON ICache schematic translation has to be done when changing into streaming mode 3 2 5 AMBA ASB interface Data and instruction cache have to share one single AMBA ASB master interface Serializing the concurrent ICache and DCache requests is done by the ACache component A simplified ASB query is shown in the figure 3 2 Request Wait Eu ie NI addr y addr out K 8 gt dat

University of Stuttgart Diploma Thesis Design of a Memory

Contents

Download Pdf Manuals

Related Search

Related Contents