Home
fulltext - DiVA Portal
Contents
1. Context Switching in OR1200 Timer Interrupt SUMMATY 2 sce ea dd elke ete ewe hae als 5 WISHBONE Specification and CONMAX IP Core 5 1 Importance of Interconnection Standard 5 2 WISHBONE in a Nutshell 5 2 1 5 2 2 5 2 3 Over Views ses eo ote WR nk eh eee ek ea ai WISHBONE Interface Signals WISHBONE Bus Transactions 5 2 3 1 Single Read Write Transaction 5 2 3 2 Block Read Write Transaction 5 2 3 3 Read Modify Write RMW Transaction 5 2 3 4 Burst Transaction 53 CCONMAX TP Cores votante Oe A BR eR a 5 3 1 Introduction ceso dese Ge Boe BG Se PAG eS 47 47 49 51 53 55 56 58 58 58 59 61 61 63 64 66 67 68 72 73 76 78 viii References 5 3 2 5 3 3 5 3 4 5 3 9 5 3 6 CONMAX Architecture o Register File 0 A de ds Parameters and Address Allocation Functional Notices 004 Arbitration 4 A eN as 6 Memory Blocks and Peripherals 6 1 On chip RAM and its Interface 6 2 6 1 1 6 1 2 6 1 3 6 1 4 6 1 5 6 1 6 On chip RAM Pros and Cons ALTERA 1 Port RAM IP Core and its Parameters Interface Logic to the WISHBONE bus Data Organization and Address Line Connection Memory Alignment and Programming Tips Miscellan
2. 120 List of memory blocks and peripherals 126 XV This page is intentionally left blank Chapter 1 Introduction This chapter intends to give an introduction and a chapter overview to the master thesis made by Xiang Li and Lin Zuo 1 1 Background and Motivation The goal of the thesis is to implement a low cost computing platform by exclusively using open source Intellectual Property IP cores The idea was brought by Johan J rgensen the hardware team leader from ENEA 1 We have discussed the reasons to start this project which are listed below e ENEA wants to gain knowledge and insight in the viability of using Open cores e Based on the knowledge they can decide whether or not to start busi ness involving HW design ASIC FPGA soft hard IP cores e If the platform is successful it might be continued and improved as a product e This project also proves some embedded applications can be done with the open cores 3 factors of the recent development and innovation of the System on Chip SoC technology give the possibility to do this thesis CHAPTER 1 1 The Design Reuse methodology with IP cores As the increasing complexity of the digital electronic systems and the growing time to market pressure engineers are forced to utilize previ ously made blocks i e IP cores as many as possible into new designs This is called design reuse methodology The methodology produces more and mor
3. In the thesis project the 32KB on chip RAM is dedicated for the bootloader so the FPGA project doesn t have to be updated all the time The 2MB SSRAM is used to store the program received by the bootloader Both the ihex2mif and the bootloader can be found in the project archive file 5 Without a direct JTAG connection for debugging the programs there are 3 workarounds 1 With the serial communication 2 With the Orlksim simulator and the GDB 3 With FPGA verification tools The easiest way of debugging the software is to print characters through the serial connection to the PC When certain information has displayed on the screen it is for sure that the program has run to a specific location among the source codes Another workaround is to simulate the programs with the Orlksim which shows the values of the CPU registers after the program is paused Also the Orlksim can be connected to the GDB where we can simulate from a higher source code level For example a breakpoint can be placed to observe the content of a variable when the program has run to there The Orlksim can even simulate the timer interrupts which we used to debug the RTOS context switch The limitations of the method are 1 it is only off line simulation and 2 it cannot simulate most DE2 70 hardware peripherals It is also possible to borrow the FPGA verification tools for debugging the software mainly to place probes and observe the signals on the CP
4. STB_A TTT ACK A nm mi cYC_B More E STB_B ACK_B Figure 5 7 Master B is blocked by master A In Figure 5 7 only the CYC STB and ACK signals are given because they are enough to describe the WISHBONE protocol First the master A started a block transaction and asserted its CYC at the 3rd clock rising edge At that time because no one took the bus the arbiter gave the grant to the master A and connected it to the slave In the following 2 clock cycles one operation was done However the other operation from the master A was somehow delayed so its CYC had to keep holding as this is a block transaction Because the arbiter gave grants by judging the CYC the master A therefore possessed the network all the time during its CYC was 1 As a result the master B had to wait until the whole A s block transaction was finished although it was ready since the 6th clock rising edge To summarize the example the WISHBONE masters use the CYC to request grants from the arbiter when accessing slaves in multi master systems If the system has no preemption i e once the grant is given to a master it won t be withdrawn if another higher priority master becomes ready the CYC actually can be used to hold the line until all data from a master is 78 CHAPTER 5 transferred Usually the signals of the group 1 and 2 are enough to perform the block transactions through the WISHBONE network but in the specification it also mentio
5. Webpage Free Software from Wikipedia http en wikipedia org wiki Free_software Last visit 2011 01 31 28 CHAPTER 2 12 13 14 15 16 17 18 19 21 22 23 24 Webpage Free Software Movement from Wikipedia http en wikipedia org wiki Free_software movement Last visit 2011 01 31 Website GNU Operating System http www gnu org Last visit 2011 01 31 Website Free Software Foundation FSF http www fsf org Last visit 2011 01 31 Webpage Why Open Source Misses the Point of Free Software by Richard Stallman from GNU http www gnu org philosophy open source misses the point html Last visit 2011 01 31 Webpage The Free Software Definition from GNU http www gnu org philosophy free sw html Last visit 2011 01 31 Webpage BSD Licenses from Wikipedia http en wikipedia org wiki BSD_licence Last visit 2011 01 31 Webpage The BSD License Problem from GNU http ww gnu org philosophy bsd html Last visit 2011 01 31 Webpage The Modified BSD License An Overview from OSS Watch http www oss watch ac uk resources modbsd xml Last visit 2011 01 31 Webpage The BSD License from OSI http www opensource org licenses bsd license php Last visit 2011 01 31 Webpage GNU General Public License from GNU http www gnu org copyleft gp1 html Last visit 2011 01 31 Greg R Vetter Infectious Open Source Softwar
6. Why not sending the MP3 file directly In that case we need the LibMAD working on the DE2 70 This is not so difficult because the LibMAD is written in ANSI C but it uses several C standard Lib functions like malloc which we have no time to make it work on our hardware platform 3 9 Download the temp wav into the DE2 70 board This will be placed in the 64MB SDRAM The command is player exe w temp wav d w specifies the path of the WAV file d tells the system to download After this command is performed the counter on the DE2 70 i e the 7 segment HEX1 and HEXO should start counting In theory we can download any WAV file smaller than 64MB but you probably won t because the downloading speed is too slow We can achieve only 3KB stable UDP speed with the system This may be enough to open a webpage but not capable for music files The low speed is because of multiple reasons first the CPU is running at only 50MHz and more importantly the platform both the hard ware and software is not working efficient enough There are lots of optimizations we could do but just have no time CHAPTER B 151 3 10 When the WAV file has downloaded successfully it looks like Figure 12 But the downloading process could go wrong if it is interfered by other software on the PC In case the HEXO and HEX1 stop changing its value but the Downloading finished doesn t show up we will need to restart the system and try
7. As we can see N operations are now completed in N 1 clock cycles So the bus utilization now is N N 1 If the N is a very large number the utiliza tion will theoretically approach 100 So the burst transactions if they can perform properly are the best scheme to achieve the top throughput By the way the burst transaction is also referred as the advanced syn chronous cycle termination at the beginning of the chapter 4 The WISH BONE specification compares 3 different ways of terminations asynchronous synchronous and advanced synchronous and points out that the advanced synchronous is the best But the conclusion is not well presented because the table in the specification shows that the asynchronous cycle termination is always the smallest of the three The writer forgot to stress an impor tant point again here that the WISHBONE is a synchronous bus standard Even though asynchronous circuits are considered faster we have to pick up a regular and stable way to communicate in the WISHBONE So fi nally the advanced synchronous is chosen because it is the better one in the synchronous schemes More WISHBONE signals are involved in case of the burst transactions because whose protocols are more complicated than the WISHBONE classi cal single block and RMW transactions These signals are the CTI and the BTE i e the signals of the group 6 described in the previous section The CTI is used to identify the burst transactions At e
8. Each action of read or write is called an operation in the thesis A single transaction contains only one read or one write operation while the block RMW and burst transactions can have multiple operations within one trans action According the specification all 4 types of transactions are optional But to be WISHBONE compliant an IP core has to support at least 1 type of transactions to communicate with the others In fact almost all IP cores choose to implement the single transaction because which is the simplest whereas the other 3 are merely used For example all IP cores in the thesis project only work with single transactions 5 2 3 1 Single Read Write Transaction Single read write transaction is the most frequently used Each transaction contains only 1 read or write operation initiated by the master The slave responds to the request by returning wanted data in case of reading or accepting exported data in case of writing Figure 5 3 gives an example of the block diagram of the connections of the WISHBONE signals between a master and a slave The diagram helps to demonstrate how the bus transactions are transmitted from the master to the slave through the connections The readers can just imagine all wave forms below in this Chapter are happening over the wires in the figure data_o 31 0 addr_o 31 0 we_o data_o 31 0 Figure 5 3 An example of WISHBONE signal connections The name of burst
9. The ACK shows that the last data of the burst will be taken care of CHAPTER 5 83 so the master de asserts all signals and terminates the burst Slave The slave latches the 4th data The slave checks the CTI and notices that it is an EOB now So it knows the 4th data will be the last one and does not need to assert the ACK anymore After the first example we hope the readers have understood more about the burst transactions The next one is another example of an incrementing read burst transaction showed in Figure 5 12 Figure 5 12 An example of an incrementing reading burst transaction Now the value of the CTI is no longer CON but INC which shows that the current transaction is an incrementing burst The BTE is now needed for incrementing bursts to present how the address grows The LIN means the address increases in a linear manner So the addresses are B 0 B 1 B 2 The B represents the Base address The B 1 does not mean exactly B plus 1 but rather the address next to the base address For instance if the data bus is 32 bit width the value of the B 1 would be the base address adds 4 Edge 1 Master The master starts the burst transaction The master sets CTI to INC and the BTE to LIN Because this is a read transaction the value of the data_in is unknown by the edge 1 Slave The slave does nothing because it cannot see the transaction has initiated Edge 2 Ma
10. t have to spend too much time on the examples of the appendix of how to implement a WISHBONE interconnection because we use the CONMAX in the thesis project which will be introduced in the later section of this chapter 2 There are lots of Rules Recommendations Suggestions Permissions and Observations in the specification It might be helpful to skip them all at the first time reading the document And only read the Rules at the 2nd time 3 All timing diagrams from section 3 2 to 3 5 of the WISHBONE are CHAPTER 5 67 not good enough to understand and a bit confusing Those diagrams are re drawn in this thesis 5 2 1 Overview The WISHBONE is not a complicated standard but it contains almost all main features that the other bus standards have And if consider it was published at the year 2002 it was really a brilliant work at that time One of the major works that the WISHBONE did was to define interface signals If an IP core supports all basic signals with correct functions as the WISHBONE specified it will be compatible with other IP cores which having the WISHBONE interfaces With the signals the WISHBONE designed a set of protocols that stan dardizing how the interfaces communicate 1 e how the data is packaged and sent received by the signals These are called bus transactions There are 4 types of transactions described in the specification single block RMW and burst In fact it is the IP cores re
11. A Program Combined Work A Program Free Software Commercial LGPL icense GPL GPL Library A Library B Library A Library B Could be any license GPL Application Application a LGPL stays along with other licenses b GPL infects other modules Figure 2 1 Difference between LGPL and GPL However the scenario will be totally different if Library A is covered by the GPL If it is because the GPL forces the whole project to be licensed only in the way of the GPL all the rest parts of the project i e the Library B and the Application have to become GPLed whatever their previous licenses are 2 3 3 Evaluations on the GNU Licenses So far the introduction to the GPL and the LGPL is finished It is time to go back to our topic to discuss how the open cores will be influenced by these licenses For commercial purposes my answer is that generally the open cores that covered by the LGPL are OK to use if the company won t spend too many efforts to improve the cores while the GPLed open cores are not suggested for commercial purposes unless they are used individually per silicon chip Basically the open cores covered by the LGPL are free to use is because they are not infectious like the GPL So there is no worry when connecting the open cores and the proprietary IP cores together to compose a larger system Besides most likely the utilized open cores will not be modified too much than the initial version Otherwise companies would rather
12. By now the first read operation to the 1st address is finished The slave checks the STB and which is 1 Since it is capable to keep working it 1 sets the ACK to 1 for 1 more cycle 2 output 2nd data based on pre calculated address at the edge 2 3 calculate the next address The master comes back from the wait state and resumes the signals should have sent at the edge 4 The slave wants to insert a wait state As the STB is 0 at the edge 5 the slave should keep all signals unchanged But since this is a wait state it remembers the data and output X instead The master plans to insert another wait state If no wait state the master should repeat the signals resumed at the edge 5 because right now the ACK 0 So once more it remembers these signals which will be resumed later The slave has been in the wait state Edge 7 Both the master and the slave resume from the wait state The master recalls the signals remembered at the edge 6 The slave repeats the signals remembered at the edge 5 CHAPTER 5 89 Edge 8 Master The master wants to work So it 1 sets the STB to 1 2 latches the current data 2nd 3 outputs the next address 3rd and other signals Slave The slave wants to work too It 1 asserts the ACK to 1 2 returns the 3rd data according to the pre calculated address at the edge 4 3 calculate the next address based on the current address the CT
13. He has this part comprehensively documented in his thesis 1 Chapter 4 5 Please refer to Lin s thesis for more information 6 7 Summary In Chapter 6 we have introduced the memory blocks or the peripherals They are important components in the OpenRISC reference platform A table is made below to give a review of all IP cores Name Section Category License On chip RAM Interface 6 1 Memory DBU ALTERA 1 Port RAM IP Core 6 1 Memory Commercial Memory Controller IP Core 6 2 Memory BSD like UART16550 IP Core 6 3 UART LGPL GPIO IP Core 6 4 IO LGPL WM8731 Interface 6 5 Audio DBU DM9000A Interface 6 6 Ethernet DBU Table 6 6 List of memory blocks and peripherals Designed By Us CHAPTER 6 127 The Memory Controller UART16550 and GPIO IP cores are the open cores from opencores org They are impressive to us because of the high quality source codes and well documented user manuals We proved they can surely work well in a FPGA system The open cores are with either the less re stricted BSD license or the LGPL which won t give troubles if the IP cores are used for commercial purposes We believe the IP cores are good options in the future projects References 1 Lin Zuo System on Chip design with Open Cores Master Thesis Royal Institue of Technology KTH ENEA Sweden 2008 Document Number KTH ICT ECS 2008 112 2 User manual RAM Megafunction User Guide ALTERA Corporation Dece
14. Some basic functions were designed to control the OpenRISC Programmable Interrupt Controller PIC the Tick Timer TT and the UART16550 IP core The HAL functions can be found in the thesis archive file 5 CHAPTER 3 43 3 4 Demo Application A MP3 Music Player In the previous sections the hardware layer the digital FPGA layer and the operating system layer of the computing platform are introduced Based on those user applications can be easily developed We decided to implement a MP3 music player as a demo application because 1 It is very popular Almost everyone plays MP3 files 2 It demonstrates most features of the platform including the Audio CODEC and the Ethernet The MP3 player was designed in 2 parts a music player running on the DE2 70 board and a client program running on the PC On the PC side the selected MP3 file is firstly converted into WAV format by the client program Then the client program sends the data of the WAV file to the DE2 70 board using UDP packages via the Ethernet connection After all music data transferred the client program issues a PLAY command to the music player The client program can also send other commands e g to control the volume The client program uses libmad to decode the MP3 files Libmad 19 is a high quality MPEG audio decoder program which is capable of 24 bit output It is free software under the GPL With the libmad it saved time for us from implementing MP3 de
15. and the HDL design of the Memory Controller IP core but rather focus on explaining how to use the IP core In the following sections first the IP core and its attractive features are introduced Then we will continue with the configurations of the IP core both from the hardware side and the software side At the end the possibility of the performance improvement is discussed 6 2 1 Introduction and Highlights The Memory Controller is one of the open source IP cores from the Open Cores organization Its source codes are completely open and free to down load at the link 5 of the opencores org website Worth to mention the source codes of the core are very well organized This benefits the users who want to study modify or improve the core The IP core is released under a BSD like license which is not an exact BSD license but even less restricted The full texts of the license can be found at the header of every source file For the convenience to read the license is copied here This source file may be used and distributed without restriction provided that this copyright statement is not removed from the file and that any derivative works contains the original copyright notice and the associated disclaimer After this paragraph there is a long disclaimer which is the same as in the BSD license that claiming no warranty and liability of the usage of the Memory Controller IP core As we can see from the license it is allowed fo
16. bits of the external bus must be pulled up or low with resistors in hardware Unfortunately in the DE2 70 board there is no such resistor Besides we didn t need the power on configuration because the software startup codes were stored in the FPGA on chip RAM where the CPU can find out how to initialize the Memory Controller As a result we decided to disable the power on configuration with a little modification in the Memory Controller source codes To disable the power on configuration firstly we created a new definition define MC_POC_VAL 32 h00000002 And then in the mc_rf v we changed the line of POC to if rst_r3 poc lt 1 MC_POC_VAL In this way the POC register always get a default value 0x0000_0002 when the Memory Controller resets which sets 32 bit bus width and disables external devices In a similar way we also changed the reset value of the BA_MASK register based on the address configuration 6 2 2 3 Tri state Bus For most memory ICs the data ports are bidirectional So the outputs of the Memory Controller must be tri stated to high impedance when reading data from external memories The Memory Controller doesn t have the design for the tri state outputs See page 45 of the user manual 6 This is because there is no way to know the FPGA architecture in advance The ALTERA FPGAs have only tri state gates at the I O pins but for some Xilinx FPGAs there are internal tri state 116 CHAPTER 6
17. cations like adding new open cores to support the VGA LCD display the keyboard and mouse etc CHAPTER 7 133 For academic research we believe the WISHBONE interconnection is a good starting point as mentioned in Section 5 1 Some topics are interesting like how to improve the bus throughput how to adapt the WISHBONE to the multi processor systems how to bridge the WISHBONE with other bus standards like PCI or ALTERA s Avalon 7 3 What s New Since 2008 Because of some personal reasons the thesis writing was finally finished in January 2011 In this section the latest news of the open core technologies since 2008 is listed below e Since November 2007 the Swedish company ORSoC 3 took over the maintenance of the OpenCores Organization and the OpenRISC project Thanks to them the open core community grows steadily Some new features are provided by the opencores org like monthly newsletter online shop SVN file system and even the translation of the webpages into Chinese language e For the OpenRISC OR1200 hardware there were 2 big updates ac cording to the project news webpage 4 On August 30th 2010 Big OR1200 update Addition of verilog FPU adapted from fpul00 and fpu projects data cache now has choice of write back or write through modes On January 19th 2011 OR1200 update increasing cache configura bility improving Wishbone behavior adding optional serial integer multiply and divide
18. http opencores org openrisc news Last visit 2011 01 31 5 Webpage GNU Toolchain for OpenRISC from OpenCores Organiza tion http opencores org openrisc gnu toolchain Last visit 2011 01 31 6 Damjan Lampret OpenRISC 1200 IP Core Specification Revision 0 10 November 2010 7 Jeremy Bennett Julius Baxter OpenRISC Supplementary Program mer s Reference Manual Revision 0 2 1 November 23 2010 8 Jeremy Bennett Or1ksim User Guide Issue 1 for Orlksim 0 4 0 June 2010 9 Specification WISHBONE B4 WISHBONE System on Chip SoC Interconnection Architecture for Portable IP Cores OpenCores Orga nization Revision B 4 Pre Released June 22 2010 10 Webpage Altera DE2 115 Development and Education Board from Terasic Technologies http www terasic com tw cgi bin page archive pl Language English amp No 502 Last visit 2011 01 31 11 Website Terasic Technologies http www terasic com tw Last visit 2011 01 31 Appendix A Thesis Announcement This is the thesis announcement written by our industry supervisor Johan Jorgensen A 1 Building a reconfigurable SoC using open source IP The goal of this thesis is to evaluate the quality of open source IP blocks and their suitability for use in commercially available embedded systems The project aims at building a low cost SoC in an FPGA through the ex clusive use of Open source IP The purpose of the thesis is to investigate the f
19. GNU Licenses 17 2 4 The Price for Freedom Comments on the GNU Philosophy 19 2 5 Developing Open Cores or Not How Open Source Products Could Benetton de A a Bie ha pis 22 2 6 Utilizing Open Cores or Not Pros and Cons 23 2 7 The Future of Open Cores 0 0000008 25 28 Conehision ext A a e a tdi Le 26 Referentes g 408 a A Sh Se ete Pee 27 Platform Overview 31 3 1 DE2 70 Board ici 228 Ace ee i ae Ba As 32 3 2 Digital System with Open Cores 33 3 2 1 System Block Diagram 33 3 2 2 Summary of Addresses 2 0085 35 3 3 Software Development and Operating System Layer 36 3 3 1 Software Development 37 3 3 2 Operating System and uC OS II RTOS 41 3 3 3 uC TCP IP Protocol Stack oct dra 42 3 3 4 Hardware Abstraction Layer HAL Library 42 3 4 Demo Application A MP3 Music Player 43 3 5 ASUMIR A A o 44 Retroceder 45 vil 4 OpenRISC 1200 Processor Ail ntrodwetion 2 td ES eee a A a 4 2 OR1200 Features and Architecture 4 3 OR1200 Reglsters o 4 4 Interrupt Vectors 0000 4 45 Md Timer EIP a e rt to Gh bode A 4 6 Programmable Interrupt Controller PIC 4 7 Porting uC OS II to OR1200 ia oy oa a 4 7 1 4 7 2 4 7 3 4 7 4 References Introduction e uC OS II Context Switching
20. It could be even forever if the slave doesn t respond at all So far we have given some general idea about how the WISHBONE works with the single read write transactions Except for that there are several important notes listed below 1 One of the most important things needs to notice is that the slave always gives exactly 1 clock cycle ACKs This is sort of one way hand shake protocol i e the master holds the sending signals and observes the replies from the slave all the time but the slave only sends back a 1 clock cycle ACK signal and don t care if the master receives the ACK or not Please remember that the master will be confused when they see an CHAPTER 5 75 ACK signal with multiple cycles This will be interpreted as several operations are done Because of the one way protocol if a WISHBONE interconnection is so complicated that somehow a master could miss an ACK the current bus transaction will be unable to finish and keeps forever This is quite exceptional but if such cases do appear a watchdog inside the master may be considered which forces to restart or skip the transaction after timeout 2 The CYC signal should not be ignored although as everyone can see the CYC and the STB have exactly the same waveforms in single transactions According to the specification the slaves are only allowed to behave when CYC 1 This means the slaves only respond to the transactions when the logical AND of the ST
21. OS II Context Switching The spirit of the porting is to make the OR1200 processor support the context switching for the uC OS IT The context switching is to switch a CPU from one process to another by storing and restoring the process and the CPU related states This is the essential of the multitasking The uC OS II has many features like semaphore mutex etc Most of the features the uC OS II can manage itself So they don t have to be adapted for different CPUs Only for the multitasking feature the uC OS IT cannot do it alone without knowing the hardware details because how to perform a context switch is tightly coupled with the CPU architecture In uC OS II there are 2 ways to initiate a context switch either actively triggered by a task or passively managed by the uC OS II RTOS in the timer ISR CHAPTER 4 59 When a task has finished its job it should give the CPU resources away So the other tasks could get the chance to use the CPU In this case the task can call functions like OSTimeDly or OSTaskSuspend to suspend itself These functions invoke a uC OS IT internal function OS_Sched to make a task scheduling and find out the next task with the highest priority After that the OS_Sched calls OS_TASK_SW to perform a context switch The OS_TASK_SW is a part of the porting and should be defined based on the CPU type For the OR1200 the OS_TASK_SW uses an instruction l sys to make a system call which manual
22. System call initiated by software Floating Point 0xD00 Caused by floating point instructions when FPCSR status flags are set by FPU and FPCSR FPEE is set Trap OxE00 Caused by the l trap instruction or by debug unit Reserved Reserved Reserved 0xF00 0x1400 0x1500 0x1800 0x1900 0x1F00 Reserved for future use Reserved for implementation specific exceptions Reserved for custom exceptions Table 4 2 Exception types and causal conditions The address 0x100 is the OR1200 starting address Every time when the power is up or the reset button is pressed the OR1200 will jump to the address 0x100 and executes from there So the startup program or at least a proper jump instruction has to be placed at the address 0x100 If there are errors on the WISHBONE bus the CPU will jump to the address 0x200 for example reading data from a non existing physical address For the errors of the memory alignment the CPU goes to the address 0x600 In Chapter 6 we will discuss the memory alignment in the OR1200 CHAPTER 4 55 The addresses 0x500 and 0x800 are for the tick timer interrupt and the external interrupts triggered via the PIC The 0xC00 is the system call It is only initiated when the CPU executes the instruction l sys This instruction manually throws an exception and forces the CPU to branch to the address 0xC00 The system call is especially useful for the operating systems when they need to
23. a significant role This conclusion is not com pletely right Indeed the cache is important but the higher bus throughput should be the definitive factor for the shorter accessing time And whether the burst transactions are supported or not takes great effects to the bus throughput It is easy to understand that the cache is not the main rea son If the data accessing on the bus is much slower than the CPU doing calculations the CPU still has to stop and wait the cache to be fulfilled regardless the size of the cache is In this case it makes no big difference with or without caches The following table is copied from the NIOS II Processor Reference Hand book Chapter 5 10 The table compares the features of the 3 types NIOS cores It clearly shows that the NIOS II s and f support the instruction cache and more importantly the pipelined memory access burst but the NIOS II e does not The pipelined memory access largely increases the bus 120 CHAPTER 6 throughput and the cache provides spaces for data buffering This is the rea son that the NIOS II s and f have much better computation performance than the NIOS II e Core Nios ll e Nios ll s Nios 11 f Objective Minimal core size Small core size Fast execution speed Performance DMIPS MHz 0 15 0 74 1 16 Max DMIPS 31 127 218 Max frux 200 MHz 165 MHz 185 MHz Pipeline 1 stage 5 stages 6 stages Instruction Cache 512 bytes to 512 bytes to 64 KBytes Bus 64 KByte
24. also emphasize this point many times by saying Free software is a matter of liberty not price To understand the concept you should think of free as in free speech not as in free beer 15 16 26 In fact the GNU even encourages people to charge as much money as they can by selling free software 26 20 CHAPTER 2 From which we can feel at the same time the GNU is struggling on protecting the freedom of knowledge accessibility they are also trying to clarify that the free software licenses won t prevent people from making money So on one hand the GNU believe the knowledge should be the wealth of all human beings and no one should set barriers to limit its propagations such that people could benefit from the knowledge On the other hand they hope the businessmen can charge at any price as they want to get a substantial profit by selling free software as long as they follow the licenses and promise no restriction on the freedom This sounds great like a win win solution that everyone benefits The pro ducers get the money and the consumers get the knowledge But the question is will this become true Personally speaking I totally like this philosophy because it describes a great idea that everyone shares their knowledge and everyone works together to make the world a better place I would also show my respect and the best praises to those free software developers and their work However at the same time it is needed to po
25. chosen to be the CPU of the computing platform because it is the most famous open source processor IP core The CPU clock is set to 50MHz in the system More details of the OpenRISC OR1200 will come up in Chapter 4 Because the OpenRISC uses WISHBONE as the bus protocol naturally we had to organize the system with the WISHBONE bus The CONMAX IP core is used to construct the WISHBONE infrastructure It connects the CPU with the memory blocks and the peripherals The WISHBONE bus protocol and the CONMAX IP core are discussed in detail in Chapter 5 Various types of memories are utilized for the system The ALTERA on chip RAM IP core provides an easy way to access the FPGA on chip RAM But we had to write an interface to adapt it to the WISHBONE network The ALTERA on chip RAM is not an open core but it is free to use on ALTERA s FPGAs The Memory Controller IP core makes it possible for the system to access the external SSRAM and SDRAM It generates the CHAPTER 3 35 signals to read write those ICs The system is also extended with multiple peripherals The UART16550 IP core provides the serial communication between the system and the PCs The GPIO IP core helps to set the output signals to the LEDs and captures the input signals from the buttons and switches To utilize the external Audio CODEC WM8731 and the Ethernet PHY MAC DM9000A we designed 2 interfaces to control those ICs The memory blocks and the peripherals are described
26. implements 68 CHAPTER 5 a crossbar switch structure With the CONMAX we can easily create a network just by connecting all other cores to it This saved us lots of time and made the system more reliable To sum up the WISHBONE standard can be divided into 3 aspects the signals the transactions and the interconnections Both the signals and the transactions are relatively stricter defined by the standard that all WISH BONE compliant cores have to follow While the interconnection is more flexible to implement that depends on the situations of different projects A figure below gives an overview of the WISHBONE IP Cores must have WISHBONE f Interface Signals We choose to get connected CONMAX IP Core to implernent the network WISHBONE network SOE in our thesis The interconnection can be organized in various ways IP Cores communicate each other with WISHBONE protocols Figure 5 2 Overview of the WISHBONE 5 2 2 WISHBONE Interface Signals The WISHBONE defines 2 types of interfaces master and slave An interface has to be either a master or a slave Masters always start actions like request for reads or writes Slaves always respond to the requests A connection can be made only between a master and a slave but not between the same types This implies that the signals located at masters and slaves are in pairs or would rather say they are complementary For instance if a master has a signal n
27. is more popular than the registered feedback 74 CHAPTER 5 Figure 5 4 depicts normally the single read write transactions could be Four transactions are contained in the figure stb ma ee ack 7 mA ret A err addr AX DODOFEOS y XXXXXXXX ADOODFEOC XXXIX DODOFEOC MEX OOOOFELO we SR data i XXXXXXXK DEDO XKXXXXXX data_o ORO sel SEE O AO CCD CUIDE SO M 2559 A 2259 GUESTS Figure 5 4 An example of WISHBONE single transactions In the 3rd clock rising edge the STB became 1 to inform the starting of a signal read transaction WE 0 On the next cycle since the slave did not respond the master held all signals unchanged After a while the slave an swered by asserting ACK to 1 On the next rising edge the master detected this ACK and latched the returned data This is the first transaction After another 3 cycles the master was ready again This time it wrote data to the slave The slave gave response as soon as possible at the following clock cycle However somehow the slave didn t finish this operation correctly so it returned a RET to request for another try The master then repeated the operation again and got the result ACK successfully this time These are the 2nd and 3rd transactions After that the master started another transaction As limited by the figure not all of the waveform are drawn The master has to keep all signals until it gets the response from the slave
28. it again If unfortunately this step always fails some software tool like Wireshark 8 might be needed to monitor the TCP IP packets and check what s wrong exactly 3 11 Before playing the music please lower down the volume with the com mand player v The software will ask to input a value between 0 80 A value between 30 to 60 would be fine and here we just fill in 40 The default value is set to 80 which is a mistake Too loud sound will hurt your ear if wearing an earphone Also sometimes the WM8731 works not properly when it is set to 80 fcygdrive c olivercamel ayProject orpXL_release_20081116 software Build nx S player exe m WINDOWS mp3 Decoding MP3 File WINDOWS mp3 Decoding Finished player exe p Start playing the music player exe v Please input a number for the volume range 80 40 The volume of 48 is now set Figure B 12 Set volume and play the music 3 12 Finally type the play command player exe p If everything is fine you should hear the music Cheers References 1 Webpage Altera DE2 70 Board from Terasic Technologies http www terasic com tw cgi bin page archive pl Language English amp No 226 Last visit 2011 01 31 152 CHAPTER B 2 Webpage Category 5 cable from Wikipedia http en wikipedia org wiki Category_5 cable Last visit 2011 01 31 3 Website Cygwin http www cygwin com Last visit 2011 01 31 4 Webpage A dis
29. like to invent a new core than reusing an existing one So this means to open source for the details of the improved parts as required by the LGPL would be acceptable for the companies 18 CHAPTER 2 Because of 2 reasons the companies need to be careful when facing to the GPLed open cores 1 the GPL is infectious and 2 the difference between open source software and open cores For software the GPL defines a concept called aggregate in section 5 A compilation of a covered work with other separate and independent works which are not by their nature extensions of the covered work and which are not combined with it such as to form a larger program in or on a volume of a storage or distribution medium is called an aggregate Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate By this definition the GPL won t infect if any 2 programs just stay together and are not related to each other For example if a web browser and a media player are stored in the same computer the media player license will not influence the web browser if it is using the GPL Or another example if a compiler is GPLed the source codes being processed by the compiler do not have to be open source Unfortunately the concept of aggregate doesn t work for open cores in the hardware world Although open software and open cores are both in the form of source codes at the beginning o
30. master This ACK means the slave is capable for accepting the Ist data please continue But NOTE that in burst transactions actually the 1st data is not processed here but at the next clock rising edge The master checks again the value of the ACK Now as the ACK is 1 the master knows that the slave can take care of the Ist data of the burst and wants more So it puts the 2nd data onto the bus Because the constant bursts always access one address the value of the 1st 2nd 3rd and 4th addresses are actually the same The slave firstly latches the 1st data at the edge 3 The Ist data is accepted now Meanwhile the slave checks the CTI and knows it is still a CON As the slave is still capable to receive more data it keeps the ACK as 1 The master checks the ACK and finds which keeps as 1 The master sends another data to the slave The slave latches the 2nd data The slave checks the CTI and knows it is still a CON The slave keeps asserting the ACK because it is capable to handle more data The master checks the ACK and finds which keeps as 1 The master sends another data 4th to the slave Because the master knows the 4th data is the last one of the burst it sets the CTI to EOB The slave latches the 3rd data The slave checks the CTI and knows it is still a CON The slave keeps asserting the ACK because it is capable to handle more data The master checks the ACK and finds which keeps as 1
31. or jource orpxl1_p Vopt or jource borpxl_t Vopt or jource orpx1_u Figure B 4 Makefile script e c folivercamel ct orpXL_release_20081116 sof ild all clean 32 uclinux bin or32 uclinux gcc c g 08 I Lib_orpkL I I uCOS I1 Port I uC TCPIP Source I uC TCPIP Port puspr c o orpxl_cpuspr o 32 uc linux bin or32 uclinux gcc c g 00 I Lib_orpXxL I I ucOS II Port I uC TCPIP Source I uC TCPIP Port ic c o orpxl_pic o 32 uc Linux bin or32 uclinux gee c g 00 I Lib_orpXL I I ucoS II Port I uC TCPIP Source I uC TCPIP Port t c o orpxl_tt o 32 uc Linux bin or32 uclinux gee c g 00 I Lib_orpkL I I uCOS 11 Port I uC TCPIP Source I uC TCPIP Port art16558 c o orpxl_uarti655 0 Figure B 5 Build software project UCOS 11 S Lib_orpXL 7ucos II 7s Lib_orpXL UCOS 11 S Lib_orpXL UCOS I IZS Lib_orpxXL 146 CHAPTER B The parameter d specifies the RS232 ports please check which port is allocated from the Windows Device Manager as showed in Fig ure 6 In Windows the RS232 port is in the format of dev comX where X is the port number But in Linux the format is usually like dev ttySX a lites EF aa A a ae LEER CESTA AAA LAEAES ie RSSRE IR EW SEV EOD Je gt mf eQa 8a Gy IDE ATA ATAPI 268 a PCHCIA amp SCSI
32. r1 r6 SW 28 r1 r7 SW 32 r1 r8 SW 40 r1 r10 PPP PPP Pe HH SW 120 r1 r30 l sw 124 r1 r31 After pushing a context into the stack all registers of the context can be accessed by the SP register rl So the context is actually linked to the SP The next step is to store the SP to the uC OS IT Task Control Block OS_TCB In the uC OS II each task has its own stack and OS_TCB The OS_TCB is used to maintain all information related to a task including the SP of the task as well When the uC OS II needs to switch to another task it firstly finds out the OS_TCB and then the stack pointer With the SP the uC OS II can refer to the correct context for the task to be switched CHAPTER 4 61 The following codes store the SP r1 to the OS_TCB The OSTCBCur is a pointer to the currently running OS_TCB and the SP is the first member of the struct OS_TCB l movhi r3 hi _OSTCBCur move high byte to r3 l ori r3 r3 lo _OSTCBCur move low byte to r3 l lwz 14 0 r3 load the address of where to save the SP l sw O r4 r1 save the SP r1 to that address Afterwards a function OSTickISR is called This is required as written in the uC OS II book Chapter 13 15 The OSTickISR calls a uC OS II internal function OSIntExit to detect whether or not another higher priority task becomes ready If yes the context of the higher priority task should be resumed which is done by the function OSIntCtxSw Otherw
33. rih Not really CYC is a bus request signal If it s asserted it validates all other signals So if CYC is negated LOCK is invalid If a higher priority bus master asserts CYC then the bus arbiter might grant that master the bus Asserting LOCK prevents this So far nobody uses LOCK Group 6 CTI_O Cycle type identification BTE_0 Burst type extension These 2 signals are even less used but clearly defined in the chapter 4 of the specification They are designed for the WISHBONE registered feedback bus cycles i e the burst transactions Basically they are similar to the tag signals which also provide extra information By the CTI and the BTE the slave knows the status of the burst so that can be prepared to handle it The 2 signals will be discussed again later in the burst transaction section 5 2 3 WISHBONE Bus Transactions In the WISHBONE specification each process of data transferring is called a bus cycle There are 4 types of bus cycles defined in the specification which are single block RMW and registered feedback bus cycles In this thesis however the name of the bus cycle is replaced by the bus transaction because when saying cycles it might be confusing with clock The first 3 types are in Chapter 3 and the last one in Chapter 4 of the WISHBONE specification CHAPTER 5 73 cycles and bus cycles The 4 bus cycles here are named as single block RMW and burst bus transaction respectively
34. separate the instruction data memory utilize the burst transactions on the WISHBONE bus and enlarge the throughput between the Memory Controller and exter nal memory devices more instructions can be read in a certain time This improves the MIPS directly because the CPU doesn t have to stay in idle state while waiting for new instructions On the other hand increasing the CPU frequency also improves the performance For now the OR1200 has a system clock of 50MHz When using a higher number the Quartus gave warnings about the internal timing We believe the CPU clock frequency can still go higher if some optimizations are made at the FPGA level On the software side more energy can be invested to the OpenRISC toolchain to increase the productivity of the software development For example up date to the latest toolchain combine it with a front end IDE like Eclipse 2 build a JTAG connection and a debugger to download user programs eas ier and to develop more complicated applications etc Also there are some other interesting topics like port the Linux operating system support more library functions etc Besides the testing to the system is always welcome which detects the existing bugs and makes the platform more stable Writing user manuals is another thing worth to do It attracts people to use and improve the system 7 2 2 Extension and Research Topics New features are possible to extend based on the requirements of the appli
35. side While on the right side are the memory blocks and peripherals acting as the WISHBONE slaves that are going to be discussed in this chapter The numbers starting with m or s mark the interface IDs of the CONMAX used in the hardware design There are 2 colors The blue blocks are real silicon chips on the DE2 70 board produced by different manufactures They are not the part of the FPGA design In this chapter we will not focus on the details of those IC chips which please refer to their specifications and application notes We care more about how to interface them with the FPGA system The white blocks are the IP cores implemented inside the FPGA We spent quite some time working on the IP cores during the thesis project so naturally they should be emphasized These blocks are the actors in the leading roles of this chapter and will get the chance to show up one by one in the subsections below Of all the IP cores Memory Controller UART16550 and GPIO come from opencores org They are of very good qualities and helped a lot to accelerate on the system design because we don t have to implement again those blocks from scratch Except for the 3 IP cores the on chip RAM is an IP core from ALTERA The other white blocks i e the on chip RAM interface the DM9000A and WM8731 interfaces are designed by me or my partner Lin Zuo This thesis was done by 2 people On the FPGA level my partner Lin was responsible for the CONMAX
36. struction by instruction and display the contents of the CPU registers The old toolchain is no longer available from the official website but we included a copy in the thesis project archive 38 CHAPTER 3 This is a good way to study how the CPU works and how the system handles the stack and function calls The Orlksim can also work as a debug target which can be connected to the GDB There we can perform C source code level debugging e GDB GDB is the GNU Project Debugger It is used together with the Orlksim to debug the C source codes e Makefile The GNU Make is a utility which automatically compiles the source files and builds them as an executable target It saves us from typing the GCC commands line by line With the tools the software development workflow can be described with the figure below test c reset S C and Assembly files or32 gcc compile test o reset o object files ram ld linker script ar32 Id link test or32 executable file y 0r32 objcopy jl or32 objdump test ihex HEX file test dis Disassembly file Orlksim GDB debug Figure 3 5 Software development workflow Firstly the source files are compiled into object files by the GCC Then the object files are linked together by the GNU Binutils as the executable target file The linker works following the linker script configurations The executable files are used in multiple ways It can be translated into Int
37. the DE2 70 board there is a DM9000A Ethernet interface So it became an interesting topic to add the Ethernet support in our system To be able to communicate over an Ethernet connection the TCP IP protocols are necessary A software implementation of the TCP TP protocols is called a TCP IP stack To design a TCP IP stack from scratch is a huge work and impossible for us to finish but luckily there are existing ones We decided to use the uC TCP IP protocol stack for 2 obvious reasons 1 The uC TCP IP is also a product of the Micrium 16 It is designed to work together with the uC OS II RTOS 2 The uC TCP IP provides the hardware driver for the DM9000A Apparently the uC TCP IP is the easiest option but still some work had to be done to make it work with the OpenRISC CPU My partner Lin Zuo was responsible for the uC TCP IC and the DM9000A For more details please refer to his thesis 18 3 3 4 Hardware Abstraction Layer HAL Library Another attempt we made for the thesis was trying to build up a library which intends to collect the functions that controlling the hardware The functions hide the details of operating the hardware In this way at a higher level the programmers can develop software applications without learning how the hardware system works exactly The library of the functions is also referred as the Hardware Abstraction Layer HAL Because of the limited time we could start only a very primary step for the HAL
38. to WAV format This is done by another application player exe which is a client working on the PC who de codes MP3 file into WAV format by LibMAD and then sends the music data to the DE2 70 For LibMAD check their website for more infor mation 7 The command is player exe m WINDOWS mp3 m parameter specifies the path of the MP3 file This is showed in Figure 11 150 CHAPTER B cyedrive c olivercamel myProject orpXL_release_20081116 software Build of xl PC Send 1664B5A6 16645394 16645394 16645394 166453CC Receive 1664B5A6 16045394 16645394 16045394 166453CC PC Send 1004B5BB 16645394 16645394 166453CC 168453CC Receive 1664B5B 16645394 16645394 100453CC 168453CC PC Send 1664B5C 16045394 3139322E 3136382E 362E3260 Receive 1664B5C 16045394 3139322E 3136382E 362E3260 PC Send 1004B5D 3235352E 3235352E 3235352E 30003139 1664B5D 3235352E 3235352E 3235352E 30003139 1004B5E 322E3136 382E3 2E 868090806 HAHAHAHAA 1004B5E 322E3136 382E3 2E 86800006 HAHAAHA lt 20003 gt Ignored record type 5 in HEX file proloader_client exe d dev com5 IS player exe m WINDOWS mp3 Decoding MP3 File WINDOWS mp3 Decoding Finished Figure B 11 Decode MP3 file 3 8 After that there will be a temp wav generated in the build folder It is 1 28MB while the MP3 is only 123KB This is the file that going to be split up into UDP packets and transferred to the board
39. without modifi cation are permitted provided that the following conditions are met e Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer e Redistributions in binary form must reproduce the above copyright notice this list of conditions and the following disclaimer in the doc umentation and or other materials provided with the distribution e Neither the name of the lt ORGANIZATION gt nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission The 3 clauses assert only minimum requirements comparing to the other open source licenses If we try to understand the clauses Clause 1 means people cannot remove the text of the BSD license from the source codes Clause 2 says if someone will redistribute the source codes in another format like selling a software product the people has to announce that the product contains something covered by the BSD license designed by someone else Both Clause 1 and 2 are very basic things that everyone should do by conscience even if there are not asked by a license Just like when writing an article a list of references is needed to acknowledge previous contributors work Clause 3 is a little complicated to understand It is used to prevent the origi nal author s name of the source codes from being abused and the drawbacks of the source
40. www underbit com products mad Last visit 2011 01 31 This is where to find the information of the LibMAD Chapter 4 OpenRISC 1200 Processor 4 1 Introduction OpenRISC 1200 is an open source 32 bit processor IP core It is very famous and has widely used in many industrial and academic projects Since the thesis was started till today the OpenRISC project is always on the top of the Most popular projects list of the opencores org 1 If considering the opencores org is the 1 well known open core website we can easily conclude that the OpenRISC processor is one of the most popular open core processors of the world The information of the OpenRISC can be found at its official webpage 2 from where the HDL source codes and the documents are free to download after a registration There is also a forum on the website for the OpenRISC related discussions This is the major way to get technical supports for the OpenRISC Otherwise it is also possible to consult commercial companies There are several confusing terms regarding to the OpenRISC In fact the OpenRISC is a project that aiming to create a free open source comput ing platform available under the GNU L GPL licenses 2 The OpenRISC 1000 or shortly OR1K is the name of a CPU architecture Base on the ORIK architecture it can have many different derivatives The OpenRISC 1200 or shortly OR1200 is a 32 bit processor IP core implemented fol
41. 0 WM8731 DAC Data Register 0xE0000000 UART16550 Registers 8 bit 0xE0000006 0xF0000000 GPIO Registers 0xF0000024 OxFF000000 CONMAX Registers OxFF00003C Table 3 1 Summary of addresses 3 3 Software Development and Operating System Layer The hardware system cannot work alone without running software So we also spent much effort to develop software programs for the platform In this section firstly the software development workflow is introduced In complex systems the software is often divided into multiple layers for a better architecture In our case there are 2 layers the operating system layer and the application layer The operating system layer contains an operating system hardware drivers Application Programming Interface API func tions etc They are designed to control the hardware platform and simplify the application development On top of the operating system layer the ap plication layer is mainly to implement a program with specific features for example a music player The operating system layer is described in this section and the application layer in the next section CHAPTER 3 37 3 3 1 Software Development The software development was done on a PC with Windows XP SP2 We didn t use Linux mainly for 2 reasons 1 To design the FPGA project ALTERA s Quartus is required But we had problems with Quartus Linux version 2 We didn t have adequate knowledge for Linux so working wit
42. 0 board is the foundation of the thesis project The layout of the DE2 70 board is showed in Figure 3 2 3 Ethernet 10 100M Port USB Device Pot Misin Line In Line Out VGA Out RS 232 Port USB Blaster Port USB Host Port 1 Ego int vs in2 t t TV Decoder NTSC PAL X2 12V DC Power Supply gt PS2 Port Connector J Power ON OFF Switch VGA 10 bit DAC j Ethernet 10 100M Controler USB Host Slave Controller Audio CODEC ii Altera USB Blaster 50Mhz Oscillator Controller chipset RS cea E fe osi A E Expansion Header 2 Altera EPCS16 4 Configuration Device i ATRN gt SiL Expansion Header 1 RUNI PROG Switch for JTAGIAS Modes 16x2 LCD Module i IrDA Transceiver 7 Segment Displays 8Mbyte Flash Memory 18 Red LEDs Tit it Bee pr sree Leos 18 Toggle Switches SMA Extemal Clock 32Mbyte SDRAMx2 28Mhz Oscillator 2MbyteSSRAM 4 Push button Switches Figure 3 2 DE2 70 board A hardware block diagram is also given in Figure 3 3 3 In Figure 3 3 the hardware resources used by the thesis project are marked with grey color The 2M SRAM and 64M SDRAM store the program codes and data The 24 bit Audio CODEC plays the music The 10 100 Ethernet PHY MAC connects the board to a PC with TCP UDP protocols The RS 232 port is used for serial communication Buttons and 7 SEG LEDs work as general inputs outputs And of course most of the designs are made inside the CHAPTER 3 33
43. 00280 volatile unsigned int WM8731_REG 0x0000047F volatile unsigned int WM8731_REG 0x0000067F volatile unsigned int WM8731_REG 0x00000812 volatile unsigned int WM8731_REG 0x00000A00 volatile unsigned int WM8731_REG 0x00000C00 volatile unsigned int WM8731_REG 0x00000E41 volatile unsigned int WM8731_REG 0x00001023 volatile unsigned int WM8731_REG 0x00001201 Then the music data can be read from a file and send one by one like define WM8731_DAC_DATA 0xD0000010 volatile unsigned int WM8731_DAC DATA volatile unsigned int WM8731_DAC_DATA volatile unsigned int WM8731_DAC DATA music_data_1 music_data_2 music_data_3 The music data are 32 bit in width and have the following format The addresses start with 0xD because the IP core is connected on the CONMAX slave port 13 126 CHAPTER 6 Left Channel Right Channel ot ja eee MSB LSB MSB LSB Figure 6 11 32 bit music data format 6 6 DM9000A Interface The DM9000A is a fully integrated and cost effective low pin count single chip fast Ethernet controller with a general processor interface a 10M 100M PHY and 4K Dword SRAM 21 The DE2 70 has a DM9000A IC on the board as the Ethernet solution Similarly to the WM8731 a WISHBONE interface is needed in the FPGA to drive the DM9000A chip In the thesis project my partner Lin Zuo was responsible for the DM9000A interface
44. 0045394 180453CC 1004B5AD 10045394 10045394 10045394 100453CC 1004B5BO 10045394 10045394 100453CC 198453CC 1084B5B 10045394 10045394 100453CC 180453CC 184B5C 10045394 3139322E 3136382E 302E3200 1084B5C 10045394 3139322E 3136382E 302E3280 1004B5DO 3235352E 3235352E 3235352E 30003139 1004B5DO 3235352E 3235352E 3235352E 30003139 PC Send 1 4B5E 322E3136 382E302E B0BBBBAB HAAAHAAA Receive 1 4B5E 322E3136 382E302E OAAAAHAA HHAAHAHA Warning 20003 Ignored record type 5 in HEX file IS proloader_client exe d dev com5 Figure B 9 Start software project via bootloader y SWE BECHER WEE 100 0 Mbps th ve ESO gt O 41 Figure B 10 Ethernet connected board is not turned off the data stored in the FPGA and the external SSRAM will be always valid So there is no need to reprogram the FPGA and download the software project on every reset We used 4 7 segment LEDs in the project Each 7 segments plus the digit can display 8 bits So they are organized to show the value of 2 16 bit counters On the DE2 70 board the HEX0 and HEX1 is the first counter Its value means how many valid UDP packets have been received The HEX2 and HEX3 is the other counter which shows the number of invalid UDP packets received The software build WINDOWS mp3 is MP3 file used as the demo in our project It is a very small MP3 file which only lasts 7 seconds First covert the MP3 file
45. 02 BA_MASK 0x00000020 0x00000020 CSC 0x00000823 0x00000691 TMS OxFFFFFFFF 0x07240230 CHAPTER 6 Table 6 3 Memory Controller register configurations 6 2 4 Performance Improvement by Burst Transactions In the previous sections we have introduced the Memory Controller IP core Now let s talk about the performance issue As mentioned at the beginning the Memory Controller IP core supports the WISHBONE burst transactions But due to the limited time of the thesis project we didn t manage to investigate this feature All data accesses in the project between the CPU and the Memory Controller are single reads or writes Compare to the burst accesses the single accesses consume a lot more time in the following 3 aspects 1 For each new bus transaction the CPU has to win the bus arbitration from the CONMAX This takes at least 1 bus cycle if there is no other bus transaction currently ongoing otherwise it takes even longer time The burst transactions which contain multiple read write operations are more efficient When a bus transaction arrives the Memory Controller has to take several steps to handle it like analyzes the target address converts the WISHBONE signals based on the external memory devices etc For the burst transactions the Memory Controller can get the information of the next data to access in time so it can better pipeline the internal operations to save time The burst transac
46. 16 registers In the specification it is said each register is 32 bit width but actually the source codes only implement 16 bit width for the registers So writing to the higher 16 bits does nothing CHAPTER 5 91 and reading from those bits always returns zero There is 1 arbiter in each slave interfaces which reads the data stored inside the registers to identify the priorities of each master For instance if the register CFG12 contains the value of Ox000080F0 only last 16 bits are valid it means that for the slave interface 12 the priorities from the master 7 to the master 0 are 2 0 0 0 3 3 0 0 i e the master 2 and 3 both have the highest priority 3 to access the slave 12 The next higher master is the master 7 and then all other masters in the third level Please note that this configuration only applies to the slave interface 12 It is possible to configure the other 15 registers individually with different priorities 5 3 4 Parameters and Address Allocation There are several parameters used to configure the CONMAX core They are the dw aw rf addr and pri_selN All of them can be changed ONLY in the Verilog HDL source file This means after the CONMAX is compiled and downloaded into a FPGA these parameters cannot be further modified So the designers have to think about the values of the parameters when designing the FPGA system The dw and the aw stand for the width of the data bus and the address bus
47. 2 bit Windows XP SP3 installed I guess Windows Vista should work too B 1 6 Cygwin Cygwin is a Linux like environment for Windows 3 It is good to provide a Linux like environment and it is small if comparing to other virtual machine software e g VMare The reason we need a Linux environment is because the OpenRISC toolchain The OpenRISC toolchain is derived from the GNU toolchain including GCC GNU Binutils GDB etc To compile source codes to binary files for the OpenRISC processor we have to use these tools They are not so friendly to Windows Then why not just use a native Linux PC Hmmm this is a good question Actually we tried the CentOS at the beginning but I gave up soon The official excuse is that we need ALTERA s Quartus for the FPGA project and we had some unenjoyable experience with the Quartus Linux edition But to be honest the real reason is that I don t know Linux well enough and usually got stuck by some very basic operations So finally I switched to Windows Cygwin where can be more productive For Linux pros the OS should not be a problem Feel free to try out the project on a Linux PC but remember to recompile everything again for those we have compiled in Cygwin Cygwin is good but the installation of the software is however kind of com plicated because it asks to choose the components to be installed from a long list And the user has to remember the components each time when reinstall the Cygwin
48. 2 bit and all registers inside the CPU are 32 bit as well This means to be efficient the processor should be able to and in fact often read and write the data with the same width as its registers When fetching data to fill in any of the CPU registers a 32 bit data should return If the RAM core is set to 8 bit width the CPU has to access the memory block 4 times to make a 32 bit data This certainly takes longer time and therefore should be avoided This is the reason why we set the parameter of the data width q of the 1 port RAM to 32 The different memory block width 32 bit and the data granularity 8 bit cause a problem that the CPU thinks there is 1 byte mapping to each address but in fact it is 4 bytes storing at each physical address of the memory block To compromise the width mismatching the solution is to shift address connections by 2 as well as to introduce the SEL signals Shifting address connections by 2 i e connecting WISHBONE address lines 14 2 to the 1 port RAM address inputs 12 0 discards the last 2 bits of the WISHBONE addresses As a result all addresses are implicitly converted during the transmission For example accessing to the WISHBONE address 0x7 0111 in binary becomes accessing to the physical address 0x1 01 in binary because the last 2 bits are ignored and the address is shifted The address shifting maps 4 continuous addresses on the CPU side into 1 physical address on the memory bloc
49. 2KB in our project For larger programs we designed a bootloader to download them via the serial connection The bootloader is a small program that can be stored in the on chip RAM It is downloaded to the DE2 70 board through the first downloading method described above so when the system starts the CPU executes the bootloader from the on chip RAM The bootloader initializes the hardware and then listens to the UART port to receive the program data from the PC On the PC side a software tool was designed to interpret the Intel HEX files It reads out the addresses and the data from the HEX files and sends them over the serial connection A USB to RS232 cable was used to connect the PC and the DE2 70 board When the bootloader receives data it stores them to the correct addresses When all data of a program have been received the bootloader issues a command to the OpenRISC which jumps to a specific address and executes the downloaded program from there To setup a JTAG connection for downloading and debugging purposes it needs 1 to activate the OpenRISC CPU debug interface in the FPGA project 2 a JTAG cable and more importantly 3 software supports on the PC side Normally for a commercial processor they are provided by the vendor But for the OpenRISC on the DE2 70 we didn t have those when doing the thesis 2By the way now it is possible to buy an OpenRISC development board plus a JTAG debugger from ORSoC 14 40 CHAPTER 3
50. 5394 100453CC 18M453CC 1004B580 100453CC 100453CC 100453CC 1MM453CC 1004B580 106453CC 1M6453CC 168453CC 166453CC 1004B598 100453CC 100453CC 100453CC 10045394 1004B590 100453CC 100453CC 100453CC 168645394 1004B5A0 10045394 10045394 10045394 10B453CC 1004B5A0 10045394 10045394 10045394 1080453CC 1004B5B0 10045394 10045394 100453CC 180453CC 1004B5B0 10045394 10045394 100453CC 166453CC 1004B5C0 10045394 3139322E 3136382E 362E3266 1004B5C0 10045394 3139322E 3136382E 362E3266 1004B5D0 3235352E 3235352E 3235352E 36663139 1664B5D 3235352E 3235352E 3235352E 30003139 1664B5E 322E3136 382E362E 66660006 88000000 1 64B5E 322E3136 382E362E 66608806 95 15151515151515 Warning 20003 Ignored record type 5 in HEX file Figure B 7 Downloading and flushing finished B 2 3 Download Music Data and Play 3 1 3 2 3 3 Now the software project has downloaded into the SSRAM of the DE2 70 Before starting the program there are some configurations to do The first thing is to edit the IP address Please set it to 192 168 0 3 IP mask to 255 255 255 0 and the default gate to the 192 168 0 1 as showed in Figure 8 By the way the IP address of the DE2 70 board is set to 192 168 0 2 where the UDP packets are going to send to The music player program running on the DE2 70 uses these numbers as the IP address After that please disable all other network adapters on your PC if there ar
51. 6 2 3 Configurations for SSRAM and SDRAM In the last section we discussed the Memory Controller configurations at the FPGA level After the Memory Controller IP core is connected to the WISH BONE network and the hardware configurations have been done correctly the OpenRISC CPU should be able to address its internal registers This section talks about how to configure the Memory Controller internal registers for the SSRAM and SDRAM The registers have to be properly CHAPTER 6 117 initialized before the Memory Controller can drive the external memory ICs On the DE 70 board there is 1 chip of SSRAM ISSI IS61LPS51236A and 2 chips of SDRAM ISSI 1542516160B All the configurations were decided based on their datasheets 8 9 The Memory Controller user manual Section 4 6 gives the full list of the internal registers There are 3 global registers in the Memory Controller valid for all CS signals CSR POC and BA MASK The POC and BA_MASK registers have described before We modified HDL codes to give the 2 registers default values when power up The Control Status Register CSR is used only for SDRAM and FLASH memory For SSRAM it is not needed to change this register For the SDRAM the REF INT field is set to 3 7 812us in our case This value comes from Table 1 at page 32 of the user manual 6 because there are 2 chips of 16M 16 SDRAM on the DE2 70 board Also the Refresh Prescaler field has to be configured for the SDRAM Th
52. B and CYC is true For example in complicated WISHBONE networks the CYC is usu ally used to request for grants from bus arbiters There are the situ ations that the arbiter broadcast bus transactions to all slaves except for delivering only the CYC signal to the right slave So in such cases the slaves may respond incorrectly if they don t check the input value of the CYC 3 All 3 signals of the ACK RET and ERR can be used to reply to the master like the 2nd transaction is finished with a RET But both the master and the slave IP cores have to support the signals and the function Usually they have only one ACK signal because the RET and ERR are not mandatory by the specification 4 In the first transaction the master received the response after 4 clock cycles but the delayed cycles may not necessarily always 4 The slaves may need several cycles to process the data No ACK will return until they are finished Therefore a bus transaction could keep for a long time because too much time is spent on waiting for the slow slave Similarly when the masters receive ACKs they may need some time to process data too In such cases there will be breaks between 2 transactions just like the delay between the 1st and 2nd transaction in the figure In the best situation without any delay the ACK will be set to 1 by the slave on the next rising edge that the STB asserts And the master will start another transaction as soon as it g
53. Cyclone I FPGA 50Mhz 28Mhz Extn 24 bit Audio CODEC TV Decoder 2 er ser Green s FPGA 2C70 Push Button Switches 4 USB 2 0 Host Device 10 100 Ethernet PHY MAC SD Card IrDA Transceiver Flash 8 Mbyte SDRAM 64 Mbyte SRAM 2 Mbyte 7 SEG Display 8 Expansion Headers 2 Figure 3 3 Hardware block diagram of the DE2 70 Because the DE2 70 board is a product of Terasic we won t spend too much text for the details of the board For more information please refer to the DE2 70 user manual 3 3 2 Digital System with Open Cores A large amount of time of the thesis project was spent on designing a digital system on the Cyclone II FPGA of the DE2 70 board where we adapted the selected open cores to the FPGA and connected them together as a computing platform In this section an overview is given for the open core system inside the FPGA 3 2 1 System Block Diagram If take a closer look to the digital system it can be drawn as the block diagram showed below The white ones are the digital blocks inside the FPGA The blue ones are the hardware ICs on the DE2 70 board 34 CHAPTER 3 on chip RAM rca on chip RAM Memory Controller SSRAM Memory Controller SDRAM oH OpenRISC DM9DD0DA Interface OR1200 DINBODDA WMa731 Interface EKEN UART Buttons amp GPIO EDs Figure 3 4 Open core system block diagram The OpenRISC OR1200 IP core is
54. FO RAID 5158 SM Driver E RS eo REDES ce a ih Re AAA AmO COM M LPT I Prolific USB to Serial Comm Port COMS Hp ERE HE fl r De PARE BS URIAS T REM TIAS RST Sek mp PASTOS Y RRS 4 Figure B 6 Check USB to serial port ID The parameter f specifies the path of the HEX file The p tells the bootloader to display the data being transferred in the Cygwin window The reason to do this is because the loading time is quite long To finish downloading the HEX file it takes about 5 minutes So the p makes sure the system is still running But the bad thing of the p is that a lot of data will overwhelm and the screen will be flushed After downloading the HEX file it will look like Figure 7 Usually the bootloader works fine without any problem but I ve had bad experience that occasionally the bootloader might get stuck And then the Windows XP gave a blue screen I am not sure about the reason of the problem Probably it is because some driver crashes CHAPTER B 147 cyedrive c olivercamel myProject orpXL_release_20081116 software Build of x 1004B5580 10045394 10045394 168453CC 18M453CC 1004B5580 10045394 10045394 168453CC 1MM453CC 1004B560 10045394 100453CC 168453CC 1MM453CC 1004B560 10045394 100453CC 100453CC 18M453CC 1004B57B 100453CC 10045394 100453CC 18M453CC 1004B578 100453CC 1004
55. GA but the 12C bus timing is more complicated to implement than the SPI The WM8731 data interface also has multiple modes to choose We selected the Left Justified mode because it is most straightforward Regarding to the I C bus there is something interesting to mention When we were working on the thesis project at that time we didn t know anything about the 12C However the WM8731 datasheet 18 doesn t mention the term I C at all It uses 2 wire MPU serial control interface instead It took several months for us after the thesis project was over to find out what we made was exactly an I C interface If we could have known it earlier a ready to use IP core from the opencores org like the I C Controller 19 should ve used to save the precious project time Also we could ve better followed the I C bus standard like using 100k or 400k baudrate It is curious if the WM8731 producer had some special considerations about the 12C licensing 20 Figure 6 10 below gives the internal structure of the WM8731 interface It contains 3 blocks one for the WISHBONE bus and two for the WM8731 12C bus and data bus respectively V M8731 Interface wm8731_interface_top vhd control_interface vhd _ wishbone_ aN q interface vhd WM8731 digital_interface vhd nd FIFO Data Figure 6 10 WM8731 Interface internal structure As showed in Figure 6 10 the WISHBONE interface receives WISHBO
56. I and the BTE Explanations to the Edge 9 and 10 are skipped Edge 11 Master Since the ACK is 1 and the CTI is EOB the master knows it is time to finish the burst Because this is a read burst the master has to latch the last 4th data and then de asserts all signals Slave The slave also realizes it is the end of the burst by the STB and the CTI Because it is a read burst the slave has nothing to do except for de asserting signals Above all almost all important things of the WISHBONE specification are covered from the interface signals to the bus transactions We hope the work could help people to understand the specification easier so that the WISHBONE standard will be even more widely accepted and applied into real projects Best wishes to the WISHBONE in the coming competitions of the interconnection standards 5 3 CONMAX IP Core 5 3 1 Introduction The WISHBONE interCONnect MAtriX IP Core CONMAX is an IP core designed by Rudolf Usselmann in Verilog HDL It constructs a WISHBONE interconnection with a crossbar switch structure which can be used as the bus of a system The CONMAX core supports up to 8 masters and 16 slaves as well as 4 priority levels This is already enough to compose a quite complicated network Using the core will save a lot time for the designers to think about how to organize all modules as a system Because the CONMAX helps to handle the traffic within the system all users need to do
57. II software In this project we chose the most general and simplest one 1 port RAM IP core In Figure 6 1 the ALTERA s 1 port RAM core is showed as the on chip RAM Because the 1 port RAM core has interface signals different from the WISH BONE bus signals some extra conversion logic is needed to connect it to the WISHBONE network The conversion logic is showed as the on chip RAM interface in Figure 6 1 When the OpenRISC CPU is trying to access the on chip memory the WISHBONE bus transactions will be sent to the on chip RAM interface through the CONMAX There the signals will be translated and forwarded to the 1 port RAM core In the thesis archive 3 design files of the on chip RAM block are saved under the hardware components ram folder The file ram0 vhd is automat ically generated by the Quartus It is the description file of the ALTERA l port RAM The file ram0_top vhd designed by us includes the interface conversion logic And the file ram0 mif contains the data that to be stored in the l port RAM The RAM will be initialized with the data in ram0 mif every time when the FPGA is programmed In the following sections firstly we will discuss about the advantages of using the on chip RAM which gave us reasons to spend time on it Then the ALTERA l port RAM core as well as its interface logic will be introduced After that we will explain how to organize the memory block because which has to follo
58. Memory Controller and DM9000A Interface while I took charge of the rest blocks as well as the system level integration Due to the administrative reasons each of us had to write a separate thesis So please refer Lin s thesis 1 as well There are more information for the IP cores that Lin was working on in his thesis This chapter intends not only to introduce those IP cores but also share some design experiences Because if the thesis is all about introduction it would probably become another specification of the IP cores and definitely will not as good as the ones written by the IP core designers CHAPTER 6 99 6 1 On chip RAM and its Interface All processors need memories to store instructions and data This is the same in the OpenRISC OR1200 When designing the platform we tried to add various types of memories to satisfy this requirement One of them is the FPGA on chip RAM The ALTERA s Cyclone II EP2C70 FPGA on the DE2 70 board contains 1152000 memory bits which in turn is about 140KB Except for the memo ries used or reserved for the other hardware modules the rest can be orga nized as an on chip RAM block for the OpenRISC processor In the thesis project we made an on chip RAM block with 32KB size ALTERA provides the designers a way to organize and utilize the on chip memory resources easily and efficiently via ALTERA s memory IP cores Several types of memory IP cores are supported by ALTERA s Quartus
59. NE transactions from the CPU It also decides whether the received WISH BONE transactions contain control or music data 1 of the 32 lines of the WISHBONE address signal makes the decision If the address line is low the data is considered as the control data and will be written into the target WM8731 register through the 12C bus Otherwise the data is the music data to be sent through the data bus CHAPTER 6 125 The control interface sends out the control data on the I C bus It uses a state machine to transmit bit by bit in serial according to the 12C timing It also adds an extra I C control byte The digital interface sends out the music data in Left Justified mode To improve the real time performance a 32 bit 8192 stage 32KB FIFO is included for buffering music data 6 5 3 HDL Source Files and Software Programming The WM8731 Interface is designed in VHDL The source files can be found in the hardware components wm8731 There are 5 files The hierarchy of the files has showed in Figure 6 10 In the project the only address line of the WM8731 Interface is connected to the 5th of the 32 bit WISHBONE address bus The address 0xD000_0000 is assigned for writing the WM8731 registers and the address 0xD000_0010 for playing the music data We used the following codes to initialize the WM8731 Hdefine WM8731_REG 0xD0000000 volatile unsigned int WM8731_REG 0x00000080 volatile unsigned int WM8731_REG 0x000
60. OB The slave checks the STB CTI BTE and knows the burst is still happening The slave feels capable to handle more data So the slave 1 returns the 4th data according to the previously calculated address B 3 2 continues setting ACK to 1 to inform the master to keep transferring 3 calculates the next address which is the address B 5 although this address will not be used The master checks the ACK and finds which is 1 so it knows the slave is still working and the last data is ready to read The master latches the 4th data The master de asserts all signals to terminate the burst The slave checks the STB CTI BTE and knows this is the end of the burst So there is nothing to do now except for de asserting the ACK to 0 So far we have seen 2 examples What the master and the slave actually do at every clock rising edge were explained After the specific descriptions now it is time to summarize some general rules about the WISHBONE burst transactions 1 According to the specification all burst transactions have to be either read or write i e a burst transaction must contain either all read CHAPTER 5 85 operations or all write operations but cannot have both within one burst 2 The CTI and the BTE signals are used to assist the slaves to identify the type and the status of the current burst All burst end up with an EOB in the CTI 3 The slaves behave different between when it is a read bu
61. OS to the computing platform Discussions were made about which operating system to use Our supervisor Johan J rgensen proposed Linux But because the limited knowledge to Linux and the uncertainty about how fast the hardware platform can be created we were not sure at the beginning if the time would be enough or not to port Linux So finally we decided to start with uC OS IT The uC OS II is a famous Real Time Operating System RTOS from the Micrium 16 It has the following advantages for us e The source codes of the uC OS II are available and can be used for academic purposes without requiring a license 17 e It is simpler comparing to Linux And there are sufficient materials books online resources to help understanding how it works e We had lectures in school with the uC OS II and already had pro gramming experience with it e There were people who successfully ported the uC OS II to the Open RISC CPU before whose work can be taken as references e The uC OS II is a RTOS It can better serve applications with real time constraints This suits our needs because the computing platform of the thesis project is mainly aiming for embedded applications 42 CHAPTER 3 In Chapter 4 where the OpenRISC CPU is discussed a section is reserved for the details of porting the uC OS IT RTOS to the OpenRISC 3 3 3 uC TCP IP Protocol Stack Many possibilities can be extended if a platform provides the networking feature On
62. OpenRISC 0 2888 Last visit 2011 01 31 6 Website Beyond Semiconductor http www beyondsemi com Last visit 2011 01 31 7 Website EMBECOSM http www embecosm com Last visit 2011 01 31 8 Website Dynalith Systems http www dynalith com Last visit 2011 01 31 9 User Manual OpenRISC 1000 Architecture Manual OpenCores Orga nization Revision 1 3 April 5 2006 10 Damjan Lampret OpenRISC 1200 IP Core Specification Revision 0 10 November 2010 11 Jeremy Bennett Julius Baxter OpenRISC Supplementary Program mer s Reference Manual Revision 0 2 1 November 23 2010 12 Webpage GNU Toolchain for OpenRISC http opencores org openrisc gnu_toolchain Last visit 2011 01 31 13 Jeremy Bennett The OpenCores OpenRISC 1000 Simulator and Tool Chain Installation Guide Application Note 2 Issue 3 November 2008 14 Lin Zuo System on Chip design with Open Cores Master Thesis Royal Institue of Technology KTH ENEA Sweden 2008 Document Number KTH ICT ECS 2008 112 15 Jean J Labrosse MicroC OS II The Real Time Kernel 2nd Edition Publisher Newnes June 15 2002 ISBN 978 1578201037 16 Webpage The online page of the thesis and project archive http www olivercamel com post master_thesis html Last visit 2011 01 31 Chapter 5 WISHBONE Specification and CONMAX IP Core In this chapter WISHBONE and CONMAX IP core will be introduced Because they are very important in the system a separate chap
63. RAM interface 25 sram_a out std_logic_vector 18 downto 0 S 26 sram dq inout std_logic_vector 31 downto 0 27 sram_adsc_n out std_logic El 28 sram_adsp_n out std_logic ER 29 sram_adv_n out std_logic 263 ab 30 sram_be_n out std_logic_vector 3 downto 0 31 sram cel_n out std_logic A 32 sram_ce2 out std_logic 33 sram ce3_n out std_logic 34 sram clk out std_logic 35 sram dpa inout std_logic_vector 3 downto 0 36 sram_gv_n out std_logic 37 sram_oe_n out std_logic 38 sram we_n out std_logic 39 40 SDRAM interface 41 dramO a out std logic vector 12 downto 0 42 dramn A inaut sta lacie ventar 15 damto mi vi gt Figure B 1 Top level entity of the project our ihex2mif tool under the folder tools ihex2mif In this way we can run any program as long as it is smaller than 64KB However if the program is getting bigger than the limit the bootloader is needed anyway to load the program into the external 2MB SSRAM which is large enough for many embedded programs If the default ram0 mif is modified and you want the bootloader back rename the file tools program_loader server_openrisc proloader_server mif to ram0 mif and copy it back to the ram folder P S When only update the MIF file without making other hardware modifications the whole FPGA project recompilation is not neces sary Just choose to update the MIF and run the assembler to gener ate a new SOF file again short key in Q
64. SC system Logic to activate a US ee ee a 32 bit address configuration for Memory Controller WM8731 Interface in the OpenRISC system WM8731 Interface internal structure 32 bit music data format 0 0 0 2 0 0000084 Top level entity of the project o o Program ALTERA FPGA 76 TT TT 78 79 80 81 83 B 3 Software project o e 144 B 4 Makefile script po sosa macta a poeti u a 145 B 5 Build software project oaoa a 145 B 6 Check USB to serial potID a 146 B 7 Downloading and flushing finished 147 B 8 IP configuration os s e come p bee a Re eS 148 B 9 Start software project via bootloader 149 B 10 Ethernet connected ooa a 00000200048 149 B 11 Decode MP3 file o o a 150 B 12 Set volume and play the music 151 This page is intentionally left blank List of Tables 3 1 4 1 4 2 6 1 6 2 6 3 6 4 6 5 6 6 Summary of addresses e 36 OR1200 GPRs part e SR 52 Exception types and causal conditions 54 Parameters of l port RAM IP core 101 Data organization for 32 bit ports 103 Memory Controller register configurations 118 System performance test results 008 119 NIOS II processor comparison part
65. They are allowed to be set to different numbers but usually are both set to 32 bit The rf_addr is a 4 bit width argument It defines the base address of the Register File The pri_selN is a group of 16 arguments with 2 bit width from pri_sel0 to pri_sell5 Each of them corresponds to one slave interface The pri_selN specifies how many priority levels are supported The 16 slaves can be set to support different priority levels if necessary The CONMAX uses the highest 4 bit of the address to decide which one of the 16 slaves is accessed For example if the address width is 32 bit a bus transaction accessing the addresses from 0xB0000000 to OxBFFFFFFF will be sent to the slave 11 regardless which master the transaction is from This also implies that once a slave is connected to the CONMAX its address range is determined e g the slave attached on the slave port 8 will have the address range from 0x80000000 to Ox8FFFFFFF The only exception is for the slave interface 15 because the Register File is included and its base address is set by the rf_addr If there is a match between the 2nd highest 4 bit of the address and the rf_addr the Register File is selected 92 CHAPTER 5 For example if the 4 bit rf_addr is configured as 0101 when writing a number 0x12345678 to the address 0xF500000C the value of 0x5678 will be written into the register for the slave interface 3 the last C is the address of the register 3 because the h
66. U buses This can be done either with off line simulation e g the Quartus built in waveform simulator or the ModelSim or with in system debugging e g the Quartus SignalTap The FPGA tools inspect the waveforms on the CPU buses With the waveforms we can further analyze which instruction the CPU is executing or what address the CPU is accessing This method is mostly used for examining the software hardware combined issues And it is difficult sometimes to setup the trigger conditions for the signal sampling Above all the software development workflow is introduced CHAPTER 3 41 3 3 2 Operating System and uC OS IT RTOS An Operating System OS is an important program runs on a CPU which manages hardware resources and provides common services for efficient ex ecution of various software applications 15 With an operating system the performance of a computing platform will be greatly improved Most existing OSs can benefit a platform from the aspects listed below e Support multitasking and schedule the tasks e Manage hardware resources e g provide drivers for popular hardware devices e Simplify application development by API library and service functions e Standardize software development e Include useful midware packages like TCP IP stack command line console file systems etc Because the advantages of the operating system when planning for the thesis it was determined that we were going to port an existing
67. a Xilinx FPGA board In these cases I will have to say I wish I could help but The reason is mainly because those boards probably do not have the audio CODEC WM8731 to play music or the DM9000A to support an Ethernet connection This makes the project porting becomes too difficult or even impossible B 1 2 A RS232 to USB Cable A RS232 to USB cable sets up a UART connection between a PC and the DE2 70 board So it is possible to communicate with the OpenRISC pro cessor with serial terminal software Most PCs do not have RS232 ports nowadays This is why we needs a RS232 to USB cable The cable is easy to find For example just go to www amazon com and search RS232 USB Another important reason to have a UART connection is the bootloader Because we don t have a programmer or debugger a bootloader was designed to download the program data to the DE2 70 s SSRAM SDRAM We enclosed a terminal software tool in the thesis project zip file under the folder tools uart_terminal Terasic s control panel is another option but it is not as handy as our bootloader CHAPTER B 139 B 1 3 An Ethernet Cable An Ethernet cable 2 connects the DE2 70 to the PC s network adapter such that the UDP connections can be created to transfer music data B 1 4 A Speaker or an Earphone To hear the music an earphone or a speaker is needed B 15 APC Of course we need a PC Our PC for the thesis project has 3
68. ages It is always much smaller than the external RAMs because the higher building costs In the thesis project we have only 32KB on chip RAM but externally 2MB SSRAM and 64MB SDRAM If the program data is over 32KB they have to be stored in the external RAMs The external memories will be discussed in the later sections 6 1 2 ALTERA 1 Port RAM IP Core and its Parameters For the ALTERA 1 port RAM core itself there is not too much to talk about because the core is quite simple and straightforward to use Like all other ALTERA IP cores the 1 port RAM core is included in and installed together with the Quartus II software package It can be launched from the Quartus II gt Tools gt MegaWizard Plug In Manager and then unfolding the Memory Compiler category After the wizard is started the Quartus CHAPTER 6 101 will ask to fill in some parameters for the RAM to be implemented Due to the plenty explanations it shouldn t be hard to understand what the parameters are about There is also a user guide document covering more details about this IP core 2 In the following table all chosen parameters are listed The table should be helpful when the readers want to recreating the same RAM block Data Width q 32 bit How Many 32 bit 8192 8192 32 bit 32KB Memory Type auto Clocking Method single clock Extra Functions and Pins register output port q create a byte enable create an aclr Memory Initial
69. amed ADR O which outputs an address there should be an ADR_I in the slave which receives the output To simplify only the signals from the The T in the CLK_I stands for an input port It is an output if the port is named as XXX_O CHAPTER 5 69 master side are introduced below The WISHBONE standard describes a lot signals and sorts them in an as cending order This is not good to understand for the beginners In this thesis the signals are divided into 6 groups based on how frequently they are used And each group is marked as basic or extended When designing an IP core not all signals specified in the standard have to be implemented but only the signals in the basic groups So some WISHBONE complaint IP cores may only have minimum basic signals while some others may have more extended signals to support advanced WISHBONE functions like block read write burst etc Here are the 6 groups of the WISHBONE signals Only the group 1 and 2 are the basic signals that every WISHBONE compliant IP core has to implement Group 1 CLK_I The system clock RST_I The system reset Clock and reset are the most basic signals that all WISHBONE interfaces should have They are the only two signals that always as input type what ever at the master side or the slave side All of the CLK_I and all of the RST_I are connected together in a WISH BONE network So all IP cores in the network share the same clock source and reset
70. an be also used to evaluate the software execution time Start the timer before a function call and stop it afterwards gives the running time of the function My partner Lin took this way to compare the performance between the OR1200 based platform and the ALTERA NIOS II based platform The details are documented in his thesis Chapter 8 14 56 CHAPTER 4 The structure of the tick timer is simple as presented in Figure 4 4 Ibit 27 enable clear stop or continue set Compare TTCR bit 31 bit 28 bit O Clock Figure 4 4 Tick timer structure and registers There are 2 SPRs controlling the tick timer the TTMR and the TTCR The TTMR sets up the timer operations And the TTCR holds the counted value The value stored in the TTCR is always added by 1 on every input clock rising edge provided the timer is enabled Bit 27 0 of the TTMR contains a user defined value which is continuously compared to the bit 27 0 of the TTCR If a match happens the TTCR can restart counting from 0 again or keep counting regardless the match or stop The behavior of the TTCR depends on the timer working mode Bit 31 and 30 of the TTMR selects 3 different timer working modes If both 2 bits are 0 the tick timer is disabled Bit 29 of the TTMR enables the timer interrupt If so on every match be tween the TTMR and the TTCR an interrupt is generated and the interrupt pending bit bit 28 is set to 1 The users need
71. an example In the first structure the system has to do memory padding Because after allocating the first char the next integer must be 4 byte aligned While in the second structure the padding is not needed Comparing the 2nd to the 1st the 2 structures have the same mem bers but the first one wastes 4 bytes in total due to the memory padding Please refer to OpenRISC 1000 Architecture Manual 4 Chapter 16 1 2 for more information of this topic 6 1 6 Miscellaneous 2 more settings of the ALTERA 1 port RAM core are described in this section We enabled the asynchronous reset aclr signal of the 1 port RAM core for interfacing the WISHBONE RST signal Note that when the 1 port RAM core is reset only the output register is cleared The contents of the memory block remain as before The registered output q is also enabled in the 1 port RAM core because this is the prerequisite to add the asynchronous reset signal The registered output actually delays the result for 1 clock cycle but makes the system CHAPTER 6 109 more stable because both input and output signals are synchronized to the system clock So far we have introduced the on chip RAM module of the hardware plat form including the ALTERA 1 port RAM core and its parameters the interface logic that connecting the 1 port RAM core to the WISHBONE bus also some specific concerns like address connection shifting memory alignment and memory padding etc 6 2 Memory C
72. ance with Harvard architecture 50 16 bit SPR address format o o 53 Tick timer structure and registers o 56 PIG Structures 42 ate a A ios 57 Interconnection is important in multiprocessor systems 65 Overview of the WISHBONE 68 An example of WISHBONE signal connections 73 An example of WISHBONE single transactions 74 xi xii 5 9 5 6 5 7 5 8 5 9 5 10 5 11 5 12 5 13 5 14 5 15 6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6 11 B 1 B 2 An example of a WISHBONE block write transaction Block transactions are helpful in multi master systems Master B is blocked by master A An example of a WISHBONE RMW transaction Maximum throughput with single transactions Maximum throughput with burst transactions An example of a constant writing burst transaction An example of an incrementing reading burst transaction An burst transaction with wait states Core architecture overview o o e 2800 An example of using CONMAX Hardware platform architecture 0 On chip RAM module internal structure Overview of data organization in OpenRISC systems Legal and illegal memory accesses 004 Example of memory padding Memory Controller in the OpenRI
73. ang Li me and my partner Lin Zuo in ENEA Malm Lund branch Sweden Most of the implementation was done from January to July in 2008 Because of the administrative reasons we had to write 2 separate theses which have similar structures but different focuses based on our responsibility For Lin Zuo s thesis please refer to 6 The theses and the archived project files are available at this link 7 1 2 Thesis Objectives The thesis is to implement a computing platform with open cores However this is a very broad topic After discussed and approved by the supervisors we had elaborated and refined the thesis into detailed tasks In this section the tasks of the thesis are summarized including both we achieved and failed due to the time limitation The original thesis announcement made by Johan J rgensen is copied as Appendix A from where we can get an overall idea of the initial purposes of the thesis 1 Evaluate quality difficulty of use and the feasibility of open source IPs 2 Design the system in a FPGA and also evaluate the system perfor mance 3 Investigate license issues and their impact on commercial use of open source IP 4 Port embedded Linux to the system 4 CHAPTER 1 Because it was hard to foresee how much work can be completed within limited time after discussion we defined the thesis tasks in 3 different levels Level One Two Three Below it is the list of the tasks The responsibility is marked in t
74. at a read followed by a write If we redefine the rules of the WISHBONE specification to allow the block transactions to include any type and any number of operations the RMW will become a subcategory of the block transaction CHAPTER 5 79 5 2 3 4 Burst Transaction Throughput is always an important criterion to evaluate the performance of an interconnection architecture Higher throughput can transfer larger amount of data in a certain time period Or in case the bus width is given it means to finish as many read write operations as possible The WISHBONE interconnection tries to achieve a good throughput too This is why it spent the whole chapter 4 to describe registered feedback bus cycles i e burst transactions The burst transactions are one of the four types of the WISHBONE transac tions which are different from the block transactions In principle the block transactions do not increase the throughput of a system Sometimes they may even ruin the performance if a master holds a line too long but does not transfer data But the burst transactions do improve the throughput by a set of carefully defined schemes The main idea of the scheme is to inform the slaves in advance that they are going to be addressed again and again within a bus transaction so that they will be prepared to respond continuously At the same time the masters could initiate operations one after another without waiting for the responses from the sl
75. at the same time Sometimes an IP core may have more than one reset input in such cases normally all reset inputs should be connected together so that only one reset signal drives the whole system The WISHBONE is a synchronous interconnection standard which means all IP cores in the system examine inputs as well as change outputs at each clock rising edge This is clearly stated in the specification when describing the WISHBONE features and objectives In page 9 one of the features is described as Synchronous design assures portability simplicity and ease of use And in page 12 the last several objectives are about synchronization like to create a synchronous protocol to insure ease of use good reliability and easy testing Furthermore all transactions can be coordinated by a single clock etc 2 70 CHAPTER 5 Group 2 STB_0 When the STB_O 1 it means either a read or a write operation is ongoing Meanwhile all data signals like the ADR DAT and WE etc are valid CYC_0 The CYC_O keeps high during the period of the whole bus trans action More than indicating bus transactions it is also used to request grants from bus arbiters when multiple masters are ac cessing one slave at the same time ACK_I Acknowledgements from the slaves When a read or write oper ation is finished the slave informs the master by giving a one clock cycle s ACK back When a master finds the ACK 1 on a clock rising edg
76. ata caches and MMUs in the thesis project The debug unit and the power management unit were implemented but not used Only the tick timer and the PIC were tested in hardware and we understand completely how they work So in the later sections some texts will be spent on the TT and the PIC but before that the OR1200 registers and the exceptions will be introduced first 4 3 OR1200 Registers There are 2 types of registers in the OR1200 the General Purpose Registers GPRs and the Special Purpose Registers SPRs The OR1200 has 32 GPRs All of them are 32 bit width These registers can be accessed directly by the software with the name r0 to r31 in assembly codes For example the following assembly code adds 128 to the data stored in the register rl and then use the value as the target address to save the This is because we wanted to have an easy start by simplifying the system as much as possible Later on when we would like to enable the caches and the MMUs sadly the time was not enough anymore 52 CHAPTER 4 value of the register r9 Because the rl is usually used as a stack pointer this line actually pushes r9 into the stack with an offset of 128 l sw 128 r1 ro When writing with higher level languages like C or C the compiler will manage the usages of the GPRs In this case the GPRs are transparent to the programmers A funny fact is that although the registers are called general purpose they are assigne
77. aves because the slave is assumed to know the data is sent continuously and be able to handle that For the single or block transactions normally after initiating operations the masters have to stay and hold signals until an ACK feeds back In the best case that communicating without any delay the waveform will look like Figure 5 9 Figure 5 9 Maximum throughput with single transactions In the figure firstly the STB is asserted to start an operation Then the slave replies as soon as possible on the next clock rising edge and give a valid ACK back After the ACK is received the master sends the next operation immediately As we can see in this scenario each operation takes 2 bus cycles to finish This means we can get 50 bus utilization in the best case 80 CHAPTER 5 To get a throughput yet higher the burst transactions are used A demo waveform is showed in Figure 5 10 In the figure the master starts a request by asserting the STB at the 8rd clock rising edge Meanwhile it somehow tells the slave that this is a burst transaction At the 4th clock rising edge the slave receives the message and gets prepared to handle the burst By giving back a one cycle ACK the slave indicates that it is ready for one more read write operation The master sees the ACK at the 5th edge and continues Then another 3 operations are done from the 5th and 7th clock cycles Figure 5 10 Maximum throughput with burst transactions
78. ay not understand the design details of the open core and how to adapt it in such cases a good market emerges by selling services especially for those consulting companies which have a good background on providing services In fact almost all companies who provide IP core products also support services at the same time Just think about Linux versus Windows CHAPTER 2 23 The third benefit is that developing open source products may give a push on selling related physical products The idea of GNU philosophy actually makes the knowledge free to get but after all the source codes have to run on some physical things somehow If the software is for free we can earn that part back by selling hardware products So silicon companies might like open cores like the LEON processor if they can increase the sales of the chips Or embedded company may be happy to develop open source software for their hardware system because this won t affect the sales of the products like digital cameras mobile phones or PDAs etc 2 6 Utilizing Open Cores or Not Pros and Cons In this section we try to answer the other question if it is not suggested to develop open cores for sale how about utilizing the existing ones Is it good idea or not Please note the difference To develop open cores means to design an open core from scratch for sale while utilizing open cores means to reuse existing open cores into the next products To answer t
79. be set to different priorities by writing numbers into the Register File which could be from 0 to 3 The priorities are 3 gt 2 gt 1 gt 0 The masters with higher priorities always win the arbitration But the masters with the same priority are still arbitrated in the round robin way To support multiple levels of priorities the value of the pri_selN has to be correctly configured When the pri_selN is set to 00 it only supports 1 level priority Writing numbers 1 2 3 into the registers takes no effect All masters are treated with the same level priorities When the pri_selN is set to 01 the CONMAX supports 2 priority levels Now it is allowed to write 00 or 01 into the registers The masters with 01 have higher priorities than those have 00 If the users somehow write 11 or 10 into the register the sequence will be 11 01 gt 10 00 Because in case of the pri_selN is 01 the arbiter only judges the last bit in the Register File for the masters CHAPTER 5 95 7 When the pri_selN is 10 the CONMAX supports 4 levels In such case all 0 to 3 priorities are valid and 11 gt 10 gt 01 gt 00 8 If configure the pri_selN to 11 it is just like to set the pri_selN to Ot 9 When a master gets a grant there is no way to interrupt it unless the master gives it up by de asserting the CYC Even if hi
80. codes from being incorrectly attributed For an example if someone gets some BSD license protected source codes which were made by a famous expert the people can improve the codes and redistribute sell it but cannot use the expert s name in the advertisement without permission Under the 3 clauses there is a long paragraph of disclaimer in the BSD license According to it basically the source codes covered by the BSD license provide no warranties to the users This is good for the authors since they promise nothing through the license And this is fair because most authors give the source codes for free when they choose the BSD license so of course they should not be responsible for any liability But for the users they may take risks when working with the source codes under the BSD license 12 CHAPTER 2 Now let s evaluate the BSD license in a business world First of all the license talks nothing regarding to money This means tech nically people can develop source codes under the BSD license and sell it at any price he she wants although this is not a usual case Because if someone would like to make money by selling products he she would rather choose a commercial license stricter than the BSD license which prohibits making copies freely Another interesting point is that the BSD license sets no limitations on keep using the license after redistributing This is different from the GNU licenses If the open source codes are the
81. coding algorithm On the DE2 70 side after the hardware is initialized the music player keeps checking the uC TCP IP stack and the UART port When a new UDP package is received the music data will be buffered in the external 64MB SDRAM When a PLAY command arrives through the serial connection the music player copies the music data from the SDRAM and forwards them to the Audio CODEC The Audio CODEC converts the music data to analog signals which can be further amplified by a speaker etc The demo application is included in the thesis archive file 5 It can be reproduced on any DE2 70 board In Appendix B detailed step by step instructions are given for the interested readers who want to try out the demo MP3 player 1At the beginning the libmad was planned to be integrated into the music player on the DE2 70 board If so we can send less data over the Ethernet But because the supports of some common C library functions like malloc for the OpenRISC were missing the libmad had to be moved to the PC side 44 CHAPTER 3 3 5 Summary Above all we have introduced the computing platform from 4 different lay ers To make a summary for this chapter as well as for the computing plat form a feature list is given below General purpose and multi functional embedded platform Low cost by using open cores and open source software Most source codes and design details are free except for the ALTERA s built in IPs lik
82. cussion on how to duplicate Cygwin environment http cygwin com ml cygwin 2008 04 msg00100 htm1 Last visit 2011 01 31 5 Webpage GNU Toolchain for OpenRISC http opencores org openrisc gnu toolchain Last visit 2011 01 31 6 Webpage The online page of the thesis and project archive http www olivercamel com post master_thesis html Last visit 2011 01 31 7 Website underbit technologies http www underbit com products mad Last visit 2011 01 31 This is where to find the information of the LibMAD 8 Website WireShark http www wireshark org Last visit 2011 01 31 Wireshark is the world s foremost network protocol analyzer
83. d E KTHS VETENSKAP 2 OCH KONST E USO MASTER THESIS Open Core Platform based on OpenRISC Processor and DE2 70 Board Xiang LI Company ENEA University Royal Institute of Technology School of Information and Communication Technology Stockholm Sweden Industry Supervisor Johan Jorgensen KTH Supervisor amp Examiner Ingo Sander Master Thesis Number TRITA ICT EX 2011 62 ENEA This page is intentionally left blank Abstract The trend of IP core reuse has been accelerating for years because of the increasing complexity in the System on Chip SoC designs As a result many IP cores of different types have been produced Meanwhile similar to the free software movement an open core community has emerged because some designers choose to share their IP cores by using open source licenses The open cores are growing fast due to their inherently attractive properties like accessible internal structure and usually no cost for license Under this background the master thesis was proposed by the company ENEA Malm Lund branch Sweden It intended to evaluate the qualities of the open cores as well as the difficulty and the feasibility of building an embedded platform by exclusively using the open cores We contributed such an open core platform It includes 5 open cores from the OpenCores organization OpenRISC OR1200 processor CONMAX WISH BONE interconnection IP core Memory Controller IP core UART16550 an
84. d General Purpose IOs GPIO IP core More than that we added the supports to DM9000A and WM8731 ICs for Ethernet and Audio features On the software side uC OS II RTOS and uC TCP IP stack have been ported to the platform The OpenRISC toolchain for software development was tested And a MP3 music player application has created to demonstrate the system The open core platform is targeted to the Terasic s DE2 70 board with ALTERA Cyclone II FPGA It aims to have high flexibility for a wide range of embedded applications and at the same time with very low costs The design of the thesis project are fully open and available online We hope our work can be useful in the future as a starting point or a reference both for academic research or for commercial purposes Keywords SoC OpenCores OpenRISC WISHBONE DE 70 uC OS II This page is intentionally left blank Acknowledgement This master thesis was started in January 2008 It cost me and my partner Lin Zuo more than 6 months to implement the project There were lots of difficulties when solving the technical issues but the writing of the thesis was even a greater challenge that I ever had Finally the writing was completed in January 2011 During the time many people generously gave considerable help to me Without those supports the thesis wouldn t have been come to this far So hereby I d take the opportunity to express my deepest gratitude to the following people Johan J rg
85. d as a result the GPLed library may be gradually forgotten by people To solve 16 CHAPTER 2 the problem a compromised license i e the LGPL is designed and used particularly for libraries Soon the FSF realized that 1 LGPL can be used for not only libraries but many other software as well and 2 the choosing between the GPL and the LGPL by authors is a strategy of development 24 but not only depend on whether it is targeted to a library or not so now the LGPL is renamed to GNU Lesser General Public License The LGPL is a set of additional permissions added to the GPL By those permissions the LGPL removes the last obligation of the four described above And this is the only difference between the LGPL and the GPL All the other three obligations of the GPL still remain In section 0 LGPL defines several more terms than the GPL The Library refers to a covered work governed by this License other than an Application or a Combined Work as defined below An Application is any work that makes use of an interface provided by the Library but which is not otherwise based on the Library A Combined Work is a work produced by combining or linking an Appli cation with the Library In section 4 Combined Works the LGPL says You may convey a Combined Work under terms of your choice that taken to gether effectively do not restrict modification of the portions of the Library contained in the Combin
86. d by the bootloader Also you will get a myPrj dis This is a disassembly file which shows all instructions of the project It is very helpful to check the disassembly file and understand what your software is actually doing Now let s start the bootloader and download the software to the DE2 70 board The bootloader is comprised with 2 parts a server that is already running on the FPGA with OpenRISC processor and a client will be started now to send the HEX data file from the PC The executable build proloader_client exeis the bootloader client that we are talking about It was compiled by Cygwin GCC and thus can only run in the Cygwin Run the following command under build CHAPTER B 145 folder proloader client exe d dev com5 f myPrj ihex p Makefile Sik WFO REO SEV MAD BRO MHW Del 464 4 a bh all myPrj OR32 PREFIX ce LD OBJDUMP OBJCOPY CFLAGS c Potato Path of F 2 odd INCLUDE INCLUDE INCLUDE INCLUDE INCLUDE VPATH VPATH VPATH VPATH VPATH ihex opt or32 uclinux bin or32 uclinux OR32 PREFIX gcc OR32 PREFIX 1d OR32 PREFIX objdump OR32 PREFIX objcopy g 00 JO iles KK Lib_orpXL ucOS II Source ucOS II Port uC TCPIP Source uC TCPIP Port Lib_orpXL uCos I1I Source FUCOS II Port UC TCPIP Source uC TCPIP Port EEE WRT 5 cd Bu make Vopt or jource lorpxl_c Vopt
87. d speed In a little bit detail I guess those open cores that have simple structure like UART mouse controller or something similar that is easy to design and implement will get a better change to become popular They are easy to be developed by small teams or even individual engineers and are easy to achieve a good quality As they are free and good enough why not use them to accelerate new system design On the contrary the complex cores like processors will be held tightly by big companies for still a long time 26 CHAPTER 2 This is because complicated cores have much more profits than simple ones Suppose if all computers are using open core processors but not the products from Intel or ARM this will make them crazy So my prediction for the future of the open cores would be the open cores will grow but not too fast And simple cores will get a better change to become popular 2 8 Conclusion In this chapter we talked about open cores in a commercial perspective Firstly we talked about some basic concepts about the open cores The open cores are the IP cores whose source codes are covered by open source licenses for example the BSD license and the GNU licenses Then we introduced the BSD license the GPL and the LGPL in detail also pointed out that there will be no problem to use the open cores that covered by the BSD license or the LGPL in commercial products But the GPL is not suggested mainly because it will infect o
88. d with special roles by the compiler It happens not only in OR1200 but many other CPUs as well Table 4 1 is partly copied from the OpenRISC 1000 Architectural Manual 9 page 334 It lists the usages of some GPRs The value of r0 is always fixed to 0 rl and r2 are the stack pointer SP and the frame pointer FP which point to the top and bottom of the stack r3 to r8 are used to pass the parameters during the function calls If a function has more than 6 parameters the extra parameters have to be stored in the stack r9 is the return address and r11 stores the returned data Register Preserved across function calls Usage Temporary register RVH Return value high 32 bits of 64 bit value on 32 bit system RV Return value Callee saved register LR Link address register Function parameter number 5 Function parameter number 4 Function parameter number 3 Function parameter number 2 Function parameter number 1 No Function parameter number 0 Yes FP Frame pointer Yes SP Stack pointer pro po o deoo Table 4 1 OR1200 GPRs part All OR1200 SPRs are listed in Section 4 3 of the Architectural Manual 9 All of them use 16 bit addresses in the format showed in Figure 4 3 The bit 15 11 are the group index and the bit 10 0 are the register address OpenRISC 1000 instructions are described in Chapter 5 of the OpenRISC 1000 Architectural Manual 9 CHAPTER 4 53 l
89. demonstrates a simple example The CPU wants to write a byte Oxbb to the address OxB It outputs a 32 bit data onto the data bus while set the bit 3 of the SEL to high During the transmission the address 0xB is converted to 0x2 because of the address shifting When the memory block sees the bus transaction the SEL lines will be checked The highest byte marked by the SEL3 will be accepted and put into the bits 31 24 at the physical address 0x2 of the memory block Signal on Bus address 0xB 0x3 Oxcc Oxbb Oxaa 0x99 data OXbb000000 ox2 0xbb 077 os 055 address sel 1000 oxi 0 063 022 om k address data action write gt ALTERA 1 port RAM IP core OpenRISC OR1200 CPU and its Interface Logic Figure 6 3 Overview of data organization in OpenRISC systems The byteena signal of the ALTERA 1 port RAM core is the perfect option to handle the SEL input Although the names are different Byte Enable and Active Select lines are in fact the same thing That s why the byteena signal is turned on when parameterized the 1 port RAM core and fed the SEL inputs directly to the byteena port as showed in Figure 6 2 As the conclusion to this section because we need to be able to efficiently read write 32 bit data for the 32 bit OpenRISC processor the data width of the memory block must be 32 bit To compromise the mismatching of the CPU s 8 bit granularity and the memor
90. do context switches We will come back later in the section where porting the uC OS II to the OR1200 is discussed To serve different types of the exceptions the programmers need to write the Interrupt Service Routines ISRs More importantly the ISRs have to be placed exactly at the correct entry addresses Locating a piece of program to the specified physical address shall be done by the linker Also note that the addresses from 0x0 to 0x2000 are all reserved for the interrupt vector table Although the addresses from OxF00 to 0x2000 are not defined yet for the compatible reason it is suggested to link the user programs starting from the address 0x2000 For the people who programming for the OR1200 but without a debugger it might be helpful to place several simple instructions at each exception entries for example to turn on a LED In this way it is easier to check if an exception has triggered or not when a program has lost response We learnt this experience from the thesis project 4 5 Tick Timer TT The OR1200 has a built in Tick Timer TT unit which is used to count the system clock pulses and therefore to have the time information if the clock frequency is given The TT is useful for many purposes It can generate fixed time interrupts which provides system ticks for the Real Time Operating Systems RTOS For example in the thesis project we used TT interrupts for the uC OS II RTOS to schedule tasks every 0 01s The TT c
91. e CS signals In this case the memory size can be 227 128 MBytes For most embedded systems 128MB RAM would be enough If still not it is an option to use multiple Memory Controllers on different CONMAX slave ports There is always a way to allocate the addresses 6 2 2 2 Power On Configuration POC Sometimes the initialization of the Memory Controller looks like an interest ing paradox To make the Memory Controller working properly its internal 1 Actually we don t even need this bit because the CS0 can be always selected The bit was reserved mainly for the purpose of testing the CS signal CHAPTER 6 115 registers have to be correctly configured by the CPU But the CPU needs to read the software instructions stored in the external memories to know how to configure the Memory Controller However the external memories cannot be accessed if the Memory Controller is not working properly The Power On configuration of the Memory Controller tries to solve this problem Every time the Memory Controller resets it reads the signal levels from the external bus The value will be stored into the POC register The last 4 bits of the POC will be then used to initialize the CSCn register to give a default basic working state to the Memory Controller so that the external memories become accessible in spite of the timing configuration is probably not optimized To give a definite logic value to Memory Controller POC register the last 4
92. e Spreading Incen tives or Promoting Resistance Rutgers Law Journal 36 53 162 2004 SSRN http ssrn com abstract 585922 Last visit 2011 01 31 Webpage Microsoft s Ballmer Linux is a cancer Jun 1st 2001 from Linux http www linux org news 2001 06 01 0003 htm1 Last visit 2011 01 31 Webpage Why You Shouldn t Use the Lesser GPL for Your Next Li brary from GNU http www gnu org licenses why not 1gp1l html Last visit 2011 01 31 CHAPTER 2 29 25 27 Website Gaisler Research http www gaisler com Last visit 2011 01 31 Gailer Research is a privately owned company that provides IP cores and supporting development tools for embedded processors based on the SPARC architecture Webpage Selling Free Software from GNU http www gnu org philosophy selling html Last visit 2011 01 31 Webpage The GNU Manifesto from GNU http www gnu org gnu manifesto html Last visit 2011 01 31 This is an old and good article suggested to read which well describes the GNU s philosophy An interesting point is in the section Won t programmers starve by saying just not paid as much as now it reflects that the GNU do realize that the free software movement will lower down the money people could earn This page is intentionally left blank Chapter 3 Platform Overview In the last chapter we discussed open source licenses and open cores in general From this chapter and the r
93. e it knows the current read write operation is done and it s OK to start the next one ADR_0 The address to access WE_0 Indicates either read or write It is a write operation if WE 1 else read DAT_O Data outputs from a master when writing to a slave DAT_I Data inputs to a master when reading from a slave SEL_0 Indicates which fragments of data are valid For instance for a 32 bit bus the SEL_O is 4 bit width If SEL_O 3 0 1000 during a read it means the master only wants the highest byte of the data DAT 1 31 24 All other bits are not valid and won t be processed The 8 signals in the group 2 are enough to perform basic WISHBONE functions Plus the CLK_I and RST_I all 10 signals are necessary for every WISHBONE compliant IP core Group 3 ERR_I Indicates errors RTY_I Retry signal ask for a repeat of the last read write operation Some IP cores have these signals in their interfaces They are similar to the ACK_I but have different meanings Once a read write operation is successfully finished a one cycle ACK returns but if it isn t the slave may give a one cycle ERR to tell the master that an error occurred in the last read write operation or send a RTY to ask for an retry To enable this function both the master and the slave have to support it i e the master should have the ERR_I and the RTY_I and the slave has the ERR_O and the RTY_O This implies certain functional logic has to be design
94. e CONMAX are cleared In this default case all priorities of the masters are reset to 0 i e all masters have equal priorities The CONMAX works in a round robin way for the masters with equal priorities In the arbiter of every slave interface there is a FSM with 8 states The states are used to decide which master is allowed to access this slave at the moment Let s name the 8 states m0 m1 m2 m7 Tf the current state is m n the master N will have the highest priority to access the slave After master N have accessed successfully the FSM will jump to m n state All other priorities are arranged in a circle If the current state is m6 the priorities are m6 gt m7 gt m0 gt ml gt m2 gt m3 gt m4 gt mo For example when power up the FSM resets to state m0 Now the priorities for the 8 masters are m0 gt ml gt m2 gt gt m7 At the next moment if 2 masters m0 and m1 struggle for the grant the m0 will 100 win because the current state is m0 even though both masters have equal priorities in the Register File Note that the round robin arbitration here has no randomly selection mechanism Besides please remember when m n is accessing the FSM will turn to m n state Thus the m n will have the highest priority This implies if the m n won t quit and is always involved into the subsequent competitions no one else at the same priority level can get the grant of the bus any more The masters can
95. e Group Index e Register Index bit 15 bit O Figure 4 3 16 bit SPR address format However it is not possible to access the SPRs directly with 16 bit addresses in the OR1200 They must be accessed with 2 instructions l mtspr and I mfspr and also with the help of the GPRs Instruction l mtspr means to move a value to a SPR It has the following format in assembly code l mtspr rA rB K This instruction uses the value stored in the GRP rA to perform a logical OR with the constant K The result is the target address of the SPR Then it copies the data stored in the GRP rB to the SPR with the calculated address For example in the instruction 1 mtspr r0 r9 32 firstly 32 is ORed with the value in r0 which is always 0 The result is 32 and it is the target address of the SPR Compare 32 with the address format showed in Figure 4 3 the group index is 0 and the register address is 32 This is the register EPCRO 9 The instruction actually copies the value stored in r9 to the SPR EPCRO Similarly instruction 1 mfspr rD rA K reads a value from a SPR It cal culates the logical OR with the data stored in the GPR rA and the constant K The result defines the target address of the SPR to be read The value read from the SPR will be stored in the GPR rD 4 4 Interrupt Vectors The OR1200 provides a vector based interrupt system which reserves a range of specific memory spaces Once an exception happens the OR1200 CPU
96. e IP cores that are ready to use These IP cores provide great convenience when designing new systems For example in this thesis project we can quickly build a system with the selected IP cores If we had to create those IP cores ourselves it would be an impossible work for us to finish the system within limited time 2 The appearance of the Open Core community Most IP cores are implemented by Hardware Description Languages HDLs either VHDL or Verilog HDL The HDL source codes can be further synthesized to digital circuits by software tools Similarly to the free software community there has emerged an open core community where the designers publish their IP core HDL source codes that are protected by open source licenses The open source IP cores are called in short Open Cores The OpenCores organization is the world s largest site community for development discussion of open source hardware IPs 2 3 In the web site there are hundreds of opening or finished projects regarding to the open cores which cover from CPUs to all kinds of peripherals Some of the open cores are with very good quality and have been successfully used in commercial industrial projects One of the most interesting advantages of the open cores is they are free to access This is especially critical for thesis students like us who do not have enough budgets to acquire commercial IP cores Besides build up a low cost system with open cores for commercial pu
97. e RAM FIFO PLL except for uC OS II and uC TCP IP Based on Terasic s DE2 70 board and ALTERA s Cyclone II FPGA OpenRISC OR1200 processor 50MHz no cache no MMU WISHBONE bus standard implemented with CONMAX IP core Memory Controller IP core for 2MB SSRAM and 64MB SDRAM RS232 by UART16550 IP core Buttons LEDs 7 segments by GPIO IP core WISHBONE interface for WM8731 audio CODEC DAC only WISHBONE interface for DM9000A Ethernet controller Porting uC OS II to OpenRISC processor Porting uC TCP IP to OpenRISC processor achieved 3KB speed for a stable connection LibMAD running on PC is used to convert MP3 to WAV format Bootloader that download software binary files via RS 232 ihex2mif that convert ihex format to ALTERA s mif format CHAPTER 3 45 References 1 10 11 12 Website Terasic Technologies http www terasic com tw Last visit 2011 01 31 Webpage Altera DE2 70 Board from Terasic Technologies http www terasic com tw cgi bin page archive pl Language English amp No 226 Last visit 2011 01 31 User Manual DE2 70 Development and Education Board User Manual Terasic Technologies Version 1 03 http www terasic com tw cgi bin page archive pl Language English amp CategoryNo 53 amp No 226 amp PartNo 4 Last visit 2011 01 31 Website OpenCores org http www opencores org Last visit 2011 01 31 Webpage The online page of the thesis and project archive ht
98. e STB is 1 the slaves should latch the current data before they turn into the wait state When the slaves come back from the wait states they should re sume all signals remembered before they fell into the wait states Note that at the edges when slaves return from wait states the only thing they do is to resume the remembered signals The first IF ELSE block is skipped at that clock edge The last thing about the burst transaction needed to explain is about the wait state According to the WISHBONE specification both the masters and the slaves are allowed to insert wait states at any time during the burst transactions when they cannot accept more data temporarily The following is an example about the wait state of the burst transactions Since the rules summarized above also suit for the wait state cases the readers are suggested to examine them in the example MAXX fst K XX X ist X IX X 2nd X XX X 2nd X 3rd X 4th Xe XXX x Ist X 2nd X XXX X 2nd X 3rd KIN K 4th XK XXX CON KO XIX XO CON XO XXX XO CON KO I K CON Be EOB oe Figure 5 13 An burst transaction with wait states Figure 5 13 is a constant read burst which deliberately inserts some wait states both by the master and the slave When the master or the slave turns into wait states they output X i e unknown signal Edge 1 The master initiates the burst The slave does nothing Edge 2 Master The master wants to insert a wait state I
99. e The OpenRISC toolchain 5 has greatly improved The latest toolchain includes the GCC 4 2 2 with uClibc 0 9 29 GDB 6 8 and orlksim 0 3 0 A precompiled toolchain package for the Cygwin is also available e Some OpenRISC documents have been updated or created 6 8 e The other 4 open cores i e CONMAX Memory Controller VART16550 and GPIO have no change since 2008 e A new WISHBONE standard Revision B 4 9 has released This new version supports pipeline traffic mode e The Micrium published a new RTOS kernel uC OS III but the source codes are no longer open for academic users The uC OS IT remains the same as before but the example of porting the uC OS IT to the OpenRISC was removed from the website Luckily 134 CHAPTER in the SVN of the OpenRISC project at opencores org another porting example is added now 4 The source codes of the uC TCP IP were also removed from the Mi crium website e An enhanced DE2 115 board 10 with a Cyclone IV FPGA and more memories is available from the Terasic 11 Again a lower price is offered for academic users References 1 Lin Zuo System on Chip design with Open Cores Master Thesis Royal Institue of Technology KTH ENEA Sweden 2008 Document Number KTH ICT ECS 2008 112 2 Website Eclipse http www eclipse org Last visit 2011 01 31 3 Website ORSoC http www orsoc se Last visit 2011 01 31 4 Webpage OpenRISC News from OpenCores Organization
100. e because of the limited time of the thesis 1 3 Chapter Overview The thesis contains 7 chapters An overview of the chapters is given below As mentioned before the contents of this thesis focuses mainly on the tasks that Xiang Li was responsible For Lin Zuo s part e g system performance comparison please refer to Lin s thesis 6 Chapter 1 is the chapter you are reading which gives an introduction to the thesis Chapter 2 introduces 3 widely used open sources licenses GPL LGPL and the BSD license and discusses the impacts of the licenses for the open cores The chapter is placed before the system implementation chapters because it is a primary task to investigate We don t want to violate the licenses while using open cores Also it would be interesting to know the influences of the open sources licenses if an open core based system will be used for commercial purposes From Chapter 3 to Chapter 6 the implementation of the open core based computing platform is described Chapter 3 gives an overall impression of the system architecture including the hardware block diagram the software development workflow and the description of a demonstration application Chapter 4 focuses on the OpenRISC OR1200 CPU which is the heart of the computing system The processor is discussed from many aspects like hardware software and porting the uC OS II RTOS Chapter 5 is dedicated for the WISHBONE bus protocol and an open core im
101. e more than one For example my laptop has 2 network cards The one is wireless and the other is a normal 100 1000Mbps network adapter In this case please disable the wireless network card Meanwhile please close all other software that may send TCP IP pack ets to the Internet like IE MSN and anti virus software that might upgrade themselves automatically The steps 3 1 3 3 make sure there will be only one program our music player sending UDP packets to the only target address the DE2 70 board The reason is because the thesis application is not so reliable to handle all kinds of packets If somehow another software broadcasts 148 CHAPTER B RAE REE mes Y Que O 5 Par pa m Hik o e ER A LAN RBM Internet ARES S 2 AER Et LANE i Q PERRRNNA cl de A tx ERNEA Internet HX ICP IP Btt E9 Intel R PRO 1000 PL Network C a EEO TURE O pele tare ee Y Bas SUBTLE i iz Y Network Monitor Driver Y Internet HHX TCP IP O Babe8 IP Hist 0 lt m ae OBA THA IP Htt 6 ZEW HEU REW IP htt q 192 168 0 ET FEB U 255 255 255 IRTE ERA MPI CRABS HRMS BUA 0 192 168 0 z ae MERR TEARRE EE w Babee DNS ARASH B MIER ERRADA WH BATA DNS ARS SHE BE DNS ARSE SFA DS BAA Figure B 8 IP configuration a UDP packet during the time we are downloading the music fi
102. e value of the following expression must as close as possible to 488 28ns Prescaler 1 System Clock gt 488 28ns The system for the thesis project uses 50MHz clock so the prescaler is set to 23 i e 10111 in binary The other fields of the CSR can be left unchanged for the SDRAM For each chip select signal there is a pair of registers need to be configured CSC and TMS The Chip Select Configuration CSC register determines the address range and external memory device type etc The Timing Select TMS register decides the timing parameters for the attached memory devices For the SSRAM the TMS is not used The value can leave as default as OxFFFF_FFFF For the SDRAM the TMS value is defined by the datasheet and according to the Table 2 of the Memory Con troller user manual 6 page 16 In our case the value is set to 0x0724_0230 Note that the timing parameters here are very non aggressive For example the read write burst between the Memory Controller and the SDRAM chip is disabled These parameters may be reconsidered to improve the SDRAM User manual Section 4 5 6 says its reset value is 0 but this is a mistake After a reset the hardware initializes all TMS registers to OxFFFF_FFFF See the HDL source file mc_rf v 118 accessing performance The following table summarizes the register values for the SSRAM and the SDRAM SSRAM SDRAM CSR 0x00000000 0x17000300 POC 0x00000002 0x000000
103. ed Work and reverse engineering for debugging such modifications And in section 5 Combined Libraries it says You may place library facilities that are a work based on the Library side by side in a single library together with other library facilities that are not Applications and are not covered by this License and convey such a combined library under terms of your choice By these clauses the LGPL clearly separates 2 different sets of components the Library that is a LGPLed module and an Application from some where else which isn t governed by the license Because the LGPL is initially designed for libraries it keeps using the phrases like library and application According to the LGPL it will not affect other modules which just connect to a LGPLed module We can draw a picture to explain the rules Suppose we have a software project including 2 existed libraries and a lately designed application linked to them As Figure 2 1 a shows Library A is covered by the LGPL while Library B is from a company and covered by a commercial license The Application links to the 2 libraries In this case the whole CHAPTER 2 17 project is a combined work and it is allowed to license the project under user defined terms as long as these terms do not conflict with the LGPL and also guarantee that the LGPLed library is still open source
104. ed in the IP cores But these features are not compulsory according CHAPTER 5 71 to the WISHBONE standard If a slave doesn t have an ERR_O or a RI Y_O the ERRI and the RTY_I of the master can be wired to ground If a master does not support these signals the ERR O RTY_O and ACK_O of the slave may be connected together with an OR gate and then send to the ACK_I of the master But in such case the ERR and the ACK signals are treated as an ACK and ignored Group 4 TGD_I Tag of input data TGD_O Tag of output data TGA_O Tag of address TGC_0 Tag of bus cycle These 4 signals are called tag signals because they are attached with other signals to provide extra information just like tags For example when a master is sending data in serial to a slave if some of the data is more special than the others they could be marked by the TGD_O which is sending at the same time so that the slave can recognize the special data when receiving the TGD_O signal Or for another example when the CYC 1 the value of TGC could be used to determine which kind of transaction is transferring for instance the 00 01 10 and 11 could be used as tags to stand for single block RMW and burst transactions respectively An interesting thing is that the WISHBONE doesn t specify the 4 signals in detail like the width of the signals or the meanings of the data patterns This leaves a great freedom to the users on how to utilize the signals In princ
105. ehavior of the IP core always matches the descriptions in the user manual So when doing the project we had a good trust to the IP core and in fact didn t make a lot simulation at the IP core level to verify the timing etc before inte grating it into the system And the system did work without spending extra time from us The Memory Controller indeed greatly improved the productivity for our project With the IP core we managed to make the SDRAM IC to work with the OpenRISC system within a week But if we had to write a controller for the SDRAM implementing the read write timing and dynamic refreshing and so on there is no way to imagine how much time we would spend on it According to our supervisor Johan it could take half a year even for experienced engineers e WISHBONE compatible The Memory Controller IP core has a WISHBONE interface which is fully compatible to the WISHBONE bus specification Rev B 3 The width of the address and the data bus of the IP core are fixed to 32 bit but it is not a problem because the OpenRISC OR1200 CPU is also 32 bit width Different from many other so called WISHBONE compliant IP cores that only support single read write operations the Memory Controller also support WISHBONE burst transactions With the burst transactions the performance of the communications between the CPU and the memory will be enhanced a lot But due to the limitation of the time we didn t try out this feature in the t
106. el HEX format by the objcopy Also it can be converted back to the assembly codes by the objdump Both objcopy and objdump are parts of the GNU Binutils All the steps above can be done by the Makefile with 1 command in Cygwin With the generated executable file the next steps should download it to the DE2 70 board run and debug the software program However they were not easy Because we didn t have a JTAG connection between the PC and CHAPTER 3 39 the OpenRISC CPU it was not possible to send data to the target CPU directly We made workarounds to achieve the downloading and debugging For down loading the programs to the DE2 70 board 2 methods were used 1 With a MIF file linked to the ALTERA on chip RAM IP core 2 With a bootloader and the serial communication In the FPGA project it is possible to link a MIF format file to the ALTERA on chip RAM IP core In this way the data stored inside the on chip RAM will be initialized when the FPGA is programmed and the CPU can already access the on chip RAM and execute the programs from there We designed a software tool ihex2mif After the or32 executable files are translated into the Intel HEX files the ihex2mif can further convert them to the MIF format There are 2 limitations of this downloading method 1 the FPGA project has to be updated every time when the MIF file is updated 2 the software program must be small enough to fit into the on chip RAM which is 3
107. ely There are also instructions which guide to set up the tools For a Linux pro this shouldn t be a problem In the past compiling from the source codes used to be the major way to get the toolchain Unfortunately I am not a Linux pro and I tried but failed to compile a working toolchain under the Cygwin The toolchain made myself was always not working properly or efficiently Luckily the OpenRISC teams now provides a pre compiled toolchain pack age for the Cygwin Just unzip the package to the system path all the tools for the OpenRISC software development will be ready to use This toolchain package is available from the opencores org website 5 A old version we used during the thesis time is also included in the thesis zip file B 1 8 Quartus II The Quartus II is the FPGA development software designed by ALTERA Because the DE2 70 board uses ALTERA s FPGA the Quartus becomes the one cannot be replaced CHAPTER B 141 We used Quartus II 8 0sp1 Web Edition for the thesis with the web license acquired freely from www altera com No need to pay for the license B 1 9 The Thesis Archive File At last don t forget to get the project archive file This file can be down loaded at my Blog 6 which includes both hardware and software projects as well as some tools and other stuff This file will be downloadable until the end of year 2011 but no guarantee after that B 2 Step by Step Instructions Now let s start t
108. ensen the ENEA hardware team leader and our industry su pervisor Johan is the best supervisor that I can expect He is knowledgeable experienced and full of brilliant ideas He is good at communicating and en couraging people With a short conversation he can take my pressures away As a supervisor he participated almost in all aspects during the thesis He proposed the topic made the detailed plan together with us helped to setup the working environment and kept checking our progress When we were in trouble he provided not only useful suggestions but sometimes even looked down to the source code level When I was upset because the thesis writing went slow he was always supportive It is my pleasure to have Johan as a supervisor Ingo Sander our KTH supervisor and examiner Ingo s lectures were very impressive and valuable to me which opened many windows to the new fields This was the reason I wanted to be his thesis student Ingo actively monitored the thesis although we were in different cities He followed our weekly reports and replied with advice and guidance He tried to set higher requirements to us which created bigger challenges but I also gained more experiences at the same time And I especially appreciate his patience when the thesis writing was failed to finish on time The knowledge learnt from Ingo truly benefits me which I am using it everyday now in my career life iii lv Lin Zuo my thesis partner and good
109. eous sos oaa Bob Be Re eae amp Sas Memory Controller IP Core 08 6 2 1 6 2 2 6 2 3 6 2 4 Introduction and Highlights Hardware Configurations 6 2 2 1 Address Allocation 6 2 2 2 Power On Configuration POC 6 2 2 3 Tristate Bus 0 2 6 2 2 4 Miscellaneous SDRAM Configurations 6 2 2 5 Use Same Type of Devices on One Memory Controller 0 2 000000 Configurations for SSRAM andSDRAM Performance Improvement by Burst Transactions 6 3 UART16550 IP Core o o a 120 6 4 TORIO IR Core ui A A do A ios 122 6 5 WM8731 Interface o 123 6 51 TntPoduction pi a te Re pia 123 6 5 2 Structure of the WM8731 Interface 123 6 5 3 HDL Source Files and Software Programming 125 6 6 DM9000A Interface o e a 126 6 7 SUMMA e A DAA at ee amp 126 References 127 Conclusion and Future Work 129 TA Conclusions s i r aa eS ee eae Ea 129 1 25 Future Works io Gi Seba te tee be Re ie Ghee Gh ede See 132 7 2 1 Improve and Optimize the Existing System 132 7 2 2 Extension and Research Topics 132 7 3 What s New Since 2008 2 204 133 References ta lit dow ge on he Be LOO Ae ae don Cass 134 Thesis Announcement 135 A 1 Building a reconfigurable SoC using open source IP ea s grak e a ae E e 0 000000 08 135 A 2 Further
110. er vendors have to ask for the license to make their products having standard interfaces Furthermore every time when the standard is updating the owner will get chances to lead the direction of the development in future 4 Sometimes the standard could even be the first criterion of the IP core selection For instance because we used the WISHBONE in the thesis project all the open cores chosen for the system have to be WISH BONE compliant Some of the IP cores with different interfaces were given up because they cannot adapt into the system easily although they might have very good quality So now we know how important the interconnection architecture could be Then we must emphasize a special feature of the WISHBONE standard it is in the public domain The WISHBONE is in the public domain means it is not copyrighted which is another way of saying anyone could do anything with the WISHBONE without any limitation but of course no warranty at the same time This is a great gesture from the developers of the WISHBONE because they gave up 66 CHAPTER 5 the ownership they could have so that all other people benefit since they are allowed to develop new products based on the WISHBONE without asking for a license They could even turn the new WISHBONE based products into their own proprietaries because the WISHBONE is not copyrighted Besides the WISHBONE will be always for free Comparing to the ARM s AMBA it is a copyrighted open
111. erally the open cores under the LGPL have no problem for commercial purposes but the GPLed open cores are not because they are infectious When the products including open cores under the LGPL are published basically a company has to do the fol lowing 1 Announce the open cores contained in the product 2 Attach a copy of both the GPL and the LGPL 3 Publish the source codes that generating the final product 2 4 The Price for Freedom Comments on the GNU Philosophy The GNU GPL and LGPL have been introduced in the previous section Now it is time to think about something at a higher level the GNU philosophy we could learn from the licenses Let s start from a misunderstanding of the free software Many people misunderstand the word free of free software as free of charge or zero price So they feel strange and uncomfortable when they are asked to pay for the copies of the free software like some Linux products from distributors They may query Isn t this FREE software Actually this is a common misunderstanding for free software In dictionary there are 2 meanings of the word free The one is not under control or not subject to obligations i e freedom and the other is available without charge i e no cost for money The GNU says clearly in the 3rd paragraph of the preamble of the GPL text as When we speak of free software we are referring to freedom not price They
112. erences 1 11 Michael Keating and Pierre Bricaud Reuse Methodology Manual for System on a Chip Designs Kluwer Academic 3rd Edition Jun 2002 ISBN 1 4020 7141 8 R K Gupta and Y Zorian Introducing Core Based System Design IEEE Design amp Test of Computers 14 4 15 25 Oct Dec 1997 Webpage Semiconductor Intellectual Property Core from Wikipedia http en wikipedia org wiki IP_core Last visit 2011 01 31 Website Open Source Initiative OSI http www opensource org Last visit 2011 01 31 OSI is a non profit corporation formed to educate about and advocate for the benefits of open source and to build bridges among different constituencies in the open source community Webpage The Open Source Definition from OSI http www opensource org docs osd Last visit 2011 01 31 Webpage Open Source Licenses from OSI http www opensource org licenses Last visit 2011 01 31 The licenses approved by the OSI comply with the open source defini tion Webpage GNU from Wikipedia http en wikipedia org wiki GNU Last visit 2011 01 31 Webpage GNU Project from Wikipedia http en wikipedia org wiki GNU Project Last visit 2011 01 31 Webpage Richard Stallman from Wikipedia http en wikipedia org wiki Richard_Stallman Last visit 2011 01 31 Webpage Free Software Foundation from Wikipedia http en wikipedia org wiki Free_Software_Foundation Last visit 2011 01 31
113. est of the thesis we will come back to the engineering topics and focus on the thesis project which implemented a computing platform with open cores Before going too much into technical details it is always good to have an overview to the system This is the purpose to have this chapter Chap ter 3 tries to give the readers an overall impression to the platform And then in the following chapters more details will be covered regarding to the processor the bus structure and the peripherals of the system For the open core computing platform it can be divided into 4 layers as showed in the figure below From bottom to top it has Hardware layer Dig ital FPGA layer Operating System layer and Software Application layer We will follow this thread to describe the system in this chapter MP3 Player Demo Application DE2 70 Develop Board Hardware Figure 3 1 Platform overview 31 32 CHAPTER 3 3 1 DE2 70 Board The thesis project is based on a DE2 70 FPGA board The DE2 70 board is produced by Terasic 1 2 It is equipped with an ALTERA Cyclone II 2C70 FPGA together with large volume RAM ROM components and plenty of peripherals including Audio devices and an Eth ernet interface The DE2 70 board presents us a reliable and powerful hardware platform With careful FPGA design to drive the hardware the board can be imple mented as different systems with multimedia networking and many other possible features The DE2 7
114. etc For more information about the WM8731 please refer to its datasheet 18 In the thesis project we wanted to play music with the open core platform so it is needed to drive the WM8731 audio CODEC with the OpenRISC processor Therefore an interface that connecting the external WM8731 device to the WISHBONE network is required Figure 6 9 below gives the overall connections FPGA OpenRISC WISHBONE WM8731 WM8731 Processor Network Interface gt Figure 6 9 WM8731 Interface in the OpenRISC system The interface was designed as an IP core by us With the interface it is pos sible to configure the WM8731 with the OpenRISC processor and send the music data But the data receiving from the WM8731 is not implemented So the microphone and the line in are not supported 6 5 2 Structure of the WM8731 Interface The WM8731 chip has 28 pins but most of them have been taken care of by the designers of the DE2 70 board All we need is to implement the digital logics in the FPGA to communicate with the WM8731 The WM8731 has a control interface and a data interface The control in terface is used to configure the WM8731 internal registers while the data 124 CHAPTER 6 interface is used to transmit the music data from to the WM8731 The con trol interface can be selected to work as either a 3 wire SPI or a 2 wire 12C interface Unluckily the DE2 70 board has the mode fixed to the 12C In this way it saves 1 pin from the FP
115. ets the ACK The 2nd to 4th transactions in the figure describe this situation 76 CHAPTER 5 5 2 3 2 Block Read Write Transaction The WISHBONE specification defines block read write transactions to trans fer more than one data in a bus transaction It is almost the same as the signal transactions only multiple single read write operations are now cap sulated in one transaction To indicate the current bus transaction is a block transaction the CYC has to keep high during the whole transaction period stb moo ack MA MA m mm ret err addr OO XADOOOOO4 A ADOOOOOS X RXNXXXXX YADOOOOOC YEXXKYADOOOOIO K XXX we SII ERK JNE data i XXXIX datao E s000000 YOOOOAAAA inom DODOBEBE Y KNAXXXXX Yoooocccc WAXXXY O00ODDDD YK sel a CEES e AA 1111 AU O Y x Figure 5 5 An example of a WISHBONE block write transaction As Figure 5 5 shows it is a block read transaction which includes 4 read operations As we can see now the CYC is keeping high for the duration of the transaction time while the 4 read operations have no difference with single read write operations According to the specification block transactions must contain either all read or all write operations but cannot have both types in one transaction However in my perspective it should not be a constraint In principle both read and write operations are operations and the block transactions are ac tually a batch of single operations so to this ex
116. expect to make money with the cores because we have discussed in the previous section that making a commercial product open source will result in less profit than not doing so However the open source things are still good in some other ways In this section we will introduce how the open source products could benefit the business The first benefit is that developing open source products is always a good strategy to make a company attracting more publicity Many open source products are considered with a better quality or at least to some aspects because bugs are easier to find in case of open source as everyone can look into it Therefore open source things are easier to propagate through the Internet without much advertising This makes producing an open source product a good way to announce the company itself which can be compared to the discount information of the supermarkets Every store frequently puts up some eye catching posters or slogans like buying a dozen coke gives 70 off etc When people are attracted into the store they are likely to buy many things else expect for the discounting products So open source products could take this role as advertisements The second benefit gained by open source products is that this is a good way to sell services Because it is most likely that many users who are going to use open source things may not know how or a company that going to include an open core into their next product m
117. f there is no wait state the master should hold all signals unchanged because the ACK is O But now the master must remember the value of all output signals and repeats them when the master resumes from the wait state By resetting the STB to 0 the master turns into the wait state Slave The slave however sees the STB l at the edge 2 Since it can 88 CHAPTER 5 Edge 3 Master Slave Edge 4 Master Slave Edge 5 Master Slave Edge 6 Master Slave handle this reading request the slave sets the ACK to 1 and returns the 1st data Because this is the first read operation in the burst the slave outputs the data based on the 1st address sent from the master Besides the slave needs to predict the next address based on the 1st address and the CTI The master is back from the wait state Now it should recall all signals logged at the edge 2 Because at the edge 2 the master should keep signals unchanged the signals at the edge 3 thus are the same as those at the edge 1 The slave checks the ACK and finds out which is 0 so it holds all outputs the 1st data unchanged for 1 more cycle The master feels not like working again If there is no wait state the master should latch the current data and output the next address because the current ACK is 1 But due to the master inserts another wait state it only latches the Ist data outputs X for other signals
118. friend As a partner he contributed a solid part to the thesis He worked really hard and took over many heavy tasks For some of them I was not confident but Lin made them come true As a friend he is a very funny guy When our work seemed going to a dead end Lin can easily amuse me in his special way and ease the anxieties Thanks to him the thesis becomes a successful and interesting memory when we look back to the life in Malmo And my parents Without their love and supports throughout my life I would never get this chance to write a master thesis acknowledgement Sincerely thanks to all the people who helped me on the way to my today s achievements Thank you all Contents Abstract Acknowledgement List of Figures List of Tables 1 Introduction 1 1 Background and Motivation e 1 2 Thesis ODJectives ri corsi si areia ld tae eee Be 1 3 Chapter Overview o e o NN 2 Open Cores in a Commercial Perspective 2 1 Basic Concepts y so ys cae poroi a 2 1 1 What is Open Core aoaaa 2 1 2 Formal Definition of Open Source 2 1 3 Licenses Involved to Evaluate 214 GNUand ESF e a e e aa E aati 2 1 5 Free Software Free of Charge iii xi XV 10 vi 22 BSD Licenses sraa d w i adaa da da ey ee ee 10 2 3 GNU Dices a ee es tk Dis 13 Dd MGR Mis it ee Shows Sok eed 3 13 232 GPITS opus ee a Cee Oe Pe 15 2 3 3 Evaluations on the
119. g in to sell the products at an even lower price All of these actions will force the price of the product to become lower and lower until the profit is low enough that no one else would like to waste time on doing these things any more Figure 2 3 shows the trend Price Profit Number of Competitors Figure 2 3 Increasing competitors force the price lower Besides in our modern society the Internet is so popular to everyone If the source codes are freely to get from the network people will likely download it instead of buying it if they feel the price is too high Although usually it is a skill to compile the source codes that not everyone understands many of them would still tend to find the instructions and to learn rather than pay for the product to the companies In a sentence open source makes products unprofitable This is the price 22 CHAPTER 2 paid for the freedom That s why most free software is free or has a very low price because the market forces it to be That s why most big companies do not support free software movement nor would like to make their products open source because it will reduce the money they could have earned 2 5 Developing Open Cores or Not How Open Source Products could Benefit 2 questions were mentioned at the very beginning of the chapter Now it is time to solve the first one As a company should we develop new open cores for sale Generally speaking the answer is NO if you
120. gher priority masters become ready during the time they still have to wait for the next competition until the current master terminates itself and gives up the grant 10 The 16 slaves can be configured individually in the Register File This means one master can have different priorities when accessing different slaves So far all descriptions about the CONMAX are finished We hope it is clear enough to understand how the interconnection is organized and operated by the CONMAX which is the one of the most important components in the system References 1 Webpage SoC Interconnection WISHBONE OpenCores Organization http opencores org opencores wishbone Last visit 2011 01 31 This is the official WISHBONE page which recommends the WISH BONE as the preferred bus standard 2 Specification Specification for the WISHBONE System on Chip SoC Interconnection Architecture for Portable IP Cores OpenCores Organi zation Revision B 3 Released September 7 2002 3 Rudolf Usselmann WISHBONE Interconnect Matrix IP Core Rev 1 1 October 3 2002 4 Webpage A message used to post at Opencore s forum Wishbone spec clarification http osdir com ml hardware opencores cores 2007 04 msg00030 html Last visit 2011 01 31 This message was posted at the Opencore s forum but now has removed A snapshot of the message is found at the link above It contains useful information to explain the LOCK signal of the WISHBONE
121. gure 6 7 Logic to activate a CS 114 CHAPTER 6 3 bits are enough to identify 8 chips The bits 26 24 of the input addresses may be reserved for this purpose For example if CSC5 21 19 are set to 101 and the BA_ MASK is set to 0x38 all the input addresses in the format of OxX5XX_XXXX or 0xXDXX_XXXX will activate CS5 It is usually not the case that 8 memory chips are connected to the same Memory Controller so it is possible to reserve fewer bits for the CS For example in our project only 1 bit is configured to enable the CS0 because we need just 1 chip select signal bit ooo bit 0 ae Bit 23 0 are reserved for external memory device Bit 26 24 activates 1 of the 8 CSs of the Memory Controller Bit 27 selects Memory Controller register or external memory Bit 31 28 should match with CONMAX IP Core slave ID Figure 6 8 32 bit address configuration for Memory Controller Figure 6 8 above summarizes the allocation of the address The first 4 bit address section is decided by the CONMAX IP core slave port ID The following bit selects internal registers or external memory ICs After that 3 bits can be reserved for 8 chip select signals And the rest bits are saved for the external memory address spaces The format in Figure 6 8 has 24 bit addresses for the external memory IC So the memory size can go up to 274 16 MBytes per chip Consider if only one CS is required there is no need to spend 3 bits for identifying th
122. h 8 bit BYTE Granularity Address Active Portion of Data Bus a 16 D8 T Select SEL_I SEL_I SEL_I SEL_I 0 SEL_0 SEL O SEL O SEL_0 0 BYTE 0 BYTE 4 Orderi rAering LITTLE BYTE 3 BYTE 1 ENDIAN BYTE 7 BYTE 5 Table 6 2 Data organization for 32 bit ports Table 6 2 shows that the system should use address range 63 2 if there are 64 address lines In our case only addresses 14 2 are used because we have only 8192 address entries 21 8192 Except for the valid address range the table also describes how the SEL signal should select the active portion of the 32 bit data bus The rest contents of the section are dedicated to explain Table 6 2 Hope it can help to understand the configuration easier Like most CPUs the OpenRISC OR1200 is also with the 8 bit 1 byte granularity i e from the CPU s perspective there is one byte stored at each unique address This sometimes gives people wrong impression that thinking an N bytes memory block is organized as the width of 8 bit times the length of N However this is not true For example in this project we set data to 32 bit width on the 1 port RAM core as mentioned in Table 6 1 The memory block is organized as 32 bit 8192 which gives in total 32KB size The data width of the memory has to be 32 bit is because the OpenRISC OR1200 is a 32 bit processor A 32 bit processor implies that the width of 104 CHAPTER 6 the data bus of the CPU is 3
123. h Win dows was more productive To create Linux like environment under Windows we used Cygwin 6 The Linux environment is needed for the GNU toolchain The GNU toolchain is a collection of programming tools produced by the GNU project 7 Natively the GNU toolchain doesn t support the Open RISC CPU but because they are open source the OpenRISC developers borrowed them for the OpenRISC processor We use C programming language to develop software The modified GNU toolchain for the OpenRISC helps to convert C source codes into executable OpenRISC instructions It is easy to install the OpenRISC toolchain under the Cygwin Just need to download the tools and follow the installation instructions from the off cial webpage 8 When doing the thesis we used a very old version of the OpenRISC toolchain The toolchain package was with the date 2003 04 13 Now the OpenRISC develop team has released much newer version The OpenRISC toolchain contains a set of tools Mostly we use the ones below 9 13 e GCC It compiles the C source files into object files for the OpenRISC e GNU Binutils lt works as the linker which links all object files maps the absolute addresses based on the linker script and produces the executable target file e Orlksim The Orlksim is a low level simulator for the OpenRISC 1000 architecture Based on the executable file it can simulate the behavior of the OpenRISC processor For example it can execute in
124. h gate re sources 4 Has a WISHBONE interface which makes it easy to work with the WISHBONE or OpenRISC based systems The GPIO IP core is easy to use Firstly it has to be correctly configured This can be done in the setting file gpio_defines v Then the IP core needs to be connected to the WISHBONE network And if the interrupt is used the WB_INTA_O signal has to be wired to the OpenRISC CPU After that the GPIO registers will be accessible by the CPU Mostly the RGPIO_IN register is read for the input values or the RGPIO_OUT register is written to set the output signals For more details please refer to the GPIO IP Core Specification 17 It has included enough information for using the IP core To conclude this section the GPIO IP core is not complicated in the func tionality but it is handy to have the IP core prepared Most projects require general I O features in various cases like buttons switches LEDs or ex ternal buses Using the GPIO IP core can certainly accelerate the project development CHAPTER 6 123 6 5 WM8731 Interface 6 5 1 Introduction On the DE2 70 board there is an audio chip WM8731 connected to the FPGA The WM8731 is produced by Wolfson Microelectronics It is a low power stereo CODEC enCOder and DECoder with an integrated headset driver It supports microphone in line in and line out And it is designed for audio applications like the portable MP3 player speech player and recorders
125. h time to try out are collected as the future works The thesis project was started in January 2008 Most implementation was done in about 6 months But due to some personal reasons the writing of the thesis wasn t finished until January 2011 So an extra section is added to give the technology updates in the last 2 years 7 1 Conclusions The goal of the thesis is to implement an open core based computing plat form on a DE2 70 FPGA board First a summary of the tasks that we achieved is given The readers can compare them with the tasks listed in Chapter 1 Section 2 1 Studied open source licenses including the GPL the LGPL and the BSD license analyzed the influences of the licenses for the project both for academic and commercial usages 2 Created a digital system on the DE 70 board using ALTERA s tools and techniques like the Quartus II the on chip RAM IP core etc 3 Utilized and integrated 5 open cores in the digital system OR1200 processor CONMAX Memory Controller UART16550 and GPIO 4 Studied the WISHBONE interconnection protocol and added some explanations in the thesis for the WISHBONE bus transactions 129 130 CHAPTER 7 10 11 Learnt to use the OpenRISC toolchain for the software development including GCC GNU Binutils GDB Makefile and OpenRISC simu lator etc Designed 2 software tools ihex2mif and proloader to help downloading user programs to the DE2 70 board The ihex2m
126. he brackets The Level One tasks are mandatory for the thesis It contains the very basic goals to create a working system with open cores e Study the open source licenses and investigate the impacts of the li censes for commercial usages Xiang and Lin e Build a FPGA system on a DE2 70 develop board with the following open cores OpenRISC CPU Xiang WISHBONE bus protocol and CONMAX IP core Lin Memory Controller for SSRAM SDRAM Lin UART connection Xiang e Setup the toolchain compiler etc for software development and de sign applications for demonstrating the hardware platform Xiang e Port uC OS IT Real Time Operating System RTOS to the platform Xiang e Build an equivalent system with ALTERA technology and evaluate the performance comparing to our system Lin The Level Two tasks are mainly to extend the system with more features e Add support to WM8731 Audio CODEC such that the system can play music Xiang e Add support to DM9000A Ethernet controller Lin e Port uC TCP IP stack to the system such that the system can com municate with a PC with TCP UDP protocols Lin e Design a software application to demonstrate these features Xiang The Level Three tasks are advanced tasks e Build multiprocessor system with more than one OpenRISC CPU CHAPTER 1 5 e Port Linux to the platform At the end we finished most tasks on Level One and Two but failed to start the tasks of Level Thre
127. he external memories are selected In our project MC_REG_SEL is set to wb_addr_i 27 1 b1 and MC_MEM_SEL is set to wb_addr_i 27 1 b0 It means the bit 27 of the 32 bit address is used to select the internal registers or the external memories For example in page 31 the Memory Controller user manual 6 there is a list of registers If assuming the CONMAX port 10 is in use the address 0xA800_0010 will be mapped to the register CSCO because the bit 27 here is 1 Third the Chip Select CS configuration is the next step to consider The Memory Controller supports maximum 8 CS signals i e it is possible to connect up to 8 external memory ICs to the same Memory Controller In the mc_defines v users can define how many CS signals are going to implement Unused chip selects are better to comment out to save the FPGA resources CSO is always enabled by default There is no way to disable it To activate a CS it requires a combination of the CSCn and BA_MASK registers For each CS signal there is a Chip Select Configuration Register CSCn while the BA MASK is valid for all the CS signals The values of the CSCn and the BA_MASK registers are initialized by software If the following equation is true the corresponding CS signal is set to low to select the external IC CSCn 23 16 logicAND BA_MASK 7 0 input address 28 21 cscn 23 16 BA_MASK Bitwise pot AND Input Address gt Match 28 21 Fi
128. he question will result in an evaluation of the pros and cons of using open cores This is why we will discuss about the advantages as well as the possible risks in this section The most obvious advantage of using open cores is to accelerate the design of the new products To reuse blocks that have been designed is much better than re design them from beginning Another very attractive point of open cores is the price because most open cores are free to get This lowers down a lot the threshold of the money to start a new project and thus especially good for small companies Take our thesis project as an example all we needed for the project are just a develop board which cost 329 dollars a PC and several cables All the IP cores are free of charge If we were using commercial cores the price would be considerably too high to afford The third advantage is that the open cores are adaptable Because all the source codes are open necessary changes can be easily made to integrate open cores into new systems Comparing to commercial cores usually they have certain techniques to hide the design details from the users To make changes on these cores therefore costs quite long period which requires the help from the vendors 24 CHAPTER 2 And one more advantage that often get ignored is the open cores could make the users feel safe Because open cores have all their design details in public users can see those details although they may no
129. he source codes are covered by any one of them you may call it open source 2 1 3 Licenses Involved to Evaluate Now we have established the open source licenses as the target to study Next question is to define what licenses are involved Because the OSI list of licenses is really long not possible to go through them all All open cores used in the thesis project are covered by either LGPL or the BSD license Due to the LGPL is based on the GPL all 3 licenses i e GPL LGPL and the BSD License will be introduced in the later sections Actually the open cores used in the thesis are not covered by the exact BSD license but a BSD style license However the introduction to the original BSD license will still be useful 2 1 4 GNU and FSF Because the GPL and LGPL will be introduced later several words related to the background are worth to mention The letter G in the GPL and LGPL stands for GNU which is a name of a project to develop a Unix like operating system that is completely free soft ware The GNU licenses were initially designed to protect the liberty of the free software of that project without being violated But later they became more and more popular and now widely used to cover a large proportion of free software all over the world Sometimes the GNU is also treated incorrectly as an organization that is responsible for the project probably because its website is named as www gnu org But in fact it i
130. he step by step instructions Basically there are 3 big steps Review the Quartus project and program the FPGA on the DE2 70 Download the software project to the DE2 70 with the bootloader Run the software to send music data and play on the board B 2 1 Quartus Project and Program FPGA 1 1 1 2 1 3 Start Quartus II and open the Quartus project in the hardware folder The top level entity is in hardware component top orpXL_top vhd The entity includes all the pins allocated on the EP2C70 Figure 1 shows the file As we can see all modules of the project are saved in different folders under hardware There is one module needed to be mentioned a little more The ram folder contains an on chip RAM module It is configured as 64KB The ram ram0 mif in the same folder is the data file will be written into the RAM when the FPGA is programmed In our thesis zip file this ram0 mif contains the data of the bootloader So if you do not change this file the bootloader will be downloaded to the board at the same time when the FPGA is programmed It is possible to replace this ram0 mif with something else For example you may write your own program covert it to the MIF format with 142 CHAPTER B Le components top orpXL_top vhd E 16 library ieee sm 17 use ieee std_logic_1164 all oho 18 19 Gentity orpXL_top is 20 port FE E el clk_50 in std_logic za 22 rst_n in std_logic A 23 24 SS
131. hesis project That is a regret e Support a wide range of memory devices The Memory Controller is designed for general purpose but not specific to a certain type of memory IC It supports a wide range of memory 112 CHAPTER 6 devices including SSRAM SDRAM FLASH ROM EEPROM etc Almost all common used memory chips that configuring for 32 bit computing systems can be easily driven by the Memory Controller On the DE2 70 board we have a 512K 36 SSRAM chip and 2 16M 16 SDRAM chips The Memory Controller can work with both of types after setting the parameters correctly 6 2 2 Hardware Configurations Before using the Memory Controller it has to be correctly configured There are 2 types of configurations The one has to be done at the hardware level for example the address allocation To do the hardware configuration it is needed to modify the parameters of the HDL codes Those parameters will take effect to the FPGA internal logics after the compilation by the Quartus The other type of configurations can be done at the software level by writing the Memory Controller registers for example the timing configurations for the external memory ICs The software programs are responsible to do this usually during the initialization phase In this section we introduce the hardware configurations including the ad dress allocation the power on configuration POC and some other HDL modifications The software configurations like the tim
132. hive pl Language English amp No 226 Last visit 2011 01 31 Lin Zuo System on Chip design with Open Cores Master Thesis Royal Institue of Technology KTH ENEA Sweden 2008 Document Number KTH ICT ECS 2008 112 Webpage The online page of the thesis and project archive http www olivercamel com post master_thesis html Last visit 2011 01 31 Chapter 2 Open Cores in a Commercial Perspective This chapter might look a bit weird in an engineering thesis but it was really the very primary task to do even before the implementation of the thesis started Because the thesis involves open cores and most open cores are covered by various open source licenses like GPL LGPL and the BSD license good understanding to those licenses are inevitably needed when evaluating open cores Quoted Johan J rgensen our thesis supervisor We want a platform that is reconfigurable no frills and cheap that can be used for in house projects but we do not want to hit the mines of open source So we need to know if open cores are a safe bet and what kind of performance they will have on a given FPGA In this chapter 3 of most widely used open source licenses GPL LGPL and the BSD license are introduced If building an open core based system really become true another topic is also interesting for the companies would the open source licenses limit the system for commercial purposes What are the benefits and the trade offs As the thes
133. if converts HEX files to the ALTERA MIF format The proloader behaves as a bootloader which loading the programs from a PC via a serial connection Started to create a Hardware Abstraction Layer HAL library to col lect the hardware interface functions for easier software development at a higher level Understood how to support context switches with the OpenRISC CPU and ported the uC OS II RTOS to the OR1200 Extended the computing platform with Audio and Ethernet features Developed a MP3 player application to demonstrate the whole system My partner Lin Zuo did some more tasks like porting the uC TCP IP stack and comparing the performance with an equivalent system built with the ALTERA IP cores For these parts please refer to his thesis 1 Chapter 1 also mentioned 4 initial purposes of the thesis Here they are repeated 1 2 Evaluate quality difficulty of use and the feasibility of open source IPs Design the system in a FPGA and also evaluate the system perfor mance Investigate license issues and their impact on commercial use of open source IP Port embedded Linux to the system Only for porting the embedded Linux we couldn t make it due to the time limitation For evaluating the system performance it is described in my partner s thesis 1 An important motivation to do this thesis was to evaluate the quality difficulty and feasibility of the open cores The way of the evaluation was
134. ighest OxF selects the slave interface 15 and the 2nd highest 4 bit 0x5 matches the rf_addr 0101 As we can see from the example because there is some addresses reserved for the Register File the address space can be used for the external slave is reduced In the last example the IP core connected to the slave port 15 will have 2 valid address ranges from 0xF0000000 to OxF4FFFFFF and from 0xF6000000 to OxFFFFFFFF All addresses starting with OxF5 are reserved for the Register File 5 3 5 Functional Notices To describe some notices of the functions of the CONMAX an example is designed with the waveform showed in Figure 5 15 clk 0 pp A Mp o O Sp Ap O pS o Sp Ap A o SS A A a II rst mO_cyc_i mO_stb_i m0_addr_i 00000000 00000008 00000000 oo00000C ood F0000000 ml_eyc_i ee ee eee mi_stb_i TFATTTT TA daa lid ml_addr_i 00000000 OOOOFFFC s0_eye_e Trittiyaititiitit it GUL PP Le sO_stb_o s0_addr_o 00000000 00000008 00000000 00000000C 0000 0000 E O000FFFC s15_cyc_o TFT TULL s15_stb_o s15_addr_o 00000000 X 00000008 00000000 oo00000C 46000 Fooooooo E Figure 5 15 An example of using CONMAX The figure displays 2 masters and 2 slaves that working with the CONMAX The m0 xxx i and m1 xxxi are the signals coming from the 2 masters and are connected to the master interface 0 and 1 of the CONMAX Similarly the s0 xxx o and s15_xxx_o are the signals that output from the CONMAX to the 2
135. ime the programmers must check the flags in the PICSR and decide the sequence to handle the interrupts There is no need to clear the flags in the PICSR after the interrupts are served However the interrupts must be cleared from the source nodes For example if the GPIO IP core triggers an interrupt the ISR must read 58 CHAPTER 4 the GPIO registers to clear the interrupt signal from there The PIC keeps sampling all external inputs and refreshes the PICSR flags automatically So when the GPIO IP core pulls down the interrupt line the flag is cleared at the same time in the PICSR 4 7 Porting uC OS II to OR1200 4 7 1 Introduction In this section we will discuss how to port the uC OS II RTOS to the OR1200 processor Port means to modify the uC OS II so that it can work on the OR1200 based hardware platform Talking about the uC OS II probably everyone knows the famous book MicroC OS IT The Real Time Kernel 15 It is written by Jean J Labrosse the creator of the uC OS Chapter 13 of the book generally describes porting the uC OS II to different types of processors It is a very important reference to us The uC OS II RTOS has good portability because most of the source codes are written in ANSI C When porting it to the OR1200 only several files need to be adapted They are the OS_CPU h and the OS_CPU_C c In our thesis project archive 16 these files can be found under software uCOS II Port 4 7 2 uC
136. in Chapter 6 In total 5 open cores are used in the system All of them are available at opencores org 4 e OpenRISC OR1200 IP Core e CONMAX IP Core e Memory Controller IP Core e UART16550 IP Core e GPIO IP Core My partner Lin Zuo and I have decided to open the designs we made to the public as well So most parts of the FPGA project i e the open cores and our designs are open source The FPGA project can be found in the project archive file at 5 3 2 2 Summary of Addresses From the perspective of the programmers a hardware system is more or less a matter of a group of registers with different addresses So it could be helpful to understand the system with a list of addresses A summary of the addresses of the system is given in Table 6 1 below Because the OpenRISC is with 32 bit address bus all addresses are of 32 bit length All addresses are statically allocated and can be accessed directly by soft ware The values of the addresses are decided partly by the CONMAX IP core and partly by the design of the peripheral IP cores themselves 36 CHAPTER 3 0x00000000 32KB On chip RAM 0x00007FFC 0x10000000 2MB SSRAM Ox101FFFFC 0x18000000 1st Memory Controller Control Registers 0x1800004C 0x20000000 64MB SDRAM Ox23FFFFFF 0x28000000 2nd Memory Controller Control Registers 0x2800004C 0xC0000000 DM9000A Index Register 0xC0000004 DM9000A Data Register 0xD0000000 WM8731 Control Register 0xD000001
137. information a a a 136 A Step by Step Instruction to Repeat the Thesis Project 137 B 1 Hardware Software Developing Environment 137 B 1 1 DE2 70 Board pae aani t aa pi a 138 B 1 2 A RS232 to USB Cable 138 B 2 B 1 3 An Ethernet Cable 139 B 1 4 A Speaker or an Earphone 139 Bilio APC a A A Es ga 139 B L6 Cygwin coe oe eke a a 139 B 1 7 OpenRISC Toolchain 140 B 1 8 Quartus Ill 2 or uino bee ee oe a we Re eed 140 B 1 9 The Thesis Archive File 141 Step by Step Instructions 0004 141 B 2 1 Quartus Project and Program FPGA 141 B 2 2 Download Software Project by Bootloader 143 B 2 3 Download Music Data and Play 147 References a A o EA Boca Beg Be E ob so th een al 151 List of Figures 2 1 2 2 2 3 3 1 3 2 3 3 3 4 3 5 4 1 4 2 4 3 4 4 4 5 5 1 5 2 5 3 5 4 Difference between LGPL and GPL 17 Difference between open software and open core 18 Increasing competitors force the price lower 21 Platform overview e 31 DE2 70 board micro rs Pe RG os a a Re ees 32 Hardware block diagram of the DE2 70 33 Open core system block diagram 34 Software development workflow 38 OR1200 architecture 2 2 2 ee eee 50 Improve perform
138. ing between 2 machines usually used together with RS 232 standard For em bedded systems UART is the easiest way and the top selected solution to setup a connection between the PC and the target board With the serial connection it is possible to use terminal software like PuTTY on the PC to control or debug the target systems For the thesis project we also needed such a connection for the open core platform That became the reason to involve the UART16550 IP core Several features of the UART16550 are worth to be stressed 1 The UART16550 is WISHBONE complaint It has a WISHBONE in terface which makes the IP core easily integrated into an OpenRISC based system Thanks to this feature we spent only half a day to introduce the IP core to the project 2 The UART16550 has 2 FIFOs always enabled for transmitting and receiving The FIFO size is 16 bytes 14 and it is possible to set different interrupt triggering levels The existence of the FIFOs gives a large improvement on the UART communication performance It can buffer more data before the overflow when the CPU has to interrupt the current task to process the UART data 3 The UART16550 supports 4 interrupts All of them share the same output signal INT_O which needs to be routed to the OpenRISC processor 4 The UART16550 IP core implements the UART logic only It still needs a RS 232 transceiver like the ADM3202 used on the DE2 70 board to reach the RS 232 electrical character
139. ing and embedded applications e Competitors include ARM10 ARC and Tensilica RISC processors The OR1200 has a clear architecture showed in Figure 4 1 It has a CPU DSP core at the center and includes data instruction caches data instruction Memory Management Units MMUs a timer an interrupt controller a debug unit and a power management unit These 2 documents were updated in late 2010 We didn t have them when doing the thesis 50 CHAPTER 4 System IF IF PM POWERM IMMU i DB ICache PERIG DCache TICK TIMER CPU DSP INT omw Figure 4 1 OR1200 architecture The OR1200 is with Harvard architecture i e it has separated instruction bus and data bus To maximize the CPU performance by utilizing the Har vard architecture it is recommended to use 2 physically isolated memories to store the instructions and the data This is showed in Figure 4 2 a Unfor tunately in the thesis project we used a non optimized structure like Figure 4 2 b which limited the CPU performance It was too late to change the design when we realized it OpenRISC 1200 WISHBONE Network Insturction Bus Instruction RAM Interface Data Bus Interface Po 70 TTT Data RAM a Instruction Data accesses in parallel OpenRISC 1200 WISHBONE Network Insturction Bus Interface ae RAM Data Bus Interface b Instruction Data accesses share the same RAM port Figure 4 2 Im
140. ing parameters for the SSRAM and the SDRAM will be discussed in the next section All the Memory Controller hardware parameters are contained in the file mc_defines v where the users can easily modify them based on the system requirements The file is globally included by all other source files 6 2 2 1 Address Allocation The Memory Controller IP core and the external memory devices attached to it have to be given a proper range of addresses so the CPU knows where to access the data in the physical spaces The Memory Controller IP core has a 32 bit address bus It can be divided into 4 sections when considering about the address allocation First as talked before the Memory Controller is directly connected to the WISHBONE network through the CONMAX IP core The slave port of the CONMAX that the Memory Controller is connected to decides the highest 4 bits 31 28 of the addresses given to the Memory Controller For example lPlease refer to the section 5 3 4 CHAPTER 6 113 if the Memory Controller is on the port 10 of the CONMAX the assigned addresses range from 0xA000_0000 to OxAFFF_FFFF Second in the mc_defines v 2 definitions MC_REG_SEL and MC_MEM_SEL determine either the internal registers of the Memory Controller or the exter nal memory spaces are being accessed If the expression of the MC_REG SEL is true the Memory Controller internal registers are selected If the expres sion of the MC_MEM_SEL is true t
141. int out the problem a paradox regarding to the concept of free The GNU argues that the freedom has no conflict with making money But from my perspective the freedom and the free of charge are actually the same thing or would rather say the freedom will finally result in free of charge So to speak the GNU philosophy is incompatible with the business rules that widely applied in our economic society nowadays This is easy to understand because it is the human nature that we will always choose the better way for ourselves If there is something that you can get freely no one will pay to get it anymore Take the air as an example which everyone can breathe Would you like to spend money on that If a sales man comes to you and says Hey we have a new product called free air which has no difference with the normal air but just cost 100 dollars will you buy it The easier to get something the cheaper it is and vice versa This explains why water in some countries is free of charge but in some others needs to pay and may be even more expensive than a life in desert Therefore at the same time the GNU licenses are used to grant the freedom to users it makes free software easier to get And this will reduce the price of the free products meanwhile lower down the profit that developers could have had Now let s think about what will happen from a business perspective Suppose we are a software company
142. iple it is totally possible to define a custom protocol for a system as long as both the master and the slave understand the tags in the same way The tag signals need specialized logic design in both the master and the slave IP cores too Again these signals are not mandatory Most existing WISHBONE IP cores do not have them Group 5 LOCK_0 Indicates the current bus transaction is uninterruptible LOCK is another signal almost never in use The usage of the signal is not clearly described and only mentioned in the chapter 3 3 of the WISHBONE specification page 51 2 According to the waveform in the figure when the block read write transactions happen 72 CHAPTER 5 the LOCK could be used together with the CYC to hold a transaction uninterrupted The information from the specification is not enough but in the WISH BONE forum there was a message 4 of LOCK from Richard Herveille the author of the WISHBONE specification which explains a little more about the signal The message is copied below What is the purpose of the LOCK_O signal From the description it says that it indicates that the bus cycle is uninterruptible However the statement Once the transfer has started the INTERCON does not grant the bus to any other MASTER until the current MASTER negates LOCK_O or CYC_O If deasserting CYC_O causes the lock to end then this signal doesn t really do anything more than asserting CYC_O by itself does it
143. is just to connect all other IP cores to the CONMAX If take a look at the website of the opencores org there are several other 90 CHAPTER 5 IP cores that also implement similar functions But we finally chose the CONMAX among the competitors because of the following reasons 1 it supports more masters and slaves 2 it provides more priority levels and 3 it is designed by Rudolf Usselmann from the asics ws 5 According to our experience the IP cores from that team have better quality and more detailed documents The CONMAX IP core is not complicated Its source codes are well struc tured plus with a clear and concise document 3 The people who are good at Verilog HDL can skip the thesis and turn to study the source codes and the official IP core document instead 5 3 2 CONMAX Architecture There is a figure in the CONMAX document well exhibits the structure of the core It is copied here as Figure 5 14 As we can see there are 8 master interfaces supporting maximum 8 WISHBONE masters and also 16 slave interfaces for up to 16 slaves Besides there is a Register File included in the slave interface 15 which is used to save the information of the priorities of the master interfaces Master k Slave Interface 0 Interface O _ Blave nterface 1 1 Master Interface 1 6K Master Interface 7 Register File Figure 5 14 Core architecture overview 5 3 3 Register File The Register File is a group of
144. is part of the source codes can be found in software Application Board reset S of the thesis project archive 60 CHAPTER 4 In the timer interrupt the first thing we do is always to backup the context of the current task If another higher priority task gets ready its context will be loaded In this case the context switching is really performed Otherwise the context of the current task will be resumed as if the timer interrupt is never happened The OR1200 context includes 3 SPRs and all GPRs In a multitasking sys tem these registers are needed for a task The 3 SPRs EPCR EEAR and ESR are updated by the OR1200 when an interrupt is triggered which store the program counter PC the effective address and the CPU status The following assembly codes demonstrate how to save a context All regis ters are pushed into the stack of the current task by using the stack pointer SP GPR rl as the base address plus different offsets l addi ri ri 140 get enough space for a context l sw 36 r1 r9 store r9 before using it temporarily mfspr r9 r0 32 copy EPCR to r9 SW 128 r1 r9 save EPCR mfspr r9 r0 48 copy EEAR to r9 SW 132 r1 r9 save EEAR mfspr r9 r0 64 copy ESR to r9 SW 136 r1 r9 save ESR PPP PPP SW 8 r1 r2 save GPRs to stack except r0 rl r9 SW 12 r1 r3 because r0 is always 0 SW 16 r1 r4 r9 has been saved before SW 20 r1 r5 rl as the SP will be saved later SW 24
145. is project was executed in the company ENEA we believe analyses about the open cores in a commercial perspective would be useful for them So some general discussions are also made in the chapter about utilizing open cores on behalf of a company Hope the introductions and evaluations to the open cores and the open source licenses in this chapter could answer 2 questions As a company project manager should we develop new open cores Or should we improve existing open cores and integrate them into the next products 8 CHAPTER 2 2 1 Basic Concepts In this section several basic concepts regarding to open cores are introduced 2 1 1 What is Open Core Open core is a shortening of Open Source Intellectual Property IP Core which is a combination of the concepts of open source and IP core There is no need to spend much words on introducing IP cores because those are quite well known and discussed already in plenty of articles like 1 2 In short the increasing complexity of electronic systems and time to market pressure force engineers to utilize already made blocks i e IP cores as many as possible into the next system which is so called the design reuse methodology Actually semiconductor IP cores have a broad definition and can be in any form of reusable unit of logic cell or chip layout design 3 but in this thesis the discussion of the IP Core is narrowed down to only digital logic blocks desig
146. ise the OSTickISR will return and the context of the previously running task will be resumed which is done in the file reset S The resuming of a context is very similar to the storing of a context but in an opposite way All buffered registers are copied back from the stack to the CPU At the end an OR1200 instruction l rfe is used to return from the in terrupt The instruction reloads the EPCR EEAR and ESR such that the CPU is having the new program counter PC and status register if a context switch has happened 4 7 4 Summary Now we have introduced how a context switch is performed in the OR1200 timer interrupt The other type of the context switching triggered by l sys is done practically in the same way with another function OSCtxSw As a summary supporting the context switching is the core of porting the uC OS II to the OR1200 processor As soon as this part is carefully taken care of porting the uC OS IT won t be difficult to understand References 1 Website OpenCores org http www opencores org Last visit 2011 01 31 2 Webpage OpenRISC Project Overview from OpenCores org http opencores org project or1k Last visit 2011 01 31 62 CHAPTER 4 3 Webpage Damjan Lampret s homepage http www lampret com Last visit 2011 01 31 4 Website ORSOC http www orsoc se Last visit 2011 01 31 5 Marcus Erlandsson OpenRISC project update http opencores org forum
147. istics Regarding to the details of how to use the IP core please refer to the UART16550 specification 15 To conclude this section the UART16550 is an open source IP core de signed with a WISHBONE interface This feature makes it suitable for the platforms internally using the WISHBONE interconnection for example the OpenRISC based systems Our experience also shows a serial connection can be quickly built with the UART16550 IP core which saves the developing time and improves the productivity 122 CHAPTER 6 6 4 GPIO IP Core The General Purpose Input Output GPIO IP core is used to drive 4 7 segment LEDs and monitor 4 buttons on the DE2 70 board The IP core is available at opencores org 16 It is designed by Damjan Lampret and Goran Djakovic The GPIO IP core might sound a little too simple to be called as an IP core but it is really helpful to have such a ready to use IP core in hand Because if we really had to start working on the I O design from scratch most likely it would consume longer time than expect The GPIO IP core is tiny yet powerful and multifunctional Several features are especially highlighted below 1 Support up to 32 pairs of general inputs and outputs 2 Support external clock input so the input signals can be sampled based on the clock rising edge 3 Support bidirectional port so the I Os can be set to tri state or open drain for the external buses provided the FPGA has suc
148. its So generally if someone is using source codes covered by the GPL he she has the rights to run copy distribute study change and improve the software 16 On the other hand if someone decides to use the GPL to cover the designed programs when publishing them he she grants these rights to the future users and customers To make sure the freedom is guaranteed for all GPL users and all potential users in the future the license designer also defined obligations that the GPL users must follow while enjoy the freedom 14 CHAPTER 2 There are 4 main obligations can be summarized among the long text of the GPL 1 Keep using the GPL forever 2 Provide source codes in any case 3 Publish the modified parts 4 Make the GPL cover the entire new project First and foremost the GPL will keep covering the source codes forever and there is no way to remove or bypass it since it is applied If people decide to edit GPLed source codes and redistribute them the GPL will be the only choice as the license for the revised work The people cannot use any other alternative license to cover the new work instead of the GPL because This License gives no permission to license the work in any other way GPL section 5 There are many licenses that have no conflicts with the GPL in principle like the BSD license This is called GPL compatible People can create a new project by combining 2 separate works that the one is c
149. ization ram0 mif File Generation only the vhd file would be enough Table 6 1 Parameters of 1 port RAM IP core Please note that we were using Quartus 8 0 spl web edition which identifies the version of the on chip RAM IP core The more interesting part is how to decide the values of the parameters In the later sections we are going to discuss several of them 6 1 3 Interface Logic to the WISHBONE bus Due to the signals from the ALTERA 1 port RAM IP core are not exactly the same as the WISHBONE signals some interface conversion logics are needed The file ram0_top vhd describes the hardware design of this part Figure 6 2 gives an overview of the structure The interface logic is more like a wrapper to the 1 port RAM core The internal blue block is the 1 port RAM generated by the Quartus The grey part behaves as a wrapper that converting the signals from the 1 port RAM to the standard WISHBONE signals 102 CHAPTER 6 clk On chip RAM Module stb Acknowledge ack Logic cyc enable rst we sel 3 0 addr 14 2 data_o 31 0 addr 31 0 mm amp I l l data_i 31 0 mms Figure 6 2 On chip RAM module internal structure One of the WISHBONE signals is the acknowledgement however the 1 port RAM does not have As you may have noticed this signal is generated by a separated logic the big white block other than the 1 port RAM core So when receiving a RAM
150. k side This coordinates the data width mismatching For example accessing to the addresses 0x0 0x1 0x2 and 0x3 by the CPU are all going to the same physical address 0x0 in the memory block The side effect of grouping 4 bytes into 1 32 bit memory block entry is that whichever the byte of the 4 continuous bytes that the CPU is trying to access the memory block always returns the same 32 bit data To identify which byte or bytes is wanted among the 4 the SEL signal has to be used According to the definition in the WISHBONE specification 3 the Ac tive Select Line SEL signal indicates where valid data is expected on the DATA_I signal during the read cycles and where it is placed on the DATA_O signal during the write cycles The SEL signal is always transmitting from the WISHBONE master to the slave i e from the OpenRISC CPU to the memory block The width of the SEL signal equals to the width of the data bus divided by the data granularity So it is 32 8 4 in our case As its name the SEL signal selects the valid portion of the data signals Each sin gle line of the SEL signal matches one byte on the data bus If 1 of the 4 SEL bits is high the corresponding byte on the data bus is valid and will be accepted and processed The other bytes are ignored regardless the values Table 6 2 also shows that how the bytes on the data bus are matched with CHAPTER 6 105 the SEL lines at big and little endian respectively Figure 6 3
151. le it will ruin the data communication 3 4 Now start the music player that we just downloaded through the boot loader It is a similar command but with other parameters proloader_client exe d dev com5 r The d specifies the RS232 port The r here tells the bootloader to jump to the entry point of the external SSRAM So the program stored there starts working and the job of the bootloader is done If the program starts without any problem you will see the LAN is connected showed in Figure 9 and 10 3 5 Now I want to spend some texts to explain the reset switch and the bootloader In the project the Switch 17 is used to reset the system When a reset is needed push the Switch 17 firstly down and then up Because the default starting address is pointed to the on chip RAM every time it is the bootloader that starts after a reset To reset the software application run the command we did in step 3 4 again to let the bootloader jump to the external RAM If the power of the DE2 70 CHAPTER B 149 3 6 3 7 cygdrive c olivercamel myProject orpXL_release_20081116 software Build of x 1084B57 100453CC 10045394 100453CC 180453CC 10048570 100453CC 10045394 100453CC 180453CC 1004B580 100453CC 180453CC 180453CC 180453CC 10048588 190453CC 108453CC 108453CC 188453CC 1084B59 100453CC 100453CC 100453CC 10045394 10048590 100453CC 180453CC 100453CC 10045394 1004BSAD 10045394 10045394 1
152. lowing the OR1K architecture So when talking about a processor OR1200 should be used as the exact name But because currently the OR1200 is the only active implementation of the ORIK family and the OR1K is the only architecture under the OpenRISC project sometimes the name Open RISC is misused with the OR1200 to both refer to the processor There is 47 48 CHAPTER 4 another term called OpenRISC Reference Platform ORP It stands for the computing platforms with the OR1200 centered as the processor Our thesis project was actually working out an ORP The OpenRISC has a long history since the project was created in Septem ber 2001 2 It was initiated by Damjan Lampret who also founded the opencores org a little earlier in October 1999 3 It is not hard to imagine that promoting and supporting the OpenRISC was a big reason to give birth to the opencores org As the OpenRISC is an open source project later on many other people were involved to give their contributions These names are listed in the Past Contributors and Project Maintainers 2 From 2005 Damjan started his own company and gradually became not so active in the OpenRISC community In November 2007 a Swedish company ORSoC 4 took over the maintenance of the opencores org as well as the OpenRISC project until today It is good to have a strong power from a commercial company to push the project In a posted message Marcus Erlandsson the CTO of the ORSoC menti
153. ly generates an interrupt The context switching is done in the ISR For more information about the l sys see also Section 4 4 The uC OS II requires a timer of the CPU to provide interrupts also called ticks with fixed time interval When a timer interrupt comes the uC OS II breaks the running task and checks whether or not another higher priority task becomes ready If yes the uC OS II performs a context switch in the timer ISR So it is always the task which is in the running or ready state and has the highest priority that gets the CPU after any timer ISR This is why the uC OS II is called a preemptive kernel 4 7 3 Context Switching in OR1200 Timer Interrupt No matter how a context switch is initiated it is always done in a similar way at the ISR level Here we will only describe the context switching happened in the OR1200 timer ISR Generally the context switching is done in the following steps 1 The OR1200 jumps to 0x500 when a timer interrupt comes 2 Backup the context of the current task including 3 OR1200 SPRs EPCR EEAR ESR and all GPRs 3 Store the SP of the current task 4 Call function OSTickISR 5 If a higher priority task is ready call OSIntCtxSw to perform a context switch Otherwise resume the context of the current task As mentioned in Section 4 4 when a timer interrupt comes the OR1200 CPU jumps to the interrupt vector address 0x500 and executes the interrupt handler program from there Th
154. mber 2008 3 Specification Specification for the WISHBONE System on Chip SoC Interconnection Architecture for Portable IP Cores OpenCores Organization Revision B 3 Released September 7 2002 4 User Manual OpenRISC 1000 Architecture Manual OpenCores Orga nization Revision 1 3 April 5 2006 5 Webpage Memory Controller IP Core OpenCores Organization http www opencores org project mem_ctrl Last visit 2011 01 31 6 Rudolf Usselmann Memory Controller IP Core Rev 1 7 January 21 2002 7 Website ASICS ws http asics ws Last visit 2011 01 31 This is the website of a specialized FPGA ASIC design team They produced many free IP cores with very good quality and documents 8 Datasheet 256K x 72 512K x 36 1024K x 18 18Mb Synchronous Pipelined Single Cycle Deselect Static RAM Rev I Integrated Silicon Solution Inc ISSI October 14 2008 9 Datasheet 32Meg x 8 16Meg x16 256 MBit Synchronous DRAM Rev D Integrated Silicon Solution Inc ISSI July 28 2008 10 User manual Nios II Processor Reference Handbook ALTERA Corporation November 2009 Available at http www altera com literature hb nios2 n2cpu_niibdvi pdf Last visit 2011 01 31 128 CHAPTER 6 11 Webpage UART 16550 Core OpenCores Organization http www opencores org project uart16550 Last visit 2011 01 31 12 Webpage PC16550D Universal Asynchronous Receiver Transmitter with FIFO s National Semicond
155. named A Company that is going to sell a newly developed brilliant product Since impressed so much by the free software movement we decide to use the GPL to cover our source codes to make the CHAPTER 2 21 product free software By selling each copy of the software we will receive 1000 dollars That sounds perfect Everything is fine in the first several days because the product is quite excellent and has a good market until another B Company appears The B Company was our A Company s customer at the beginning so they got a copy of the source codes together with the product However soon they start selling a similar product easily developed based on our source codes which costs only 500 dollars This is immoral however surprisingly we cannot prevent them according to the GPL This is because 1 the GPL asks us to provide the source codes when selling products 2 the licensees are allowed to redistribute convey the source codes under the GPL 3 You may charge any price or no price for each copy that you convey GPL section 4 All of above show that the B Company s behavior is legal according to the GPL Now assuming you are the next customer coming to buy the product there are 2 choices between A and B even if you know that B is the immoral one what do you do The story above includes only 2 competitors A and B Actually if a 500 dollars price is still high enough for a proper profit more companies will continue joinin
156. nd hold all other signals unchanged for one more clock cycle ELSE the ACK 1 IF the current CTI is not an EOB set the STB to 1 meanwhile IF it is a write burst output the next data to the slaves ELSE if it is a read burst latch the current data and output the next address ELSE if the CTI EOB set the STB to 0 meanwhile 86 CHAPTER 5 IF it is a write burst de assert all signals to finish the burst ELSE if it is a read burst latch the last data and de assert all signals to finish the burst IF no wait state is needed then that s it Go to the next clock rising edge ELSE if the masters currently are unable to handle more data a wait state is inserted by resetting the STB to 0 All other signals can output as X However the masters should still go through the previous IF ELSE block and somehow remember what the output signals should be In case of read bursts and the ACK is 1 the masters should latch the current data before they turn into the wait states When the masters come back from the wait states they should resume all signals remembered before they fell into the wait states Note that at the edges when masters return from wait states the only thing they do is to resume the remembered signals The first IF ELSE block is skipped at that clock edge For slaves At every clock rising edge check the value of the STB IF the STB 0 IF no burst is started
157. ned by HDL and targeted to FPGA ASIC This is because all open cores used in the thesis project are of this type 2 1 2 Formal Definition of Open Source When coming to the other concept open source of the open core most people simply just take it literally as you can look at or get a copy of source code However this is a common misunderstanding Open Source Initiative OSI 4 an official organization of open source community has a formal definition to the term open source which can be found on the Internet at the link 5 It is too long to cite the full texts here If haven t read the concept before you will find the open source is a more complicated concept than ever imagine It does not only refer to the access to source codes but also defines lots of criteria needed to follow The actual effects of those criteria might be important to be aware of before publish products as open source or utilize something in any form of open source which naturally includes open cores Because the OSI definition of open source is too long personally in the thesis I would like to simplify the definition to the source codes that are covered by certain open source licenses such that we can concentrate on studying CHAPTER 2 9 the open source licenses only A list of approved open source licenses can be found in OSI s website 6 Different licenses in the list may have different rules and regulations but if t
158. ns other signals like the tag signal TGC or the LOCK which should be involved in block transactions The TGC could be used to identify which type of the transactions is ongoing So the slave knows if the current transaction is a single or a block transaction However this is not necessary in fact Because if a slave can process read write operations correctly it does not have to recognize what the current transaction is And in the systems without preemption the LOCK is useless too 5 2 3 3 Read Modify Write RMW Transaction The WISHBONE specification also defines a kind of transactions named Read Modify Write RMW It is said the RMW transactions are used for indivisible semaphore operations 2 Far from the complicated name the RMW transactions are fairly simple In fact a RMW can be seen as a block transaction with 2 different operations The first one is a read operation while the other is a write operation So by the RMW transactions we can easily read data modify it and then write back to the same address The RMW waveform is showed in Figure 5 8 YOO YX FFFFFFF4 AR XXXXXXXX XAXXXXAX KXXXXXXX Figure 5 8 An example of a WISHBONE RMW transaction In my opinion the RMW and the block transactions can be merged together The block transactions are essentially a batch of bus operations which have to be either all reads or all writes The RMW transactions contain 2 bus operations th
159. o the memory alignment which could help to make better software One tip is that using 32 bit type variables in OpenRISC OR1200 systems is the most efficient in performance Bus transactions that accessing to char 8 bit short 16 bit and int 32 bit type variables take the same time to complete When accessing 8 bit data 75 bandwidth of the 32 bit bus are not utilized which is a big waste So if several adjacent bytes are called frequently in a program like a part of an array it is better to define an integer pointer to access them all together Besides accessing to a 64 bit long type takes twice time comparing to an int type So if possible using int types instead of long helps to shorten the program executing time 108 CHAPTER 6 Another tip helps to save memory space when defining structures In case there is a struct variable in the source code the linker will allocate memory for each member of the structure from top to bottom but the memory alignment rule still has to be followed at that time So when it has to the linker will skip certain bytes of memories and leave them unused This is called memory padding Carefully defined orders of the members in the structures help to eliminate the memory padding spaces and achieves the maximum memory utilization struct padding int b cher o EA O short d padding struct no_padding wa eper e int b Figure 6 5 Example of memory padding Figure 6 5 gives
160. ollowing 1 Evaluate quality difficulty of use and the feasibility of open source IP 2 Take the design through synthesis and place amp route in order to eval uate the highest possible clock frequency that can be achieved when running a system in a low cost FPGA This phase should also include FPGA utilization LUTs required Apart from these two metrics the students should identify other metrics that can be used for perfor mance evaluation and also define and measure a quality metric 3 Investigate license issues and their impact on commercial use of open source IP 4 Test and run the system on an evaluation platform using embedded Linux 135 136 CHAPTER A We expect the thesis to be carried out by two students at our office in Malmo We further expect you to be an SoC major As a master thesis student at ENEA you will be offered a great deal of flexibility and freedom We expect you to be self motivated and able to work independently A 2 Further information For further information regarding this thesis project please contact Johan J rgensen Appendix B A Step by Step Instruction to Repeat the Thesis Project This is a step by step instruction shows how to repeat the thesis project For the people who are interested it should be easier to reproduce the same MP3 player as we did by following this instruction B 1 Hardware Software Developing Environment Below is a list of hardware devices and software
161. om the addresses that do not end up with 0x0 0x4 0x8 or OxC For example the following lines are illegal unsigned int i unsigned int OxXXXXXXX1 illegal unsigned int i unsigned int OxXXXXXXX2 illegal unsigned int i unsigned int OxXXXXXXX3 illegal Those lines are illegal because the 4 bytes trying to access are not stored in the same physical location of the memory block Figure 6 4 below shows the case In the figure each line is a physical address that contains 32 bit data The memory block can only read or write a whole line with one operation So for those address mod 4 0 they can be accessed at a time If trying to access 32 bit data stored for example at address 0x2 the 4 bytes of the data i e 0x2 0x5 will cross the line So they cannot be done in 1 operation but 2 instead Access 32 bit data xC at address OxC is legal OxO 0x03 0x02 0x01 0x00 Access 32 bit data EE rr address data because the 4 bytes 02 05 across the line Figure 6 4 Legal and illegal memory accesses CHAPTER 6 107 To avoid illegal accesses the OpenRISC CPU makes the rules for the data storage which is called the Memory Model or the Memory Alignment Ac cording to the OpenRISC Architectural Manual 4 Section 7 1 also men tioned in 3 2 2 and 16 1 1 Memory is byte address with halfword access aligned on 2 byte boundaries signleword accesses aligned on 4 byte boundaries and doubleword accesses aligned on 8 by
162. oned Our mission goal is to make the Open RISC a worldclass open source 32 bit RISC processor aimed both for com mercial companies as well as for non commercial products 5 So it looks the OpenRISC will have a promising future There are several companies which providing OpenRISC related products or services Beyond Semiconductor 6 the company started by Damjan Lam pret provides enhanced commercial versions of the OpenRISC which are renamed as the BA processor family ORSoC 4 has developed OpenRISC development kit which includes a FPGA CPU board an I O board and a debugger etc Because the ORSoC is currently the maintainer of the opencores org it is possible that the kit will be promoted as the standard OpenRISC development platform Another company called Embecosm 7 mainly focuses on software development They have worked a lot on porting the GNU tool chain and the OpenRISC simulator And there is a Korean company Dynalith Systems 8 which has made a complete SW HW envi ronment called OpenIDEA About the OpenRISC documents and tutorials honestly only few are useful for the beginners In hardware the OpenRISC 1000 Architecture Manual 9 is always the standard reference Also the recently updated the OpenRISC 1200 IP Core Specification 10 and the OpenRISC 1200 Supplementary The price for the FPGA board plus a debugger but exclude the I O board cost me about 260 EUR Strangely the ORSoC chose an Actel FPGA fo
163. ontroller IP Core As described in the previous section the FPGA on chip RAM gives the best performance comparing to the other types of memories but it usually has very limited storage capacity To satisfy the need of memories external RAMs have to be used There are a 2MB SSRAM chip and a 64MB SDRAM chip on the DE2 70 FPGA board They are directly wired to the FPGA chip in hardware With the help of the Memory Controller IP core we easily managed to utilize these external RAMs for the system The Memory Controller IP core works like a bridge between the OpenRISC CPU and the external RAM ICs It has a WISHBONE interface So it can be simply connected to the WISHBONE network or directly to the Open RISC CPU It also has an external memory interface to drive the memory ICs When the CPU or other WISHBONE masters are trying to access the external memories the WISHBONE bus transactions will be sent to the Memory Controller where the read write requests are translated to the ex pected logic signals with correct timing according to the type of the external memory chips and the Memory Controller configurations FPGA Memory Controller IP Core na PR External OpenRISC WISHBONE p WISHBONE Fana gt Memory Processor td Network qmp Interface Interface Device XAS Figure 6 6 Memory Controller in the OpenRISC system 110 CHAPTER 6 The texts of this section intend not to analyze the internal architecture
164. or copy an exact Cygwin environment to another PC To fix this trouble I spent some time looking for help and now there is a 140 CHAPTER B solution The file installed db under the folder cygwin etc setup is actually a list of the names of the packages have been installed If we can backup the installed db and store it in the same path when running the setup exe the system will display those packages as installed Then we can simply choose to reinstall all these packages which will create a Cygwin environment exactly as the installed db file specified See this webpage for more information 4 My installed db is included in the thesis archive B 1 7 OpenRISC Toolchain The OpenRISC Toolchain converts high level programming language like C into binary instructions for the OpenRISC processor When I was a beginner the most difficult part of the thesis project was to get a working toolchain Most commercial CPUs for example ALTERA s NIOS have already provided an IDE including everything Just by several clicks all the compiling downloading and debugging stuff are done But in the OpenRISC world without IDEs we will have to experience all these difficulties in person The OpenRISC toolchain is modified based on the GNU toolchain which is free software under the GPL The OpenRISC toolchain developers well performed their duties as the GPL asked The source codes on the SVN of the opencores org can be downloaded fre
165. ource codes i e it is a must to show ALL source codes which generating the final products For example if someone started with a GPLed work and spent a lot of time to improve it when selling improved version he she cannot just provide the original source CHAPTER 2 15 codes without the details of the modifications According to the GPL no secret is allowed to hide All details of the modifications and improvements have to be public although not everyone might be happy to do so Most disagreements and arguments come from the last obligation that the GPL must cover the entire project which already caused many criticisms 22 23 In section 0 the GPL defines the term modify It says To modify a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission other than the making of an exact copy And when Conveying Modified Source Versions in section 5 term c the GPL makes the rules You must license the entire work as a whole under this License to anyone who comes into possession of a copy This License will therefore apply to the whole of the work and all its parts regardless of how they are packaged This License gives no permission of license the work in any other way These clauses make the GPL spread to the parts which previously could have not belonged to a GPL covered work And this is why some people called the GPL infectious When we are going
166. overed by the BSD license and the other by the GPL without any problem However only the GPL can be used to cover the new project not the BSD license So once the GPL is there it will be always there It is required to keep using the license forever for all improved versions in the future The 2nd obligation well expresses the thought of the GPL as an open source license the source codes have to be provided in any case After modify a GPLed work and when redistributing it if the final product is in the form of the source codes it is easy to understand that the codes should be open But even if the final product is in a non source form according the section 6 of the GPL the source codes that generating the final product still have to be provided with the products For example if a company produces and sells GPLed software in non source form like binary or executable files normally it has to provide either an extra CD including the source codes or a web server which allows the users to download those files freely This obligation makes the GPL stronger than many other open source licenses like the BSD license which do not ask for a copy of the source codes When coming to hardware world for the open cores this means at the same time silicon chips are sold certain HDL source codes and or design details have to be public The 3rd obligation makes the GPL even stronger It forces its users to publish also the modified parts when providing the s
167. pen software will be finally compiled into binary or executable format which is still software however the HDL codes of open cores will be synthesized by EDA tools and transformed to hardware at the end Like Figure 2 2 shows in a hardware system every open core will be connected together via a bus or other interconnection structures As a result the definition of aggregate is no longer satisfied in this case and the GPL will be applied to the whole system Commercial IP Core Software Aggregation Commercial IP Core pu o ee Computer Silicon Chip a Software aren t linked in computers b IP cores link with others in chips Figure 2 2 Difference between open software and open core So to speak GPLed open cores are not suggested to use because it will force other parts of system to become open source which is generally the case that CHAPTER 2 19 companies would not like to see There are indeed some really good open cores under the GPL like the LEON processor with SPARC architecture 25 but unless they are used solely to form a system i e to make a silicon chip includes only one open core so that no other cores will be infected open cores covered by the LGPL will be always a better choice than the GPL To sum up this section both the GPL and the LGPL are very strict open source licenses They grant users the freedom to use source codes but mean while stipulate obligations must to follow Gen
168. plementation of the protocol i e the CONMAX IP core The bus protocol organizes the whole system It is so important that we will use a separate chapter for it In Chapter 6 the memory blocks and the peripherals of the system are introduced including Memory Controller IP core UART 16550 IP core GPIO IP core and the interfaces for WM8731 and DM9000A etc CHAPTER 1 Chapter 7 finally concludes the thesis and also provides a list of todo which can be interesting topics to research in the future At the end 2 appendixes are attached Appendix A is the copy of the thesis announcement made by Johan J rgensen Appendix B is a step by step instruction of how to reproduce the project on a DE2 70 board with the downloaded files from 7 References 1 Website ENEA http www enea com Last visit 2011 01 31 Enea is a global software and services company focused on solutions for communication driven products Website OpenCores org http www opencores org Last visit 2011 01 31 OpenCores is a community that enable engineers to develop open source hardware with a similar ethos to the free software movement Webpage Advertising at OpenCores from OpenCores org http opencores org opencores advertise Last visit 2011 01 31 Website Terasic Technologies http www terasic com tw Last visit 2011 01 31 Webpage Altera DE2 70 Board from Terasic Technologies http www terasic com tw cgi bin page arc
169. prove performance with Harvard architecture CHAPTER 4 51 In the system of Figure 4 2 a the CPU can access the instructions and the data stored in 2 memory devices in parallel But in Figure 4 2 b the instruction port and the data port of the OR1200 have to share the only memory device i e only 1 of them is allowed to access the memory at a time Furthermore due to the CPU usually needs to access the instructions and the data in an alternate way the WISHBONE network i e the CONMAX IP core has to arbitrate for the ports to decide who can take the grant of the WISHBONE bus This is another negative factor that influences the system performance So the CPU performance is largely limited in Figure 4 2 b than a The OR1200 uses the WISHBONE as the bus standard for both the instruc tion and the data buses The WISHBONE bus as well as the CONMAX IP core will be discussed in Chapter 5 As mentioned before all OR1200 HDL source files can be downloaded from the opencores org website 2 In most cases there is no need for the users to modify the files except for the or1200_defines v In the or1200_defines v the users can easily make configurations for the OR1200 implementation for example to enable or disable functional blocks or to select target FPGA memory types etc It is needed to go through this file before compiling the OR1200 FPGA project Of all 8 functional blocks showed in Figure 4 1 we disabled the instruc tion d
170. r everyone to modify the IP core integrate it to another project and redistribute the project as long as the header information is kept in all source files related to the Memory Controller The restriction of the license is really nothing if comparing to what we gain from the core For more information about the BSD license please refer to Chapter 2 The Memory Controller is another IP core used in this project that produced by Rudolf Usselmann and the ASICS ws 7 Like many other IP cores com ing from the ASICS ws for example the CONMAX IP Core the Memory Controller also gave us very good impression because of its good quality use See Figure 27 at page 43 of the Memory Controller IP core user manual 6 or sell in another word CHAPTER 6 111 ful document and a lot more We hereby acknowledge to the author again for this great contribution Below there are several summarized highlights of the Memory Controller IP core They may become the reasons convincing you to use the core in the next project e Improve the productivity The top reason to use an IP core is always to achieve the design reuse and therefore accelerate the development of new projects But some times making a decision to use an IP core can be tricky Because if there is a lack of documents or the IP core is full of troubles to work it can cost much more time than expected to debug it However this is not the case for the Memory Controller The b
171. r the board but not using products from Xilinx or ALTERA 2OpenIDEA looks very attractive from the Dynalith s website because it is highly integrated and might be the easiest way to start implementing some real OpenRISC based projects But the price was about 800 USD when we asked That was too high for a thesis project I think CHAPTER 4 49 Programmer s Reference Manual 11 could be useful In software people can just follow the instructions on the OpenRISC official toolchain webpage 12 to get the OpenRISC compiler and debugging toolset When trying to build the toolchain manually an application note 13 from Jeremy Bennett is recommended as a reference So far we had introduced the OpenRISC processor from many aspects In the following sections the OR1200 architecture will be firstly described and then several topics related to the OR1200 including the registers the exceptions the Tick Timer TT and the Programmable Interrupt Controller PIC After that a section is reserved for the details of porting the uC OS II Real Time Operating System RTOS to the OR1200 4 2 0OR1200 Features and Architecture To give an overview of the CPU performance some of the OR1200 features 10 are listed below e 32 bit RISC processor e Harvard architecture e 5 stage pipeline e Cache and MMU supported e WISHBONE bus interface e 300 Dhrystone 2 1 MIPS at 300MHz using 0 18u process e Target medium and high performance network
172. rding to the WISHBONE specification an interface should respond only when the logic AND of the CYC and the STB is 1 So the delay of the CYC sacrifices the performance of the whole system to achieve a complicated scheme of arbitration 3 The CONMAX broadcasts bus transactions to idle slave ports except for the CYC and the STB signals As in the example s15_addr o is the same as s0_addr_o although no one is accessing 0x8 and OxC in the s15 The designers have to be very careful to deal with the broadcasting In some cases even the STB may be delivered incorrectly for about 1 to 2 cycles because the arbiter is judging the CYC and hasn t made the decision yet The only safe way is always to strictly check the CYC and the STB at every clock rising edge and don t respond to the bus transaction unless the logic AND of the 2 signals is 1 5 3 6 Arbitration In the CONMAX specification the only thing not so clear is about the arbitration For example it says if all priorities are equal the arbiters work in a round robin way But how the round robin is performed After studying the source codes this section tries to summarize the rules of the CONMAX arbitration 1 The CONMAX is reset by the RST signal of the WISHBONE All RST signals in a WISHBONE network are normally connected together If any RST input becomes 1 the CONMAX will reset 94 CHAPTER 5 2 After resetting all registers of the Register File of th
173. re everyone can take the codes modify improve and redistribute them The BSD license requires nothing on what license to use for the modified versions So it is totally possible to use another commercial license over the BSD license This is very good news for vendors because they can take a BSD licensed IP core integrate it into a new product and then sell it as the way they want with their own commercial licenses For the open cores perhaps the most attractive part of the BSD license is that it doesn t ask you to publish the modified source codes when redis tribute or sell the products in a non source form On the contrary the GNU licenses do Thinking about the open cores which are in the form of the HDL source codes after the products are finally built and ready to sell the open cores will not be in the HDL codes any more but become silicon chips Because the BSD license doesn t force to provide source codes the companies can safely make modifications to the BSD licensed open cores combine them together with their own patents meanwhile happily selling the new chips without telling their competitors the design details To conclude the BSD license it is a quite loosely restricted open source li cense Modified source codes that originally covered by the BSD license even do not have to be open source any more This makes the license completely compatible for commercial purposes Companies are welcome to utilize open cores under
174. read write request we always give a 1 cycle ACK after a certain time i e the ACK signal does not rely on the 1 port RAM core outputs This seems not reasonable because we are acknowledging whatever the outputs of the 1 port RAM core are But due to the 1 port RAM actually never goes wrong and has a fixed delay we can assume that the outputs will always be ready at a certain point and therefore set the ACK signal at that moment Also note that each ACK should be set high for only 1 clock cycle long which we ve mentioned in the WISHBONE chapter Otherwise the CPU will be confused because it considers multiple ACKs returned from the RAM core 6 1 4 Data Organization and Address Line Connection If take close look to the source codes in the ram0_top vhd you may feel curious about the line ram_address lt wb_addr_i 14 downto 2 It means only the WISHBONE address inputs 14 2 of the 32 WISHBONE address lines are connected to the 1 port RAM Core address lines 12 0 CHAPTER 6 103 This is also illustrated in Figure 6 2 The way of the address connections was not arbitrarily decided On the contrary it is clearly specified in the WISHBONE bus standard Table 6 2 below is copied from the WISHBONE specification 3 section 3 5 page 66 It tells how to organize data in the systems with 32 bit bus width and 8 bit granularity Note that this is a WISHBONE RULE i e all WISHBONE implementations must obey 32 bit Data Bus Wit
175. resources available The users have to implement the tri state buffers for the Memory Controller on their own In the thesis project we made 2 files of tri state buffers for SSRAM and SDRAM respectively The files are stored in the folder hard ware components memif 6 2 2 4 Miscellaneous SDRAM Configurations In the mc_defines v there are 2 hardware configurations for SDRAM only the refresh cycles and the power on operation delay Those parameters can be found in the SDRAM datasheet 6 2 2 5 Use Same Type of Devices on One Memory Controller One important experience we learnt from the thesis project is to always attach the same type of memory devices to the same Memory Controller So for example if a Memory Controller has designed to support an external SSRAM chip do not put on any SDRAM or FLASH memory device to this controller In fact we tried to put the SSRAM and the SDRAM on the same Memory Controller but with different CS signals In this way we can better utilize the 8 CS signals and also save the FPGA resources because only 1 Mem ory Controller is needed for both SSRAM SDRAM But the trial was not successful The SSRAM became slower together with the SDRAM because the Memory Controller is always refreshing the SDRAM During the time the accesses to the SSRAM are forbidden Also sometimes there were wrong data read back At the end we decided to separate the SSRAM and the SDRAM with 2 different Memory Controllers
176. rposes is also an attractive topic 3 The fast system prototyping via FPGA development board Compare to the traditional ASIC design flow which takes long time and is also more costly risky the fast system prototyping on a FPGA based development board has proven a good methodology to accelerate the functional verification Many vendors have designed a large number of FPGA boards for this purpose with high performance One example is the Terasic s DE2 70 board 4 5 that we used for the thesis project The DE2 70 is equipped with a high density ALTERA Cyclone II FPGA large volume RAM ROM components and plenty of periph erals including Audio devices and Ethernet interface It provided us CHAPTER 1 3 an ideal hardware platform to try out open cores and allowed us to quickly build up an open core based digital system with demonstra tions Because of the 3 factors described above it becomes feasible to execute the project which implements an open core based computing system on a DE2 70 FPGA board which is low cost but powerful and versatile If such a system can be made it would be interesting both for commercial and academic purposes With the platform many other possibilities can be further extended As mentioned before the idea was initially proposed by Johan Jorgensen from ENEA and who is the industrial supervisor of us The thesis is also coached by Ingo Sander our supervisor from K T H The thesis was performed by Xi
177. rst and when it is a write If it is a write the slaves don t do anything special than just latch the written data at every rising edge However if it is read the slaves need to pre calculate the next address based on the current address and the value of the BTE And then use the calculated address to access the next data for the masters 4 The WISHBONE specification does not mention that the slaves have the ability to predict the next address automatically But this is true I found a message from the forum of the opencores org which said so And in this way all waveforms in the specification are well explained 5 The masters assert one clock cycle of the STB is saying that there is more data to read or write The slaves assert one clock cycle of the ACK implies the last operation has been taken care of and they are ready to handle the next read or write 6 Both the masters and the slaves are allowed to break the current burst i e to insert wait states WSM or WSS at any time during the burst This will be described later A general algorithm is made for the WISHBONE burst transactions which lists everything in detail that the masters and the slaves should do at every clock rising edge to perform bursts For masters Firstly at a clock rising edge initiate a burst by asserting the STB CTI BTE and other signals After that at every clock rising edge check the value of the ACK IF the ACK 0 set the STB to 1 a
178. s Pipelined Memory Access Yes Yes Branch Prediction Static Dynamic Tightly Coupled Memory Optional Optional Table 6 5 NIOS II processor comparison part To conclude this section based on the tables and the analysis above we can make a rational assumption it is likely that our open cores system will have a dramatic performance improvement after enabling the WISHBONE burst transaction between the OpenRISC CPU and the Memory Controller 6 3 UART16550 IP Core The UART16550 IP core is another open source IP core coming from the opencores org The source codes can be found at the link 11 Jacob Gordan is the author of the IP core The UART16550 IP core gets its name because it is designed to be maximally compatible with the industrial standard National Semiconductors 16550A device More information about the 16550 can be found at the National s website 12 13 Note that at this moment the 16550D is the latest version but not A any more The UART16550 IP core is not fully identical with the National s 16550 For example its FIFOs cannot be disabled But in fact most people like us do not care about the difference between the UART16550 and National s 16550 because they are more interested in the UART part rather than the 16550 part The Universal Asynchronous Receiver Transmitter UART is a piece of CHAPTER 6 121 hardware that helping to create a serial connection for data exchang
179. s Free Software Foundation FSF that takes the role The FSF is a corporation founded to support the free software movement as well as to be a sponsor to the GNU project So don t be con fused when seeing Copyright c lt year gt Free Software Foundation Inc in the GNU licenses Both the GNU project and the FSF were started by Richard Stallman a legend because of his contributions for the free software movement The 10 CHAPTER 2 thesis won t go further on his story but lists some references instead 7 14 2 1 5 Free Software Free of Charge The free software movement mentioned above is a social movement aiming to prompt people s freedom on accessing and improving the source codes of the software If a software truly assures the freedom for the users it can be called free software Free software is a concept close to the open source but highlights more on the rights of the freedom 15 Similarly it is also often misunderstood literally as the software that is free of charge We will come back again to this concept in the later section for now please just remember the official explanation to the concept here Free software is a matter of liberty not price To understand the concept you should think the free as in free speech not as in free beer 16 The explanation is meaningful It implies people can sell free software with a price in case of the freedom is guaranteed as the
180. same time It is allowed to make profits by utilizing free software so as to the open cores So far all related concepts have been introduced From the next section we will start discussing about the open source licenses and then make analyses for the open cores in a commercial perspective 2 2 BSD License The Berkeley Software Distribution BSD license gets its name because it was first designed to cover a Unix like operating system developed by University of California Berkeley 17 Later it was revised by removing a clause which was too impracticable and limited the license to be widely accepted This story can be found in 17 18 Now the BSD license is also called the new BSD license The BSD license is the 3rd most popular open source license according to the article 19 which also mentions that the first two ahead of it are the GPL and the LGPL and those two account for almost 80 of all open source licenses in use The article doesn t tell how the statistics were made but it shows the truth that the 3 licenses are quite widely used in the open source world Or modified or simplified or 3 clause BSD license CHAPTER 2 11 The BSD license is popular probably because it has very few restrictions both for authors and especially for users The full text of the BSD license can be found at 20 Let s take a look at the 3 clauses of the license Redistribution and use in source and binary forms with or
181. slaves All other signals are ignored in this example The scenario demonstrated in the example is fairly simple Firstly master 0 accesses slave O and gets the grant from the CONMAX Later master 1 wants to access the slave 0 too but is blocked because the grant is still taken by the master 0 After a while the master 0 turns to slave 15 so the grant of the slave 0 is released to the master 1 at that time There are several things needed to notice in the example CHAPTER 5 93 1 The CONMAX uses the CYC for its arbiters to determine which mas ters should be given grants at the moment But to activate the arbiter at least both the CYC and the STB have to be 1 at the beginning In Figure 5 15 the CYC of the master 0 becomes 1 quite early several cycles before the STB But the sO_cyc_o is output after a long time This is because the arbiter is not activated until both the CYC and the STB are 1 However between the 2 accesses to the address 0x8 and OxC there is a short gap on the STB But the master 0 keeps holding the grant although the master 1 is ready at that time This is because the arbiter is already activated which judges only the CYC to make decisions 2 The CONMAX uses finite state machine FSM to judge the CYC inputs to make complicated arbitration i e round robin This results in the CYC is usually delayed for several cycles than the other signals This is showed in the figure As mentioned before acco
182. sponsibility to implement such signals and pro tocols To be WISHBONE compatible the cores have to include specialized logic to provide interface signals as well as to be able to send and recognize bus transactions correctly Due to the implementation of the WISHBONE interface logic is tightly coupled with IP functions that may vary from one core to another there isn t a universal solution of how to design a WISH BONE interface So this chapter will not introduce the details of the interface implementation However all IP cores used in the thesis project are with WISHBONE interfaces Their source codes can be taken as examples to study the interface logic When more than one WISHBONE compliant IP core is used to form a larger system a WISHBONE network is constructed In the Appendix A 2 of the specification page 96 99 2 4 types of the interconnections are introduced They are the most common ways to compose a WISHBONE network But of course there are more solutions than the four Users can design inter connections with new structures as long as the solutions guarantee all bus transactions are transferred correctly and efficiently in the interconnection The way to organize a network is still a good topic of the WISHBONE to research There are many IP cores already built to help constructing the WISHBONE networks In our thesis project we didn t spend much time on designing an interconnection Instead we used the CONMAX IP core which
183. standard 5 Website ASICS ws http asics ws Last visit 2011 01 31 This is the website of a specialized FPGA ASIC design team They produced many free IP cores with very good quality and documents 96 CHAPTER 5 6 Rudolf Usselmann OpenCores SoC Bus Review Rev 1 0 January 9 2001 This article compares several bus standard It is a little out of date but worth to take a look Chapter 6 Memory Blocks and Peripherals In the previous chapters the 2 important modules of the hardware system the OpenRISC processor and the WISHBONE interconnection IP core have been introduced separately Now this chapter continues to finish all other hardware modules of the platform They are the memory blocks and periph erals First let s take a review of the hardware system architecture which has been showed before in Chapter 3 on chip RAM 7 interface on chip RAM Memory 51 4 Controller SSRAM Memory s2 Controller Me SDRAM mo L OpenRISC DM 000A OR1200 el interface s LEA mi s13 Wme731 WM8731 Interface ca vr gt GPIO ES Figure 6 1 Hardware platform architecture i 97 98 CHAPTER 6 As we can see the CONMAX IP core constructs the WISHBONE network for the whole system All other blocks are connected to the WISHBONE network through the CONMAX The CONMAX has 8 master interfaces and 16 slave interfaces 2 of the master interfaces are taken by the OpenRISC processor on the left
184. standard which do not have to pay right now but ARM never promise it will be free forever So it is possible that after next upgrading you may have to pay for the license of each copy of the products which apply the AMBA standard The public WISHBONE standard also gives a meaningful push to the open core community Now developers can happily design new open cores by following the WISHBONE standard with no trouble on interface compatibility and with less worry on legal problems Above all the importance of the WISHBONE has been introduced But it is a shame that actually all we did in the thesis was just including a CONMAX core into the system no further investigations In the future it would be a really good starting point to research the interconnection architecture with the WISHBONE like how to adapt the standard into a Network on Chip NoC system meanwhile keeping a certain throughput or make benchmarks between the WISHBONE and other interconnection standards etc 5 2 WISHBONE in a Nutshell In this section the WISHBONE standard will be introduced It can be seen as an explanation or a supplement to the official specification when the descrip tions in the standard are not that easy to understand So it is recommended to read the WISHBONE specification first Here re several suggestions for reading the official specification 1 Start with the tutorial in the appendix It is very good to give a quick overview of the WISHBONE But don
185. ster The master checks the ACK and finds which is 0 So the master holds all signals Slave The slave sees all signals and knows this is an incrementing read burst The slave feels capable to handle the burst So the slave 1 returns the 1st data according to the address B 0 2 sets ACK to 1 to inform the master to keep transferring 3 calcu lates the next address based on the value of the current address and the BTE 84 CHAPTER 5 Edge 3 Master Slave Edge 4 is Edge 5 Master Slave Edge 6 Master Slave The master checks the ACK and finds which is 1 so it knows the slave is capable to handle more data The master puts the next address B 1 and other signals onto the bus The slave checks the STB CTI BTE and knows the burst is still happening The slave feels capable to handle more data So the slave 1 returns the 2nd data according to the address B 1 which is the address calculated by the slave itself at the previous edge 2 continues setting ACK to 1 to inform the master to keep transferring 3 calculates the next address based on the current address B 1 and the BTE similar to the Edge 3 The master checks the ACK and finds which is 1 so it knows the slave is capable to handle more data The master puts the next address B 3 and other signals onto the bus The master knows this will be the last one of the burst so it changes the CTI from INC to E
186. stops normal operations automatically branches to the certain ad dresses and fetches instructions from there to handle the exception The address that the CPU jumps to depends on the type of the exception 54 CHAPTER 4 Table 4 2 is copied for the Architectural Manual 9 Section 6 2 It gives the entry addresses of the supported exceptions Exception Type Vector Causal Conditions Offset Reset Caused by software or hardware reset Bus Error The causes are implementation specific but typically they are related to bus errors and attempts to access invalid physical address Data Page Fault No matching PTE found in page tables or page protection violation for load store operations Instruction Page Fault 0x400 No matching PTE found in page tables or page protection violation for instruction fetch Tick Timer 0x500 Tick timer interrupt asserted Alignment 0x600 Load store access to naturally not aligned location D TLB Miss 0x900 No matching entry in DTLB DTLB miss Illegal Instruction 0x700 lllegal instruction in the instruction stream TLB Miss OxA00 No matching entry in ITLB ITLB miss Range 0xB00 If programmed in the SR the setting of certain flags like SR OV causes a range exception On OpenRISC implementations with less than 32 GPRs when accessing unimplemented architectural GPRs On all implementations if SR CID had to go out of range in order to process next exception System Call OxC00
187. t guaranteed to come back in an expected time Another disadvantage is that the open cores are often less verified than the commercial ones Because verifying hardware cores needs quite a lot of complicated procedures and equipments only powerful companies could do these things well They can even build a real chip to test the quality of the cores but this seems too hard for individual developers Besides please notice that using open cores could suffer high risks on law or patent issues These risks may come from 1 The charge from other business competitors big companies and open source opponents 2 The open source licenses were initially designed to cover software When coming to hardware they may not be that reliable on protecting open cores 3 The explanation and execution of the law for protecting the licenses may vary from place to place CHAPTER 2 25 To summary the question should a company utilize existing open cores and integrate them into next products has no absolute answer It depends on the judgments of the smartest project managers It is definitely not an easy decision for a company to use an open core which is not familiar before And it could be affected by the factors like budget time product volume design team as well as many others However there is one definite conclusion we can draw here that every com pany should keep an eye on the open cores Some big companies have spe cialized depar
188. t necessarily do so This makes it impossible for the developers to hide harmful designs in the products without telling the customers This advantage makes the open cores well suited for security critical products that used for national or military purposes At the mean time open cores also have disadvantages that need to take into consideration The most impressed one for me is that the open cores are often less supported than the commercial products So it depends on the capability of the develop team in a company to decide how fast the open cores can be adapted into new products Here the less supported may be represented in many aspects like less documents no telephone support etc Most open cores are designed by engineers or fans individually for the purpose of interest After a tough design work only few of them can still have passion on working for services like writing good documents For example in our project we used an open core named OpenRISC We followed a long time to do exactly as one of its documents said but there were always problems Finally it turned out that the date of that document was written at the year 2001 while the latest revision of the source codes was at 2006 Except for the documents when having any questions regarding to the open cores there is no way to find quick technical supports Although there may be a forum on the Internet which you can post a message for discussing most likely the feedbacks are no
189. te boundaries In another word 32 bit data int type must always be placed in the ad dresses starting from 0x0 0x4 0x8 and OxC Long type or double type that takes 8 bytes should be placed at the addresses starting from 0x0 or 0x8 Short type 2 bytes should be placed in the addresses starting with even numbers And the 1 byte char type data can be placed anywhere After compiling a software project all variables are converted into memory objects Normally it is the linker who takes care of the memory alignment and gives the objects correct absolute addresses in the physical memory Nor mally the programmers do not have to think about the memory alignment But they should be careful when the addresses are explicitly referenced in source codes like unsigned int OxXXXXXXXA illegal unsigned short OxXXXXXXX8 legal unsigned short OxXXXXXXXF illegal unsigned char OxXXXXXXXF legal unsigned int i unsigned short i unsigned short i unsigned char i The OpenRISC OR1200 itself has an exception mechanism to handle the illegal accesses When it detects the next read write operation is not going to an aligned location internally the CPU throws out an alignment exception The next access will be discarded and the CPU program counter PC will jump to the address 0x600 Please refer to OpenRISC Architectural Manual 4 Chapter 6 for more information For programmers some tips are good to know regarding t
190. tent the slaves can always behave correctly if they are able to deal with the single read or write op erations without thinking about if the current block transaction is a block read or a block write transaction at all By the way please don t mix up the block read write in the WISHBONE and the blocking read write They are quite confusing sometimes The block read write means to read write a batch of data in a time while the blocking read write means the system will be stuck until the last read write operation has been finished For a simple example of the blocking read when programming in C with the function scanf to read from a keyboard the program won t continue until some buttons are pressed Some people may wonder why we need the block transactions because they look pretty much like the single transactions and essentially do not increase CHAPTER 5 TT the bus throughput So the following example is designed to give an answer In Figure 5 6 there are 2 masters and 1 slave Both the masters want to access the slave but only one of them is allowed to do so at a time So there is an arbiter who makes the decisions The arbiter gives grants to the masters according their CYC signals like Figure 5 7 shows WISHBONE Master A e WISHBONE Slave Arbiter WISHBONE Master B Figure 5 6 Block transactions are helpful in multi master systems
191. ter is reserved for them WISHBONE is the specification of a System on Chip SoC interconnection architecture It is a bus standard like ARM s AMBA It defines the interfaces of IP cores therefore specifies how the cores should communicate with each other Further this influences how the IP core network looks like The WISHBONE is adopted to be the interconnection standard in our thesis project because it is the official standard suggested by the opencores org 1 and most open cores support and only support the WISHBONE standard If saying the WISHBONE is a blueprint the CONMAX is a building The WISHBONE interCONnect MAtriX IP core CONMAX is a real IP core that implements a matrix interconnection that complies with the WISH BONE standard In the thesis project what we did was just connecting all other IP cores to the CONMAX It helped us to control the data traffic and handle bus transactions in the system Actually everything about the WISHBONE and the CONMAX are fully documented in their specifications 2 3 To avoid just copying texts from the specifications the chapter adds more explanations to help understand the WISHBONE standard and the CONMAX IP core lt is also called crossbar switch structure 63 64 CHAPTER 5 5 1 Importance of Interconnection Standard Before everything goes into detail it is necessary to stress the importance of the interconnection standards which is the WISHBONE in our case No
192. the BSD license in their business using their own commercial licenses instead as long as keeping a reference together with the products which shows certain BSD licensed open cores are contained like including a copy of the license in the user manuals CHAPTER 2 13 2 3 GNU Licenses In this section we are about to introduce the GNU licenses including the GPL and LGPL GNU General Public License GPL is one of the most popular open source licenses and perhaps the strictest of the world published by the FSF GNU Lesser General Public License LGPL is a supplement of additional per missions to the GPL by removing some limitations from it So to speak the LGPL is based on the GPL rather than an individual license We will start with the GPL and then take a look at how the LGPL is lesser than the GPL 2 3 1 GPL The GPL is a very strict open source license that designed to make the source codes become free software and keep them as free software forever The full text of the GPL can be found at 21 According to the formal definition of free software 16 a program is free software if its users have all 4 freedoms e The freedom to run the program for any purpose e The freedom to study how the program works and adapt it to your needs e The freedom to redistribute copies so you can help your neighbor e The freedom to improve the program and release your improvements to the public so that the whole community benef
193. ther parts of the system and force them to become open source as well After that we discussed a little about the free software philosophy which depicts an attracting image that everyone shares knowledge However this is against the business rules because it makes knowledge free to get and thus unprofitable So this philosophy will not be appreciated by those big companies which earned a lot of money on selling proprietary products Due to the same reason it is not suggested for a company to develop a new open core for sale because it will not earn enough money back However this is not absolute There are also some good effects that selling open cores could bring Regarding to the question that should a company utilize existing open cores into next commercial products or not my answer is that there is no definite answer This is the responsibility of the smartest project managers because using open cores do have advantages but also take risks So it is a practical question that depends on different situations We also talked about the future of open cores I guess open cores will keep growing yet not too fast because there is currently no strong power big companies huge investment that pays enough attention on pushing the open cores CHAPTER 2 27 Perhaps the only definite conclusion in this chapter is that everyone should keep an eye on the open cores because the next person who benefits from the open cores maybe you Ref
194. tions also give the Memory Controller possibilities to utilize the burst capabilities of the external memory devices For CHAPTER 6 119 example it is allowed for the SDRAMs to read out data continuously stored in a bank after the bank and the row are open For single trans actions the bank has to be re selected on every read write operation Apparently a lot of time is wasted on that My partner Lin Zuo in his thesis 1 has given a performance comparison between the OpenRISC system and ALTERA NIOS II system Table 8 8 of his thesis is cited below Platform Open Cores NIOS II e NIOS II s NIOS II f Clock MHz 20 20 20 20 Writing Time s 3 696 2 045 0 682 0 476 Reading Time s 3 880 2 123 0 760 0 527 Table 6 4 System performance test results The table gives the time that the CPU completely reads writes a 2MB ex ternal SSRAM The Open Cores means the OpenRISC CPU accessing the external SSRAM through the CONMAX IP core and the Memory Con troller IP core While the ALTERA systems use the NIOS II processor the Avalon bus and the ALTERA s SSRAM controller It is obvious that the ALTERA systems have better performance but the interesting part is that if comparing the 3 types of NIOS II processors econ omy standard fast the NIOS II e takes remarkably longer time than the other 2 Lin concludes it is the cache the NIOS II e doesn t have any cache but the others do that takes
195. tments and engineers which continuously collecting the infor mation of the IP cores and evaluating them so that they could find the right cores they need in a short time and utilize them to accelerate the develop ment If we constantly spend efforts on studying and testing open cores maybe one of them will be exactly what we need next time 2 7 The Future of Open Cores As a part of the open core it is interesting to discuss about the future In this section I d like to give my prediction Basically it is for sure that the open cores will keep growing First this is because people need to reuse cores to accelerate the system development Second due to the source codes of open cores are fully open everyone could get in touch with the open cores and work together to improve them So although there are some open cores may look not well enough today they will become better and better If take the free software community which is growing all the time as a reference we can foresee that open cores will follow the same track because both of them are open source But at the same time open source is against the business rules as discussed this in previous section So open cores will not be easily accepted by big companies nor will be welcomed by investments because in most cases the investments only like things or industries which could make more money back This means there will be no strong stimulus which forces open cores to grow in a rapi
196. to create a computing platform CHAPTER 7 131 5 open cores were utilized in the platform Once again they are verified workable So we can conclude that it is feasible to use the open cores Regarding to the quality it can vary from one open core to another because they are made by different designers and teams For the 5 open cores involved in the thesis project we think they are with good quality After the system was built they worked functionally stable So it is enough to prove the 5 open cores are good enough for academic and research purposes For the commercial usages more professional verifications might be still needed Comparing to the commercial IP cores generally speaking the open cores are in short of documents and reliable technical supports Therefore it relies on the qualifications of the design teams that how difficult an open core can be studied and utilized into a new system Another motivation of the thesis was to investigate the impacts of the open source licenses In Chapter 2 we introduced 3 widely used licenses the GPL the LGPL and the BSD license and discussed the influ ences of the licenses For commercial usages we get the conclusion that it is not a problem to use the open cores covered by the LGPL and the BSD license But the requirements of the licenses must be met For example for the LGPL a copy of the license text and the source codes of the IP core need to be attached with the distributed prod
197. to manually clear this bit in the timer ISR 4 6 Programmable Interrupt Controller PIC The OR1200 supports maximum 32 external interrupt inputs Those in put signals come from other hardware modules For example in the thesis project we have 3 interrupts from the UART16550 GPIO and DM9000A IP CHAPTER 4 57 cores It is possible to configure the number of the supported interrupts in or1200_defines v The Programmable Interrupt Controller PIC block of the OR1200 manages all external interrupts The PIC structure is showed in Figure 4 5 Pale ete A External Interrupt gt Logic AND Signals Figure 4 5 PIC structure The OR1200 PIC has 2 registers a mask register PICMR and a status register PICSR The PICMR enables disables the external interrupts The interrupt input signals are firstly logically ANDed with the value stored in the PICMR Only for the unmasked interrupts they can set the flags in the PICSR Note that the bit 0 and 1 in the PICMR are fixed to 1 so the INTO and INT1 are always enabled All 32 flags in the PICSR are logically ORed together If the result is not 0 the interrupt is triggered in the OR1200 which stops the normal CPU operations and jumps to the interrupt vector 0x800 to execute the ISR pro gram All external interrupts to the PIC are with the same priorities This implies the interrupt nesting cannot happen in the OR1200 If more than 1 interrupt is pending at the same t
198. to reuse some GPLed source codes for a new product this will surely fall into the definition modify And when redistributing the improved version section 5 will take effect to force applying the GPL to the whole new work which means we have to unfortunately open source for the entire work including the parts which are not initially covered by the GPL For a simple example if a program which had 4 950 lines of codes and now adding a 50 lines function copied from a GPLed software all the final 5 000 lines have to become open source and be public to the future users even though the GPLed codes only counts 1 of all 2 3 2 LGPL The last obligation of the GPL described in last section is obviously too strong than many users who are not that enthusiastic in free software move ment could accept Indeed many companies are afraid that their patents and proprietaries could be violated when combing them with GPLed things so they keep themselves away from the GPL Therefore a light version of the GPL is developed by the FSF by removing the infectious attribute from the GPL The LGPL is previously named as GNU Library General Public License which shows it is primarily developed for libraries Because the GPL is too strict and infectious if it is applied to a library all programs that linking to this library will have to become open source This is not the case that many developers especially commercial companies would like to see An
199. tools for the thesis project If you can manage to get the same developing environment it would be helpful to repeat the project e DE2 70 Board e A RS232 to USB Cable e An Ethernet Cable e A Speaker or an Earphone e APC e Cygwin e OpenRISC Toolchain e Quartus II e Our Thesis Archive File 137 138 CHAPTER B B 1 1 DE2 70 Board The most important hardware needed for the thesis project is a DE2 70 FPGA board The DE2 70 is a Development and Education board based on ALTERA s Cyclone II FPGA EP2C70 It is produced by Terasic The information of the DE2 70 can be found from their website 1 The board costs 599 USD or 329 USD for academic users which is not cheap but luckily many universities already have lectures or labs with the board So if you are a student maybe try to borrow one from your professor or laboratory Just like we did for the thesis Some people might have Terasic s DE2 board instead It is possible to port the thesis project from the DE2 70 to the DE2 because the 2 boards are very similar The main difference between the DE2 and the DE2 70 is that the DE2 uses Cyclone II EP2C35 as the FPGA which has less on chip resources than the EP2C70 Besides the DE2 has only 512KB SSRAM and 8MB SDRAM while the DE2 70 has 2MB SSRAM and 64MB SDRAM But the resources on the DE2 are already enough for the thesis project Some other people may have different FPGA boards like Terasic s DE1 or maybe even
200. tp www olivercamel com post master_thesis html Last visit 2011 01 31 Website Cygwin http www cygwin com Last visit 2011 01 31 Webpage GNU Toolchain from Wikipedia http en wikipedia org wiki GNU_toolchain Last visit 2011 01 31 Webpage GNU Toolchain for OpenRISC http opencores org openrisc gnu toolchain Last visit 2011 01 31 Website GCC the GNU Compiler Collection http gcc gnu org Last visit 2011 01 31 Website GNU Binutils http www gnu org software binutils Last visit 2011 01 31 Webpage OpenRISC 1000 Architectural simulator http opencores org openrisc orlksim Last visit 2011 01 31 Website GDB The GNU Project Debugger http www gnu org software gdb Last visit 2011 01 31 Website GNU Make http www gnu org software make Last visit 2011 01 31 46 CHAPTER 3 14 Website ORSoC http www orsoc se Last visit 2011 01 31 15 Webpage Operating System from Wikipedia http en wikipedia org wiki Operating system Last visit 2011 01 31 16 Website Micrium http www micrium com Last visit 2011 01 31 17 Webpage uC OS IT License from Micrium http micrium com page downloads os ii_evaluation_ download Last visit 2011 01 31 18 Lin Zuo System on Chip design with Open Cores Master Thesis Royal Institue of Technology KTH ENEA Sweden 2008 Document Number KTH ICT ECS 2008 112 19 Website underbit technologies http
201. tually no difference than a single processor system Figure 5 1 b shows an improved version The bus is replaced by a matrix interconnection Now two processors can access different peripherals in parallel but still need to be arbitrated when accessing the same peripheral In fact some systems have more than two and even dozens of proces sors And the number is still increasing As a result the traditional shared bus is evolving more and more like a network So as we can see how to communicate efficiently in the multiprocessor system has CHAPTER 5 65 Peripheral Peripheral Peripheral Arbiter a Traditional bus limits performance b Matrix structure is better Figure 5 1 Interconnection is important in multiprocessor systems emerged as a critical problem Carefully designing interconnection ar chitecture to get a maximum throughput for the IP core network be come an attracting issue that engineers care about now 2 Second IP cores need standards for their interfaces This is easy to understand Lots of companies produce IP cores If there is a standard that everyone follows all of the cores will get connected easily This will definitely accelerate a lot on developing the new products So people need a unified interconnection architecture 3 Third a standard for the interconnection has great market potential Think about it If a company owns a standard which takes the domi nant role in the market all oth
202. uartus Il is Alt R U then Alt R A A 1 4 Now it is time to setup the DE2 70 and get all cables connected the power cable the USB cable for FPGA programming the USB to Serial cable and the Ethernet cable Don t connect the speaker to the board for now Because the DE2 70 s built in default demonstration program plays a high frequency sine wave when power on there will be noises if the speaker is connected The Switch 17 of the board can be used to switch off the sine wave To switch off we need to turn the switch up By saying up I mean to push the switch closer to the LEDR17 CHAPTER B 143 1 5 Now the board is ready please program the FPGA with orpXL_top sof file in Quartus like Figure 2 showed Wj Quartus II C olivercamel myProject orpXL_release_20081116 hardware orpXL_top DER File Edit Processing Tools Window da USB Blaster USB 0 Mod Progress E For Help press F1 Figure B 2 Program ALTERA FPGA 1 6 By programming the FPGA the OpenRISC hardware platform will be downloaded to the board and the bootloader will be placed into the on chip RAM Meanwhile the default DE2 70 demonstration project will be overwritten so the sine wave noise is not there anymore From now on don t be afraid to connect the speaker 1 7 In the project we used the Switch 17 as the reset key of the hardware system When the Switch 17 is pulled down it means the reset is on And when the Switch 17 is pushed
203. uctor Corporation http www national com mpf PC PC16550D html Last visit 2011 01 31 13 Datasheet PC16550D Universal Asynchronous Receiver Transmitter with FIFO s National Semiconductor Corporation June 1995 Available at http www national com ds PC PC16550D pdf Last visit 2011 01 31 14 Webpage 16550 UART from Wikipedia http en wikipedia org wiki 16550 UART Last visit 2011 01 31 15 Jacob Gorban UART IP Core Specification Rev 0 6 August 11 2002 16 Webpage General Purpose I O GPIO Core OpenCores Organiza tion http www opencores org project gpio Last visit 2011 01 31 17 Damjan Lampret Goran Djakovic GPIO IP Core Specification Rev 1 1 December 17 2003 18 Datasheet WM8731 WM8731L Portable Internet Audio CODEC with Headphone Driver and Programmable Sample Rates Wolfson Microelectronics Rev 4 7 August 2008 19 Webpage 12C controller core OpenCores Organization http www opencores org project i2c Last visit 2011 01 31 20 Webpage 12C Licensing Information NXP Semiconductors http www nxp com products interface_control i2c licensing Last visit 2011 01 31 21 Datasheet DM9000A Ethernet Controller with General Processor In terface Final DAVICOM Semiconductor Inc Version DM9000A DS F01 May 10 2006 Chapter 7 Conclusion and Future Work Finally the chapter will conclude the thesis Also some interesting topics that we didn t have enoug
204. ucts When coming to the GPL we suggest the users to think carefully because the GPL will force opening the design details of the other parts of the system For academic usages all 3 licenses are possible if it is not an issue to open the design details in case of the GPL One definite conclusion we made for the thesis is that everyone should keep an eye on the open cores The open core community might grow slowly because the investments are lower comparing to the commercial world but it will never walk backwards The existing open cores will become better and better and the new open cores will appear sooner or later If more people join in and even just give a small contribution each the community will grow much faster Meanwhile the people will be able to find the proper IP core at the 1st time when needed 132 CHAPTER 7 7 2 Future Works Due to the time limitation there were some tasks we wanted to do better but couldn t Those tasks are described as the future works in this section 7 2 1 Improve and Optimize the Existing System The open core computing platform we contributed is able to work but far from perfect Many improvements are possible to increase the system per formance make it more stable and easier to use On the hardware side there is still large space to improve the CPU efficiency by reducing the CPU waiting time As already analyzed in Section 4 2 and Section 6 2 4 if we can enable the OR1200 cache and MMU
205. up the system starts working After the FPGA is programmed please reset the hardware system by putting the Switch 17 down and then up After that the bootloader starts running It stays in an endless loop waiting while reading the RS232 port B 2 2 Download Software Project by Bootloader 2 1 Start Cygwin and enter the folder of the software project i e software Figure 3 shows the folder structure 144 CHAPTER B cd cygdrive c olivercame1 myPro ject orpkL_re lease_20081116 ls Fcyedrive c olivercamel myProject orpXL_release_20081116 software of x documents hardware others readme txt software source thesis tools cd software 6 ls Application Lib_orpXL software cbp uC TCPIP Build readme_software txt software layout uCOS II 2 2 2 3 2 4 Figure B 3 Software project The software build is the folder to compile the software project To save time without type all commands every time a makefile script is made for the Make tool Please double check the makefile if the OpenRISC toolchain is placed under the correct path Otherwise those commands will not work The makefile script is showed in Figure 4 Now let s recompile the software project This is not necessary but interesting to try In the build folder run command make all clean as showed in Figure 5 You will get a myPrj ihex at the end of compilation It is an Intel HEX format file which will be downloaded to the boar
206. very rising edge the slaves examine the value of the CTI to see if preparations are needed to execute to handle the burst transactions If the CTI is 000 the current transaction is a classical transaction No need for special preparing If the CHAPTER 5 81 CTI is a 001 the current transaction is a constant burst or if 010 it is an incrementing burst When a burst is about to terminate the master will give a CTI with 111 to tell the slaves that go back to normal state There are 2 kinds of burst The constant burst always reads or writes the same addresses This is useful to access FIFOs or certain I Os which have volatile data While the incrementing burst contains the operations targeted to adjacent addresses It is particularly designed for reading or writing a block of data from to memories When the incrementing burst is used one more BTE is needed to indicate how the address grows The definition of the BTE is clearly described in the table 4 2 and 4 3 of the WISHBONE specification Now it is time to go through the details of how the burst transactions work To avoid describing by just boring texts 3 examples are designed with wave forms which can be seen as supplements to the WISHBONE specification The first example is a constant writing burst which is showed in Figure 5 11 The first line of the waveform marks the number of each clock rising edge Below that only the related signals are drawn As
207. w the WISHBONE bus specification 100 CHAPTER 6 6 1 1 On chip RAM Pros and Cons 2 advantages gave us the reasons to make an on chip RAM block for the project 1 Access to the on chip RAM is much faster comparing to the external RAMs 2 The contents of the on chip RAM can be easily programmed modified and monitored by Quartus The first advantage is quite well known and thus no need to explain in detail For the FPGA on chip RAMs because both the processor and the memory block reside in the same FPGA chip it results in simpler interfacing logic and shorter accessing time For ALTERA 1 port RAM core it takes only 1 clock cycle to read or write This makes the on chip RAM especially suit for working as cache stack or storing global variables The second advantage provides great convenience for developing software ALTERA has comprehensive tools to control the on chip RAM First the designers can create a MIF file and link it to the 1 port RAM core When programming the FPGA the Quartus will download the data in the MIF file into the on chip RAM Second there is a tool in the Quartus called In System Memory Content Editor It can be used to examine or modify the data stored in the on chip RAM dynamically With these features the designers practically get a programmer and a basic debugger They can already make some simple software applications with the tools The limitation of the on chip RAM is as obvious as its advant
208. wadays people have gradually realized that the interconnection architec ture maybe the most significant in an electronic system even more than processors There are several reasons listed below 1 First the bus or the interconnection is becoming the bottleneck of the system performance In the past it was the processor that limited the system performance One of the methods was to increase the system frequency For example the home PC frequency was improved from 100MHz level to GHz level during the last decades But it turned out that increasing the CPU frequency was not always helpful There are at least two serious drawbacks e Higher frequency consumes more power e Most peripherals cannot catch up such high speed As a result processors spend a lot of time in the idle state waiting for the peripherals to complete an operation while doing nothing To solve the problem multiprocessor systems appeared By using more than one processor peripherals can be driven in parallel And in theory the frequency can lower down because the work is now shared by more processors So it is the trend that the multiprocessor structure will become popular However another problem arose that the traditional shared bus limited the performance hugely in multiprocessor systems as Figure 5 1 shows In Figure 5 1 a only one processor is able to access the peripherals at a time The other have to wait until the first one releases the bus This has ac
209. we can see now the CTI and the BTE are included for burst transactions The value CON of the CTI shows this is a constant burst and the EOB stands for End Of Burst The BTE is not needed for the constant burst transactions so its value is not cared about X during the whole period Besides the signal WE is always 1 This indicates the current burst is a burst write Figure 5 11 An example of a constant writing burst transaction Edge 1 Master The master is ready to initiate a burst transaction It sets the STB to 1 to start the transaction Meanwhile it gives 1st valid address outputs the 1st data to be written and sets WE to 1 More important it asserts the CTI as CON to inform the slave this is a constant burst Slave The slave does nothing at the edge 1 because it cannot see any thing from its perspective 82 CHAPTER 5 Edge 2 Master Slave Edge 3 Master Slave Edge 4 Master Slave Edge 5 Master Slave Edge 6 Master The master checks the value of the ACK to see if the slave replies Since the ACK is still 0 the master holds all signals unchanged for one more clock cycle The slave now receives the information about the burst from the master The slave checks and the CTI and knows it is a constant burst Because the slave is idle and can handle the 1st data of the burst it asserts the ACK to inform the
210. y block s 32 bit data width the address connections are shifted by 2 So 4 continuous addresses at the CPU side are mapping to the same physical location of the memory block Besides the SEL signals are introduced to identify the valid portion on the data bus The example showed in Figure 6 3 also gives an overview of the data organization scheme 106 CHAPTER 6 6 1 5 Memory Alignment and Programming Tips As discussed in the last section any continuous 4 bytes in the CPU s per spective stored at the addresses that starting from 0x0 0x4 0x8 or OxC e g bytes 0x0 0x3 bytes 0x4 0x7 etc are in fact stored at the same 32 bit physical location of the memory block If the CPU needs all 4 bytes instead of accessing 4 times from the memory block it can simply get a 32 bit data at a time This operation can be done by for example the following C code unsigned int i unsigned int OxXXXXXXX0O legal Here the X can be any hex numbers from 0 to F When executing the line above in hardware the CPU will send the address OxXXXXXXX0O onto the WISHBONE bus while turn on all the 4 SEL signals As soon as the memory block finds out the 4 SEL lines are high it realizes the 32 bit data stored at 0x0 are all wanted by the CPU Then it feeds back 32 bit data with 4 valid bytes And the CPU will consider it has received 4 bytes stored at the continuous address 0x0 0x3 However it is not possible to access an integer type data fr
211. yet do nothing ELSE if a burst has already started set the ACK to 1 and hold all other signal unchanged for one more clock cycle ELSE the STB 1 check the CTI BTE WE and other signals IF the current CTI is not an EOB set the ACK to 1 meanwhile IF it is a write burst accept and write the data to the address currently transferred through the bus Exception if this is the first write operation of the burst don t process the data ELSE if it is a read burst do 1 return the data to the master based on the previously calculated address Exception if this is the first read operation of the burst i e there s no pre calculated address return the data based on the current address sent by the master 2 calculate the next address to read according to the value of the current address the CTI and the BTE ELSE if the CTI EOB set the ACK to 0 meanwhile IF it isa write burst latch the last data and de assert all signals ELSE if it is a read burst just de assert all signals IF no wait state is needed then that s it Go to the next clock rising edge CHAPTER 5 87 ELSE if the slaves currently are unable to handle more data a wait state is inserted by resetting the ACK to 0 All other signals can output as X However the slaves should still go through the previous IF ELSE block and somehow remember what the next output signals should be In case of write bursts and th
Download Pdf Manuals
Related Search
Related Contents
Toldo plegable de aluminio Tenda dobrável em Portable Utility Water Pump Pompe à eau portable tout Annexe/Appendix/Anhang............................. Samsung 400DXN 用戶手冊 Huffy Dual Electronic Basketball System Board Games User Manual SIM908-C_Hardware Design_V1.01 DCR-PC53E/PC55E 妊産婦をとりまく諸要因と母子の健康に関する研究 Copyright © All rights reserved.
Failed to retrieve file