Home

Xcell Journal: The authoritative journal for programmable

1. spea y JO aAISN DU Be SPNpoId aWe j pea JO SUOISUBLIP Pal Z BON doyj dij 171 yndul p se pauyap s 19 21607 e g ueyed5s 104 Z SO 4SN 40 JEQUINU LWINWIXEW 9 JedIPU jqe Ul SI QUNN L VON SINVY Se pesn Sg 0 OE OZ PNPUI S9725 waishs oo a ba H D D D X FPR uu sexse ES WEEL v SP Se GE Ze GE Ee isan ete Oeste e ees et 207 09599 oni O8SZ EE 08XxXp0L 2000 wesen WE LL v ep SHO WO KOU u ZLE SIA SIA SHA v O EMZ 96 KEN 96 ev 962 SS 807 79 89 LZ 7ZX96 0004 m ssen 9 9 S9S wu ex re CH SZSM sng SZSANI SS EE WLL y Gi mamug G9S OU SIA SIA SIA v DEE OF 2021 Ov JOZE goen onge ogpm 9x08 N00 wesen Beal een eae o wee RE NZ S v Sp reet GE ee Ge Ge Ee EE CEET ZE JLS ZE 1802 oo Zeppi ZLE EL 75X79 200el mein Sie ES BE WE prp eseo s ee ou o au a P EE Sst mp FH mt met mu 0894 OX BH 20001 maen SSD TSH I SSEID ge Buneds eg ww oU Wg Yyozd auly puog asim 54 sebeydeq WO Wl y G i DISD CIISS ame VOC OEE SA SA S UEL 9l 1887 oF 19S 891 Z 7908 GER 872 X CE 1007 weem C sa D ET ann Vaan lean WO L OR E e a E ENEE EE RE ZL EI ZL MOE 0ve ozet 026 L Ott 002 weem I8 USUE ESONDAT LLN Buneds eg ww 0 LU Wg uy Yyd Id auly puog asim 14 sabpopd YD4 Wir P cv papus ajbulg G 9S Sey Se Sey y OEE 7Z P ACL P ACL 9 G L SC L 894 Col 0S L6 6 6 0u aS JOA Z Ajlwey g ueyeds E y A cs E VN 6ulseds pea wig
2. We picked 9 Xilinx Virtex II Pro devices as it meant a Significant savings in device count from 1U5 to 9 One such experiment is called the Compact Muon Solenoid CMS which is based on a large superconducting magnet system The CMS will have a number of sub detectors including an Electromagnetic Calorimeter ECAL The ECAL will use about 80 000 crystals to capture the energy of the photons and electrons The data col lected from these crystals will be captured processed and transmitted by the DCCs about 60 of them for further analysis Design Overview The DCC includes 70 high speed optical receiver channels 6 blocks of 12 channels each implemented on a 9U VME board 36 cm x 40 cm working at 800 Mbps using a 2 byte 8b 10b protocol For the implementation of the trans ceivers we had two choices 1 As many as 70 discreet deserializers along with 35 FPGAs for the required control this number was based on cost considerations for a total device count of 105 This would have given us more granularity and a lower cost but more components and hence higher debug and testing times 2 Only nine Xilinx Virtex II Pro devices with eight embedded RocketIO transceivers on each only the XC2VP7 FG456 part was available at the time We would lose some gran ularity but the PCB would be much less dense and easier to test We picked the second choice as it meant a significant savings in device count
3. How VirtuaLab Works TechOnLine has served the educational needs of the engineering community since 1995 Their solutions are designed to build awareness educate and train audi ences and enable the real time evaluation of hardware and software products over the Internet VirtuaLab goes beyond the software simulation usually associated with web based design you can access hardware and software with just a browser From virtual ly anywhere in the world design engineers with an Internet connection can experi ence the advantages of designing Xilinx DSP solutions with the Xilinx Nu Horizons VirtuaLab When you begin a VirtuaLab session you can be assured of a consistent experi ence and a controlled environment You can schedule time on VirtuaLab 24 7 according to your local time zone and the scheduled date can be integrated into your Microsoft Outlook calendar The first user interface is very famil iar a Windows or Linux desktop on which locally stored files can be moved onto an LDAP lightweight direc tory access protocol environment on TechOnLine s servers In this way bench marks applications under development and other proprietary software can be run on the target board within a person alized file structure Security is always a top priority DIGITAL SIGNAL PROCESSING From the Windows desktop you move to the remote target control interface which provides remote control of the hard ware
4. ddd up OL sabeyded d4OL S o S z S SIE se J z e ZS z z J n FG 53 3 Sess ie ee bs ches E LC Oo ch D Di Get D 1 o w D 3 Bupeds Deet vue 0 ddd1 uy Asan OA saeed d4OA gd S 2 m al 3 o Se 2 E 7i 2 a a e e n N _s ss Lvl Lyk FI 3 ed Wap 2 Se e lie E a T do E vi np AS Ff A a Ge 2 sa SS o E lt a 2 a ES Buneds pe j vue 0 ddd 20spId puog a4im Od SabeOgd d4Od z Ag ae z gt D Ei a D v84 ziz Lev ELL SO 3 Q 2 w 3 e z lt gt D 3 N N E D ze Se gt x Se Se Se Se Se ee ei s g SS A A A A A A AN AN N O E kal Ce LA LA LA LA LA LA LA LA be A A A o o o o Lo Jo ae 2 ul BN a N U D Ul D CH CH CH CH CH CH CH Lo CH CH CH CH CH Az L g ueeds EUREN s n e 4 O I s 2 nos y JT sadinosay How WN s 2 nos y 91D O lasN pue suoudo Heed XIJ eIN UOIDAIaS LABO S9IIADP WOD XUIJIX MMM d U sAueduuo gt 31607 ajqewuwesboig dU ed ueyeds Xuum WON Winter 2004 POWER MANAGEMENT Tl Power Solutions Power Behind Your Designs Highly integrated Triple Supply from TI Powers Spartan 3 Core I O and Vccaux Rails PRODUCT PREVIEW 5 V_INPUT 120pF Vecaux 10 nF 10 ne Veco 33V 3A li 20 5 k ie ne Applications e DSL modems F e Set top boxes ji e Plasma TV display panels Se 300 mA
5. architecture 2 Xilinx aware user You have designed with previous Xilinx products but may not know about the latest solutions 3 Expert Xilinx user You are profi cient in both standard and advanced products architecture and are licensed in most or all of the EDA design tools available for development The opportunity to return and contin ue your studies or design work is always possible with VirtuaLabs LDAP environ ment which provides secure storage how ever you can always transfer your files if you prefer Once in the VirtuaLab environment you can compile code measure board per formance and set breakpoints anything that you could do if the evaluation board was being controlled by your own PC What Can Evaluate Working with Xilinx the Nu Horizons engineering teams offer a full portfolio of educational aids as well as direct access to Xilinx design flow tools and high speed test equipment Our first laboratory consists of the Nu Horizons Spartan 3 2000 evaluation platform Figure 1 which is a very flexible testing platform that allows you to evaluate the Xilinx XC3S2000 FPGA and develop a multitude of applications One of the Nu Horizons Xilinx VirtuaLab applications is focused on high performance FPGA DSP functionality You can develop advanced algorithms and perform complex measurements through a full complement of test equip ment connected to the Spartan 3 2000 environment
6. many customer training needs we real ized that we could make improvements Xilinx formed a team of experts to ana lyze the website and as a result deter mined that a self service portal was the best solution The extreme makeover included features to reduce your time to knowledge more than ever The new training catalog on the Education Services website gives you a speedy personalized low cost flexible and quality training solution at your fin gertips anytime anywhere Improved Navigation The makeover team which included web developers and designers programmers system administrators usability experts instructional designers and technical writ ers put their heads together to increase the usability of the aging and content heavy Education Services website Since its last redesign three years ago the site was beginning to show wrinkles and the navi gation had become complicated em The team wanted to create a quicker route to course descriptions and registra tion They also wanted to augment train ing services and course offerings to help decrease knowledge gaps among Xilinx designers In addition to streamlining the navi gation the team also set out to redesign the online enrollment process changing keywords in the search menus to provide a more intuitive registration experience Now you can find instructor led live and recorded e Learning courses more easily by curriculum paths based o
7. to manage the test data using a stream ing programming model Given the performance differences between a processor based test bench and the potential performance of an all hard ware system it should be clear that soft ware based testing of such applications cannot replace true hardware simulation in which you can observe using post route simulation models the design running at any simulated clock speed Conclusion In system testing using embedded proces sors is an excellent complement to simula tion based testing methods allowing you to test hardware elements at lower clock rates efficiently using actual hardware interfaces and potentially more accurate real world input stimulus This helps to augment simulation because even at reduced clock rates the hardware under test will operate substantially faster than is possible in RTL simulation By combining this approach with C to hardware compilation tools you can model large parts of the system includ ing the hardware test bench in C lan guage The then be iteratively ported to hand coded HDL system can or optimized from the level of C code to create increasingly high performance system components and their accompa nying unit and system level tests For visit www impulsec com e mail info impulsec com or call 425 576 4066 o more information Winter 2004 a EMBEDDED SYSTEMS lt a Nohau Shortens Debugging Time tor MicroBlaze and Virtex ll
8. Winter 2004 suppliers are loath to start designing devices before the specification is frozen To take advantage of new busing net works in advance of fixed specifications designers are turning to soft IP cores embedded within programmable logic devices This allows designers to try out new ideas risk free and add in customized solutions within the bounds of the proto col This approach also allows cut down versions of the full interface if not all of the features are required thus saving even more silicon area Now that programmable logic prices have dramatically dropped they can even be considered a viable way of designing production solutions as well as prototype builds A key benefit of having a LIN inter face embedded within a PLD in the form of an IP core is that it can be reconfigured remotely to be either a master or a slave node thus aiding greatly the test and design phases Even in field fault diagnosis and vehicle maintenance the ability to make nodes either master or slave may be beneficial In the case of a non volatile CPLD reconfiguring the node is simply a matter of erasing the device and re programming it with a new personality The ability to switch between master and slave in the same device means that inventory and stocking costs are reduced plus there is only the need to qualify one device rather than two thus saving the lengthy device qualification time and costs associated with it PLDs fr
9. absolute difference SAD engine in motion estimation and video scaling By mapping these modules onto the FPGA fabric the host processor or the programmable DSP has the extra cycles for other algorithms Furthermore FPGAs can have multiple clock domains in the fabric so selective hard ware blocks can thus have separate clock speeds based on their computa tional requirements e Theoretic optimality in quality Any theoretic optimal solution based on the rate distortion curve can be achieved if and only if the complexity is unbounded In a programmable DSP or general purpose processor the computational complexity is always bounded by the clock cycles available FPGAs on the other hand offer much more flexibility by exploiting data and algorithm parallelism by means of multiple instantiations of the hardware engines or increased use of block RAM and register banks in the fabric A programmable DSP or general pur pose processor is often limited by the number of instruction issues per cycle the level of pipeline in the execution unit or the maximum data width to fully feed the execution units Video quality is often compromised as a result of the limited cycles available per task in a programmable DSP whereas hard ware resources are fully allocated in FPGA fabric three step vs full search motion estimation Implementing Functional Modules onto FPGAs Figure 1 shows the overall H 264 AVC macroblock level enc
10. by Wilson C Chung Senior Staff Video and Image Processing Engineer Xilinx Inc wilson chung xilinx com H 264 AVC is the latest international video coding standard in a series of such standards H 261 MPEG 1 MPEG 2 H 263 and MPEG 4 visual or part 2 It was approved by the ITU T International Telecommunications Union Telecommun ication Standardization Sector as recom mendation H 264 and by ISO IEC as International Standard 14 496 10 MPEG 4 part 10 Advanced Video Coding AVC in May 2003 Despite H 264 AVC s promises of improved coding efficiency over existing video coding standards it still presents tremendous engineering challenges to sys tem architects DSP engineers and hard ware designers The H 264 AVC standard brought in the most significant changes 40 Xcell Journal and algorithmic discontinuities in the evo lution of video coding standards since the introduction of H 261 in 1990 The algorithmic computational com plexity data locality and algorithm and data parallelism required to implement the H 264 AVC coding standard often direct ly influences the overall architectural deci sion at the system level In turn this determines the ultimate cost of developing any commercially viable H 264 AVC sys tem solution in the broadcasting video editing teleconferencing and consumer electronics fields Complexity Analysis To achieve a real time H 264 AVC stan dard definition SD or high defini
11. EL7536 High Efficiency 1A Integrated FET Regulator CT EA EVALUATION BOARD Buek Converter ET PPOR arrera e SNE HRS LMG pe mia 0320 EL753X Evaluation Board with 300 mils x 600 mils Footprint Get more technical info on Intersil s complete portfolio of High Performance Analog Solutions at www intersil com info 2004 Intersil Americas Inc All rights reserved The following are trademarks of Intersil Americas Inc Intersil Intersil logo i and Design Amplifiers Converters DataComm DCPs DCCs Display Interface Power Management Switch MUX Timing Video EE Tiny 0 97cm total BOM footprint Extremely small 0 6A 2A synchronous buck regulators Vin 2 5V a D DM 95 maximum efficiency 100 duty cycle V close to Vn 1 4MHz fixed PWM e Small passives PFM PWM auto switchable available in EL7530 and EL 531 e 120uA quiescent current Power good signal for EL7530 and EL7531 Power on reset for EL7532 EL7534 and EL Ap External frequency synchronizable EL 532 EL7534 and EL7536 Internal soft start Core Power Supply Communications Equipment Storage Systems WLAN Pocket PC Wireless Web Browsers GSP Navigators Digital Cameras Barcode Scanners Portable Instruments Language Translators inters HIGH PERFORMANCE ANALOG UU yOoqe ep Ojulzed Wod XUIjIXMMM d 4 e PUNO s a ys e ep adIAap Y YIM JUBWINIOP SIY Ul EIER UE Aja 1uEUOodu SJIMAISUEL UO DU
12. FOE GAR lela by Warren Miller VP of Marketing Avnet Design Services Avnet warren miller avnet com Traditionally designs for a variety of appli cations used dedicated digital signal process ing DSP chips or application specific standard products ASSPs to process digital information using signal processing algo rithms Filtering video processing and audio processing were just a few of the many applications using digital signal processors Now with performance and capacity improvements to FPGAs as well as the improved efficiency of common arithmetic operations usually found in most DSP 48 Xcell Journal applications FPGAs doing DSP functions are becoming more common In many cases both processors and FPGAs are used in the same application in a co processing archi tecture where the FPGA does pre or post processing to accelerate processing speed DSP applications are usually difficult to verify via software simulation because of the enormous number of cycles required to process a meaningful data stream thus it is usually better to use a hardware develop ment platform to prove out the key parts of a new design The new DSP design kits from Avnet provide a powerful flexible and expandable platform to validate even the most complex signal processing designs that use both FPGAs and DSPs Avnet DSP Design Kits Avnet Design Services has created a variety of DSP oriented design kits for use w
13. Synopsys has developed Design Compiler FPGA DC FPGA DC FPGA brings the ASIC strength synthesis technology of Design Compiler with new Adaptive Optimization AO technology to achieve excellent timing in fast run times DC FPGA is part of a family of products that work in conjunction with Xilinx ISE to streamline the prototyping process enabling you to design once 108 Xcell Journal Winter 2004 Why Prototype The complexity associated with ASIC devel opment has led to a significant increase in the number of design teams choosing to proto type their designs using an FPGA According to Gary Smith of Gartner Dataquest User Wants and Needs 2003 as well as our own surveys as part of the Galaxy Technical Seminar more than 40 of all ASIC designs have been prototyped in an FPGA This trend is increasing over time Prototyping provides several bene fits Primarily it offers a way to prove the design before undertaking an expensive ASIC manufacture A physi cal prototype also enables the design to be rigorously verified using real data Industry analyst data gathered from the 2003 Synopsys Seminar indicated that a majority Verification 70 of design re spins still occur because of functional errors Rapid verification of the programmable pro totype can go a long way toward ensur ing that the ASIC design is right the first time Additionally a prototype enables earlier integration of the
14. The DVD Forum with its HD DVD initiatives has selected H 264 AVC togeth er with WMV 9 and MPEG 2 as the stan dard video coding formats The European DVB consortium has also selected H 264 AVC as the next format after MPEG 2 These announcements plus endorsements from Hollywood studios content distributors and broadcast infra structures have further validated the importance of the H 264 AVC video cod ing standard for the next few years For more comprehensive studies and technical details of the H 264 AVC video coding standard please see the II References References Draft ITU TU Recommendation and Final Draft International Standard of Joint Video Specification ITU T Rec H 264 ISO IEC 14 496 10 AVC in Joint Video Team JVT of ISO IEC MPEG and ITU T VCEG JVT G050 2003 A Luthra G J Sullivan and T Wiegand July 2003 Special Issue on The H 264 AVC Video Coding Standard IEEE Trans Circuits System Video Technology 13 7 557 725 Xcell Journal 43 DIGITAL SIGNAL PROCESSING W tremeDSP Developing a GSM Modem on a DSP FPGA Architecture Using System Generator and Simulink you can create a seamless simulati onia M leme ntatior by Louis Belanger Product Development Manager Lyrtech Signal Processing Inc lovis belanger lyrtech com GSM Global System for Mobile is the most widely used cellular phone technology Having begun mostly as a European stan dard it has spread through
15. The SMT148 Meeting the Challenge SDR is characterized by a decoupling between the heterogeneous execution plat form based on hardware functions togeth er with DSP and MCU processors and the applications through abstraction layers One of the approaches for meeting the complexity demands of third generation systems is to use multi core systems where a DSP and a microcontroller work togeth er with an FPGA based hardware co processor With its embedded IBM PowerPC the Xilinx Virtex IJ Pro FPGA is rapidly becoming a solution that embeds and tightly couples reconfigurable logic and a processor in the same device Sundance a developer of advanced sys tem architectures for high performance sig nal processing applications has focused on designing FPGA based development plat forms that address an SDR OEM s wish list Our challenge was to design a system that would provide the scalability SDR sys tems require The Embedded System Controller The SMT148 Figure 1 is one of many development systems Sundance has launched recently Aimed specifically at SDR the SMT148 is a fully configurable and expandable waveform development environment that meets the many require ments of SDR developers This entry level stand alone system enables radio designers to investigate and experiment with the Winter 2004 many configurations of multi channel soft ware programmable and hardware config urable digital radio T
16. Virtex 4 SX many die per wafer compared to build ing an equivalent chip with 130 nm process on 200 mm 8 inch wafers This lowers cost per die significantly Multiple platforms deliver cost opti mized feature sets With each generation of Virtex FPGAs Xilinx has taken advantage of the latest process node to fabricate devices that offer greater capacity higher performance and lower price For the Virtex 4 family we went even further to achieve cost reduction As we strive to expand the use of Virtex FPGAs into new markets and geographies we see that our customers have different requirements that vary with the complexity and target price for the systems they are creating Using our propriety ASMBL pronounced assemble architecture see Figure 1 and sidebar ASMBL Architecture Enables Cost Optimized Platforms Virtex 4 FX Figure 1 ASMBL architecture Winter 2004 Freedom to CI The Virtex 4 family offers three platforms with a total of 17 devices tallored to the requirements of different application domains The Virtex 4 LX SX and FX platforms each provide a unique mix of core capabilities such as logic memory parallel and serial Ve embedded processors DSP functionality and other functions suited to specific system requirements Figure 2 One family multiple platforms we have assembled three different plat forms Figure 2 with an initial offer ing of 17 devices that
17. such as generating VHDL code to emulate the state machine and old_filter_finish filter_finish if enable false implementation through a Xilinx black box filter_finish false co nfi gur ati on elseif BBRX_end filter_finish true ENNY ee Xilinx StateCAD Configuration filter_finish old_filter_finish end A simple state machine is given with the algorithm shown in Figure 3 The requirements of the signal filter_ Figure 3 Pseudo code for finish are described as follows If the block is the switching algorithm Winter 2004 filter_reset Kg filter enable 1 DIGITAL SIGNAL PROCESSING JJ Start filter_finish 0 filter_enable 0 filter_enable 0 Filter_ Complete filter_finish 1 Figure 4 StateCAD implementation for BBRX filter enabled when the counter reaches the value of BBRX_end filter_finish should go high and stay high until enable goes low The state machine shown in Figure 4 was gener ated with StateCAD to emulate this logic Note that the default state in the VHDL code must be changed manually to start This is because StateCAD s default state in terms of VHDL code is that whose name is first alphabetically Thus to avoid compli cations you should always create a default state in which to start called aaa You can then generate VHDL code bbrx vhd for this state machine The VHDL code can then be modified for con figuration of a Xilinx black b
18. using the Xilinx EDK toolset and a direct memory interface approach A co processing oriented application can use the hardware platform demonstra tion designs and included tools as a great starting point for prototype design and algorithm development DSP applications are often very difficult to simulate in soft ware so the ability to quickly create a hard ware firmware software platform can cut development time significantly Using the co simulation tools available in the Xilinx tool suite through The MathWorks Simulink and the target hardware is one technique that can dramatically reduce design time Additionally deciding what portions of the algorithm to process in the DSP and which portion to process in the FPGA can often best be done with a trial and error Winter 2004 approach using real hardware to quickly evaluate the performance of various options For example the number of data streams that can be pre processed by an FPGA before post processing by a DSP will depend on many factors the burstiness of the incoming data the accept response rate of the DSP the size of the buffer memories the bandwidth of the sys tem bus and the amount of pre processing allocated to the FPGA These are all diffi cult decisions to make without doing some detailed hardware prototype based analysis DIGITAL SIGNAL PROCESSING MVE The DSP Co Processing Design Kit also includes the following software
19. 2 J 83 Annex B functional block diagram Figure 3 J 83 Annex A C functional block diagram 52 Xcell Journal J 83 in System Generator for DSP The J 83 specification defines the forward error correction FEC and baseband modu lation with pulse shaping characteristics The J 83 Annex B FEC section Figure 2 uses a concatenated coding technique with four processing layers comprising an RS encoder convolutional interleaver randomizer fol lowed by a frame sync insertion block and trellis coded modulation TCM The J 83 Annex A and Annex C Figure 3 have iden tical FEC processing stages comprising an RS encoder convolutional interleaver and a byte to symbol differential encoder followed by a symbol mapper The System Generator Xilinx library or block set is abundantly populated with IP that enables rapid design and simulation of such a system The tokens required to con struct the J 83 FEC section as well as the filter blocks required to construct pulse shaping filters are available within the library browser The underlying circuit of each of these tokens is optimized in area and speed to suit the Xilinx family of devices Each of these elements is conveniently customizable to be compatible with the precise specification of the J 83 standard It is then a simple matter of using these cus tomized library elements to build out the circuit required For example you can obtain the 204 188 RS encoder r
20. 6 759 852 I can t believe you didn t guess this forrest couch siling com 405 579 5270 It was issued to Maheen A Samad in our General Products Division Engineering department So what s the big deal you might ask Although our corporate pride may runneth over our patent ASSISTANT MANAGING EDITOR Charmaine Cooper Hussain count doesn t hold a candle to some corporate giants That may be true but Xilinx was founded WCELL ONLME EDITOR am DE with innovation at its core beginning with Ross Freeman s invention of the FPGA and continuing with innovative practices and ideas many of which are commemorated in the patent hallway at our corporate headquarters ADVERTISING SALES Dian Teie dE Using our R amp D dollars as a metric to measure our efficiency in converting innovation into patents Xilinx as a high tech company ranks second only to IBM Xilinx also ranks 131st in the ART DIRECTOR Scott Blair number of patents held making it one of the most innovative companies worldwide This milestone while not hugely significant in terms of the raw number is more about celebrating the continued innovation from Xilinx both in the form of technology patents as well as business acumen This issue of the Xcell Journal features articles in two key technology areas digital signal processing DSP and embedded processors This issue also includes an article on the new Virtex 4 family of FPGAs which offers three
21. Auto Setup Design ASIG Figure 2 ASIC and FPGA formal verification flows with Formality the best timing in fast run times For designers who want even more flex ibility DC FPGA allows you to fully con trol the synthesis process on a block level This level of control is very useful particu larly when you are trying to gain the last bit of performance from the design or want to carefully control the implementation Formal verification is a key part of a unified design flow because it mathemati 110 Xcell Journal in synthesis will help you meet your most difficult design challenges getting to pro totype quickly DC FPGA is just part of the complete ASIC strength prototyping solution from Synopsys Other tools supported in the Xilinx flow are Formality for formal verifi cation DesignWare Library IP Leda for RIL design and code checking PrimeTime for static timing analysis VCS for simulation Module Compiler for datapath synthesis and HSPICE for analysis of multi gigabit serial I Os Although it is a new product DC FPGA has a rapidly growing base of more than 80 customers For more information about Design Compiler FPGA visit www synopsys com products dcfpga dcfpga html As a customer centric designer and manufacturer of microprocessors flash memory devices and system on chip solutions for the computer and communications industry AMD is pushing the speed limits of today
22. DSP circuit may Ss HOt Seem possible within the abstract world of Xilinx System Generator but on the contrary you can easily realize custom logic by configuring a Xilinx MCode block KS A MATLAB M file from The MathWorks configures the block to emu late the algorithm realized in the file You can attain custom control and more specifically state machines with System Generator through configuration of a Xilinx black box with code generated from Xilinx StateCAD The VHDL code gener ated by StateCAD is emulated within the Xilinx black box block With advanced design and control logic synchronization in DSP circuits also becomes an issue You can realize hand shaking the exchange of control and status information between two blocks in System Generator through delays and enable signals 18 Xcell Journal Winter 2004 I would like to see Xilinx System Generator offer more flexibility with the addition of output enables and other input parameters to offer dynamic configuration of blocks This flexibility comes at the expense of maintaining abstraction if you would prefer not to immerse yourself in the BBRX_End 15 0 details of digital VLSI design filter enable 1 AND add_result 0 Using a Xilinx MCode Block If you are implementing a straightforward logic algorithm configuring the MCode block is an easier solution than building the logic together through Xilinx blockset logic
23. Olax2OY ajqeylene zo JAqUINN f8944Qd W0D XUI IX MMM USIA SUOIN OS 91 qd JNOGE UONEWIOJU 310W 104 o GejIEAe ole SUOI NIOS vo1 qd MEIS pz ww Szy X Szy 09 144 tZ 892 02 892 vz WW 0y X OV HIER 07 925 91 9LS Z1 gtt wu GE X CE ZSLldd EA OI Le ZI ZSE Z ZSE 8 OE ZE mra 096 096 096 ml wwopxor ELSI 0v9 894 EEN 89 0v9 Org 89 wwsexse ga oze oze Sry a gtt Svv Svv OZE svv mme mon Op Op orz 0vz 0vz Ian mes OVLX4Av OOLX4AD O9X4AD Ou OZX4AV ZLX4AD SSXSAV SEXSAY GZXSAv OOZXIAD oo OOLXIAD 08X1Av t Al CAR al SLXTAD saj sow en gege 7SE 006 0S Dp cZUer 9LO ZOIZZTZ rop goe 200 Lg 880 10 S OC Gen t 809 9 7 rL ZLO LS9 6 8rb 8r9 0S 1 962 E98 ly HZ9 8L8 LE Obb LOL vZ OZS SLE SI 089 Z79 ZL ZLE ZEO S Z6E S 287 RRE Re vc UC 91 cL 8 0 SIBALBISURA JES 01 2204 P P P P l E Sa SE E a E SCH 90 DVIN 49U49415 OOOL OOL OL C G C l L L Se Ss SS Sg SC s 20 g 10SS 201d d1 MOd L l l 0 0 0 0 0 0 L l L l 0 0 0 0 Dav s1 auo e 161q 0 bojeuy l l l l 0 0 L l 0 L l l L l l l 0 SIO 10 1U0IN W 4 S S Ra 091 821 8r cE cE cls Ra 821 96 96 96 08 b 79 87 lE SODNS n KKK A Sib PC 882 vez 091 091 DC d 091 087 087 087 PC OE OE vel 091 sJled O I 1 1 U34 Iq XEN 968 GE 9 5 8t oze oze DO Sib DC 096 096 096 89 0v9 el IA UC 01 9 S XEN 8 8 8 P 0 0 P P 0 8 8 8 8 P P P 0 SABPIAIG IO p aypzew
24. Process Process H W Process MBlaze PPC or Other Processor Process OPB FSL or Other Interface CoDeveloper generates FPGA hard ware from the C language software processes and automatically generates soft ware to hardware and hardware to soft ware interfaces You can optimize these generated interfaces for the MicroBlaze processor and its FSL interface or the PowerPC and its PLB interface Other approaches to data movement including shared memories are also supported Impulse C we Application AY Process A Process FPGA Hardware Resources H W HA Process Process H W HA Process Process Virtex or Spartan FPGA Figure 1 Hardware and software test components are mapped to the FPGA target and communicate across the OPB or FSL bus to the MicroBlaze or Power PC processor It may be impossible to test all permu tations from the system perspective so the unit test lets you build a suite to test spe cific areas of interest or test only the boundary corner cases Performing these tests with actual hardware which may for testing purposes be running at slower than usual clock rates obtains real quantifiable performance numbers for specific applica tion components Introducing C to RTL compilation into the testing strategy can be an effective way to increase testing productivity For example to quickly generate mixed software hard ware test routines that run on the both the embedded pr
25. an IBM PPC440GP a Xilinx XC2V3000 FPGA two Cirrus Logic MPEG 2 codecs and a PCI PCI bus bridge The PPC provides general purpose pro cessing It has a dual issue superscalar RISC core with 64 way associative I and D caches It also manages PCI RS 232 IIC and Ethernet I O Chain controlled DMA units are available in the PPC for Xcell Journal 15 Nu DIGITAL SIGNAL PROCESSING moving data between the PCI bus the external peripheral bus EPB the PPC DRAM and I O registers The FPGA handles raw uncompressed audio raw video and raw and compressed I O for the MPEG codecs Part of the FPGA fabric is dedicated to video genera tors and mixers I O multiplexers standard video processing such as scal ing and RAM interfaces About 10 of the XC2V3000 is dedicated to a basic 2 D graphics engine You can use the remainder of the FPGA fabric for custom processing functions The default FPGA internal clock is 100 MHz which matches the clock used for the DRAMs Each of the two MPEG 2 codecs is capable of encoding or decoding ele mentary streams They are independ ent of each other For example in a video application where the raw video is enhanced by the FPGA you can compress both the original and the enhanced video In a communications scenario one codec may be compress ing local video for transmission while the other is de compressing remote video Or you can use the two codecs to decompress video from two disti
26. an interactive graphical block diagram envi ronment with a customizable set of block libraries for signal processing communications and control You can create comprehensive system specifications model channels and other environmental effects These tools also simplify system analysis using quantitative measures such as signal to noise ratio and bit error rate Simulink is integrated with MATLAB providing access tO an extensive range of tools for algorithm development and data analysis Simulink models are hierarchical you can partition them easily into subsystems or components This simplifies compre hension of the design and interaction of Xcell Journal 6 Nu DIGITAL SIGNAL PROCESSING ee Rate 5 8 Bernoulli Encoder Binary TX Error Rate Calculation Rx Binary Data Processing Fixed Point Analysis 16 bits Remove GP Process Sfx U U R C U U R C Group Frames Remove Remove Zeros Sync Preamble and Shift CE Sequence num errors num bits Channel and Estimation Compen Frequency Hopping and Filtering OFDM Transmitter UWB Channel Frequency OFDM Dehopping Receiver and Filtering Baseband Model of RF Front End and Channel Digital Baseband Processing fixed point rxEqSig Remove Pilots 2x Despread Ungroup Symbols and rxEqConst Figure 3 Simulink model of UWB system top with fixed point OFDM re
27. and zero cost solution to eval uating technology In addition to its primary use by engineers the VirtuaLab can help other departments as well Online product evaluation sales demon strations and internal training are example applications of VirtuaLab technology Each VirtuaLab is delivered with a range of tools including a product tutorial or quick start script enabling novices to exercise the hard ware without writing any code or a complete IDE for the expert expecting a full featured experience For more information visit the Nu Horizons Xilinx VirtuaLab at VirtualLab TechOnLine com or contact your local D D J Nu Horizons sales representative Xcell Journal 83 What would you do with advanced access to new technology Ka 2000 Evaluate the new Xilinx Spartan 3 1500 FPGA today Today s engineering environments consist of limited resources tight schedules and dramatic learning curves Waiting for high volumes of Silicon production can add to the frustration of completing a design and being fi st to market with your solution That s why Nu Horizons and Xilinx have partnered to provide engineers with a complete online evaluation and development environment using TechOnLine s VirtuaLab technology Save time Be innovative with a zero cost solution Check it out TechOnLine s VirtuaLab goes beyond the software simulation usually associated with Web based design Hardware and software
28. been an ASIC But an ASIC was not feasible for this project for several reasons First this project had a very tight sched ule An ASIC could not have been com pleted in the time allotted Second the volumes of the components in this sign are not of sufficient volume to hide the NREs of an ASIC Third an ASIC lacks the development opportunities of an FPGA To me as an engineer this rea son is the most important No matter how much simulation you perform there can always be unexpected bugs In an ASIC these bugs are expensive in an FPGA they can be fixed easily Another FPGA advantage is that it can meet future needs through feature upgrades an ASIC cannot The recon figurable nature of Xilinx FPGAs allows us to provide feature upgrades and bug fixes to the customer via e mail making it easy for them to apply to the sign Through an Ethernet inter face the FPGA reprograms the Platform Flash configuration PROM and automatically reboots e Video processing requires a large num ber of multiply operations The video processor must perform color space conversion and apply calibration coeffi cients in real time It would require a large portion of FPGA logic resources to build multipliers Instead this can be done very efficiently by utilizing the embedded multipliers Building pipelined processing structures with the embedded multipliers allowed us to eas ily meet the processing requirements 12 Xcell Jou
29. both fit into a single Virtex II block RAM configured as 1K deep x 16 bits wide We used some extra FPGA fabric for shift registers to handle escape codes for run level values not included in the Huffman code tables The ISDSM block handles the functions of inverting zigzag scanning dequantization and scaling The iDCT was the easiest block to design it is included as a standard core in the Xilinx ISE CORE Generator package The format converter assembles the Y Cb and Cr sample blocks into slices in a slice assembly RAM buffer comprising 16 block RAMs The slices are then scanned out line by line and the lines are wrapped EPBIn EPBOut Winter 2004 VL Decoder DIGITAL SIGNAL PROCESSING MVE lf we had to develop the board all of the associated software and all of the IP that went into the low latency decoder and display system would have taken years instead of months in CCIR 656 start and end active video SAV EAV marker codes We used an address rotation technique so new blocks can be assembled in the buffer as soon as a single line is removed allowing the pipeline to run continuously without having to double buffer the slice assembly RAM Results The original unoptimized MPEG 2 codec chip external to the FPGA had a latency of 1800 ms Working with the codec chip manufacturer we reduced their latency to 45 ms The I frame decoder we developed using the Xilinx FPGA and PPC has a latency o
30. clock speed We need a framework to receive digital ly sampled analog input signals from an ADC as stimuli into the Simulink model either as real time streaming data or as cap tured data for repeatable playback We also need our framework to deliver processed data from the Simulink model to a DAC digital to analog converter to produce an analog output signal Winter 2004 bad 9 4 MHz Figure I Memec P160 Analog Module The Memec P160 Analog Module is a daughtercard to interface external analog signals to Memec s wide assortment of Xilinx FPGA development boards In this article well present design techniques using the Memec P160 Analog Module in Simulink DSP models making creative use of several new blocks from Xilinx System Generator version 6 3 If you are an FPGA designer these techniques offer practical starting points providing you with a head start on your DSP development using external analog signals Memec Simulink Library The Memec P160 Analog Module is shown in Figure 1 It provides two channels of analog I O through 12 bit data convert ers from Texas Instruments e Two 165 megasamples per second DAC902 DACs driving single ended analog outputs e Two 53 megasamples per second ADS807 ADCs The digital data out of the ADCs is latched into external buffers and then passed to the FPGA through the P160 interface The Memec P160 Analog Module DAC and ADC blocks are delivered as a Si
31. completely formed These features of the OBS networks are Z similar to an optical circuit switched net work Like an optical packet switching net work an OBS network can dynamically control system resources assigning wave lengths of optical fiber to individual data d ki KR D a ga i k R 4 a TS 8 i i a ec F T tt S KN ah wi Sg Te le S s e 6 Z T E i 8 d r 8 em a m a e e Laf j PF bursts only when that user needs to trans mur dap Unlike some optical packet switched networks an OBS network does not require optical buffers Winter 200 V T Xcell Journal 29 g EMBEDDED SYSTEMS The MCNC RDI has developed a NASA funded OBS protocol implemen tation called JIT Just In Time which recently achieved successful testing in an ATDnet Advanced Technology Demonstration network testbed Established by the Defense Advanced Research Projects Agency DARPA for demonstrating advanced networking tech nology the all optical ATDnet runs at 2 5 Gbps through six sites using eight wave lengths and wavelength division multi plexing WDM switches The testbed included applications in multiple areas like optical networking network security and networked information systems Technology Overview WDM is a method of transmitting data from different sources over the same fiber optic link at the same time each data channel is carried on its own unique wavelength The
32. data will be output in raster scan format you can create the destination address with two counters representing the two dimensions These values must be cen ter adjusted around zero before being transformed At this point any adjustment changes the center of rotation for the out put image The following two examples demon strate manipulating the output image by using minimal extra logic e Two adders are all that are required to add pan control for the output image e Using two multipliers you can scale the destination addresses to zoom in or out from the center point As interpolation occurs at a later stage zooming and panning at this point will result in an output image with each pixel uniquely interpolated versus a simple copy or additional interpolation procedure later Figure 2 demonstrates a 45 degree counter clockwise rotation with 8x zoom Once the destination address is correct for center pan and zoom the transform is applied The resulting addresses are reverse Winter 2004 corrected for the centering function The numbers are of the form integer with a dec imal portion The integer portions are the required source address locations to read from the frame buffer The decimal por Figure 2 Zoom and pan incorporated into the algorithm Figure 3 Original picture for subjective comparison Figure 4 Image rotated counter clockwise 7 5 degrees followed by rotation clockwi
33. deliver cost optimized solutions for the widest range of high performance electronic systems e Integrated IP reduces the customer s bill of materials and saves FPGA resources Virtex 4 FPGAs reduce system cost with abundant integrated IP By incor porating many functions that find use in a broad range of applications Virtex 4 FPGAs replace a number of discrete components commonly found on sys tem boards Designers can take advantage of embedded PowerPC processors up to 10 Mb of embedded dual port RAM FIFO integrated Ethernet MACs sophisticated DSP circuitry and on board serial transceivers among other features This helps our customers lower system cost in several ways by reducing component count and streamlining logistics with a small er bill of materials by simplifying the design and manufacturing of system hardware by easing PCB design and manufacturing and by improved sys tem reliability through the reduction of solder joints In addition building dedicated cir cuits on the FPGA provides required functionality efficiently while pre Winter 2004 serving the programmable logic fabric for customers to add the value of their proprietary designs The result is more capability within a single package at a given price point Up to 80 Additional Cost Reduction with EasyPath The EasyPath program further lowers system cost for customers who are ready to take their finished design to
34. delivered as an Embedded Development Kit EDK reference system The reference system as described in Xilinx Application Note XAPP536 lever ages a multi port DDR SDRAM memory controller to allocate memory bandwidth between the PowerPC processor local bus PLB interfaces and two data ports Each data port is attached to a direct memory access DMA controller allowing hard ware peripherals high bandwidth access to memory A MontaVista Linux port is available for applications requiring an embedded oper ating system while a commercial standalone TCP IP stack from Treck is also available to satisfy applications with the highest bandwidth requirements System Architecture Memory bandwidth is an important consid eration for high performance network attached applications Typically external DDR memory is shared between the proces sor and one or more high bandwidth periph erals such as Gigabit Ethernet The four port multi port memory con troller MPMC efficiently divides the avail able memory bandwidth between the PowerPC s instruction data PLB interfaces and a communications direct memory access controller CDMAC The COMAC provides two bi directional channels of DMA that connect to peripherals through a Xilinx standard LocalLink streaming inter face The CDMAC implements data re alignment to support arbitrary alignment of packet buffers in memory A block diagram of the system is shown in Figure 1 The Loc
35. design solutions in both PCB and FPGA so it stands to reason that we d create the only truly integrated system flow Our unique class of tools empowers designers to concurrently design FPGA s and PCB s Increase your system performance and design team productivity Get the systems integration white paper at www mentor com techpapers or call 800 547 3000 2004 Mentor Graphics Corporation All Rights Reserved Mentor Graphics is a registered trademark of Mentor Graphics Corporation DSP DIGITAL SIGNAL PROCESSING Taking Digital Signal Processing fo the Extreme In this series on digital signal processing the Xcell Journal spotlights the challenges of and solutions to developing extremely high performance DSP applications m ed vn mn mm zf wf by Omid Tahernia Vice President and General Manager DSP Division Xilinx Inc omid tahernia xilinx com Programmable DSP processors are growing tremendously and are being implemented in a variety of applications To keep pace with this explosive growth Xilinx is expanding and enhancing its XtremeDSP processing solution We ve added dedicated DSP ele ments into our FPGAs to make it easier and more cost and power efficient to achieve performance levels previously only possible in custom ASICs The Xilinx solution is the perfect match with programmable DSP processors The Xilinx tool suite allows you to very easily develop massively parallel digital signal processing engines t
36. for the MicroBlaze processor Evaluating Nucleus PLUS in EDK The Accelerated Technology Nucleus PLUS evaluation software provided in the EDK Platform Studio 6 31 shipment includes a limited version LV of Nucleus PLUS This is a fully function al version of the RTOS compiled into a library format rather than the normal source code distribution with the single restriction that it will stop working after 60 minutes facilitating evaluation of its full functionality When you purchase a full license of Nucleus PLUS from Accelerated Technology you receive the full source code and obviously the 60 minute run time restriction is lifted The LV version of Nucleus PLUS is configured to execute from the off chip SRAM or SDRAM module Once you have a full license to the RTOS you can configure it to run from any memory in your system Nucleus PLUS is a scalable RTOS only the software you use in your design is included in the downloaded code This may be contrasted with other larg er more static systems which consume far more system resources In some cir cumstances the whole RTOS and appli cation can fit in the on chip memory thus achieving high performance and low power consumption Even with larg er applications which may utilize exten sive middleware the efficient use of the relatively small amount of on chip mem ory means that the size of the kernel footprint is an important consideration You can configure oth
37. higher level system Between these gateways you must use blocks from the Xilinx library blockset or import your own code through the black box interface The Xilinx blockset library comprises basic elements math functions DSP func tions communications blocks control logic and other useful elements Each block is fully parameterizable and a tight integration with the MATLAB workspace allows you to enter parameters based on complex equations or variables defined in the workspace Equations such as this one acc_nbits ceil log2 sum abs coef 2 coef_ width_bp data_width 1 define the precision required for a filter as a function of the filter taps coefficients the number of taps and the coefficient width Because these are fixed at design time it s possible to tailor the hardware resources to Xcell Journal 57 IVE DIGITAL SIGNAL PROCESSING the filter specification As you change coef ficients no modification will be required to the design because the output precision will automatically be recalculated This capability is only available with tools tightly integrated with MATLAB The Xilinx blockset library will not always meet your needs and some functions are better suited for HDL implementation such as complex state machines or complex legacy code The System Generator black box allows you to bring VHDL Verilog and EDIF netlists into a design In the QAM design example you can imagine repl
38. inter process communications net works within and between FPGAs at a conceptual level and automatically gen erate synthesizable FPGA code to repre sent them This significantly reduces the time spent designing the communica tions element of an application enabling you to concentrate your efforts on the parts of an application where your expertise lies delivering solutions to cus tomers faster 104 Xcell Journal Winter 2004 System Communications Developing applications to run in FPGAs has become easier in part because of the advances made in design flows tools and general awareness of how to program FPGAs Tools such as Xilinx System Generator and other high level implemen tation methodologies enable developers to quickly translate their algorithms from math level functions into working FPGA algorithm blocks Once developed connecting these algorithm blocks together is a complex and error prone task Even more complex is the connection of algorithms in multi ple FPGA applications and the communi cation with external interfaces and backplanes This interconnectivity inside the area outside and between FPGAs is termed sys tem communications and can consume the vast majority of design time in many applications distracting developers from their key expertise in the application being implemented High level algorithm design tools do not generally make provisions for implementation of system communic
39. is invaluable if your engineering staff wishes to preserve any existing investment in proven re usable HDL code System Generator blocks dont expose the system clock directly Nevertheless as digital designers we sometimes prefer to see digital waveforms referenced to the clock especially for logic that drives signals onto FPGA pins to off chip For this rea son the Memec P160 analog DAC block is built using a black box Figure 3 shows a model in which we drive the DAC block with a sinusoid sig nal stored in a ROM look up table The ModelSim waveform window opens auto matically during simulation displaying all inputs and outputs of the black boxes and all clock and clock enable signals sup plied by System Generator The signal display can be customized with an auxil tary Tcl script P he Se DG pa ADC H FH ele Haze O g Kr D n Less ri B Aen Hie Figure 2 Memec Xilinx DSP library in the Simulink library browser Meme PISO Analog DAC im HOL co samulation E of S20 ASS e ee ZEN CC CAE Figure 3 Memec P160 Analog DAC in HDL co simulation Xcell Journal H DIGITAL SIGNAL PROCESSING Hardware Co Simulation with the ADC You can synchronize a System Generator hardware co simulation block with its asso ciated FPGA hardware in one of two clock modes In single step mode the FPGA is clocked from Simulink in free running mode the FPGA runs off an internal clock and is sampled a
40. live in hardware Equalized 16 QAM demodulator including the adaptive filter The receiver architecture provides subsys tems that demonstrate adaptive chan nel equalization and carrier tracking on a random QAM data source The Spartan 3 2000 platform delivers acquisition conversion capability through two high performance plug in modules and two mid range performance platform solutions ADC Platform Solution The Nu Horizons Spartan 3 2000 evalua tion platform includes a mid range ADC Kap iwim AS DF j SMSCORCLIN Figure 1 Nu Horizons Spartan 3 2000 evaluation platform Winter 2004 TechOnLine Spartan 3 2000 Target VirtualLab HIE GC Target Server Agilent 16 bit 150ksps Signal lt a gt 250 Khz 3Ghz D be E Agilent 4 1 GS Sec 4Ch Intersil 125 MHz D A LAB Control Drive Space Read Only Drive Space lt _ gt Drive Space Customer Figure 2 TechOnLine VirtuaLab block diagram on the main platform You can use the Linear Technology LITC1865L 16 bit 150 ksps ADC in ratio metric applications or with external references The high impedance analog inputs and the ability to operate with reduced spans down to 1V full scale allow direct connection to signal 5208 5208 8 moon A 5 5 SS A EEE Hu Dis 12bit pe on an ae jay Lon ue mm 12bit 65 Msps IR EUN Dn EON D Ke H 5 ad rm 2
41. of SRL16 as a series con catenation of 16 flip flops with a pro grammable tap point This unique aspect of Xilinx FPGAs is extremely powerful for building very efficient time division mul tiplexed TDM hardware that you can use for example to process multiple channels of data Because they run the design at a faster rate TDM processing structures save resources This has been notably exploited during the design of an optimized multi channel group of modulators For exam ple in the design of the optimized four channel granularity of a group all channels share a common control struc ture in the MPEG framer RS encoder interleaver randomizer and TCM As the interleaver controls are shared the data path into and out of the interleaver effec tively becomes wider Number of Channels Four Channel One Channel Granularity 3372 8 1 6764 16 2 10049 24 2 Granularity 1866 2 1 3644 4 1 5405 6 1 Resource Utilization Using the resource sharing techniques we ve described thus far you can realize sig nificant savings in the implementation of modulators constructed out of optimized four channel granularity designs compared to the equivalent constructed out of single channel granularity designs Table 1 and Table 2 show a comparison of the resources used in the design of various sizes of J 83 modulators using single and four channel granularity footprints They also show the resources used to implement 4 8
42. other tool offers features such as HDL co simulation hardware co simulation and integration with the ChipScope Pro tool and EDK which are invaluable and only available in Xilinx System Generator for DSP Winter 2004 Complete Software Radio Solution ona single 6U card Features gt TMS320C6416 DSP gt 105 MHz 14 bit 2 Ch Analog UU gt 32 MBRAM gt 6 Million gate Virtex II FPGA gt 32 64 bit cPCI bus gt PMC expansion site gt STAR Fabric interface SS el a Get your data sheets now Quixote www innovative dsp com quixote Inno vative sales innovative dsp com In fegr a tio n 805 520 3300 phone 805 579 1730 fax real time solutions Virtex Il is a trademark of Xilinx Inc TRAQUAIR eww iiragealicam The High Performance DSP amp FPGA Solution Modular HERON Systems using Xilinx FPGAs HERON Systems offer the ultimate in high performance Virtex II Pro and Virtex II FPGAs signal processing hardware capabilities for use in PCI Single and Multiple FPGA configurations CompactPCI and embedded real time environments A modular and extensible hardware architecture allows Optional DSP Processors developers to realize their ideas and address complex system Texas Instruments TMS320C6000 DSPs requirements using high density Xilinx FPGA technology N i ane ait i Multi Channel Analog amp Digital I O Xilinx Virtex II Pro and Virtex II FPGAs powerful Texas 12 bit A D up to 210MSPS per channel Instrum
43. platforms are capable of supporting cutting edge services for a reasonable number of subscribers But they lack the robustness required of a true carrier class solution For scalability and robustness therefore Newport Networks decided to implement a significant proportion of the functionali ty in custom hardware But the new gate way also had to retain that crucial flexibility to remain protocol agile and easy to man age key attributes in delivering a low overall cost of ownership for IP carriers For a network to be easily managed operators must be able to perform routine maintenance and apply periodic upgrades without visiting on site to change cards introduce additional logic or implement hardware links to support test functions In the IP world new services emerge and evolve quickly calling for frequent func tional upgrades The imminent widespread adoption of Internet Protocol version 6 IPv6 will also Winter 2004 bring great implications for IP system flex ibility Adoption of IPv6 has already begun predominantly by carriers in the Far East Standard Processing Hardware Newport Networks has introduced the 1460 session controller to enable network operators to capitalize on the opportunities presented by the IP services revolution At its heart are three distinct functional cards that perform line interfaces application processing and switching management functions respectively Interestingly a s
44. platforms optimized for logic DSP and embedded processor applications And speaking of innovation the Virtex 4 family includes more than 120 new and of course patented features many of which are specific to supporting high performance signal gie Ine processing and embedded processors 2100 Lagic Drive Wa a With the launch of the Virtex 4 multi platform FPGA family the Xilinx vision expands to encompass programmable systems which include logic embedded processing and very high performance digital signal processing As illustrated in the many articles in this issue programmable technologies provide 7004 lp Inc Al rights reserved ANS the Slm Logo and otherdesignated brands included herein are trademarks of Xiling Inc PowerPC is o trade mark of DA Inc All other trademarks are the property customers further flexibility and performance benefits to inspire innovation of their respective owners The articles information and other materials included in this issue ore provided solely for the convenience of our readers al makes no warrondies express implied statutory or otherwise and accepts no liability with respect fo any such articles informaation or other materials or their use and ony use thereof is solely atthe risk of the user Any person or entity using such information in omg Forrest Couch way releases and waives any claim it might have against Managing Editor Ailing for ony loss damage or e
45. processors and show the name given to each processor in the top banner In the case shown in Figure 7 the execution sites are named MBI1 and MB2 When you select a command from a pull down list by clicking on it the command is directed to the processor assigned to the window in focus You control the set of windows open for each processor through a pull down menu The choice of open windows is controlled by your selection of new windows to view The result is an easy to use intuitive user interface Conclusion Getting the right tool set and develop ment environment set up for a new FPGA project is critical to the success of the product development cycle A highly productive development and debug environment based around Nohau tools supports these new mul tiprocessor systems with an extension of the same powerful debug and test tools the company has offered for the last 20 years For more information please visit www nohau com www iq service com or e mail darrell iq service com or darrellw nohau com e LOL Services is under contract to sup port and market platform FPGA tools for Nohau Corporation and performs custom start up engineering for plat form FPGA embedded system designs Xcell Journal 25 EN Mere DBE BSS MES A Scalable Software Defined Radio Development System Sundance enters the SDR fray with a Xilinx based platform Xcel Journal by Flemming Christensen Managing Director Su
46. requirements and test cases and identifies untested portions of models Ability to include a subset of the MATLAB language in Simulink mod els and automatically generate embed dable C code New products for Model Based Design of signal processing and communica tions systems including Filter Design HDL Coder for gener ation of VHDL and Verilog code for fixed point filters Fixed Point Toolbox for design and verification of fixed point algorithms and analysis of fixed point data in MATLAB RF Toolbox for design and analysis of networks of RF components RF Blockset for design and simula tion of RF system and component behavior in end to end Simulink wireless system models Video and Image Processing Blockset for design and simulation of embed ded video and image processing systems Link for ModelSim for co simulation and verification of VHDL and Verilog using Mentor Graphics ModelSim Conclusion There is growing acceptance of Model Based Design as the way to handle com plexity in embedded hardware and software systems The MathWorks Xilinx alliance has enabled the design and implementation of high performance DSP systems within the Simulink environment reducing design and schedule risk while capitalizing on the potential of FPGAs for advanced signal processing applications For more information Visit www mathworks com products dsp_comm for technical litera
47. result is a link with an ageregate bandwidth that increases with the number of wavelengths employed In this way WDM technology can maximize the use of the available fiber optic infra structure what would normally require two or more fiber links will now require only one WDM technologies primarily differ in the number of available channels Coarse wave division multiplexing CWDM combines as many as 16 wavelengths onto a single fiber dense wave division multi plexing DWDM combines as many as 64 wavelengths onto a single fiber With DWDM technology the wave lengths are closer together than CWDM meaning that transponders are generally more complex and expensive than CWDM However with DWDM the advantage is a much higher density of wavelengths and also longer distance DWDM is emerging as a preferred solution for providing scalable and efficient optical networking technologies of the future The key objective of the hardware based OBS protocol implementation is to dynamically manage commercially avail able WDM switches An OBS network comprises OBS network controllers and clients with OBS network interface cards 30 Xcell Journal CALLING HOST CALLING SWITCH CALLED SWITCH CALLED HOST SETUP SETUP ACh TUP OPTICAL ae S BURST GE oe CONNECT RELEASE SCH wf CROSSCONNECT CONFIGURED FOR EXPLICIT RELEASE CONNECT RELEASE 1 hs Figure I JIT signaling scheme NICs OBS network controllers direct th
48. separate FPGA boards and brought back into Simulink using System Generators hardware co simulation fea ture You are now simulating an entire sys tem from Simulink with 10 to 100 times faster simulation time instantaneously val idating the functionality in the FPGA Interactively Control Hardware You can also go one step further by access ing the FPGA while running the simula tion In this QAM example the amount of Doppler content introduced by the chan nel can be controlled interactively during simulation a slider bar shifts the carrier phase of the modulated signal Figure 5 A significant adjustment to the slider invari ably causes the receiver to lose lock Interactive control over Doppler provides a simple yet powerful way to test the func tionality of the receiver s control systems Designers now have a simple way to sim ulate complex designs that require millions of samples that without hardware in the loop would take months to simulate This among other features is something that only System Generator can offer With other DSP design methodologies you are required to verify designs in multiple design environ ments a complicated process resulting in significantly slower simulation times Conclusion System Generator is a mature tool that allows algorithm development imple mentation simulation and verification in an environment understood by most designers Although there are other design flows no
49. seyd 02 CL l 8 P P 8 8 Y al Gli al Se 8 8 8 Y INDG ssabeueyy Pop e461 dt 89L 9 h Weg v2Z l 819 le Dep PO d GK 81 S 0ZE v ER ER 8ZL L 962 1 798 S 1q NYY 2014 2301 SC pl EK 088 9S Sbt O0 y90 7S1 76S 0LL 079 08 v06 6S ZP ly 76L bZ ii 21607 OVLXAAVIADX OOLX4SAVIDX O9XSAVADX OVXAAVIDX KLOER OOCXTIAPADX O9LXTAVIDX OOLXTAPIDX O8XIAPADX OIXIAVIDX OVX IAVIDX SUOIIN OS yredAsey OVLXAAV OOLX4AAD O9XSAV OVXIAV CIA lux i AYIAIBUUOD jenas 3 Hulssad01g p pp qwu q Xd v X HIA Hulssar01g jeubIS XS Y X MIA 21607 X1 Y X MIA pee XL UOIPDIIaS PNpold SADIABP WOD XUIJIX MMM 0 Y SYDd4 t X MIA XUIJIX nsAueduuo gt 21507 ajqewwesbolg dU XNNIX Winter 2004 Xcell Journal 113 Ju DIEN Suiseypeg p8pioddnsjuos xux mnn duq qp4vos RADIO HEIN He sujos ifuosjuos xuyix mnmnjj dug SIITAIIG EGO spivog 22212 Jourdoyaasq sula sh BLIS pue uonemnyuo Xcell Journal pouasdyuos xux mmnjj dug psyuos xuyix manjj dug TEE mnn duq souarajoy dI STEAIFOS smd OW P WOdA eut Hurmoyjoy ou ein aspayd sponposd XUIY jp uo suomynads pnposd pun UOIDUNOJUI S340 out 104 W y yOoqe ep OjulzAed Wod xuIjIXMMM d 4 e PUNO S aaYs ezep 9dIAapP ay YIM JUBWINIOP SIY Ul e ep ye Aya zuezodu 9944 0 WOD XUl IXMMM Usel SUOIN OS 9944 qd INOGE UOHEUWJOJU 310W 104 s Ge IEAL 31e SUO IN OS 3914 Ad YOOZHZ jqejene aq II g ueyeds JO suonjos appIp D IANOWOJNY E
50. start with a block diagram DSP architects and FPGA designers have two completely different back grounds yet they must work together to n SC b Ee gt create an optimum product Their focus ook E Sg l and expertise do not overlap and as a result they often have difficulty communi cating The team must verify that the FPGA implementation does indeed match the original specification given by the DSP architect and usually they must modify the DSP algorithm to obtain the best possible implementation in the FPGA This requires a constant exchange of informa tion about simulation results design size design performance DSP algorithm changes and implementation results throughout the design process 56 Xcell Journal Winter 2004 DIGITAL SIGNAL PROCESSING MVE The tool provides high level abstractions that are automatically compiled into an FPGA at the push of a button Deciding on a single tool and a lan guage that meets the requirements of the whole design team can be difficult espe cially when budgets are low and turn around times are short What could be better than an environment understood by the entire team using a single source code without writing a single line of HDL code or even looking at the Xilinx ISE tools To highlight the System Generator design flow we will use a Quadrature Amplitude Modulator QAM system design example Figure 1 implemented according to the specifications provided by A
51. tens of millions of dollars in up front NRE investment and ASIC designs are risky because you not only have to do the logic design you must also do the physical design This can only be justified for high volume low cost applications With Xilinx you can do a system on chip design with no NRE And because the chip itself is already designed and debugged you dont need to worry about physical design issues such as crosstalk and power distribution All you need to do is develop the logic design which can be quick and easy using our growing family of IP and development tools that solve many complex design problems for you Basically now we can offer a system on chip for the masses because now we have the advantages of an ASIC in a flexi ble and programmable device Now you can create a single chip that includes DSP and embedded processors along with IP and custom logic for much less cost and no risk All these programmable technolo gies available on a single device give you P a significant advantage Xcell Journal 1 FREE TECHNICAL WORKSHOPS REGISTER NOW AT WWW XILINX COM PW2004 we SE 2 ie tou a 98 gt Don t miss the industry s focused on today s most hottest new programmable critical design challenges See technology with in depth workshops the agenda below and register today Connectivity Solving high speed serial design challenges Designing Serial Backplanes with Xilinx Solutions M
52. that speed so the frequency must be reduced by converting serial data on each channel to parallel data as it enters the device Conversely transmission requires converting parallel data to serial format Traditionally this process involves multiple stages of dividing down or mul tiplying up the speed The steps required to meet the setup and hold require ments are laborious and time consuming ChipSync technology simplifies design and boosts performance with an embedded SERDES that serializes and de serializes parallel bus interfaces to match the data rate to the speed of the internal FPGA circuits ChipSync technology enables data rates greater than 1 Gbps for differential I O and over 600 Mbps for single ended I O This ability simplifies the design of interfaces such as SPI 4 2 XSBI and SFI 4 as well as RapidlIO and Hyperlransport Each channel and clock follows a slightly different route through the printed circuit board Ensuring reliable data capture requires satisfying the setup and hold times of each channel With communication interfaces of eight channels and higher and with memory buses up to 144 bits wide this can be an extremely challenging task ChipSync technology simplifies the implementation of communication and high speed memory interfaces including DDR 2 SDRAM QDR II SRAM FCRAM II and RLDRAM II by compensating routing issues that produce skew between data and clock signals Built in circuitry enables the
53. the processing FSM is responsible for taking a message and pro cessing that message Several processing sub modules can be activated by process ing FSM as needed such as a hashing module or a state machine module Figure 3 diagrams the processing of mes sages in the JIT engine Conclusion We believe that communications will be bi modal within the next 25 years All land lines will be optically based with optical access to the user or device that is a client of the network All backbone con nections will be across optical trunks Networking will be predominantly imple mented in the optical layer with little or no additional layering above it Optical networks will be mostly a transparent transport media for applications To meet the increasing demands of bandwidth and cost reduction several technologies in the optical communica tions paradigm have been under inten sive research Just In Time signaling applied to the optical burst switching paradigm has the promise of being able to provide either cir cuit or packet switched services JIT OBS implements the best of optical circuit switching and optical packet switching but avoids their shortcomings JIT signaling aims to better utilize the variable parame ters that can exist within both an optical and a wireless network such as frequency availability and data rate differences For more information on the research conducted by MCNC RDI in the field of optical net
54. the same model System Generator for DSP The Xilinx System Generator tool suite was employed to implement a majority of the J 83 modulator design System Generator is a visual dataflow design environment based on The MathWorks Simulink visual modeling tool set This programming interface allows you to work at a suitable level of abstraction from the target hardware platform and use the same model not only for simu lation and verification but also for FPGA implementation System Generator blocks are bit and cycle true behavioral models of FPGA Cable Head End Web amp Application Servers Internet Backplane e g Fibre Channel MPEG Encoders Remultiplexers StatMuxes intellectual property components or library elements A library based approach results in design cycle compression in addition to generating area efficient high performance circuits Together with model features such as data type propaga tion and the extensive virtual instruments that are part of the Simulink libraries the environment facilitates rapid design space exploration together with powerful mechanisms for model debugging MATLAB from The MathWorks programmatically generate scripts custom VHDL and project files based on user defined parameters Customer Premise Video Servers Modulators Transmitters Figure 1 Cable network J 83 modulator fits in the cable head end modulator transmitter block Figure
55. tools as evaluation versions from the Xilinx XtremeDSP Software Evaluation CD Kit Xilinx ISE 6 2 Foundation ChipScope Pro Xilinx System Generator for ISE 6 2 The MathWorks MATLAB and Simulink Video DSP Design Kit The Video DSP Design Kit targets simple DSP oriented video applications in the industrial security consumer and automo tive markets Algorithms for video process ing like image recognition video encode video decode and video image enhance ment are all very difficult to prototype and evaluate without actual hardware on which to run the software or firmware Using a DSP Design Kit with some simple video capabilities can make it much easier and quicker to prototype and evaluate various algorithms and architecture alternatives The Video DSP Design Kit features a Xilinx Spartan 3 XC3S400 FG456_ or XC3S1500 FG456 FPGA Platform Flash configuration PROM expansion connec tors 32 bit PCI edge connector 10 100 Ethernet port video DAC RS 232 con sole PS2 keyboard and mouse ports sim ple analog I O 1 MB SRAM 256 Kb serial EEPROM and a variety of user switches and LEDs et Figure 2 DSP processor adaptor module Xcell Journal 49 No DIGITAL SIGNAL PROCESSING The kit also includes example designs and user documentation to make it easy to get started on a new video DSP design Several Xilinx application notes and refer ence designs some using Xilinx IP cores available from the DSP
56. up and running quickly with your first Nucleus based system the instal lation also includes a sample pre built ref erence design with a compiled Nucleus PLUS demonstration The pre built refer ence designs currently support the Memec design based DS KIT 2VP7FG456 and DS KIT V2MB1000 FPGAs This is the fastest method to employ for a sample Nucleus based MLD enabled Xilinx system Use of Xilinx s Base System Builder is also well documented inside the applica tion notes accompanying the installation With the Base System Builder you can build a variety of system core configura tions to work with the Nucleus PLUS RTOS see Figure 2 If you have received your EDK 6 31 update recently or have purchased a seat please check the contents for this evalua tion After running the Nucleus PLUS evaluation disk installer the necessary files will be placed into the Xilinx EDK 6 31 and the support of Nucleus PLUS will be automatically added The elements of Nucleus PLUS modi fied by the data generation file Tcl for specific hardware configuration are e The number and type of devices used by the hardware designer e Memory map information e Locations of memory mapped device registers e Timer configuration e Interrupt controller configuration Once you have installed these you can use Xilinx Platform Studio EDK with Nucleus now visible in the RTOS pull down selection menu See Figure 3a for the PPC405 and Figure 3b
57. volume pro duction Xilinx creates customized test pro grams for EasyPath customers that exercise only the device resources used in the specif ic design This approach shortens test time and increases yield to reduce FPGA unit price up to 80 Source Synchronous Interfacing To ensure reliable data transfer between a new generation of high speed devices hard ware designers are turning to source syn chronous design techniques in which the component sending the data generates and issues its own clock signal along with the data that it transmits This technique elim inates one set of problems associated with parallel interfaces but introduces its own circuit design challenges ChipSync tech nology significantly simplifies component interface design with critical built in cir cuitry that is available in every Virtex 4 I O see sidebar Virtex 4 Solves Source Synchronous Design Challenges Embedded Processing Embedded developers have already used Xilinx processor solutions to create thou sands of designs As we talked to these developers about the requirements for their next generation systems several common themes emerged A Full Range of Processing Solutions Engineers need a range of processing solu tions to match the requirements of different tasks ranging from simple control func tions to advanced algorithms and high speed calculation In addition they want the different solutions to share a common desi
58. wheel we chose Ethernet as our data distribu tion medium Our video processor has multiple Gigabit Ethernet ports that interface to the sign Gigabit Ethernet can be transferred over fiber optic cable allowing great distances between the con troller and the sign itself We were able to use off the shelf switches to distribute the data within the sign and put inexpensive 10 100 Ethernet ports on the individual distribution boards The avail ability of Ethernet protocol analyzers such as the open source project Ethereal allowed us to easily analyze and debug the system Sign Control In advertising time is money thus it is cru cial to monitor the sign at all times The control system monitors temperatures throughout the sign to ensure that adequate cooling is present Voltages are monitored to detect malfunctioning power supplies The control system maintains error and resend counts to detect faulty data links It also pro vides an interface to upgrade the FPGAs remotely for enhancements and bug fixes Xcell Journal 11 iy EMBEDDED SYSTEMS The recontigurable nature of Xilinx FPGAs allows us to provide feature upgrades and bug fixes to the customer via e mail The Benefits of Xilinx Devices Xilinx devices include a large number of features that are ideal for our sign project e The reconfigurable nature of Xilinx devices is necessary for a project like this Without FPGAs the only alterna tive would have
59. 2 2 2 2 2 2 2 2 6 uaa LTC1743 A ad 5 5 Table 1 Linear Technology high speed acquisition modules 25 80 Msps A ad 0 0B J 520B K 5 5 ISL5629EVALI ISL5729EVALI ISL5829EVALI ISL5929EVALI Table 2 Intersil high speed conversion modules Winter 2004 Ain lt 40 MHz in gt 40 MHz Ain lt 40 MHz Ain lt 40 MHz Ain lt 40 MHz Ain gt 40 MHz Ain lt 40 MHz Ain gt 40 MHz Ain lt 40 MHz Ain gt 40 MHz Ain lt 40 MHz sources in many applications eliminating the need for external gain stages High Performance A D Acquisition Module The high performance ADC acquisition modules interface directly to the Nu Spartan 3 2000 evaluation platform also pro Horizons vided by Linear Technology These ADC development boards offer 12 14 bits of resolution and sampling rates from 25 40 MHz The Linear Technology ADC development boards see Table 1 are designed to digitize high frequency wide dynamic range signals These boards target applica tions such as telecommuni cations digital imaging spectrum analysis and cel lular base stations DAC Platform Solution The Spartan 3 2000 evaluation platform includes Linear Technologys LITC1654L 14 bit 8 conversion rate DAC The LT C1654 is a dual rail to rail voltage out put 14 bit DAC that includes output buffer amplifiers and a flexible serial inter face The LTC1654L has two programma ble speeds a fas
60. 2 PHASE ES SS DUAL CONTROLLER ot Nt ee BE EE EE Vom 2 9V 2A Vout 1 8V 2A INPUT CAPACITOR RMS CURRENT 2 3 4 5 6 7 8 amp 8 9 410 INPUT VOLTAGE V Figure 6 RMS input Current Comparison frequency at 550kHz but can be externally synchronized from 300kHz to 750kHz The LTC3736 provides output tracking for controlled ramp up of two supply rails programmable current limit output overvoltage protection power good output and selectable Burst Mode operation for high effciency light load operations The device is available in the tiny 4mm x 4mm thermally enhanced QFN package or the 24 lead SSOP package LTC3708 2 Phase Dual Synchronous DC DC Controller for 15A Loads The core supply voltages of the latest Xilinx FPGAs have decreased towards 1V The Virtex II pro family requires 1 5V Vect and the Spartan 3 family ELECTRONICS CORP Winter 2004 Single Phase Dual Controller SW1 V th 2 Phase Dual Controller TUIL CH Figure 5 Example Waveforms for a Single Phase Dual Converter vs the 2 Phase LTC3736 In the meantime these FPGAs demand more needs only 1 2V Meer current from the power supplies Some systems use more than ten FPGAs board so the resulting total supply current can easily exceed 10A The LTC3708 based dual output supply is an ideal choice for such applications The LTC3708 is a 2 phase dual DC DC controller with a wide input voltage sync
61. 4 FPGAs contain XtremeDSP slices the Virtex 4 SX platform provides the highest ratio of XtremeDSP slices to other resources The largest SX device the XC4VSX55 has 512 slices Using these 500 MHz XtremeDSP slices with 18 x 18 bit multiplier and 48 bit accumulator exclusively this device can achieve 256 GMACY s performance at a very aggres sive price point providing the most pow erful DSP capabilities of any FPGA in the industry Demonstrating the revolu tionary flexibility of the multi platform approach enabled by the ASMBL archi tecture the DSP optimized SX55 offers ten times the DSP value as measured in GMACs dollar compared with previous generation FPGAs Xilinx is helping DSP developers close the gap between the performance of pro grammable single MAC DSPs and the requirements of advanced algorithms with Virtex 4 SX platform FPGAs Virtex 4 FPGAs can serve alongside programmable DSPs as pre processors or co processors to offload compute intensive tasks Conclusion To learn more about how you can take advantage of the breakthrough capabili ties and performance of Virtex 4 FPGAs in your next system please visit our web BER z site at www xilinx com virtex4 Winter 2004 Virtex 4 Solves Source Synchronous Design Challenges Source synchronous interfaces typically send signals at bandwidths of up to 1 Gbps or higher on each channel FPGA logic circuitry has difficulty pro cessing incoming signals at
62. 6 bit multiply operations per second La bp DDR333 SDRAM banks ca Ea Gigabit Ethernet MACs Fast Ethernet MACs gt 1000 PicoBlaze Processors Table 2 Xilinx devices achieve impressive specifications Winter 2004 especially fast or dont happen very often One example of this would be a serial transfer to read a temperature sensor For this application the sensor only needs to be read every ten seconds It would be a waste to have a state machine for tempera ture sensor reading that ran once every ten seconds but only took a few milliseconds to complete The logic would be unused 99 9 of the time These types of functions can be effi ciently combined into a single PicoBlaze processor which in the previous example Se Ep ge Figure 3 The video data distribution board is based on an XC3S200 FPGA It also includes SRAM the PicoBlaze processor is a quick and easy way to define many functions The PicoBlaze processor is also a great tool for accelerating the testing and debug ging process The PicoBlaze program code is stored in block RAM To make a change to the program we only need to change the block RAM contents It is possible to do this without re implementing the FPGA saving a lot of time Our favorite method of PicoBlaze processor development which is slightly unique is to use a PC serial port and a sim D D cog OT ENTER E OTe E d vi Lea Au a 10 100 Ethernet port a
63. A logic fabric This ultra low latency archi tecture enhances performance by reducing by a factor of ten the number of bus cycles needed to access the accelerator hardware The net result is a 20 fold increase in processor accelerator efficiency High Speed Connectivity When we asked system developers to describe their connectivity requirements they highlighted the need for performance to support emerging standards and flexi bility to upgrade today s designs to meet future bandwidth requirements They are looking for solutions that offer bandwidth greater than 3 125 Gbps provide com plete support for multiple communication standards and maintain the highest possi ble signal integrity Our third generation RocketIO multi gigabit transceiver satisfies these require ments with the industrys broadest Operating range and other enhancements Virtex 4 FX FPGAs enable bridging between just about any serial or parallel connectivity standard For example the third generation RocketIO multi gigabit transceivers provide compliance with the PCI Express standard with support for out of band signaling electrical idle and beaconing and spread spectrum clocking To address the challenges of backplane and other high speed connectivity designs RocketIO multi gigabit transceivers pro e Third generation multi gigabit transceivers e Operating range 622 Mbps 11 1 Gbps e Channels up to 24 e Transmit pre emphasis e Recei
64. A Channels a Access Burst T Tail bits G Guard bits 1 burst gt 156 25 symbols Figure 1 GSM FDM TDM and burst structure RF Analog Section V RX 935 960 MHz If 70 MHz 200 KHz Bandwidth Switched Voice Network Protocol Engine RISC or DSP IP Network DSP Section IF and TDM Processing FPGA Section Baseband Figure 2 Partitioning of the GSM processing between the FPGA the DSP and a RISC processor Xcell Journal 45 N DIGITAL SIGNAL PROCESSING OSP titian of the O3 model Figure 3 DSP model of baseband GSM processing with comparative host target blocksets DDS in the IF demodulator subsystem that frequency is dictated by a value com ing from the DSP The baseband signal can now be GMSK demodulated and sent for further processing This model runs in one of three differ ent ways e Normal mode simulation e Co simulation mode hardware in the loop e Real time mode 100 hardware In normal mode the data comes from the block located in the DSP section This block contains the DSP functions shown in Figure 2 in order to provide frames to the transmit ter the same applies for the receiver side Simulink performs all processing during simulation The gateway blocks have no effect except for converting the signal from one format to another such as from double to fixed point The Signal I O and Mixer subsystem then produces a loopback of the IF signal and
65. Both a signal pattern genera 82 Xcell Journal tor and high speed oscilloscope are con nected to high speed analog to digital con ADC converter DAC modules and each piece verter and digital to analog of Agilent test equipment is placed in the host mode so you can remotely manage the equipment s front panel controls You can save settings and scripts easily within your I or H drive private folders to re use for future sessions Having real signal insertion and being able to measure real output means that you can validate your algorithms transforms and functions and simultaneously have confidence that all results are both precise and authentic Reference DSP designs are provided within the VirtuaLab These designs allow you to evaluate the Spartan 3 2000 FPGA in a pre verified environment Reference designs include e Existing System Generator tutorial to introduce engineers to other features such as the ChipScope tool HDL co simulation hardware co simulation and the PicoBlaze processor 2 IM i604 SDRAM 16MB e Simple FFT with a 256 tap FIR filter and interpolation by three Ease of use and reasonably high performance allow you to evaluate the tool interface as well as the hardware The filter design is pro vided using the FDA tool to generate the coefficients allowing you to modify the coefficients and view the results Simulation can be run in System Generator as well as
66. D hard ware does not suffer from complicated test issues as testing is much simpler and deter minant than software based designs PLD LIN does not burden the software applica tion with LIN protocol processing It allows for accurate LIN timing control and does not require a crystal oscillator in slave mode thus saving costs board space and power consumption PLDs are generic devices They do not incur non recurring engineering charges and can be used across many projects One of the key advantages is the ability of the devices to be programmed in system so changing the hardware from master to slave is a breeze As with MCU designs the PLD needs an external transceiver device to drive the line The main downside to using PLDs You may not be conversant with the design flow so this may not be your most natural design route but it is certainly worth trying In more integrated higher end designs you will still need some sort of processor support but this can be achieved by using an embedded soft core processor such as MicroBlaze a low cost 32 bit RISC processor LIN System Development Automotive designers have a dilemma when adopting a new bus standard Should they wait for standard silicon devices or try to develop an ASIC with a semiconductor supplier in advance of a final agreed and verified protocol specification Some speci fications take years to be agreed upon ver ified and ratified so many semiconductor
67. DSP allows for high level mathematical verification and converts the heart of the algorithm into ready to use HDL Simulation inside of The MathWorks Simulink tool enables you to easily verify the image algorithm qualitatively and subjec tively when used with the Image Processing Toolbox also from The MathWorks Using System Generator to develop and implement image processing algorithms allows for a thor oughly verified and easily executed design plus you save time on subjective analysis of the HDL The high level block diagram allows for easy communication between team members resulting in less time spent crossing skill boundaries when determining imple mentation trade offs Winter 2004 em Generator allows you to easily imple The Basics The Image Processing Toolbox is an excel lent starting point with which to develop image processing algorithms for FPGAs It allows you to easily load and view many image types Although not directly usable within System Generator it can also rotate resize filter convert to and from the frequency domain and operate on the image mathematically and morphological ly You can use these latter functions as a qualitative measure against the actual System Generator implementation Exchanging Data Images are often stored and manipulated as two dimensional arrays that can be quite large For example a 1 024 x 1 024 x 8 bits per pixel image is 1 MB Typically a portion of the image is m
68. HDL commercial IP cores and high level languages and tools An example DIMEtalk network for this system is shown conceptually and in the DIMEtalk software tool in Figures 5 and 6 The network spans across all five FPGAs in the system with router and bridge elements in place as appropriate to enable the net work to operate Each FPGA has algorithm blocks s associated with it these are con nected to the network at user nodes The user node type and location can be defined to fit your application requirements In this example the network is also connected through a DIMEtalk edge to the VMEbus host interface on the system enabling direct data communications from the host to specific algorithm blocks inside the FPGAs What is clear from this example is the value DIMEtalk adds to the system You e a LZ Q Q je T ES m z OD E OD KA Brightly colored blocks represent algorithms Figure 5 DIMEtalk network shown for example system Winter 2004 can take an off the shelf system along with your algorithm blocks and rapidly connect all of these together and to the VMEbus Conclusion Using DIMEtalk you can efficiently implement the systems communications infrastructure required for FPGA comput ing applications The generated network is flexible and provides a complete communi cations solution to connect together algo rithm blocks interfaces backplane links and host syste
69. IN net work The application note is available at www xilinx com For more information please e mail the automotive team at automotiveteam xilinx com Note The LIN IP Core from Intelliga Integrated Design Ltd and CAST Inc are fully supported for use in automo tive designs The LIN implementation in XAPP432 is a reference design and should be used for evaluation purposes only Xcell Journal 95 Fducation Services Trims Your Learning Curve The training catalog s new intertace helps reduce your time to knowledge WT d by Cindy Andruss Instructional Designer Xilinx Inc cindy andruss xilinx com Designers have a need for speed not only in their designs but also in their learning curves Keeping up with the latest infor mation about Xilinx products and services is a critical part of getting your product out of design and into the market Knowing what and how you need to learn in order to increase performance is vital in today s global economy where lack of knowledge can spoil your competitive edge said Patrice Anderson Xilinx Education Services manager It s no secret that specialized training on Xilinx software tools can help you reduce your time to knowledge and gain an advantage over your competitors Education Services understands that effective training is critical to a designer s improved productivity and performance Although our website already addressed 96 Xcell Journal
70. Issue 51 Winter 2004 AC oural THE AUTHORITATIVE JOURNAL FOR PROGRAMMABLE LOGIC USERS 7007 YILNIM LS INssi KEIER NI XNITIX Mita a0 we 2 Times Square Outshines the World with the Biggest Brightest LED Display Ever Created EMBEDDED SYSTEMS High Bandwidth TCP IP Software Defined Radio Embedded Nucleus PLUS DAY i H 264 AVC on FPGAs GSM Modems DAJE DIX A i ks Virtex 4 FPGAS i i i n CHASES 1 NI Crh kor wi LA Breakthrough Performance ia ge at Lowest Cost iy E mi TE i ee SEBES Sn QUIKSILI 2 The Nw SPARTAN 3 Make It Your ASIC The world s lowest cost FPGAs Spartan 3 Platform FPGAs deliver everything you need at the price you want Leading the way in 90nm process technology the new Spartan 3 devices are driving down costs in a huge range of high capability cost sensitive applications With the industry s widest density range in its class 50K to 5 Million gates the Spartan 3 family gives you unbeatable value and flexibility Lots of features without compromising on price Check it out You get 18x18 embedded multipliers for XtremeDSP processing in a low cost PS J d d FPGA Our unique staggered pad technology delivers a ton of I Os for total connectivity es Ja solutions Plus our XCITE technology improves signal integrity while eliminating hundreds of resistors to simplify b
71. Let s describe an example implemen tation of custom logic in this case a function tone_q noise_q filter_finish switch_cir d BBRX_Start tone_bin BBRX_End enable BBRX_Finish BBRX_End 15 0 if BBRX_Finish false if d gt BBRX_Start amp d lt tone_bin 8 tone_q false nooise_q true elseif d gt tone_bin 7 amp d lt tone_bin 7 tone_q true noise_q false elseif d gt tone_bin 8 amp d lt BBR X_End tone_q false noise_q true else tone_q false noise_q false end else tone_q false noise_q false end if d BBRX_End amp enable true filter_finish true else filter_finish false end rithm in MCode is shown in Figure 1 q 15 0 switching circuit used in a filter The algo You must place the M file in the same directory as The MathWorks Simulink model file followed by selection and place ment of the MCode block in the model file In the parameter listing for the MCode block a MATLAB function parameter exists here you would type switch_cir the name of the M file The block will then configure itself and emulate the logic Figure 1 MCode for a switching circuit of the file as shown in Figure 2 Currently the Xilinx MCode block can not hold an internal state But if you would Figure 2 Configured MCode switching circuit block like to implement a state machine capable of holding an internal state there are other alternatives
72. PGA directly into the second FPGA The ability to easily develop projects that process data between dual FPGAs adds an entirely new level of project capabilities Power Supply Because the XUP UNM board consists of dual Virtex FPGAs three XC18V04 in sys tem programmable configuration PROMs and a wide variety of other parts we need ed a power system that could provide sev Figure 1 XUP UNM prototype platform eral amperes of filtered power over an extended period of time The heart of the power system is a Texas Instruments TPS54616 buck switching power supply This supply provides a stable 3 3 volt out put to a maximum 6 amp range Because everything else on the board is driven by this source 6 amps was a good supply level Students designed a large por tion of the power filtering using Xilinx XAPP623 application note Power Distribution System PDS Design Using Bypass Decoupling Capacitors Both students and professors are cur rently evaluating prototypes of these plat forms at the University of Texas Austin University of Texas El Paso and West Point According to Colonel Bryan Goda an academy instructor at West Point This board has great potential within an under graduate curriculum We are looking for ward to seeing what it can do Conclusion The XUP UNM prototype platform is a tremendous example of how academia and industry can work together to accomplish a common goal E
73. Pro PowerPC Users Nohau tools provide multiprocessor debug environments for embedded systems resulting in increased design modification and debug efficiency by Darrell Wilburn President LU Services Inc darrell iq service com or darrellw nohau com Platform FPGAs can implement a com pletely configurable system on chip by containing one or more microprocessors in a tightly coupled fabric This delivers very flexible hardware and software which can change continuously throughout the design and debug cycle A powerful set of software debug tools that can properly support sophisticated FPGAs is critical for successful project completion Debugging and verifying a design from external pins is problematic at best Reliably measuring 200 to 300 MHz signals like Fast Simplex Links over a 3 foot logic cable to an external trace facility is very difficult and sometimes impossi ble to make with sub nanosecond preci Winter 2004 sion Furthermore adding logic paths to provide for external probing is greatly intrusive which may create new place and route problems as well as timing differ ences in the final design Simulation can still help you overcome the simpler roadblocks but for real time or intermittent problems observing in real time through in circuit methods quickly becomes a necessity On board instrumen tation circuits can provide visibility to all system signals as well as executing pro grams
74. QAM System with Packet Framing and FEC for Telemetry Channels Sine Wave Transmitter FEC and QAM Symbol Mapping System Generator Channel Model QAM Receiver Original Subsystem Figure I QAM system System Generator model System Generator Xilinx System Generator is a system level modeling tool that facilitates FPGA hard ware design It extends The MathWorks Simulink in many ways to provide a modeling environment well suited to hard ware design The tool provides high level abstrac tions that are automatically compiled into an FPGA at the push of a button The tool also provides access to underlying FPGA resources through lower level abstractions allowing the construction of highly effi cient FPGA designs It is delivered along with a predefined Xilinx blockset library but also allows access to other languages with which most FPGA designers are familiar Finally it offers the ability to design at a system level and allows simula tion implementation and verification within the same environment usually Winter 2004 the Consultative Committee for Space Data Systems for telemetry channel coding specification CCSDS 101 0 B 5 Introduction to the QAM System Design Example In our example the overall QAM system starts with the transmitter This subsystem accepts data from an input source where forward error correction FEC is applied and an attached synchronization marker ASM i
75. Studio EDK 6 3i configura is now automatically configurable using XPS MLD technology ion of this leading royalty free RTOS on your newly designed system is as easy as selecting from a pull down menu Instead of spending hours mod ifying your target software to work with your new hardware configuration you can configure the target software auto matically in minutes without the error prone possibilities of configuring by hand This is especially valuable during the earlier design phases when the hard ware may be changing frequently This process was enabled by one of the underlying technologies of Xilinx Platform Studio EDK called micro processor library definition or MLD MLD Technology The Xilinx Platform Studio EDK devel opment system is based on a data driven code base that makes it extensible and open MLD is one example of this underlying capability It was created specifically to allow you to easily create and modify kernel configurations kandil lli associated board support packa PEAN o BSPs for partner supported RTOSs like Nucleus PLUS and its extensive middleware offering MLD has two required file types the data definition file MLD and data generation file Tcl The MLD con tains the Nucleus user customization parameters while the Tcl file is a Tcl script that defines a set of Nucleus specific procedures for building the final software system see Figure 1 Winter 2004 i Xce
76. System Generator tool are available online to provide even more of a head start see Table 1 The Xilinx DSP Central website www xilinx com products design_resources dsp_central grouping index htm has a complete list Audio DSP Design Kit The Audio DSP Design Kit is similar to the Video DSP Kit but is optimized for audio processing applications The kit fea tures a Spartan IIE hardware board with an XC2S200E 6GFT256 FPGA a TI TLV320AIC23 16 bit audio CODEC RS 232 port LEDs and switches and sev eral expansion connectors Customize Your Platform If the Audio and Video DSP Design Kits are not quite what you need for your design you can add more hardware firmware or software to create a custom platform Avnet has a variety of hardware add in modules that can serve as extensions to the basic platform Audio Video Add in Module The Audio Video Module provides addi tional functionality for DSP applications targeting audio and video processing appli cations It interfaces to a host through a standard AvBus connector and provides multiple video interfaces to accommodate RGB monitors LCD panels and standard definition television monitors The module also captures composite video and includes a CODEC to facilitate audio processing A PS2 keyboard mouse interface is included as well as a touchscreen controller Key elements of the module are e Philips SAA7113H video input processor e Philips UCB1400 ste
77. Technologies Xilinx clearly leads the programmable logic business now we are expanding into other key programmable technologies Xcell Journal Throughout the history of semicon ductors when a technology becomes programmable it dominates That s why in addition to programmable by Wim Roelandts logic we have CEO Xilinx Inc defined two key programmable technologies on which we will focus digital signal processing and embedded processing We have the right technology and the right business model to be a major player in all of these important and fast growing markets Digital Signal Processing Xilinx FPGAs have long been the highest performance technology for building DSP designs With our devices and software you can build systems that are two to three orders of magnitude faster than what a ded icated DSP device can do on its own Putting our extremely high performance DSP functionality next to a programmable DSP allows DSP designers to develop sys tems with unprecedented performance and value You also get many other advantages offered by FPGAs including flexibility fast time to market and higher levels of system integration There simply is no easier faster or better way to develop extreme performance DSP designs For example with our new Virtex 4 FPGA family you can achieve 256 GigaMACs billions of multiple accumu lates per second We have achieved this amazing performance t
78. The IP protocol environment is unlikely to settle down for some time Quite apart from competition among protocols supporting IP services that we know of today new IP based services are quickly emerging Virtex ll Benefits When looking for a suitable FPGA to take on these intensive processing tasks for the 1460 session controller cards Newport Networks chose the Virtex II FPGA Valuable features include high I O count greater than 100 MHz bus speed operation plentiful on board RAM digitally controlled impedance DCI and I O banking In particular the internal RAM based FIFOs enable a convenient software interface This allows for smooth interaction between the PowerPC and the hardware accelerated functions executed in the FPGA The Virtex II on chip delay locked loop DLL circuits also proved useful for gener ating low skew internal and external clock domains These can be referenced to an incoming clock signal such as the common switch interface CSIX an open standard commonly used to interface a network processor with a physical switch fabric The DLLs also allow output clocks to be phase adjusted to meet setup and hold times for devices such as SDRAM Virtex devices provide as many as eight fully digi tal dedicated DLLs on chip Alongside the Virtex II devices that add raw processing power Xilinx XC9500 CPLDs perform MAC layer functions and other custom functions These include pro prietary data control and al
79. The challenges of verification and debug steps are e Instrumentation to provide correlated hardware and software measurements e Needing a broad range of engineering skills e Extreme flexibility with ever changing needs for both Nohau Corporation has developed a compact on chip development system that enables you to efficiently address these debug issues The Nohau solution includes called Debug ItaceBlaze that is minimized for compact on chip debug IP size connects directly to the on chip peripheral bus OPB and utilizes on chip block RAM for trace storage The debug facilities are implemented two ways through hardware or software The software based solution uses a small Xilinx program called XMD STUB that resides in the first 1K block of memory The hardware solution uses programmable logic in the hardware and is transparent to the software You may choose the solution that is best for you Personally I prefer the software solu tion because it has less impact on the hard flexible for customization Also the cost of 1K of ware and is more Xcell Journal 23 g EMBEDDED SYSTEMS JTAG 256K X32 PHY E SRAM Figure 1 Nohau Debug IP with trace shown in a simple five chip Internet aware system memory is usually insignificant in systems that often have 1 MB or more A block diagram for a typical small system is shown in Figure 1 illustrating placement of the Nohau DebugTrac
80. The recorded modules are typi cally less than an hour in length and help to familiarize you with Xilinx technologies and short topics In some cases the mod ules offer you a preview of the content cov ered in multi day instructor led courses Some recorded e Learning modules cover topics that simply did not fit within the time constraints of an instructor led course provided in the classroom Aibing Zhou a design engineer at Networks in Sunnyvale California recently took the ChipScope Pro free recorded e Learning mod Juniper ule during a training break at work The recorded e Learning was very ChipScope Pro helpful to me to catch up with the tool Zhou said I was able to apply the learning immediately to gain field expe rience When I have a prob d lem I look for e Learning courses for more in depth training on that tool Empowerment Managing your own training is like driving a car You have the freedom to go whenev er and wherever your wish You get that kind of power everytime you visit the training catalog You can take control of your training with the self directed features in the training catalog such as self registration self paced recorded e Learning and self managed training plans Feel vested in your training by completing online course evaluations to let Xilinx know how it can continue to improve its services and products Nowe said that he prefers
81. a tions so although the algorithm imple mentation has been made easier system communications remains complex error prone and time consuming DIMEtalk The Concept Looking at the needs of FPGA application developers we established key require ments for a system communications tool e Scalability catering to designs of all sizes and designs distributed across multiple FPGAs e Flexibility tailored to the needs of the application e Easy algorithm interfacing comple menting algorithm implementation e Easy implementation ideally through a software tool e Resource friendly minimizing hard ware resource requirements Looking beyond the FPGA world the majority of data communications take place across some form of data network Winter 2004 Data networks are appealing because of the flexibility and scalability they provide When we developed the early DIMEtalk tool at Nallatech we intended it to be pri marily network based for these reasons So in essence DIMEtalk is network based and meets identified needs by providing e A high level software tool to enable users to develop communication networks e An intelligent packet based network routing tables automatically defined by software e Easy user node interfaces block RAM FIFO memory map e Automatic FPGA synthesizable code to represent the network e Small footprint network elements for efficient use of resou
82. able direct IP IP interconnections must be extremely 102 Xcell Journal and complementary protocols We expect the many IP standards to consolidate in the foreseeable future Flexibility is therefore paramount Scalability must also be built into the infrastructure to support the subscriber growth that IP carriers are targeting Easy management is also a prerequisite This includes 99 999 availability fully resilient operation and the ability to modify key functions remotely and apply upgrades without powering down equipment Further basic requirements of an IP IP interconnect include features that preserve network security ensure the availability of accounting information for accurate billing and control Quality of Service QoS mapping and media translations Building the Next Generation Choosing whether to implement the func tions of a device like the 1460 session con troller in software or hardware depends on the carriers business model and future plans For example you can bring a soft ware centric solution to market quickly which is also very flexible But drawbacks include a relative lack of scalability and robustness also falls as subscriber numbers increase These interconnects are also mis sion critical because session controller servers are located in line with the call par ties a software crash or other failure can result in dropped calls On the other hand a number of off the shelf computing
83. access to hardware tools that hardware and software engineers can also use to refine and implement the designs A design environment like Simulink makes it possible to quickly and accurately simulate system behavior and provides a direct path to implementation using automatic code generation Xilinx recognized early on that Simulink was a platform that could make Winter 2004 FPGA based DSP design practical Today a complete design flow is available from The MathWorks and Xilinx that includes third party development hardware for real time prototyping and deployment Many organ izations now enjoy order of magnitude returns on their investment in Model Based Design for Xilinx FPGAs Comprehensive sys tem level mathemati cal models form the basis of Model Based Design Such modeling Continttows Tesi omg Vertivnte Simulink Models was once only in the realm of technology researchers not mainstream prod uct developers But facing the limitations of traditional software and hardware description lan guages for large scale projects many design leaders recog nize that modeling and simulation is neces sary to handle the complexity of today s systems not only for system design but for hardware and software development as well Hardware description languages even with system level extensions do not sup port the rapid modeling and design itera tion needed for algorithmically intensive large
84. acing one of the FEC blocks Reed Solomon or Viterbi decoder readily available from the Xilinx blockset library with your own HDL implementation through a black box The black box behaves like other System Generator blocks it is wired into designs participates in simula tions and is compiled into hardware When System Generator compiles a black box it automatically wires the imported module and associated files into the surrounding netlist System Generator simulates black boxes by automatically launching an HDL simulator generating additional HDL as needed analogous to an HDL test bench compiling HDL scheduling simulation events and handling the exchange of data between the Simulink and the HDL simulator This is called HDL co simulation At this point you can also envision bringing MATLAB or C code previously compiled with the appropriate MATLAB to gate and C to gate tools into the Xilinx black box Simulation and Verification You can couple Simulink blocks with System Generator models to produce robust flexible test bench environments The convenience of this capability is that implementation and verification design phases can take place in the same environ ment Such is the case with the QAM sys tem design where a mixture of Simulink and System Generator blocks were used to implement the design test bench This allows you to verify the functional ity of your design at any critical probing 58 Xcell Jo
85. ackets jumbo frames of 9 000 bytes has a similar effect by reducing the number of frames trans mitted and therefore the number of inter rupts generated This amortizes the per packet overhead over a larger data pay load GSRD supports the use of Ethernet jumbo frames The components of GSRD use the device control register DCR bus for con trol and status This provides a clean inter face to software without interfering with the high bandwidth data ports The per packet features of GSRD help make efficient use of the processor and improve system level TCP IP performance TCP Transmit Bandwidth 270 Mbps 540 Mbps 490 Mbps 780 Mbps Conclusion The Xilinx GSRD is an EDK based refer ence system geared toward high performance bridging between TCP IP based protocols and user data interfaces like high resolution image capture or Fibre Channel The com ponents of GSRD contain features to address the per byte and per packet overheads of a TCP IP system Table 1 details the GSRD TCP transmit performance with varying levels of optimiza tion for Linux and standalone Treck stacks Future releases of GSRD will explore fur ther opportunities for TCP acceleration using the FPGA fabric to offload functions such as TCP segmentation The GSRD Verilog source code is available as part of Xilinx Application Note XAPP536 It leverages the MPMC and CDMAC detailed in Xilinx Application Note XAPP535 to allocate memory bandwidth be
86. address the sheer complexity of today s FPGA designs DC FPGA supports top down and bottom up methodologies for team based designs enabling you to choose the appropriate methodology DC FPGAs new AO technology auto matically selects the best synthesis algo rithm for the design The algorithms are dynamically controlled given the nature of the design and the applied constraints AO technology will also reorder the sequence in which the synthesis algorithms are run The result is that AO technology provides RS DC FPGA Synthesis ies Formality Auto Setup r ID Auto Setup P Integrity Assured cally proves that the RTL matches the implementation DC FPGA supports for mal verification with our Formality solu tion Both DC FPGA and Xilinx ISE output automatic setup files for Formality which greatly simplifies the formal verifica tion task The formal verification flow with Formality is shown in Figure 2 Conclusion The goal of effective prototyping is to have your design up and running at the desired speed with the least possible effort while maintaining design integrity with the ASIC implementation DC FPGA will help you reach this goal With DC FPGA you can use common RTL for both the FPGA and the ASIC implementations to maintain design integrity allowing you to design once The timing performance of DC FPGA with AO technology combined with the flexibility Ra Synthesis a Formality
87. adds white Gaussian noise to it The co simulation mode basically per forms in the same way as the normal mode except that the processing function between the gateways will be executed in the FPGA as a hardware in the loop simulation In real time mode all the blocks outside the gateways are ignored These gateways establish the communication between the 46 Xcell Journal different hardware entities In the case of our GSM model the frames travel from DSP to FPGA and vice versa through the parallel 32 bit data bus The IF signal now travels through the Froda Sal betewan 7 7 and 22 3 MHz DAC and can either be in loopback mode if the output of the DAC is connected to the input of the ADC or fed to the front end to produce an RF signal Co Simulation Benefits We designed the first version using only standard communication and DSP block sets from Simulink running in double precision from start to end As a second step we gradually replaced Simulink blocks with ones from Xilinx and tested the model in normal mode which result ed in hybrid models and simulations that were easier to debug After producing the Xilinx model we could test it in co simulation to verify that hardware computation was as expected As a final step we targeted the whole process to hardware Demodulator Subsystem Complexities The subsystem presented in Figure 5 is the IF to baseband demodulator which is the most complex part o
88. after an input differential clock buffer We observed improved stability and a more global uniform distribution of the reference clock with the FPGA editor Also though not directly related to the high speed transceivers we found that an independent post configuration DCM reset logic usually recommended if you have an external feedback clock is useful even when using internal feedback This solved a problem we were having with the DCMs where they were sometimes not locking after reconfiguration Xilinx Technical Support helped us find the solu tion Xilinx Answer Record 14425 As for programming and JTAG we used the same group of EPROMs to con figure eight of the nine FPGAs One of the FPGAs is the master and provides the clock for all the devices in the chain The ninth FPGA has a different pinout and a separate EPROM for itself All circuits are connected in the same JTAG chain which improved reprogram ming time mainly during the test stages We found that a need exists for a pull up resistor on the TDO output of each Xilinx device something that we hope Xilinx will add in future devices The JTAG is used also to check the board interconnections after assembly Conclusion In this article we ve shown the advantages of using embedded deserializers instead of discrete components on a large project By using nine 456 pin FPGAs to do the same job as 105 TQFPs we saved time both in the design and debuggi
89. al image or an algorithm developed with the Image Processing Toolbox Xcell Journal 63 No DIGITAL SIGNAL PROCESSING Using M Files You can use m files to model the algorithm before implementation with System Generator Although this step is optional it can be a highly effective aid in the slight paradigm shift from matrix operations in software to raster scan operations in a high ly parallel architecture Some of the key benefits include e Deconstruction proofs of Image Processing Toolbox functions as an aid to understand the algorithm e Creation of intermediate variables to assist in debugging the System Generator version of the design Try to set variables similar to how they will appear in the design Two dimensional matrices should be used as memory while raster order works better between blocks e Making algorithm trade offs Ask your self if you should place the bi linear interpolation stage before or after the sending of data to memory e A fast proof that the algorithm is on the right track e Qualitative analysis from the decon struction m file to the corresponding Image Processing Toolbox function Design Considerations An Example Real time image rotation has many chal lenges when going from the algorithm architect at a high level to the FPGA engi neer at the HDL and board level The algo rithm level choices that you will make about how to implement the design in the FPGA will a
90. alLink Gigabit Ethernet MAC LLGMAC peripheral incorporates the UNH tested Xilinx LogiCORE 1 Gigabit Ethernet MAC to provide a 1 Gbps 1000 BASE X Ethernet interface to the reference system The LLGMAC implements checksum offload on both the transmit and receive paths for optimal TCP performance Figure 2 is a simplified block diagram of the peripheral Winter 2004 FPGA Fabric PPC405 E DCR Bus EMBEDDED SYSTEMS Je DDR SDRAM Multi Port Memory Controller RX TX RX TX Checksum Checksum Offload Offload User Defined LocalLink LocalLink Peripheral GMAC Peripheral DIO 5 140 es DIU Ci ale LEDs amp Pushbuttons TX_DATA 31 0 TX_REM 3 0 TX_SOF_N TX_SOP_N TX_EOP_N TX_EOF_N TX_SRC_RDY_N TX_DST_RDY_N RX_DATA 31 0 RX_REM 3 0 RX_SOF_N RX_SOP_N RX_EOP_N RX_EOF_N RX_SRC_RDY_N RX_DST_RDY_N DCR_READ DCR_WRITE DCR_WR_DBUS 0 31 DCR_ABUS 0 9 DCR_ACK DCR_RD_DBUS 0 31 DIO 5120 UART Lite DB9 Can be customizable for other applications Figure I GSRD system block diagram CSUM Insert LocalLink Data a id TX Peripheral id RX Peripheral DCR to Host Interface J I J L E 7 J 7 J 7 d Peripheral Registers RX Client Interface Figure 2 LocalLink Gigabit Ethernet MAC peripheral block diagram Xcell Journal 15 din EE E TCP IP Per Byte Overhead Per byte overhead occurs when the processor touches p
91. alog or take advantage of new online services such as skills assessments personalized learning plans and more free recorded e Learning modules You can preview classroom courses from your desktop and meet the Xilinx instruc tors all experts in their fields via the Presenter s Bio link in the training cata log Offered at no charge these online serv ices are perfect for every budget and require absolutely no travel at all Philip Nowe a hardware designer and consultant in Canada said he is a big fan of online services especially recorded e Learning modules because they maximize his time The modules give me the ability in one hour or less to get most of the infor mation I need about a particular tool or software he said I usually get answers to any very specific questions I might have from my local FAE or I register for a full course on the subject Winter 2004 Figure 1 Xilinx Education Services redesigned its training catalog to reveal new online services and navigation features Management and Business Workflow Processes You can maximize your productivity and time with a solution that lets you or your engineers get Xilinx training right when you need it for immediate application on the job The training catalog is open 24 7 and allows training to be an integral part of the design process Education Services recorded e Learning fits especially well into the workflow learn ing model
92. and 12 channels of J 83 Annex B and J 83 Annex A C solutions on a Spartan 3 device Although Table 1 details the resources on an implementation that does not con tain the optional root raised cosine filter the details in Table 2 are specific to an implementation that contains the option Using the 12 channel case as an example the scales are favorably tipped towards a four channel granularity implementation of the J 83 Annex B and J 83 Annex A C as the savings achieved are significant i One Channel Granularity 1574 4 3130 8 4683 12 Four Channel Granularity 1049 3 2088 6 3304 9 Table 1 Resource utilization comparison between one and four channel granularity J 83 Annex A B C designs without RRC Spartan 3 FPGAs Number of Channels One Channel Granularity 8014 20 1 16024 40 2 23924 60 2 Table 2 Resource utilization comparison between one and four channel granularity J 83 annex A B C designs with RRC Spartan 3 FPGAs Four Channel Granularity 3748 7 1 7402 14 1 11057 21 1 One Channel Four Channel Granularity Granularity SS 4871 8 7483 12 Winter 2004 Design Example Usage The J 83 modulator design provides control configuration that you can control using a PowerPC or the MicroBlaze processor in Virtex IT Pro FPGAs The processor can not only control the J 83 Annex B config urations such as QAM interleaver control word and interleaver level but also the reset se
93. are operated 180 out of phase This technique interleaves the current pulses coming from the topside MOSFET switches greatly reducing the total RMS input ripple current This in turn allows the use of smaller and lower cost input capaci tors reduces the EMI attenuation requirement and improves operating eff ciency AME TO vo 250 JI NII LL Dot Buggi Burst Mode OPERATION UI NO LOAD ON OTHER CHANNEL gil 1000 1 10 100 LOAD CURRENT mA d al ZC EFFICIENCY EFFICIENCY 4 100 1000 10000 Deeg CURRENT mA Figure 2 LTC3407 Efficiency Curve Figure 3 Efficiency vs Load Current for the LTC3736 Converter Vum aI TO Gu FT cor r ies ang Se Ki EE SAAT Syn i GAAL Dial Figure 4 High Efficiency 2 5V 5A and 1 8V 5A Dual Output Converter with Output Tracking www linear com Winter 2004 www nuhorizons com 86 Xcell Journal Figure 5 compares the input waveforms for a representative single phase dual switching converter to the 2 phase dual switching converter Figure 6 shows how the RMS input current varies for single phase and 2 phase dual controllers with 2 5V and 1 8V outputs over a wide input voltage range For most applications 2 phase operation will reduce the input capacitor RMS current requirement to that of just one channel operating at maximum current and 50 duty cycle The LTC3736 has a default switching SINGLE PHASE DUAL CONTROLLER
94. arm interfaces to the backplane The XC9500 CPLDs provide plenty of gates to implement these functions with predictable routing and high I O In the future Newport Networks may move the 1460 s computing platform into the Virtex II Pro architecture subject to CPU bandwidth requirements Virtex II Pro FPGAs integrate PowerPC processing blocks directly on the chip enabling cost and real estate savings and easing manufac turing demands Conclusion The IP protocol environment is unlikely to settle down for some time Quite apart from competition among protocols sup porting IP services that we know of today new IP based services are quickly emerg ing These are supported by legions of new protocols While the industry works toward greater standardization among the applica ble protocols equipment providers need to deliver solutions that have the power to meet today s challenges as well as flexibility for the future The Newport Networks 1460 session controller exploits the high performance Virtex II FPGA architecture to achieve each of these goals For more information about the Newport Networks 1460 session con a J troller visit www newportnetworks com e Xcell Journal 103 e e by Craig Sanderson Systems Applications Engineer Nallatech S U y P iy c sanderson nallatech com Developing processing systems to imple j ut I away from traditional processor based systems to FPGA based
95. ater than previous generation Virtex II Pro FPGAs Winter 2004 DSP Product and Solutions Marketing DSP Performance GMACS DSP Valie GMACS ZS Power Consemplion per MAC ul Wy MACs Figure 1 The Virtex 4 FPGA offers breakthrough DSP performance at new low cost points Even more revolutionary Virtex 4 sys tem designers need not employ the largest family member to achieve this perform ance as has previously been the case Virtex 4 FPGAs now deliver this signal processing capability in a medium density a three input adder subtracter with feed back for accumulation modes The addi tion of a seven bit op mode multiplexer allows you to dynamically configure the XtremeDSP slice for one of more than 40 operating modes such as addition multi DIGITAL SIGNAL PROCESSING MVE for external logic slices which can thus be allocated to other tasks XtremeDSP slices can also be cascaded directly without accessing logic fabric or any loss in speed Xilinx s new ASMBL architecture enables us to alter the mix of XtremeDSP slices and logic slices The SX platform with in the Virtex 4 family offers the highest ratio of XtremeDSP slices to logic slices at one XtremeDSP slice for every 108 logic slices The SX platform is ideal ly suited for multiplier or MAC intensive tasks such as software radios The LX plat form offers the highest ratio of logic to other features and is suited for many tra di
96. ation of the quantized transform coef ficients for a luminance 4 x 4 block The efficiency of entropy coding can be improved further if using CABAC There are two parts in CABAC The arithmetic coding core engine and its associated probability estimation are specified as multiplication free low complexity methods using only shifts and table look ups The use of adaptive codes allows it to adapt to non station ary symbol statistics By using context modeling based on switching between conditional probability models that are DIGITAL SIGNAL PROCESSING Lu estimated from previous coded syntax elements CABAC can achieve a reduc tion in bit rate between 5 15 com pared to CAVLC Figure 2 depicts a typical system level functional block partition of the H 264 AVC SD video codec The solution is implemented based on the Spectrum Digital EVM DM642 evaluation module for the Texas Instruments TMS320DM642 DSP together with the Xilinx XEVM642 2VP20 Virtex II Pro or XEVM642 4VSX25 Virtex 4 daughtercard Conclusion When used in an optimized fashion the coding tools of the H 264 AVC standard increase coding efficiency by about 50 compared to previous video coding stan dards like MPEG 4 part 2 and MPEG 2 for a wide range of bit rates and resolu tions Currently it is the most likely suc cessor to the widely used MPEG 2 However the algorithm is quite complex at a resolution greater than source input format SIF
97. ation offer viable solu tions for implementing custom logic System Generator is a very powerful and abstract tool but we would like to see greater flexibility in terms of achieving synchronous e A J design within System Generator Winter 2004 ECHT by Steven Erck Director Technical Marketing Nu Horizons Electronics Corp serck nuhorizons com Brian Seymour Chief Technical Officer TechOnLine brian seymour techonline com Todays engineering environments are fraught with limited resources tight schedules and dramatic learning curves One of the most important customer responsibilities we have as suppliers or distributors is to help simplify or lessen these constrictions Development boards EDA software tools and reference designs have long been the traditional tools for design engineers But compiling these configurations does not relieve stress in critical areas and more than likely can intensify the problem Reducing time is the common critical element and accelerating the development cycle should become an engineer s prime objective Nu Horizons Electronics Corp has formed an exclusive partnership with Xilinx to provide a complete online evalu ation and development environment using TechOnLine Virtualab technology Rather than waiting for parts and boards you can now can evaluate the latest Xilinx technology learn new tools and take real measurements at the click of a button Winter 2004
98. ation to the embedded processor but C to hardware compilation as well it is possible with minimal effort 20 Xcell Journal to create high performance mixed soft ware hardware test benches that closely model real world conditions Key to this approach are high performance standardized interfaces between test software C language test benches running on the embedded processor and other components including the hardware under test imple mented in the FPGA fabric These interfaces take advantage of communication channels available in the target platform For example the MicroBlaze soft core processor has access to a high speed serial interface called the Fast Simplex Link or FSL The FSL is an on chip interconnect feature that provides a high performance data chan nel between the MicroBlaze processor and the surrounding FPGA fabric Similarly the PowerPC hard processor core as implemented in Virtex I Pro and Virtex 4 FPGAs provides high per formance communication channels through the processor local bus PLB and on chip memory OCM interfaces as illustrated in Figure 1 Using these Xilinx provided interfaces to define an in system unit test allows you to quickly verify critical components of a larger application Unlike system tests which model real world conditions of the entire applica tion a unit test allows you to focus on poten tial trouble spots for a given component such as boundary con
99. ayload data The two most com mon operations of this type are buffer copies and TCP checksum calculation Buffer copies represent a significant overhead for two reasons 1 Most of the copies are unnecessary 2 The processor is not an efficient data mover TCP checksum calculation is also expen sive as it is calculated over each payload data byte Embedded TCP IP enabled applications such as medical imaging require near wire speed TCP bandwidth to reliably transfer image data over a Gigabit Ethernet network The data is generated from a high resolution image source not the processor In this case introducing a zero copy soft ware API and offloading the checksum cal culation into FPGA fabric completely removes the per byte overheads Zero copy is a term that describes a TCP software inter face where no buffer copies occur Linux and other operating systems have introduced software interfaces like sendfile that serve this purpose and commercial standalone TCP IP stack vendors like Treck offer similar zero copy features These software features allow the removal of buffer copies between the user application and the TCP IP stack or operating system The data re alignment and the checksum offload features of GSRD provide the hard ware support necessary for zero copy func tionality The data re alignment feature is a flexibility of the CDMAC that allows soft ware buffers to be located at any byte offset This rem
100. bed place the rotation engine after the frame buffer The net effect will be an increase in memory bandwidth of 4 or 16 times for bi linear and bi cubic inter polation respectively The choice of how to deal with the increase in bandwidth requirements is design and memory dependent You can increase the number of read accesses from the memory by a factor of four to minimize the total amount of mem ory for bi linear interpolation This will decrease the overall incoming pixel rate by a factor of four unless the memory access speed is sufficient to handle the offset Alternatively you can store the four pixels necessary for bi linear interpolation at the Winter 2004 same address location thus increasing the total amount of memory by a factor of four This methodology can be mitigated based on pixel bit width that is 8 bit wide pixels can use this approach to fit into a 32 bit wide memory footprint Design Implementation with System Generator Our example will implement image rota tion with the following assumptions e Input image size and pixel widths are known at the time of implementation e Use the Hotelling transform of the form Sx Dx cos t Dy sin t Sy Dy cos t Dx sin t e Bi linear interpolation e Increased memory bandwidth is accounted for by reading the data out of the frame buffer four times faster than the incoming pixel rate Frame Buffer Read Address Generation Because the
101. c for building the custom hardware unassigned address space for the new instruction and a low latency path between the processor and the acceler ation hardware Xilinx provides the most efficient integration of microprocessor and FPGA fabric with dedicated interfaces that save clock cycles by eliminating bus over head are decoupled from the CPU to enable implementation of multiple acceler ators and do not stall the pipeline crucial to RISC performance 36 Xcell Journal All Virtex FPGAs have abundant pro grammable logic resources suitable for building acceleration hardware Xilinx enables efficient accelerator integration for the MicroBlaze soft processor core with the Fast Simplex Link FSL The MicroBlaze processor supports up to 32 input and 32 output FSL and code development is easy with simple programming for blocking and non blocking instructions Virtex 4 FX devices include up to two PowerPC hard processor cores Xilinx first introduced the immersed PowerPC 405 core in the Virtex II Pro family For the Virtex 4 family Xilinx has increased processor performance to 680 DMIPS at 450 MHz and reduced power consump tion to 0 44 mW MHz while maintaining compatibility with all software and IP cre ated for the first generation core A new auxiliary processor unit APU controller simplifies the integration of acceleration hardware for the PowerPC core by providing a direct interface between the CPU pipeline and the FPG
102. can be accessed over the Web with only a browser From virtually anywhere in the world with an Internet connection you will experience the advantages of designing with the newest products from Xilinx Our fi st Laboratory allows you to evaluate the Xilinx XC3S2000 FPGA and develop a multitude of applications One of several Nu Horizons Xilinx Virtual Lab applications is centered on high performance FPGA DSP functionality Others include Embedded Processing and high speed Serial I O Why wait For a limited time evaluate the latest Xilinx technology learn new tools and take real measurements without any initial investment What would you do with advanced access to new technology DISCOVER THE POSSIBILITIES OF ONLINE DESIGN MD HORE NU HORIZONS ELECTRONICS CORP For more information visit z www nuhorizons com VirtuaLab TechOnLine FPGA Power Solutions High Performance Analog Solutions from Linear Technology Dual Output DC DC Converter Solutions for Xilinx FPGA Based Systems ilinx FPGAs require at least two power supplies Vecwr for core circuitry and Veco for I O interface For the latest Xilinx FPGAs including Virtex II Pro Virtex II and Spartan 3 a third auxiliary supply Vccaux May be needed In most cases Vecaux can share a power supply with Veco The core voltages Vect for most Xilinx FPGAs range from 1 2V to 2 5V Some mature products have 3V 3 35V or 5V core voltages Table 1 shows the core vo
103. captured data can be saved as stimuli for subsequent simulation ChipScope Validation of DAC to ADC Loopback Our third design technique is useful to validate P160 analog DAC and ADC Memec P160 Sab ADC Data Capture Hardware co simulation mit Figure 5 ChipScope validation of P160 analog DAC to ADC loopback 12 Xcell Journal functionality in a loopback configura tion As shown in Figure 5 we generate a waveform that drives the P160 analog DAC to produce a continuous analog output signal The waveform stored in FPGA block RAM is defined as an expression from The MathWorks MATLAB either a periodic function such as a sinusoid or arbitrary data from a MATLAB array When compiling the model to a bit stream the DAC and ADC output ports are mapped to FPGA I O that connects to the P160 Analog Module according to the selected target FPGA In operation a loopback cable connects the analog output signal from the P160 Analog Module DAC back into the ADC input The Xilinx ChipScope tool available as a block in System Generator captures both the generated waveform and the sampled loopback version The design runs in the FPGA at the system clock rate 100 MHz on most Memec development boards The data sampling rate is set in the model to a con venient sub multiple of the system clock rate as the ADC can operate at a maxi mum 53 megasamples per second Conclusion The Memec P160 Analog Mod
104. cation of the PicoBlaze processor in this project is the Ethernet controller As mentioned earlier we select ed Ethernet to distribute data throughout the sign At each Ethernet connection we have an Ethernet physical layer transceiver PHY device connected directly to an FPGA We developed a very simple and tiny media access controller MAC mod ule which we use inside the FPGA to con nect the PHY to an instantiation of the PicoBlaze processor This Ethernet unit is small requiring less than a quarter of the logic resources in the XC3S200 FPGA It handles the basic Ethernet layers and protocols including ARP address resolution proto col It also supports the IP Internet protocol layer with ICMP Internet control message protocol UDP user datagram protocol and DHCP dynam ic host configuration protocol With this Ethernet controller we can plug an FPGA into our network and it negotiates an IP address Then we can transfer files and data to and from it Conclusion Xilinx devices made the challenge of devel oping the world s highest definition LED display achievable These devices are a per fect fit for a complex design because of their flexible nature and powerful feature set Valuable design components such as the PicoBlaze processor further increase their ease of use and thus their value The reconfigurable and flexible nature of the devices allowed us to ship the sign with all first revision circuit board
105. ce and by taking advantage of IP included in the Xilinx development toolset we were able to get the job done in four months MPEG 2 MPEG 2 is a widely used video compres sion standard rich with diverse encoding methods Its diversity includes three dis tinct techniques for coding individual video frames as either intra I frames pre dicted P frames or bi directionally inter polated B frames P frames and B frames introduce additional latency both encod ing and decoding To cut latency to the absolute minimum we used only I frame encoding and decoding Intra frame encoding consists of a pipelined set of functions Basics The three MPEG 2 coding methods are e I frame Intra frame encoding is based solely on information within a single frame Furthermore the I frame encoding and decoding process may begin as soon as the first 16 lines of a frame are received P frame Predictive encoding uses a previous frame and encodes only the differences between that frame and the current frame to be encoded B frame Bi directionally encoded frames use both a previous I or P frame and a future I or P frame forming a best match interpolation between those two frames and the cur rent frame and encoding the resulting differences B Frames Impede Low Latency and P Frames Don t Help Because B frames use a future frame to encode the current frame B frame encod ing and decoding impose a
106. ceiver bottom Fixed point computations are shown in yellow Dermed word langi 16 ben Alaieoompite scale laci n r ri D In andaria r iep DAS senri S Pa F E A i H A Drw i Dis LA bd bb do dod A 7 A8 A A 3 pon Fest past reg pe Fiat ret Figure 4 Optimal fixed point scaling of OFDM transceiver obtained through iterative simulation and analysis subsystems including software digital hardware and RF analog hardware You can rapidly simulate and iterate to identify flaws and refine the model to validate behavior against the requirements The Simulink model defines all the information needed to implement the soft ware or hardware including bit true fixed point processing and cycle accurate timing and synchronization of multi rate systems Simulation is used to show that the exe cutable specification defined by the model is complete and works correctly You can use model components to cre ate well defined subsystem interfaces which simplify reuse in subsequent design 68 Xcell Journal efforts even when those projects employ different target hardware or hardware soft ware partitioning Implementation with Automatic Code Generation Once you have refined and validated the design you can automatically generate code from the model eliminating the need for hand coding and the errors that manual cod ing can introduce You can then use the code for real
107. com plete system providing a platform for software development that can contin ue in parallel with ASIC development and manufacturing The benefits of FPGA prototyping are clear you will have more confi dence in your design which ultimate ly enables the development of a right first time ASIC in less time Prototyping Challenges Ideally the source register transfer language RTL for the design would be identical for both ASIC and FPGA But in practice you must make modifications to the RTL to get the best results from FPGA synthesis or in some cases to even synthesize a design FPGA synthesis tools typically require you to write code in a certain style follow ing recommended coding guidelines and each synthesis tool will have its own subset and variation of language support Unless the ASIC and FPGA synthesis tools use the same compilers and directives the RTL for the FPGA and ASIC implementations will likely be different Winter 2004 Design Compiler Designers often use the Synopsys DesignWare Library building blocks in the ASIC implementation of the design Using a synthesis tool that does not sup port DesignWare requires you to write the specific elements yourself which can potentially introduce errors in the design Meeting timing is often one of the most challenging issues in prototyping the design Often designers are forced to use a fixed optimization strategy in traditional ASIC St
108. de control in the LTC3708 allows fast transient response minimizing the number of output capacitors An internal phase locked loop allows the IC to be synchronized to an external clock for applications with more than two output rails The LTC3708 also features programmable current limit output overvoltage protection and power good output The device is available in the 5mm 5mm OFN package An optimal power solution for multirail supply systems incorporating the latest Xilinx FPGAs should provide multiple outputs with supply tracking sequencing As board real estate becomes more expensive the power supply must be more effcient and smaller while supplying higher current in high end applications Linear Technology s latest dual output power management ICs LTC3407 LTC3736 and LTC3708 successfully address these challenges For data sheets and additional information on other power solutions for Xilinx FPGAs visit Linear Technology s website at www linear com 7 Bee eee NU HORIZONS ELECTRONICS CORP 1 888 747 NUHO 500mV DIV 2ms DIV Figure 9 Up Down Coincident Tracking 2ms DIV Figure 10 Up Down Ratiometric Tracking Note LT LTC 7 and Burst Mode are registered trademarks of Linear Technology Corp All other trade marks are the property of their respective owners N W Fae SS www linear com Winter 2004 Implementing 70 High Speed _ Channels with H FPGAs Jsing nine Xi
109. delay the Winter 2004 16 MB Audio In Video In Digital O 1 Digital 1 O 2 Digital I O 3 EPB DIGITAL SIGNAL PROCESSING MVE 16 MB DDR QDR DDR SDRAM SDRAM SRAM Audio Out RGB Video Out RGB l Video Out SDRAM I O Serial UO IIC Bus Local PCI PCI PCI Bridge Figure 3 VigraWATCH block diagram encoder or decoder must wait for the future frame to arrive before coding the current frame Thus B frames must be tossed in the quest for minimum latency The P frame s principal contribution to MPEG 2 is in improving compression ratios as they are smaller than I frames A greater compression ratio means reduced transmission bandwidth However because low latency is the primary concern the bandwidth needs to be enough to accom modate I frames without buffering delays We also had another latency issue devel opment time Thus we developed an I frame only decoder Without P and B frames MPEG 2 video becomes essentially the same as motion JPEG In this case we were constrained to MPEG because that was the source format The VigraWATCH Video Processor The VW platform allows you to rapidly develop high performance audio video and image processing functions using the XC2V3000 the microprocessor or both Figure 3 illustrates the VW s primary com ponents peripherals and available I O Primary Components The VW contains five large ICs
110. delay of each data and clock channel within the SelectIO block in 78 ps incre ments to meet the setup and hold requirements for reliable data capture For extreme levels of skew the misalignment might be greater than a bit interval Aligning bits helps read the data reliably but some channels might be out of step with others To address extreme levels of skew greater than a bit interval ChipSync technology provides a bitslip capability An optional training pattern simplifies the task of aligning data words across all channels With source synchronous design each interface has its own clock As multi ple interfaces and memories are connected to the same FPGA the need for numerous flexible clock resources grows With clock aware I Os ChipSync technology enables simultaneous implementation of multiple source syn chronous interfaces Xesium clocking makes this possible with up to 24 clock regions per device Each region can have up to six I Os acting as clock sources for data capture Up to 95 I Os can be clocked by a single I O clock providing great clock flexibility and a large number of clocks Xcell Journal 37 What will happen when your FPGA and PCB designs meet SWaIsSAS payeibajuy Integrated Systems The potential for disaster is enormous if your designers work independently of one another Mentor Graphics eliminates that potential with the only integrated systems design solution in EDA We have superior
111. designers the traditional customers for our devices However DSP and embedded pro cessing designers are very different from logic designers they use different tools they have different needs and expectations and they approach their designs in different ways For example DSP designers usually work with algorithms such as Fast Fourier Transforms and FIR filters embedded processing design Oar ers work with high level languages such as C or C while logic designers usually work in VHDL or Verilog Although the final implementation is in an FPGA the design approach is very different for each of these customers and thus we must support these customers in different ways Therefore to ensure that we are address ing the needs of each market segment I decided to create two new divisions within Xilinx one to focus on the DSP market and one to focus on the embedded process ing market Each new division headed by a vice president will focus on providing the development tools devices IP cores sup port services and marketing functions to fully capitalize on these growing technolo gies We intend to be the leader in all of these key programmable technologies Conclusion Xilinx is the only company that can bring all these programmable technologies together in a single device giving you a tremendous advantage in performance cost and time to market If you do a sys tem on chip design in an ASIC it will require
112. dinates in the 2 D image are Sx Dx cos t Dy sin t Sy Dy cos t Dx sin t 2 Place the rotation engine before the frame buffer In effect this method predetermines and weights the values stored through non sequential addressing into the frame buffer The output data is read in raster scan format from the frame buffer The Hotelling transforms with similar representation as above are Dx Sx cos t Sy sin t Dy Sy cos t Sx sin t In their paper Real Time Image Rotation and Resizing Algorithms and Implementations www xilinx com products logicore dsp rotation_resize pdf my Xilinx colleagues Robert Turney and Chris Dick showed that this approach is more eco nomical in terms of memory bandwidth Because the data is stored to the frame buffer in non sequential order you must take care to ensure that all valid pixel loca tions are written to Also if the rotation angle is changed you must clear artifacts in the corners from the previous frame from the frame buffer A nearest neighbor rotation of 45 degrees clockwise demonstrates the void artifacts as shown in Figure 1 that occur due to Figure 1 Void artifacts when rotation engine occurs before the frame buffer multiple input pixels being written to the same frame buffer address To avoid a complex control path and concentrate mostly on the data path our example goes with the first choice previ ously descri
113. ditions and corner cases that might be difficult or impossible to test from a system level perspective Such unit testing improves the quality and robustness of the application as a whole Unit Testing A comprehensive hardware software testing strategy includes many types of tests includ ing the previously described unit tests for all critical modules in an application Traditionally system designers and FPGA application developers have used HDL simu lators for this purpose Using simulators the FPGA designer cre ates test benches that will exercise specific modules by providing stimulus test vectors or their equivalents and verifying the result ing outputs For algorithms that process large quantities of data such testing methods can result in very long simulation times or may not adequately emulate real world condi tions Adding an in system prototype test environment bolsters simulation based verifi cation and inserts more complex real world testing scenarios Winter 2004 Unit testing is most effective when it focuses on unexpected or boundary condi tions that might be difficult to generate when testing at the system level For exam ple in an image processing application that performs multiple convolutions in sequence you may want to focus your efforts on one specific filter by testing pixel combinations that are outside the scope of what the filter would normally encounter in a typical image H W S W
114. dware some of the channels were getting misaligned A colleague of ours referred us to the design note about 32 bit word comma alignment in the RocketIO transceiver user guide Although this is usually needed only for a 4 byte data path we implemented a similar scheme for our 2 byte data path and this fixed our misalignment problem Clock Programming and JTAG We cannot over emphasize the need for a high quality reference clock Besides satis fying all of the criteria specified in the RocketIO user manual we made sure that Winter 2004 this is a flexible approach as the FPGAs are reprogrammable and a more economical solution in the long term our reference clock was as clean as we could possibly get see Figure 2 We used a quartz based phase locked loop QPLL circuit developed at CERN for our system to provide the best jitter free clock source 100 ps peak to peak We found that a lot of problems in the per formance of the RocketIO devices could be traced to a noisy jittery reference clock If you are using RocketIO transceivers on both halves of the chip then it s much bet ied Gep kimwe As Ladies Gen E Figure 2 Clock jitter measurement ter to have two reference clocks We believe this helps even if you are running the RocketIO transceivers in half rate mode which is our case Another aspect of the clocking scheme that we used was to pass the reference clock through a global clock buffer
115. e DVD players 625k 10 uF Key Features 205k e Two 95 efficient 3 A buck controllers and one 300 mA LDO e Adjustable output voltages 1 20 V to 6 5 V on all channels xin fe ur ong 575003 nx FPGA ution for e Input voltage range of 2 2 V to 6 5 V The TPS75003 power management IC tested ender management e Independent soft start for all three for Xilinx s Spartan II IIE 3 series of Preferred by e 7 A power supplies FPGAs integrates multiple functions e LDO stable with small ceramic to significantly reduce the number output capacitor of external components required and e Independent enable for each supply simplify design Combining increased independent enables for sequencing for flexible sequencing design flexibility with cost effective the 3 channels The TPS75003 meets e 4 5 mm x 3 5 mm x 0 9 mm 20 pin voltage conversion the device all Xilinx startup profile requirements QFN package includes programmable soft start including monotonic ramp and e 1 ku price 1 90 for in rush current control and minimum ramp times For information on TI s complete line of power management solutions for Xilinx FPGAs visit www ti com xilinxfpga You ll find a library of reference designs tested and endorsed by Xilinx along with schematics and BOMs for all designs Questions Need samples or an Evaluation Module Contact us at fpgasupport list ti com ar aay VA ScD a di TEXAS INSTRUMENTS Real World Signal Processing and the black
116. e highly optimized network protocol and low overhead imple mentation used in DIMEtalk the efficiency of data transfers is as high as 97 FPGA Slice Resource Requirements DIMEtalk Router DIMEtalk PCI Edge DIMEtalk Memory Node PCI Express Endpoint 10G Ethernet MAC 1G Ethernet MAC RapidlO Logical and Transport RapidlO PHY 0 500 1000 1500 2000 2500 3000 3500 4000 4500 FPGA Slices Figure 1 DIMEtalk network element resource usage comparison Xcell Journal 105 a design flow shown in Figure 2 enables easy Network Definition Develop Algorithms network implementation and you can use DESIGN FLOW WITH DIMEtalk Define DIMEtalk network for FPGA computing system using DIMEtalk software tool Develop rest of application using standard design entry flows HDL System Generator AccelFPGA Handel C etc the design entry tool of your choice for algorithm blocks The stages of the design flow are e Network Definition conceptually Connect Algorithms Assign to Devices Generate Code design the network to provide commu DIMEtalk automatically generates HDL code drag and drop port and user constraints signals to the FPGA files for complete UCF I O signals design Import algorithms as HDL code or Xilinx compatible netlist and connect into the DIMEtalk network Assign network to FPGAs and use Device Editor to nications links and interface points to algorithms as required ac
117. e optical data bursts received from a source client OBS NIC to a destination client OBS NIC Advances in Xilinx FPGA technology have made it possible for the MCNC RDI to build a NIC that implements the JIT signaling protocol for an OBS network The OBS NIC uses DWDM technology to transmit and receive data optically on spe cific wavelengths and is capable of handling data rates as high as 1 25 Gbps The NIC card can be tuned dynamically to as many as eight different DWDM wavelengths In the JIT protocol a control packet reserves a wavelength channel in the net work for a period of time L equal to the burst length starting at the expected arrival time R this can be adjusted by the number of hops that a burst needs to travel and the processing time at each intermediate node If the reservation is successful the con trol packet adjusts the offset time for the next hop and forwards it on If the reser vation is not successful the burst will be blocked and the packet will be discarded Because JIT is a one way reservation proto col buffering does not occur at the node level thus reducing any latency Implementation of JIT with an efficient scheduling algorithm can further decrease the probability of burst loss The JIT protocol uses a SETUP mes sage to announce a burst in the OBS net work Each optical burst of data comprising some number of contiguous packets destined for a specific destination is sent immediately aft
118. e the on chip serial port for other tasks This implementation may also burden the application code with LIN protocol requirements and will complicate the design and testability of the code This method also needs to be complemented with GPIO functionality for error check ing and synchronization purposes and requires CPU activity throughout LIN message exchange Therefore it is not the most power efficient solution as these fixed function parts allow little or no flexibility for customization The devices still require an external bus trans ceiver chip and a degree of real time pro cessing in the MCU Distributed MCU solutions can also result in complex design and test issues associated with software based designs designers may need to explore all potential fault and interrupt loop states so that no strange indeterminate states occur Exhaustive testing is costly however and the test vectors can take longer to write than the design code itself Winter 2004 The ability to switch between master and slave in the same device means that inventory and stocking costs are reduced Hardware Programmable Logic Device PLD LIN implemented in PLDs offers similar benefits to LIN implemented in an MCU dedicated hardware peripheral They do benefit from being implemented in generic devices that are off the shelf low cost and low power This means that time to market is extremely quick and easy The LIN implemented in PL
119. eBlaze module Please note that the Nohau solution requires no external signal pins all access is through the JTAG port Furthermore it does not impact timing because it only interfaces through the OPB bus The resource utilization in the FPGA for the Nohau IP is very small Actual require ments are shown in Figure 2 Debug with no trace Debug with trace Slices 84 Slices 425 LUTs 86 Flops 148 Mults 0 BRAMs 0 Figure 2 Nohau resource usage with and without trace Design Debug Flow The design flow with Nohau tools present is illustrated in Figure 3 You may build an initial system from scratch or use a plat form generator like Nohau BlazeGen or Xilinx Platform Studio XPS Base System Builder BSB Simply specify your system with the 24 Xcell Journal MHS MSS UCE and project options files which are generated by the platform builder or user generated text files To add the Nohau Debug raceBlaze IP to a proj ect you first build it with BSB or BlazeGen and add Debug ItaceBlaze IP with a pass through BlazeGen The output of the XPS build is a bur file that contains the bitstream required to pro gram the target FPGA with your system The Nohau Seehau debugger is a conven ient and easy to use GUI interface that allows fast and easy updating of system hardware and software as well as test and check out of software execution Seehau loads the bit file and programs the FPGA in just a fe
120. ective way to deter mine cycle counts and explore hardware interface issues HDL simulators can also help the typically compile synthesize map times required alleviate long before testing a given hardware module in system Hardware simulators provide much more visibility into a design under test and allow single stepping and other methods to be used to zero in on errors Impulse C Design Files Visual Studio CodeWarrior Generate Software Interfaces Generate Hardware Interfaces Generate RTL Software Libraries Figure 3 Hardware and software C code is compiled and debugged in a standard IDE the interfaces are automatically generated and the generated HDL synthesized in Xilinx tools before downloading into the Virtex IL Virtex II Pro or Virtex 4 device If tests require very specific timing using an embedded processor to create test data will most likely result in data rates that are only a fraction of what is needed to obtain timing closure In fact if the test routine is implemented as a state machine on the processor the speed at which the state machine can be made to operate will be slower than the clock fre quency of the test logic in hardware Hence for most cases the hardware por tion would need to slow down so the CPU can keep pace providing test stimulus and measuring expected responses Alternatively you can create a buffered interface a software to hardware bridge
121. een capturing video at the sensor and displaying it at the remote viewer With training people can get used to as much as a half second of delay but often the result is vehicle oscil lation as the operator over corrects con trols without having the intuitive immediate feedback Titan Advanced Products and Design Division works with Corporation s aerospace defense primary contractors 4 Xcell Journal WW Figure 2 Ground station console who provide unmanned aerial vehicles UAVs to the DoD Figure 1 shows a General Atomics Predator UAV and Figure 2 shows a ground station used for UAV remote control In surveillance missions MPEG 2 encoded video from a pan tilt zoom PTZ camera mounted on the UAV is transmitted to a ground station There the imagery is presented on a console to the operators For the most effective control of the camera and vehicle we had to reduce delay through the MPEG 2 decoder to less than 75 ms To accomplish this task we used our commercial off the shelf multimedia video processing board the VigraWATCH VigraWATCH VW is equipped with a Xilinx Virtex IIT FPGA and an IBM PowerPC 440GP PPC processor This Winter 2004 provides more than enough processing power to easily implement a customized MPEG 2 I frame decoder which far sur passes the minimum latency requirement With the overhead of circuit board development and a basic software frame work in pla
122. een plug in modules The board provides a multitude of out put connectors e Standard 40 pin 0 1 inch connectors on Digilent I O boards which make it possible for students to obtain low cost push pins that allow interfaces to logic analyzers and other test equipment e Hirose 140 pin con nectors that mate to the new high speed bus series of Digilent boards featuring memory and analog to digital conversion e Standard 96 pin con nectors that allow basic interfacing to a 6U VME mounting platform e Standard 9 pin serial and 25 pin parallel connectors and their associated interfacing devices A nice feature is the ability to switch the 25 pin connector between enhanced paral lel port and JTAG modes By flipping a switch students can JTAG program the board using a standard parallel printer cable This is a very useful feature in a stu dent environment because it allows profes sors to mount the boards in a stationary position students can program the board 100 Xcell Journal This board has great potential within an undergraduate curriculum We are looking forward to seeing what it can do Colonel Bryan Goda West Point without vertically mounting a program ming cable to the header pins reducing the possibility of the header pins breaking off From an academic standpoint another great benefit is the placement of dual FPGAs on a single platform The design team connected 100 pins from one F
123. efficienc eliminating the need for external MOSFETs and Schottky diodes Figure 1 is an application example for 2 5V 600mA and 1 8V 600mA supplies Vouty 18V AT BOUTA Out Figure 1 High Efficiency 2 5V 600mA and 1 8V 600mA Regulators Table 1 Core Voltage Requirement for Xilinx FPGA Families Virtex E Extended Memory ELECTRONICS CORP Winter 2004 Xcell Journal ADVERTISEMENT 85 FPGA Power Solutions Figure 2 shows the efficiency curves of the circuit vs load current The LTC3407 has a constant 1 5MHz switching frequency allowing the use of tiny inductors and capaci tors Selectable Burst Mode operation provides high eff ciency at light loads The IC has short circuit protection and a power on reset power good output It is available in small thermally enhanced 10 lead MSOP and 3mm x 3mm DFN packages ADVERTISEMENT LTC3736 2 Phase Dual Synchronous DC DC Controller for 5A Loads The LTC3736 is a 2 phase dual synchro nous step down DC DC controller Power supplies using the LTC3736 can provide 5A at both outputs with a 5V input meet ing the load current requirements for most FPGA applications The LTC3736 receives input from 2 7V to 9 8V and produces output voltages ranging from 0 6V to 9 5V Figure 3 shows that up to 95 eff ciency is achieved An applica tion example is shown in Figure 4 In contrast to single phase operation the two channels of a 2 phase switch ing converter
124. enerators cessseeeee 20 Nohau Shortens Embedded Processor Debug Time 23 Scalable Software Defined Radio Development 26 High Speed Optical Burst Switching c cessseeeseesseesseeen 29 Virtex 4 Breakthrough Performance Lowest Lt 33 Jr m DEE mm tg Eug ee eee 39 Implementing H 264 AVC Video Coding Mondo 40 GSM Modem on a DSP FPGA Architecture c scesssceesseeeen 44 Design Kits Turbo Charge DSP Arlon 48 System Generator to Create J 83 Cable Modulator d Implementing DSP Algorithms in Hit 56 XtremeDSP Slices Deliver More DA 60 Image Processing Algorithms with System Generator 63 Simulink for Embedded Signal Brrgecg 66 Interfacing Simulink to the Analog World cscsceeseesseees 0 Build Custom Real Time Video Applications 0 sseeseeeeee 14 Let System Generator Do the Hondchukng 18 Early Access The Designer s bie d 70 High Speed Channels with 9 Hit 89 LIN Bus Cost Effective Alternative to AN 92 Education Services Trims Learning Curve c ceseeseesen 96 Board Gt Educaion ae P aaae 99 Flexible and Adaptable IP Jntegonechon 101 Simplify FPGA Application Design with DIMEtalk 104 Design Once for ASIC Dtoiuge 108 To subscribe to the Xcell Journal or to view the web based Xcell Online visit www xilinx com xcell 6 from the top Focusing on Programmable
125. ents TMS320C6000 DSP Processors ultra fast A D 16 bit D A up to 160MSPS per channel D A and user configurable Digital I O interfaces can be combined in single and multi board systems to address Extensible Configurations application requirements bounded only by the imagination PCI CompactPCI Embedded Learn more at www traquair com ads xcell heron html Traquair Data Systems Inc Tel 607 266 6000 Email sales traquaircom Web www traquair com Virtex II and Virtex ll Pro are trademarks of Xilinx Inc Xcell Journal 59 VNU DIGITAL SIGNAL PROCESSING ee New XtremeDSP Slices Deliver As Much As 10X More GMACs Per Dollar Virtex 4 FPGAs offer breakthrough DSP performance at the lowest cost _ SC byNorinder toll ON i ll km KA P a b a Xilinx Virtex Series FPGAs have been the preferred choice for high performance signal processing and DSP co processing in many digital communications and video imaging applications With algorith mic complexity on the rise the pervasive need for flexibility and increasing down ward pressures on price power per chan nel designers often face tough tradeoffs and difficult choices to employ either FPGA or ASIC centric systems for chal lenging signal processing applications New XtremeDSP slices on the Virtex 4 device family extend FPGA signal processing capability beyond 256 GMACs which represents DSP perform ance two times gre
126. equired for J 83 Annex A C by using the Xilinx Reed Solomon encoder block with the Code Specification parameter set to DVB Similarly the Xilinx interleaver deinter leaver block is directly used in the design with the mode set to Interleaver and the Number of Branches and Length of Branches set to 12 and 17 respectively This results in an exact match to the requirements of the interleaver in the J 83 A C specification Using the visual graphic means of design entry in System Generator these blocks are easily connected to each other and to the control circuitry that is part of the design Winter 2004 M r 83 Annex A C Modulator Copyright C 2004 Xilin ine Figure 4 J 83 Annex A C modulator with scatter plot IP Simulation in Simulink It takes a lot of time to simulate and test the functionality of a complex system You can use the same J 83 circuit built in System Generator for simulation and veri fication as well as the FPGA implementa tion Within the same environment using Simulink for simulation the design is stim ulated with MPEG transport packets and the appropriate QAM reset synchroniza tion and other control inputs As shown in Figure 4 this stimulus is shown in the block labeled Stimuli The source inter packet gap and burst nature of the MPEG transport packet may be chosen at random at the top level allow ing a test of the full suite of possibilities Figure 4 also shows a discrete time
127. er components of the Nucleus system by hand to work in this environment such as network ing web server graphics file manage ment USB WiFi and CAN bus Future releases of Nucleus will move these products into full integration with Xilinx Platform Studio EDK and MLD technology Winter 2004 Nucleus PLUS and Xilinx Devices As we have said the process of creating a working BSP for Nucleus PLUS begins with configuring the hardware platform using Xilinx Base System Builder or the supplied sample reference designs Then you can go to the Project gt Software Platform Settings menu item and select the operating system you want to use from the list Choosing the Nucleus option Figure 3 a b will make available the spe cific software settings for the RTOS under each of the tabs on the Software Platform Settings menu Figure 4 shows the user enabling the cache on the PowerPC Note that in the LV version most of the software options are disabled but can be changed in the full version Once you are satisfied with the soft ware settings you can use the Generate Netlist and Generate Bitstream com mands and download the hardware con figuration onto the FPGA using Xilinx XPS or ISE tools You can now execute the Tools gt Generate Libraries and BSP commands to configure Nucleus PLUS The appli cation software can be linked with the RTOS Now you are ready to switch over to the GDB debugger and download the combined RTOS and ap
128. er the node receives a SETUP ACK from the ingress OBS node An out of band SETUP message is sent across all switches before this step to prepare all path switches for the burst data OBS does not use any optical buffering or packet parsing For a long burst a KEEPALIVE message may be required to keep all switches in active state The JIT signaling scheme is shown in Figure 1 The Role of the FPGA The development of the OBS NIC was enabled by the availability of integrated high speed multi gigabit RocketlO Winter 2004 Ethernet GigE Link FPGA Prototyping Board To From Controller Optical Frontend Card To From BSG Receiver Control Figure 2 Architecture of OBS NIC transceivers in the Virtex II Pro FPGA allowing high speed data streams 1 10 Gbps to directly reach the core of the FPGA for processing Dense FPGA logic available in the Virtex II Pro FPGA facili tates implementation of complex state machines of the JIT protocol The avail ability of embedded IBM PowerPC 405 processors in the Virtex II Pro FPGA allows implementation of complex sched JIT Processing new existing Figure 3 FSM system flow diagram Winter 2004 no new message connection connection connection update connection ee eee uling algorithms and timers associated with the JIT protocol The OBS NIC contains a Virtex II Pro XC2VP20 FPGA Three Gigabit Ethernet channels are used
129. esign flow that supports development of both ASIC and FPGA without the manual intervention typical in the current approach to proto typing would provide major advantages in ensuring design integrity and minimizing devel opment time Design Compiler FPGA is an FPGA synthesis product intend ed for design teams who proto type ASICs using high end FPGAs DC FPGA is built on Design Compilers industry leading ASIC synthesis technol ogy and is then customized to include FPGA specific features DC FPGA inherits DC s reliability proven through the development of more than 125 000 ASIC designs As shown in Figure 1 DC FPGA shares the same compilers scripting language and SDC Synopsys Design Constraints as Design Compiler Because they use the same compilers DC and DC FPGA will interpret the RTL the same way This elim inates the manual changes in the RTL required when using different synthesis tools for FPGA and ASIC design Manually transforming gated clocks to the FPGA equivalent is not only very time consuming but also error prone DC FPGA Xcell Journal 109 can automatically transform gated clocks in the ASIC design to the FPGA equivalent This capability preserves clock gating func tionality while improving timing and elimi nates manual design modification These features along with full DesignWare Library building blocks support allow you to easily migrate designs between ASIC and FPGA implementations To
130. events throughout the year This is a perfect opportunity to meet our silicon and software experts ask questions see demonstrations of new products and technologies and hear other customers success stories with Xilinx products North America Europe Oct 31 Nov 3 MILCOM Monterey CA Nov 9 12 Electronica Munich Germany Nov 3 4 System on Chip Conference Milpitas CA Nov 9 TI Developers Conference Paris France Nov 3 Programmable World 2004 Plano TX Nov 11 Developer Conference Munich Germany Nov 4 Programmable World 2004 Raleigh NC Nov 16 TI Developers Conference Birmingham UK Nov 5 Programmable World 2004 Orlando FL Nov 18 TI Developers Conference Moscow Russia Nov 9 10 Denali MemCon 2004 Santa Clara CA Decal Programmable World 2004 Tel Aviv Israel Nov 10 Programmable World 2004 Boston MA Dec 8 Boundary scan Design For Test Nantes France Nov 11 Programmable World 2004 Ottawa Ontario Dec 8 9 IP SOC 2004 France Nov 12 Programmable World 2004 Baltimore MD Dec 9 10 EDA Forum Dresden Germany Nov 15 17 Software Defined Radio Forum Phoenix AZ Nov 16 Programmable World 2004 Chicago IL Nov 17 Mentor Graphics EDA Tech Forum _Dallas TX Asia Pacific Nov 9 Programmable World 2004 Hsinchu Taiwan e K Nov 12 Programmable World 2004 Seoul South Korea For more information and Nov 16 Programmable World 2004 Shenzhen China the most up to date schedule Nov 18 Programmable World 2004 Shanghai C
131. f development time The FPGA dynamic probe when combined with an Agilent 16900 Series logic analysis system allows you to access different groups of signals to debug inside your FPGA without requiring design changes You ll increase visibility into internal FPGA activity by gaining access up to 64 internal signals with each debug pin You ll also be able to speed up system analysis with the 16900 s hosted power mode which enables you and your team to remotely access and operate the 16900 over the network from your fastest PCs The intuitive user interface makes the 16900 easy to get up and running The touch screen or mouse makes it simple to use with prices to fit your budget Optional soft touch connectorless probing solutions provide unprecedented reliability convenience and the smallest probing footprint available Contact Agilent Direct today to learn more Agilent Technologies dreams made real DSP VN DIGITAL SIGNAL PROCESSING v Build Custom Real Time Video Applications Quickly and Easily You can use a board equipped with all required video 1 0 and a Virtey II FPGA to rapidly develop custom video processing functions by John L Smith Principal Engineer Titan Corp AP amp D Division john l smith titan com Visually guided tele operation is becoming ubiquitous in a variety of fields including medicine defense and industry A key requirement is low latency there should be minimum delay betw
132. f less than 2 ms Conclusion We saved a lot of time and effort using pre built boards and IP in the development process If we had to develop the board all Controls Inverse Scan Dequantization Saturation Mismatch ISDSM Table Address Figure 5 I frame decoder block diagram Inverse DCT of the associated software and all of the IP that went into the low latency decoder and display system would have taken years instead of months You can rapidly develop other video processing functions including e Other codecs H 264 MPEG 4 Motion JPEG2000 e Enhancement linear and non linear filters super resolution histogram equalization specification de convolu tion warping e Stabilization and mosaicing For more information on MPEG 2 read the book MPEG Video Compression Standard edited by Joan L Mitchell et al And for VigraWATCH system visit www titan com keyword search VigraWATCH e more information on the To VigraWATCH CCIR 656 Input Format Converter Xcell Journal 7 DIGITAL SIGNAL PROCESSING D Let System Generator Do the Handshaking You can use the state control capabilities of Xilinx System Generator for synchronous digital DSP realization by T Justin Campbell FPGA Programmer UVM tcampbel uvm edu If you are a DSP circuit designer you should not feel restricted by the basic Xilinx logic blocks when building your design Custom logic in a
133. f the design Before A Wu bad geet Figure 4 FPGA model IF processing of GSM Winter 2004 WS hijai ia Wat i foe hare t Lagim Mia l ate Da hamp 3 OPen Keep ug F n OZ HECK z Ferga Lee po f i KE Mai H bsn ie O St Bas ntsi heap desst ben d oe Bsse ieii fa maple 4 Lier KAS ee ih Eien i Figure 5 IF FPGA model demodulator subsystem Figure 6 Spectrum scope displaying the two GSM FDMA channels eee bp S P PEALL tier T Figure 7 Time scope displaying JO signals before and after the magnitude phase correction operation the GMSK demodulator receives the signal the receiver has more to do than just blind ly shift and down sample the signal from IF to baseband It must compensate for the effects of the channel on the signal which means it has to perform the following e Carrier frequency recovery e Carrier phase recovery e Amplitude adjustment e Timing recovery Winter 2004 We created low pass filters using the digital filter design block from the MAT LAB DSP blockset which were later replaced by the Xilinx FIR filters that use the same taps generated by the Simulink block Once the demodulator was func tional for ideal signals we added correc tion blocks to cope with non ideal signals You can see in Figure 5 that a squaring loop deals with carrier recov e
134. fers must be addressed correspondingly dur ing the motion estimation and compen sation stages in the encoder e Weighted prediction The JVT recog nizes that in encoding certain video scenes that involve fades having a weighted motion compensated predic tion dramatically improves the coding efficiency Improved Coding Efficiency In addition to improved prediction meth ods other parts of the standard design were also enhanced for improved coding efficien cy Two additional features are most likely to impact the overall system architecture based on our design criteria for software and hard ware partitioning e Small block size hierarchical exact match inverse and short word length transform The H 264 AVC like other standards also applies transform coding to the motion compensated prediction Winter 2004 residual But unlike previous standards that use an 8 x 8 discrete cosine trans form DCT this transform is applied to 4 x 4 blocks and is exactly invertible in a 16 bit integer format The small block helps reduce blocking and ring ing artifacts while the precise integer specification eliminates any mismatch Texas Instruments Video out DM 642 Video in VLC Rate Ctrl Ref Indx Pen ADD is the baseline entropy coding method of H 264 AVC Its basic coding tool consists of a single VLC of structured Exp Golomb codes which by means of individually customized mappings are ap
135. ffect device utilization Conversely board and system requirements may place limitations on your design such as number width and type of frame buffer memories Design Choices A key decision about the architectural style of the rotation algorithm is quality For example when moving the pixels of the original image grid to fractional locations on the rotated image grid is a nearest neighbor selection adequate Should you perform bi linear interpolation on the 64 Xcell Journal image to reduce shear the tendency to see discontinuities along object edges Is bicubic interpolation determining the new pixel value based on weighting and summing its closest 16 neighbors the pre ferred method Your choice of quality will impact resources in both the FPGA and the frame buffer Our example is based upon the Hotelling transform to determine the original image versus the rotated image pixel addresses You have two choices as to where the rotation engine occurs with respect to frame buffering 1 Place the rotation engine after the frame buffer This has the effect of sequential raster scan address writes into the frame buffer and allows for raster scan format of the output data by reading from the frame buffer in non sequential address form The Hotelling transform with basis vectors cos t and sin t for rotation angle t Sx Sy representing source x and y coordinates and Dx Dy representing destina tion x and y coor
136. filtered with an individual LC filter and a separate power plane for the analog supply also with spe cific filters No transceiver power sup ply was left unconnected regardless of whether it was used or not We used the same type of LC filters on the opti cal receivers e Approximately 350 power supply decou pling capacitors of three different values to match the main clock frequencies in use on the board were placed as close as possible to the central power pins of the Xilinx FPGAs Other capacitors were placed nearby each FPGA e Each FPGA received one high quality reference clock low jitter 100 ps peak to peak differential pair from an individual buffer We recommend using two independent reference clock sources to ease the internal usage of this clock on the FPGA if using all of the RocketIO transceivers Figure 1 The DCC board fully assembled with the nine Virtex II Pro FPGAs on the left RocketlO Implementation and Issues Virtex II Pro devices provide the first stage of processing for the front end data received from the on detector electronics on the DCC board Each device receives 800 Mbps of serial data on each of its eight channels from the optical receivers for a total of 6 4 Gbps per device In a nutshell the purpose of the Xilinx FPGAs is to process this data and prepare it for readout RocketIO transceivers are used to dese rialize the received data and perform 8b 10b decod
137. from 105 to 9 And because the DCCs will be in operation for four to five years it will have a huge impact on overall PCB design and the final cost of production and maintenance from a long term perspective Also after deserialization we will need to verify the integrity of received data and reformat it for downstream processing and analysis We found that the remaining resources in the selected device were enough 90 Xcell Journal for most purposes Of the 72 transceivers available we use 70 and leave the other two unconnected The use of 800 Mbps per channel is a system choice but the design could work at 1 6 Gbps or higher PCB Design Issues The DCC PCB is a 12 layer board with four power planes and eight routing layers We have mostly followed the main rules for high speed design and analog considerations from Chapter 4 of the Xilinx RocketlIO Transceiver User Guide such as e All high speed traces are impedance con trolled and routed manually in microstrip edge couple differential pair with impedance matched to 50 Ohms and as close as possible to the source respecting the crosstalk rules No other lines were designed in the same area as the high speed layout where the immediate layer was the ground power plane All high speed differential pair signals were AC coupled with 100 nf capacitors and internally terminated to 50 Ohms All of the transceivers power supply pins were
138. g a three tap filter The filter coefficients or strength are governed by a content adaptive non linear filtering scheme Directional spatial prediction for intra coding In cases where motion estima tion cannot be exploited intra direc tional spatial prediction is used to eliminate spatial redundancies This technique attempts to predict the cur rent block by extrapolating the neigh boring pixels from adjacent blocks in a defined set of directions The differ ence between the predicted block and the actual block is then coded This approach is particularly useful in flat backgrounds where spatial redun dancies exist There are a total of nine prediction directions for Intra_4x4 prediction and four prediction direc tions for Intra_16x16 prediction Note that the data causality imposes quick memory access to the neighbor ing 13 pixel values to the above and left of the current block in the case of Intra_4x4 For the Intra_ 16x16 16 neighboring pixels on each side are used to predict a 16 x 16 block e Multiple reference picture motion com pensation The H 264 AVC standard offers the option for multiple reference frames in the inter frame coding Unless the number of the referenced pictures is one the index at which the reference picture is located inside the multi pic ture buffer has to be signaled The multi picture buffer size determines the memory usage in the encoder and decoder These reference frame buf
139. gn tailored to those exact specifications Usage The Xilinx J 83 modulator implementa tion is available as a module that plugs into Xilinx System Generator for DSP or as a netlist that may be directly referenced by another design The design of the J 83 core in System Generator allows for generation with a simple push button solution Through a GUI constructed in the famil iar Simulink environment the core provides you with a convenient means of supplying design specifics such as the granularity desired the number of channels required and clock rates as shown in Figure 7 Figure 7 J 83 Annex B generator GUI screenshot During parameterization and generation the core is automatically configured to the specifications and deposited into the target directory Along with the netlist the core also includes behavioral and timing simula tion script files do for Mentor Graphics ModelSim and an ISE Project Navigator project file npl From this point on you can bring the core into the ISE Project Navigator environment for synthesis place and route and bitstream generation Resource Sharing The Xilinx FPGA implementation of the J 83 modulator specification capitalizes on a particular architectural feature to 54 Xcell Journal construct efficient multi channel imple mentations the shift register logic 16 SRL16 primitive found in Virtex HI Virtex II Pro and Spartan 3 devices You can think
140. gn environment Xilinx satisfies these requirements with a range of processors that includes the Xcell Journal 35 Virtex 4 FX devices include built in Ethernet connectivity enabling seamless chip to chip connections without consuming programmable logic resources PicoBlaze eight bit microcontroller soft core the MicroBlaze 32 bit general purpose processor soft core and the industry standard PowerPC architecture in the form of a performance optimized hard core Efficient Hardware Acceleration Using an FPGA with an embedded proces sor as a platform for programmable system design enables flexible partitioning of func tionality into hardware and software Immersing the processor in the FPGA logic fabric opens the door to the additional flex ibility of creating custom hardware to accel erate the execution of critical software Hardware acceleration enables designers to apply logic resources to achieve perform ance exactly where needed Creating hardware tightly coupled to the CPU to act on a set of operands can accelerate the execution of key software by performing in a single cycle calculations that take many cycles on a processor This performance boost is achieved by tuning the hardware design to provide the degree of parallelism required by the algorithm High Performance Flexible Hardware Acceleration Creating accelerators for FPGA based processors requires three elements pro grammable logic fabri
141. grated functionality Virtex 4 FPGAs provide 2x more density and boost performance as much as 2x while reducing power consumption by as much as 50 compared with previous generation FPGAs see sidebar Features at a Glance At the same time Virtex 4 FPGAs cut the cost of programmable system platforms by more than 50 enabling developers to adopt high performance FPGAs in an extraordinary range of products Higher Performance s Viretx 4 FPGAs attack the requirements for higher performance on several fronts First designers can improve system per formance thanks to the advanced 90 nm process and optimized FPGA fabric Xcell Journal 33 i The second approach is to include dedicat ed performance tuned circuitry for imple menting key system functions such as integrated processors DSP slices Ethernet MACs and serial transceivers For exam ple the embedded Virtex 4 XtremeDSP slice delivers up to 500 MHz performance and the RocketIO serial transceiver ranges from 0 6 to 11 1 Gbps unprece dented in the industry The third approach is the incorporation of powerful clock management capability enabling engineers to extract the maximum performance from the programmable logic fabric addresses designers demands for more flex Xesium clocking technology ible clocking with abundant resources up to 32 global clocks in each device and up to 20 digital clock manager DCM circuits Xesi
142. h manufactures the majority of the spectacu lar signs in Times Square When MultiMedia asked our company Advanced Electronic Designs Inc AED to design an LED sign for JPMorgan Chase in Times Square we needed a huge amount of signal processing data distribution and interfac ing We also needed to design the sign very quickly We met this challenge by utilizing the advantages of Xilinx components We used Virtex II XC2V1000 FPGAs for video processing and for control and distribution we chose low cost Spartan 3 XC3S200 FPGAs To configure the FPGAs we chose the Platform Flash XCFO00 config uration PROM family And for final distri bution of the data on the 3800 LED blocks we used XC9572XL PLDs Winter 2004 The Design An LED sign is like a large computer mon itor video data goes in and is displayed on the sign The sign comprises red green and blue LEDs that turn on and off pulse width modulation to generate more than four trillion colors What made this particular design a challenge was the scale both in terms of physical size as well as the amount of data and the transfer rates involved The sign is 135 feet long and 26 feet tall With nearly two million pixels it is the highest defini tion LED display in the world This is ten times the resolution of the average televi sion screen and twice the resolution of top of the line HDTV sets After considering our options design ing with Xilinx p
143. hat can do the heavy lifting in a complementary fashion to programmable DSPs Providing extremely high performance DSP solutions has become so important at Xilinx that we have recently created a DSP division We are consolidating our DSP resources and creating focused development platforms and reference designs to help designers of high performance DSPs get up to speed on our solution quickly and cost effectively In the following series you will find articles on prototyping developing imple menting and analyzing in the context of real world high performance DSP applica tions using tools from Xilinx and our partners Winter 2004 Table of Contents Implementing H 264 AVC Video Coding Standard GSM Modem on a DSP FPGA Architecture Design Kits Turbo Charge DSP Applications System Generator to Create J 83 Cable Modulator Implementing DSP Algorithms in FPGAs XtremeDSP Slices Deliver More GMACs Image Processing Algorithms with System Generator Simulink Brings Model Based Design to Embedded Signal Processing Interfacing Simulink to the Analog World Build Custom Real Time Video Applications Let System Generator Do the Handshaking Early Access The Designer s Edge Xcell Journal 39 DIGITAL SIGNAL PROCESSING Y trenmeDSP Implementing the H 264 AVC Video Coding Standard on FPGAS Xilinx Virtex FPGAs provide excellent co pre and postprocessing hardware acceleration solutions
144. he SMT148 has at its heart a powerful embedded controller that lever system ages the Xilinx Virtex II Pro FPGA with its embedded PowerPC 405 processor Figure 2 As an embedded system controller the role of the Virtex II Pro device is to man age the reconfigura tion of the add on modules especially when downloading Reconfiguration means switching between modes or updating a hardware software component In the global functioning of the SMT148 you can download many kinds of software high level applications protocol stacks low level signal processing algo rithms and employ several methods to UART USB FireWire Interface Power Management 100 MHz Crystal ADC Circuitry 8 channels 14 bit 400kSps DAC Circuitry 8 channels 16 bit 200kSps cho ch7 cho ch7 EMBEDDED SYSTEMS Je download software The eight RocketIO transceivers on the Virtex II Pro device enable high speed data transfer to additional SMT148 or Virtex II Pro add on modules Downloads can leverage a powerful I O architecture that includes the popular FireWire USB interfaces LVDS interfaces and JTAG for debugging and downloads Data flows into the FPGA and is managed by a Sundance program written for the embedded PowerPC before processing Figure 3 is the C code comprising the data flow and RocketIO PowerPC program Scalable Reconfigurable Embedded Processors Scalability is addressed through four add
145. hina visit www xilinx com events Japan Nov 18 19 Programmable World 2004 Yokohama Japan Nov 29 Programmable World 2004 Osaka Japan FREE E Online Seminar KA Ee a 2 Ga Verification of Your Embedded Rill Va EK re CA FPGA Design Seamless FPGA for Xilinx Virtex II Pro gt XILINX During this session you will learn how to eLeverage Platform FPGAs for embedded systems eUtilize the tightly integrated solution of Seamless with Xilinx Platform Studio XPS eEasily debug complex hardware software interactions eMeasure software and hardware performance of the FPGA system Learn more today http www mentor com fv events seminars xilinxonline For additional details about Seamless FPGA Menor visit us at www seamlessfpga com celojal Xcell Journal 55 VNU DIGITAL SIGNAL PROCESSING ee Implementing DSP Algorithms in FPGAs A detailed overview of Xilinx System Generator tor DSP using a QAM System desian exar i Ne I by Sabine Lam DSP Technical Marketing Engineer Xilinx Inc sabine lam xilinx com Today the role of FPGAs in the DSP mar ket is well understood there simply is no better way to create ultra fast DSP applica tions Yet combining these two technolo gies can be a challenge DSP designers primarily use The MathWorks MAT LAB or C C to specify systems whereas FPGA designers use VHDL or Verilog The only common approach between these two is that they often
146. hronous step down range from 3 3V up to 36V Its output voltage can be programmed down to 0 6V Figure 7 shows the schematic of a dual output 2 5V 15A and 1 8V 15A converter Figure 7 High Efficiency 2 5V 15A and 1 8V 15A Dual Output Converter with Output Tracking Xcell Journal ADVERTISEMENT 87 FPGA Power Solutions As shown in Figure 8 up to 95 effciency can be achieved The LTC3708 has output voltage up down tracking capability It allows both pe d d MEn LET a fee ug 20Vin TO 2 5VouT pn e Dm TO 2 5VouT EFFICIENCY 20V in LBE elt bm TO 1 8Voyt 0 0 01 0 1 1 1015 LOAD CURRENT A Figure 8 Efficiency vs Load Current for the LTC3708 Converter coincident and ratiometric tracking as shown in Figures 9 and 10 The ramp rate can be selected by a soft start capacitor from RUN SS pin to ground Multiple LTC3708s can easily be daisy chained in applications requiring more than two voltages to be tracked The 2 phase operation of the LTC3708 reduces power loss and noise and lowers the input fltering requirement FREE Development Tool Offering For qualifed individuals To help facilitate and expedite your FPGA power supply design Linear Technology offers development boards For additional information and to see if you qualify for this free offer visit www nuhorizons com linear www nuhorizons com 88 Xcell Journal ADVERTISEMENT The constant on time valley current mo
147. hrough both advanced architecture and silicon fabrica tion technologies Applications for our high performance DSP capabilities are growing Broadcasting or video conferencing for high definition television for example is rapidly being converted to the H 264 format This stan dard requires a lot of processing power as the target is to have the quality of MPEG Strasser lt the breiar Sophisticated motion compensation schemes are being used to achieve this goal Standard video processors can perform this Winter 2004 function at smaller screen sizes up to com mon intermediate format CIF resolu tions but to go beyond this to standard definition SD or high definition HD requires the performance of a Xilinx FPGA to perform some of the more math inten sive functions such as motion estimation in conjunction with a programmable video processor Our DSP capability makes Xilinx the technology of choice for these new demanding applications For years the only other solution for these very high performance DSP applications was custom devices ASICs Yet ASICs take far longer to design cost much more to develop cannot easily be modified to meet changing requirements and are risky because of their complexity Xilinx programmable devices and development tools provide a far better solution with less overall cost Today the high performance FPGA based DSP market alone is worth more than 200 million and we have o
148. idges this size gap It can be used for a limitless number of things from FIFOs for processing engines to loadable tables for data con versions The flexible port widths of block RAM allow you to use them individually or in efficient combina tions The dual port capability makes them easy to use for transferring data between clock domains or sharing data e While very powerful and convenient the PLDs and Spartan 3 FPGAs are also very inexpensive When combined with the development advantages the low device price makes Xilinx devices unbeatable when developing high per formance embedded systems PicoBlaze Processors Device hardware capabilities are essential for any design but development tools and tricks are also very important The favorite toy in our Xilinx bag of tricks is the PicoBlaze processor We could not have completed the project in the time allowed without extensive use of the PicoBlaze processor The sign contains an impressive count of more than 1 000 of these embed ded processors with nine different designs PicoBlaze processors provide efficient logic resource utilization by time multiplex ing logic circuits Many functions especial ly control functions do not need to be 10 XC2V1000 Virtex ll FPGA 323 XC3S200 Spartan 3 FPGA 333 1 vru Platform Flash PROM 3 800 XC9572XL 72 macrocell PLD Table 1 This sign includes nearly 4 500 Xilinx devices Gbps video processing Billion 1
149. ile had problems in the past with execution in Simulink These problems arose from the fact that System Generator did not appear to allow for mul tiple sampling rates for instance a clock and its respective down sampled version This problem has since been alleviated with the addition of a clock enable probe sys tem generator block This block lets you effectively up and down sample a clock rate such that multiple clock rates are allowed within the same model Figures 6 and 7 illus trate an example of the clock enable probe DSP Circuitry Synchronization Synchronous design goes hand in hand with the development of DSP circuitry Therefore it is important to be able to real ize synchronous design in the high level abstraction that System Generator provides Note that Xilinx delay blocks are used for delaying enable signals for a duration that matches computation effort time You can use the output of this delay as an effec tive output enable as shown in Figure 8 These delays are of such importance that before enabling the second block in a chain you want to make sure the first block has completed its computation Exploring FFT_power mdl demon strates that latency requirements increase when the precision of the inputs and output of the multiplier block increase Thus the delays need to be modified when greater computational effort and thus greater time require ments in terms of Xilinx bock late
150. ility of software and hardware partitioning of the H 264 AVC coding standard implementa tion on a platform that consists of a mixture of FPGAs programmable DSPs or general purpose host processors we need to look at a number of architectural issues that influ ence the overall design decision e Data locality In a synchronous design the ability to access memory in a par ticular order and granularity while min imizing the number of clock cycles due to latency bus contention alignment DMA transfer rate and the types of memory used such as ZBT memory SDRAM and SRAM is very impor tant The data locality issue is primarily dictated by the physical interfaces between the data unit and the arith metic unit or the processing engine Data parallelism Most signal process ing algorithms operate on data that is highly parallelizable such as FIR filter ing Single instruction multiple data SIMD and vector processors are par ticularly efficient for data that can be parallelized or made into a vector for mat or long data width Winter 2004 FPGA fabric exploits this by providing a large amount of block RAM to sup port numerous very high aggregate bandwidth requirements In the new Xilinx Virtex 4 SX device family the amount of block RAM matches closely with the number of Xtreme DSP slices SX25 128 block RAM 128 DSP slices SX35 192 block RAM 192 DSP slices SX55 320 block RAM 512 DSP s
151. ill then have to be synthesized and place and routed or you can decide to go straight to a bitstream or even target a hardware demonstration board Selecting hardware co simulation makes it possible to incorporate a design running in an FPGA directly into a Simulink simu lation Hardware co simulation compila tion targets automatically create bitstreams and associate them with blocks When the design is simulated in Simulink results for the compiled portion are calculated in hardware This allows the compiled por tion to be tested in actual hardware and can speed up simulation dramatically In addition to supporting specialized interfaces such as the XtremeDSP kit System Generator provides a generic inter face that uses JTAG and the Parallel Cable IV to communicate with FPGA hardware This takes advantage of the ubiquity of JTAG to extend System Generator s hard ware in the simulation loop capability to numerous other FPGA platforms System Generator also allows you to run hardware co simulation on your own development board You can make design decisions and changes earlier in the design Winter 2004 process and accelerate the design cycle directly from the Simulink environment Hardware Acceleration The complexity and size of the QAM sys tem resulted in lengthy simulation times To accelerate the simulation and confirm the functionality of this system the trans mutter and the receiver were downloaded to two
152. in this implementation of the OBS NIC The first channel on the OBS NIC connects to an off the shelf Gigabit Ethernet card plugged into the host This channel carries data and host messages between the OBS NIC and the host The second channel is for signaling and connects to the OBS network controller it carries the JIT OBS signaling messages The third channel is used as the data channel and is connected to the optical front end card The optical front end card consists of an optical tunable transmitter and receiver The OBS NIC generates the tuning commands for the laser and optical receivers on the optical front end card Figure 2 illus trates the architecture of the OBS NIC The Virtex II Pro FPGA on the OBS NIC uses a PCS PMA core and a MAC layer to con nect the external gigabit chan nels to the JIT engine The JIT engine implements the JIT OBS protocol in the OBS NIC Functionalities for both the source and destination EMBEDDED SYSTEMS s state machines of the JIT OBS client are implemented in the JIT engine The JIT engine processes three kinds of messages messages from the host signaling messages from the network and internally generat ed timing messages The JIT engine uses two functional state machines FSM the scheduling FSM using a round robin scheme picks up a message from one of the three mes sage queues for different types of mes sages and dispatches them for further processing while
153. ing The 16 bit data is then Winter 2004 written in a programmable latency bufter to match the trigger latency A number of data verification checks are carried out The data is finally formatted into 64 bit words and written into FIFOs From there it is read out by the event builder on the board Without going into the details of the functionality we will focus on the various issues we faced and solved in making the real hardware churn out correct data with a focus on the use of RocketIO trans ceivers Much of what we learned was on a trial and error basis The main issue was related to the reference clock which we ll describe in detail in the next section The other significant issue that we faced was the alignment of the K character within the 2 byte data path of the received data We were initially using the Gigabit_Ethernet primitive in half rate mode for a 2 byte data path But we observed that not all of the channels were putting the K character in the same place within the 2 byte word and there was no way to force this alignment in the Gigabit_Ethernet primitive the ALIGN _COMMA_MSB parameter of this primitive is set to FALSE by default Because our protocol expect ed the K to always appear on the LSB of the word we switched to the GI_CUSTOM primitive where we could force the alignment and sub sequently swap the position of K to the LSB of the data The simulations showed perfect alignment but in real har
154. ion and late detection that are at the root of these problems Winter 2004 d p EE bk i N S ra el Ou S Ben all D Spec Design CT Introduced Ei Detected Implement Test Design Stages Figure 1 Typical patterns of design defects using conventional flows which lead to rising costs and late product deliveries Tacking new verification techniques or language extensions onto traditional design tools and flows is not enough to effectively improve the development process These incremental improvements do not elimi nate the aspects of traditional flows ambiguous text based specifications man ual implementation and after the fact test ing that produce expensive errors and jeopardize delivery timelines The MathWorks has demonstrated that Model Based Design with Simulink produces dramatic reduc In contrast tions in development time cost and risk These benefits have been documented in the aerospace automotive communica tions and semiconductor industries wherever the application requires real time signal processing communications and control logic Reinventing Development The use of FPGAs for high performance DSP is a natural application for Model Based Design Getting the most out of FPGA hardware requires insight into algo rithmic and architectural complexities at the same time To do this architects need tools that offer direct
155. ired Typical applications are door control window lift lock and mirror control seats climate reg ulation lighting and rain sensors In these units the cost sensitive nature of LIN enables the introduction of mechatronic ele ments such as smart sensors actuators or illumination They can be easily connected to the car network and become accessible to all types of diagnostics and services Outside the automotive sector LIN is used for machine control as a sub bus for CAN SMARTwireX Copper Twisted Pair D2B MOST Token Ring Optical Bus ByteFlight LIN Time Triggered J1850 Master Slave Single Wire No Crystal Bluetooth Wireless Bus Figure 1 Relative cost per node of automotive networks tion This communication concept enables the exchange of data in various ways from the master node using its slave task to one or more slave nodes and from one slave node to the master node and or other slave nodes It is possible to communicate signals directly from slave to slave without the need for routing through the master node or have developed robust and fully verified IP cores aimed at FPGA and CPLD architec tures One example is their LIN core which occupies a fraction of a low cost FPGA for example 13 of a 200 000 system gate device thus leaving space for additional LIN nodes CAN nodes UARTs soft core processors or simply glue logic Programmable logic has long been accepted as an effecti
156. ith Xilinx FPGAs and Texas Instruments TI DSPs The Spartan 3 based design kit is optimized for simple video applica tions while the Spartan HE based kit is targeted at audio applications The Virtex II Pro kit features an adap tor card that interfaces to TI DSPs and is meant for co processing applications where the FPGA is offloading significant processing and control functions from the digital signal processor Each of these design kits also includes a variety of software tools from Xilinx The MathWorks and TI Let s describe these kit components in more detail Winter 2004 es ai D ES KI RB a reg kl EE ao eh d w ee EST FT rererreerrel d E rT b bg echte KT RTE KC aes si a STEEN A n D A i mh H CC ln ee geek td ECH i S Le TT SCH IUTSCLLLLULUL iii E lt St J Figure 1 DSP Co Processing Design Kit hardware platform DSP Co Processing Design Kit The DSP Co Processing Design Kit fea tures a Virtex II Pro evaluation board shown in Figure 1 This board contains a Xilinx XC2VP7 FF896 FPGA eight SMA connectors for high speed I O on board DDR SDRAM 64 MB up to 30 LVDS pairs user I O switches LEDs and several expansion connectors Two of the expansion connectors are compatible with the TI adaptor daughter card shown in Figure 2 and can connect to TI DSPs Example designs show how to interface directly with the TI processor
157. ity Although this example shows an image rotation implementation these same methodologies can help you develop any image processing algorithm If you have any questions or sugges tions call me Daniel Michek at 858 431 5901 or send me an e mail at daniel michek xilinx com Xcell Journal 65 WINE DIGITAL SIGNAL PROCESSING DIGITAL SIGNAL PROCESSING YiremeDSP Simulink Brings Model Based Design to Embedded Signal Processing The complexity ot FPGA based signal processing systems drives the need for new development approaches S RF SS a on s i e ke KN 66 Xcell Journal VK k E as ken karmotsky mathworks com EP Many of today s highly integrated embed ded hardware and software systems rely on sophisticated signal processing and com munications Dramatic increases in silicon and algorithmic complexity in these sys tems have triggered a corresponding rise in design and verification costs Several studies have noted the impact of the complexity challenge A Collett International Research study reported by Jack Horgan stated that only 39 of IC designs were bug free at first silicon in 2002 Embedded Market Forecasters found that more than 50 of embedded projects are behind schedule and one third failed to achieve 50 of performance and functional expectations Figure 1 shows the typical patterns of early defect intro duct
158. ix tap FIR filter 1 5 20 20 5 1 32 horizontally and vertically Prediction values at quarter pixel posi tions are generated by averaging sam ples at the full and half pixel positions These sub sampling interpolation oper ations can be efficiently implemented in hardware inside the FPGA fabric e Variable block sized motion compensa tion with small block size The stan dard provides more flexibility for the tiling structure in a macroblock size of 16 x 16 pixels It allows the use of 16 x 16 16x 8 8x16 8x8 8x4 4 x 8 and 4 x 4 sub macroblock sizes Because of the increasing combina tions of tiling geometry with a given 42 Xcell Journal 16 x 16 macroblock to find a rate distortion optimal tiling solution is extremely computationally intensive This additional feature places an enormous burden on the computa tional engines used in motion estima tion refinement and mode decision process In the loop adaptive deblocking filter ing The deblocking filter has been suc cessfully applied in H 263 and MPEG 4 part 2 implementations as a post processing filter In H 264 AVC the deblocking filter is moved inside the motion compensated loop to filter block edges resulting from the predic tion and residual difference coding stages of the decoding process The fil tering is applied on both 4 x 4 block and 16 x 16 macroblock boundaries in which two pixels on either side of the boundary may be updated usin
159. lices e Signal processing algorithm paral lelism In a typical programmable DSP or a general purpose processor signal processing algorithm parallelism is often referred to as instruction level parallelism ILP A very long instruc tion word VLIW processor is an example of such a machine that exploits ILP by grouping multiple instructions ADD MULT and BRA to be executed in a single cycle A heavily pipelined execution unit in the processor is also an excellent example of hardware that exploits the paral lelism Modern programmable DSPs have adopted this architecture includ ing the Texas Instruments TMS320C64x However not all algorithms can exploit such parallelism Recursive algorithms like IIR filtering variable length coding VLC in MPEG1 2 4 context adaptive variable length coding CAVLC and context adaptive binary arithmetic coding CABAC in H 264 AVC are particularly sub opti mal and inefficient when mapped to these programmable DSPs This is because data recursion prevents ILP from being used effectively Instead dedicated hardware engines can be built efficiently in the FPGA fabric Computational complexity Programmable DSP is bounded in computational complexity as measured by the clock rate of the processor Signal processing algorithms imple mented in the FPGA fabric are typical ly computationally intensive Some examples of these are the sum of DIGITAL SIGNAL PROCESSING MVE
160. linx XC2VPZ circuits on a data concentrator card greatly reduced costs and PCB design emt and increased board reliability by Jose C Da Silva Design Engineer LIP Laboratorio Instrumentacao e Particulas Lisbon jc silva cern ch Adarsh Jain Design Engineer LIP Laboratorio Instrumentacao e Particulas Lisbon adarsh jain cern ch ep bp pp Emh hnn CEET REECH RE ECH H Implementing 70 high speed differential j pairs on a 9U PCB using regular off the gt shelf deserializers can be a nightmare high ee speed PCB design noise clock jitter and ee signal integrity are the main challenges i GEES EH Even the smallest deserializer packages ST a be ii would occupy roughly two thirds of a 9U H E ms board on which you would still need space F oH JE for the logic configuration memories i SS e BS access interfaces and local control coe eas a Our design concerns a data concentrator ERS EE GE card DCC part of a large high energy E A S zem 3 physics experiment at the European EE Organization for Nuclear Research ea b CERN in Geneva A very large particle SCH H accelerator called the Large Hadron H Collider LHC is being constructed near pars the Franco Swiss border west of Geneva A ER ES rae number of experiments will be conducted EE EE EE to observe and measure the various proper ties of several existing and possibly new fundamental particles Winter 2004 Xcell Journal 89
161. ll Journal 17 g EMBEDDED SYSTEMS Figure I Outline of an MLD enabled system design Installing MLD files in XPS The installation CD of Nucleus PLUS installs the RTOS associated drivers and the two Nucleus configured MLD files that enable you to use MLD technology within the Xilinx Platform Studio EDK The default install path for the MLD files is the nucleus bsp sub directory located in edk_user_repository Dag Sites Brtider 0 berthed Cengtatalatens Phe fave pees Basti tem nonnii Emeri ee eek ee Dn Ga Preh Jakem H rpi bi s t kr Zeck eed k keegg pormi aaj o fr en seent i VEER riinan i T eatin i aT iia T rh A1 ee at CUERRS zeges ka Ia pate Ga tenders d s ran oie eist p Ee rend ee ee a u biim emim p rey Figure 2 The hardware design is complete and ready to configure the software 18 Xcell Journal Accelerated Technology supplies these files and the associated installer as an eval uation disk included in the latest release of Xilinx EDK 6 31 Accelerated Technology has also established a website to support and distribute this evaluation This site contains updates evaluations reference designs and documentation for all of the Accelerated Technology Xilinx offerings and will be updated regularly with new middleware implementations that you can add to the automatic configuration of your application The website is located at www acceleratedtechnology com xilinx To get
162. log netlists e The Xilinx DSP library now supports Virtex 4 FPGAs allowing you to develop designs faster e A range of services are now available as you implement your DSP design onto Virtex 4 FPGAs These include DSP design services education classes and platinum technical support Conclusion FPGA based DSP has always been associ ated with high performance when hun dreds of GMACs s rates are needed Virtex 4 FPGAs bring a new revolutionary era in the XtremeDSP initiative that pro vides you with economic incentives to use FPGAs and get your design to market faster than ever before To understand how to use the new XtremeDSP slices in your next design attend the Virtex 4 session in the DSP track at Programmable World 2004 or watch the demo on demand that will fol oye low the event at www xilinx com dsp Winter 2004 Developing Image Processing Algorithms with System Generator Syst by Daniel E Michek Staff Field Applications Engineer Xilinx Inc daniel michek xilinx com Converting image processing algorithms to FPGA implementations can be tedious The algorithm may be proven in software but with no direct link to actual implementa tion Additionally it can be difficult to sub jectively verify the implementation Using a mathematical simulator to verify and create HDL implementation files bridges the gap from the algorithm architect to the FPGA engineer Xilinx System Generator for
163. ltage requirement for most of the FPGA device families Typical 1 0 voltages Veco vary from 1 2V to 3 3V For Virtex II Pro and Spartan 3 the auxiliary voltage Vecayx is 2 5V It is 3 3V for Virtex Il Each FPGA family has a specific quiescent supply current ranging from under 100mA to about 2A For applications with multiple FPGAs the core supply current can be higher than 10A With multiple voltage rails in today s systems FPGA DDR memory data converter ICs etc supply sequencing and tracking are quite important for proper start up and shutdown Ramp time requirement should also be satisfied For example the recommended ramp time tccpo for the core voltage Vecwr is less than 50ms during power on Some Xilinx FPGA families also have minimum Veqyr ramp time requirements New dual output DC DC regulators from Linear Technology the LTC 3407 LTC3736 and LTC3708 greatly simplify the design of an optimal power supply solution for systems using Xilinx FPGAS LTC3407 Dual Synchronous 600mA DC DC Regulator The LTC3407 is a dual synchronous Vouma d DN AT BOUMA 10pf DC DC converter with integrated power switches It provides step down a compact and high efficiency power solution for FPGAs with supply currents up to 600mA The switching regulator operates from a 2 5V to 5 5V input voltage range and has an adjustable output range from 0 6V to 5V Its internal 1A switches provide up to 96
164. m This type of infrastructure would have taken significantly longer to implement using traditional methods Forthcoming developments in future releases of DIMEtalk will include addition al interface support and links directly into algorithm development tools making application development even easier further about For information www nallatech com DIME talk dimetalk e Visit au deii IER el reese frm mmm ee eme ms tem men rm me Figure 6 Screenshot of DIMEtalk network portion for example system Xcell Journal 107 Designing Once tf ASIC Prototypes P Le BU AT d z M tt FE d Mi AAA n aig e FS i eee i e i iy Pe f aot i Wi vd Va i i KS Design Compiler FPGA offers an industry standard ASIC strength solution and optimal circuit timing results through a common ASIC and FPGA flow by Mark Patton Product Manager FPGA Synthesis Synopsys Inc mpatton synopsys com Todays ASIC designers face a host of pro totyping challenges Most ASIC prototypes require the largest most advanced FPGAs available such as Xilinx Virtex 4 devices Many are required to run at full speed particularly for wireless designs Therefore timing quality of results QoR is critical Plus using incompatible synthe sis solutions involves a time consuming and error prone manual effort to move designs between the ASIC and the prototype To address these challenges
165. mated C to RT L compilation quickly generates hardware representing test producer or consumer functions These func tions interact with the unit under test using FIFO or other interfaces to implement data streams and supply other test inputs The CoDeveloper C to RTL com piler analyzes C processes individual functions that communicate via streams signals and shared memories and generates synthesizable HDL compatible with Xilinx Platform Studio EDK Xilinx ISE and third party Synplicity Figure 3 The generated synthesis tools including RTL is automatically parallelized at the level of inner code loops to reduce process latencies and increase data rates for output data streams Automated compilation capability with the ability to express system level parallelism creating multiple 22 Xcell Journal Impulse Platform Libraries pipelined processes for example makes it possible to generate hardware directly from C language at orders of magnitude faster than the equivalent algorithm as implemented in software on the embed ded microprocessor This creates hard ware test generators that generate outputs at a high rate Does C Based Testing Eliminate the Need for HDL Simulators C based test methods such as those described in this article are a useful addi tion to a designers bag of tricks but they are certainly not replacements for a com prehensive hardware simulation HDL simulation can be an eff
166. memory to minimize power dissi pation and deliver increased performance while its wealth of middleware makes it ideal for products targeted at the network ing telecommunications data communi cation and consumer markets Making this solution easy to configure within Xilinx EDK easily exploit the benefits of this powerful allows you to product For more information visit www acceleratedtechnology com or www mentor com e Xcell Journal 19 an EMBEDDED SYSTEMS MicroBlaze and PowerPC Cores as Hardware Test Generators Combining FPGA embedded processors with C to RTL compilation can accelerate the testing of complex hardware modules by David Pellerin CTO Impulse Accelerated Technologies david pellerin impulsec com Milan Saini Technical Marketing Manager Xilinx Inc milan saini xilinx com Regardless of whether you are using a processor core in your FPGA design using a Xilinx MicroBlaze or IBM PowerPC embedded processor can accelerate unit testing and debugging of many types of FPGA based application components C code running on an embedded proces sor can act as an in system software hardware test bench providing test inputs to the FPGA validating the results and obtaining performance numbers In this role the embedded processor acts as a vehicle for in system FPGA verification and as a comple ment to hardware simulation By extending this approach to include not only C compil
167. mplemented using Virtex II Pro FPGAs Although not the ultimate goal this reduction in power goes some way in addressing the power concerns of infrastructure equipment providers Another way to reduce system power consumption in such applications is to use the embedded processor capabilities avail able on the FX platform You have the option to trade gates for processor cycles for sequential control tasks using FX platform devices Examples of such implementations include software communication architec tures or real time operating systems High Compute Density Using SRL16s Shift Register Logic SRL16 is a unique feature in Xilinx FPGAs A popular feature for increasing compute density in multi channel implementations SRL16s are included in all Virtex 4 platforms To demonstrate SRL16 usage lers take a look at a simple Reed Solomon encoder example Implementing a single channel Reed Solomon encoder in a Virtex 4 device consumes 56 logic slices For a 16 YQ E GF Multiplier One SRL16 GF Adder channel implementation one approach would be to replicate this 16 times result ing in a consumption of 16 x 56 slices Figure 3 shows another implementation of the 16 channel solution using SRL16s This consumes only 86 logic slices repre senting only 10 of the 16X replicated version SRL16s can substantially pack more signal processing into a smaller area allowing you to potentially target a much smaller device than i
168. mulink library shown in Figure 2 It offers the fol lowing features e Drag and drop P160 analog compo nents from the Simulink library browser Winter 2004 e Supports HDL co simulation from Simulink e Supports common compilation types in Xilinx System Generator for DSP 6 3 Hardware co simulation type gener ates board specific I O ports to FPGA pins connected to the P160 Analog Module HDL netlist type generates top level FPGA I O pins connected to the P160 Analog Module e Automatically detects target part pack age from the System Generator token e Supports all Memec FPGA development boards with P160 expansion connector e Installer for automatic Simulink library creation Interfacing to External Analog Signals Let s describe three design techniques using various features of the P160 Analog Module to interface to analog signals dur ing development of a DSP design in Simulink Memec P160 Analog DAC in HDL Co Simulation Our first design technique uses HDL co simulation a Xilinx System Generator fea ture that lets you incorporate your HDL code into Simulink though a black box Simulink doesn t interpret HDL directly lake F b i DIGITAL SIGNAL PROCESSING Ei rather it invokes the Mentor Graphics ModelSim HDL simulator with which it exchanges I O data during simulation When the design is compiled to hard ware the HDL code is included for syn thesis This technique
169. n your design specialty From course description to enrollment fewer clicks get you where you need to go faster see Figure 1 Our usability studies showed us where we could make a significant impact on our customers training experi ence said Rohan Thompson web redesign project manager The subse quent overhaul to the Education Services website reflects what our customers told us they wanted Winter 2004 Easy Payment Methods The training catalog now offers more payment meth ods to offer flexibility and quicken the enrollment process Simply use your credit card when registering for courses in the training catalog or apply your compa ny s training credits or purchase order number toward training services Be sure to check with your compa ny representative to find out whether you have low cost payment options through a Xilinx Productivity Advantage XPA agreement The XPA offers all Xilinx software educa tion support services and IP cores in one package customized to meet your needs Quite a number of training modules in the training catalog require no payment at all New Services Education Services currently offers more than 14 instructor led courses five live online courses that together include 17 hands on labs and 14 recorded e Learning modules to help you maximize your pro ductivity and get your designs to market faster Register for any of these learning opportunities in the training cat
170. nct remote sources The PCI PCI bridge allows you to install VW in either 3 3V PCI or 5V PCI systems the PPC is not 5V I O tolerant Peripherals Inputs to the VW FPGA include e A stereo audio digitizer e A video digitizer decoder The video decoder accepts standard def inition NTSC and PAL format analog video from one of four composite sources or one of two S video sources Outputs from the VW FPGA include e Two SVGA DACs capable of driving independent displays e An audio DAC producing standard line level stereo audio output The two DRAM banks attached to the 16 Xcell Journal FPGA are independent each is capable of 1 6 Gbps peak bandwidth One is associat ed with the graphics engine in the FPGA the other is typically used by video process ing functions Digital I O connectors 1 and 2 each support 22 bi directional LVTTL signals To 8 bit CCIR 656 External to TV Out Video DRAM N ADC Decoder 10 bit CCIR 656 CODEC A 8 bit CCIR 656 CODEC B 8 bit CCIR 656 l Frame Decoder Figure 4 Section of FPGA internal structure including I frame decoder as well as a few auxiliary connections You can use the digital I O to connect another board directly to the FPGA or to connect two VW boards together Digital I O connector 3 has 16 LVTTL pins and can be used for a video interface port or as a convenient place to bring out de bugging test points Software The PPC runs Mon
171. ncy result from overall design changes gt You could add greater flexibility to the Xilinx blockset with the addition of extra input parameters to some of the blocks in the set For instance the count limited counter does not offer a count to value as a possible input parame ter Therefore dynamic configuration of the counter threshold is unrealizable Output enables allow one stage to sig nify it is complete and thus for the next stage to start Currently most of the blocks in the blockset do not provide signals telling you when the block is finished its computation This would be helpful in the handshaking for several Xilinx blocks However you can realize output enable signals and counter threshold input signals by generating a VHDL file using the Xilinx Core Generator tool and then configuring a Xilinx black box But this requires more time from the designer and greater engineering effort as well We have to consider flexibility the addition of output enables and other input parameters to offer dynamic configuration of blocks versus a trade off in maintaining the abstraction desired by DSP designers lacking strong digital design skills It is therefore important to consider these ideas in future versions of System Generator Conclusion A custom logic design may seem like a daunting task but with the flexibility offered by Xilinx System Generator it is quite achievable Xilinx MCode and black box block configur
172. ndance Flemming C sundance com There has been a strong push in the past few years to replace analog radio systems with digital radio systems The Department of Defense Joint Tactical Radio System program shifted the empha sis on the development of software defined radio SDR to the forefront of research and development efforts in the defense sivilian and commercial fields Although it has existed for many years SDR technology continues to evolve through newly funded ventures The holy grail of SDR is its promise to solve incom patible wireless network issues by imple menting radio functionalities as software modules running on generic hardware plat forms Future SDR platforms will comprise hardware and software technologies that _ enable reconfigurable system architectures for wireless networks and user terminals Although new SDR based systems are purported to be highly reconfigurable and reprogrammable right now the truth is that SDR hardware platforms are still in their early development stages Many issues must still be resolved including reconfig urable signal processing algorithms hard ware and software co design methodologies and dynamically reconfig urable hardware Overall the main key issues for SDR embedded system platforms are flexibility expandability scalability reconfigurability and reprogrammability Winter 2004 Ga KA CS ve LLL eh Gk ee A WANN ieee Figure 1
173. nerator for DSP allow you to simply apply the desired function The addition of new XtremeDSP slices allows you to implement many such func tions within the slice and without the need e High Performance 500 MHz fully pipelined e High Integration 40 DSP arithmetic operation modes Directly cascadeable e Easy to implement Software configuration wizards A 18 bit B 18 bit C 48 bit 2 2 D Q D o EH a E C 2 e BCOUT D te fei D f E D bodies Power consumption is also becom ing a key concern for some military appli cations such as Joint Tactical Radio Systems radios The integrated XtremeDSP slices on Virtex 4 FPGAs eliminate the need to use logic slices for many signal processing and arithmetic tasks reducing the need for power consuming routing resources Initial PCOUT Multiplier Optional Register P 48 bit Routing Logic Optional Pipeline Register Routing Logic BCIN PCIN Figure 2 New XtremeDSP slices feature 18 x 18 bit multiply 48 bit accumulator Xcell Journal 6l Ne DIGITAL SIGNAL PROCESSING Virtex 4 FPGAs bring a new revolutionary era in the XtremeUSP initiative that provides you with economic incentives to use FPGAs and get your design to market faster than ever betore power estimates show that XtremeDSP slices consume only 57 pW MMAC repre senting one seventh the power for an equivalent function i
174. ng phases Plus this is a flexible approach as the FPGAs are reprogrammable and a more economical solution in the long term We are currently considering migrating to a bigger Xilinx device as our processing requirements from the FPGAs increase Therefore we are studying the new devices available and how such a migration will affect our PCB design in terms of the rout ing of the high speed lines We believe that by following the design rules concerning high speed design like clean clock distribution power supply filtering and good routing of the internal reference clocks it is possible to obtain a successful design in good time For more to us at information please write je silva cern ch or adarsh jain cern ch Xcell Journal 9 is A Cost Effective PLDs are ideal for implementing LIN buses offering fast time to market flexible design options low cost and low power consumption O af z AE Ae d eh 53 e Alba TS by Karen Parnell s Automotive Product Marketing Manager Xilinx Inc Karen parnell xilinx com a a a The automotive industry is constantly striv ing to reduce costs but at the same time introduce new and innovative comfort and convenience features to meet customer demand Almost all automotive companies have adopted various busing systems to reduce wiring complexity and weight and hence overall costs This also results in increased fuel efficiency Although fle
175. nior Design Engineer Xilinx Inc hemang parekh xilinx com The increased capability and capacity of video audio data and interactive services through cable distribution has spurred much interest Applications such as video on demand and cable telephony are natural extensions of these services The ITU T International Telecommun ications Union Telecommunication SS ean Sector has established the _ a 83 specification to standardize the physical layer transmission of audio video and data services over cable networks These cable transmission networks as they apply to Europe North America and Japan are detailed in Annex A B and C of this stan dard respectively Xilinx addresses this interest with the J 83 Cable Modulator IP a flexible scala ble and cost effective solution In this article we ll discuss the use of Xilinx J 83 cores in the downstream modulator at the head end Figure 1 while focusing on the physical layer implementation The Xilinx J 83 IP solution provides flexibility to parameterize the modulator scalability to allow you to select any num ber of channels on a single FPGA and ease of use in the System Generator for DSP visual programming environment as the design and delivery mechanism Xcell Journal 5 TV Broadcast N DIGITAL SIGNAL PROCESSING This programming interface allows you to work at a suitable level of abstraction trom the target hardware platform and use
176. o solve this inefficiency Xilinx introduced a radical new architecture that enables us to offer a new generation of Virtex FPGAs providing the broadest range of capa bilities in three unique platforms with feature mixes optimized to meet the require ments of different application domains The ASMBL Advanced Silicon Modular Block architecture enables Xilinx to scale the capabilities and capacity of Virtex FPGAs independently of one another and rapidly assemble multiple platforms leakage current inherent in the migration to finer geometry nodes and is exclusive to Xilinx in the FPGA industry In addition dynamic power consumption decreases by 50 because of lower supply voltage and lower capacitance in the 90 nm process Finally extensive use of abundant embedded IP provides valuable functionality in Circuits optimized to consume as little as one tenth the power of an equivalent imple mentation in programmable logic fabric Lower System Cost Xilinx addressed the requirements for lower system cost on three fronts e 90 nm 300 mm process leadership produces the lowest FPGA price Xilinx manufactures Virtex 4 FPGAs using the same 90 nm 300 mm pro cessing technology we use to build the world s lowest cost FPGAs Spartan 3 devices The combination of finer geometries and larger 12 inch wafers produces approximately five times as Virtex 4 LX Logic Memory DSP Blocks Transceivers Processors
177. oard layout and reduce your bill of materials ug to use Software With the lowest cost per I O and lowest cost per logic cell Spartan 3 Platform FPGAs are the perfect fit for any design and any budget MAKE IT YOUR ASIC gt XILINX The Programmable Logic Company For more information visit www xilinx com spartan3 Pb free devices FORTUNE 2004 aes available now 100 BEST COMPANIES TO WORK FOR he pre 2004 Xilinx Inc 2100 Logic Drive San Jose CA 95124 Europe 44 870 7350 600 Japan 81 3 5321 7711 Asia Pacific 852 2 424 5200 Xilinx is a registered trademark Spartan and XtremeDSP are trademarks and The Programmable Logic Company is a service mark of Xilinx Inc LETTER FROM THE EDITOR And the Number Please What does the number 6 759 852 represent Well I guess it could represent a lot of different things For example it could be the current population of Chennai India It could be the phone number of Training Academy Ireland they re nice folks but please don t call them to verify Or it could be the student ID number of a computer science major attending the University of Manitoba Canada earns Had you chosen any one of these you would have been correct but you would not have guessed the answer I was looking for On July 6 2004 Xilinx reached the 1 000th patent landmark The patent VDD Detection Path MANAGING EDITOR Forrest Couch in Power Up Circuit was U S Patent number
178. ocesses that must access external or static data such as coefficients Data Throughput and Processor Selection When evaluating processors for in system testing you must first consider the fact that the MicroBlaze processor or any other soft processor requires a certain amount of area in the target FPGA device If you are only using the MicroBlaze processor as a test generator for a relatively small element of your complete application this added resource usage may be of no concern If however the unit under test already pushes the limits in the FPGA you may want to target a bigger device during the testing phase or consider the PowerPC core pro vided in the Virtex II Pro and Virtex 4 platforms as an alternative Synthesis time can also be a factor Depending on the synthesis tool you use adding a MicroBlaze core to your complete application may add substantially to the time required to synthesize and map the application to the FPGA which can be a factor if you are performing iterative com pile test and debug operations Again the PowerPC core being a hard core that does not require synthesis has an advantage over the MicroBlaze core when design iteration times are a concern The 16 KB of data cache and 16 KB of instruc tions cache available in the PowerPC 405 processor also makes it possible to run small test programs entirely within cache memory thereby increasing the perform ance of the test application If a high
179. ocessor and in dedicated hard wate you can use tools such as CoDeveloper available from Impulse Accelerated Technologies to create prototype hardware and custom test generation hardware that operates within the FPGA to generate sam ple inputs and validate test outputs Winter 2004 Desktop Simulation and Modeling Using C Using C language for hardware unit testing lets you create software hardware models for the purpose of algorithm debugging in software using Microsoft Visual Studio GCC GBD or similar C devel opment and debugging environments For the purpose of desktop simulation the complete application the unit under test the producer and consumer test functions and any other needed test bench elements is described using C compiled under a standard desktop compiler and executed Although you can do this using SystemC the complexity of SystemC libraries in particular their support for data flow abstractions through channels makes the process of creating such test benches somewhat complex CoDeveloper s Impulse C libraries take a simpler approach provid ing a set of functions that allow multiple C processes representing parallel software or EMBEDDED SYSTEMS Je hardware modules to be described and interconnected using buffered communica tion channels called streams Impulse C also supports communica tion through signals and shared memories which are useful for testing hardware pr
180. oder with major func tional blocks and data flows defined One of the primary successes of the H 264 AVC standard is its ability to predict the values of the content of a picture to be encoded by exploiting the pixel redundancy in different ways and directions not exploited previous ly in other standards Unfortunately when comparing to previous standards this increases the complexity and memory access bandwidth approximately four fold Xcell Journal 4 Ne DIGITAL SIGNAL PROCESSING Coder Control I i I Transform Scal Quant Split into Macroblocks 16x16 pixels Motion Compensation Motion Estimation Control Quant Transf Coeff Entropy Coding Output Video Signal Motion Data Figure 1 H 264 AVC macroblock encoder with functional blocks and data flows Improved Prediction Methods Let s highlight some of the main features of the H 264 AVC video coding standard design that enable its enhanced coding effi ciency evaluating these functional modules based on the design criteria discussed in the previous section e Quarter pixel accurate motion compen sation Prior standards use half pixel motion vector accuracy The new design improves on this by providing quarter pixel motion vector accuracy The pre diction values at half pixel positions are calculated by applying a one dimension al s
181. ogic design Multiprocessor System Support Recently Nohau completed a joint project with Xilinx expanding the Seehau system to include support for the hard core PowerPC proces in Virtex I Pro is a robust sor found devices As Seehau source level debugger the user interface and source level feature set are nearly identical The only major changes from an embedded system engineers point of view are the processor level language on disas sembled screens and the register set Winter 2004 SSC TANE pin sae ie E imm en Lind Figure 4 Full source with breakpoint at context switch in wC OS IT i he ma e Ler i ETTET TEENEI DS li i EZ 5 Trace 40 bit mixed mode display in binary logic signals from FPGA fabric tte RI Figure 7 Multi core display with two MicroBlaze processors EMBEDDED SYSTEMS Je associated with the PowerPC archi tecture Figure 6 shows a source level debug screen of a typical PowerPC debug session Seehau has also been expanded to include support for multiple processors in the same fabric The processors are run independently Figure 7 shows a set of screens for a two processor system The Nohau GUI provides a sim ple easy to use interface that assigns a complete set of control and status windows to each processor All Seehau windows are available for both
182. ologies INS TENTS G Reed Electronics Group i i ALPHA DATA i The MathWorks an AVNET iT LINEAR A gt TEHO EDN d RELL AT H T ma EAE EDD EDE Sys hens Embedded with Xilinx In this series n embedded processing the Xcell Journal by Mark Aaldering Vice President Embedded Processing Division Xilinx Inc mark aaldering xilinx com of embedded pro During the last few years In today s world just about every system incorporates somefe cessing in an amazing array of markets and application Xilinx our partners and our customers have deVeloped and shared a vision to build or a complete and robust range of embed PGA technologies In this edition of the Xcell Journal we have assembled articles representing a wide and assemble all of the elements required ded processing solutions adaptedsfo range of embedded processing applications These include articles on state of the art commercial applications real time operating systems multi processor debugging envi ronments testing of complex hardware modules and high speed Internet communi cation protocols With our accelerated success in the embedded processing arena it is appropriate that this series of articles coincides with the recently announced formation of the Embedded Processing Division This division brings talent and technology together in an organization to accelerate development of an even wider range of embedded system solutions
183. om Xilinx are offered in the extended temperature range of 40 C to 125 C for automotive applications PLDs come in two main types the larger FPGAs and simpler low power CPLDs Conclusion The LIN bus can be used as a cost effective alternative to CAN in low speed automo tive and industrial networks To add even more flexibility to the network the LIN interface can be implemented in reconfig urable logic which is not only low power but can be reconfigured remotely to be either a master or slave in the device The ability to reconfigure the device to either node can help with fault diagnosis in the field test in development and also cut down on inventory by only stocking one device This also reduces device qualifica tion time and costs For more information visit these web sites CAN www can bosch com LIN LIN IP www intelliga co uk Xilinx automotive www lin subbus org core S Te A e devices www xilinx com automotive LIN IP Cores and LIN Application Note Xilinx AllianceCore partners that offer fully verified LIN IP cores Intelliga Integrated Design Ltd and CAST Inc Further details of these IP cores can be found at currently has two www xilinx com ipcenter You can download Xilinx XAPP432 Implementing a LIN Controller on a CoolRunner IJ CPLD to use in an existing CoolRunner I application note design or simply to understand how to design your own L
184. on module sites and you can partly resolve the requirements of dynamic reconfigura tion by adding additional Xilinx based FPGA modules All add on modules com municate through the Virtex II Pro device which also manages two 32 bit microcon trollers that enable communications with most widely used standards With the RocketIO transceivers con nected to differential pair connectors you can connect FPGA systems directly ComPorts Global Buses LVDS pea SHB 32 LED 400 M Reser fullduplex EE SE Figure 2 SMT148 systems architecture Xcell Journal 27 g EMBEDDED SYSTEMS through simple cable connections support ing more than 2 Gbps data rates The SMT148 leverages the Xilinx Virtex II Pro block RAM configuration to generate FIFOs for the RocketIO trans ceivers add on modules communication ports and a high speed bus as well as the embedded PowerPC code The single high speed bus allows parallel data transfer to and from a wide range of high speed ADC DAC modules e nw e DAC IF ADC IF LED IF RSLIF f SHBIF RS485 IF DAC ADC LED RSL ee When we designed the SMT148 we understood that one of the many chal lenges OEMs would face in developing JTRS compliant platforms was the avail ability of interchangeable and networked processing nodes The availability of pro cessing nodes aims to meet the expandabil ity and scalability requirements in complex waveform applications The SMT148 mee
185. optimizing the full capabilities of our silicon architectures at multiple per formance and price points This new division reinforces our commitment to the increasingly diverse and changing embedded systems market and represents the evolution of three years of embedded processing experience Winter 2004 e a broad base ot embedded processing applications Table of Contents Sign of the Times High Bandwidth TCP IP PowerPC Applications Embedded Nucleus PLUS RTOS Using Xilinx EDK MicroBlaze and PowerPC as Test Generators Nohau Shortens Embedded Processor Debug Time Scalable Sottware Detined Radio Development High Speed Optical Burst Switching Xcell Journal EN Mere DDE DB A Mens p Xilinx makes hightech outdoor adv Sign of the Times ising in Times Square possible by Jason Daughenbaugh Sr Design Engineer Advanced Electronic Designs Inc jason daughenbaugh aedmt com New York City s Times Square is known as the Crossroads of the World Approximately 1 5 million people pass through the intersection of Broadway and 4 set every day and millions more sce the area daily on television broadcasts No better place for outdoor advertising exists As a result dazzling signs have become a Times Square trademark Every advertiser wants to have the best advertising medium possible so new signs must use the latest technology Times Square tenants rely on MultiMedia whic
186. or because except for cell phones and video games it is the one most used in our industry and it is well supported by both IBM and Motorola Capturing even a rela tively small percentage of this 15 billion market would mean significant revenue for Xilinx Many embedded processing cus tomers are beginning to realize the benefits of our technology and weve only started to focus on this market segment Because our MicroBlaze and PicoBlaze processors are created as soft cores they are very flexible and extensible Plus they are fast enough to meet the needs of many applications very inexpensively Combined with our high performance PowerPC processor they form an unbeatable alliance that can handle the most demanding appli cations with ease all on a single program mable device Our processor strategy is to provide a range of embedded processors all using the same peripherals and IP all working together seamlessly on a single chip and working seamlessly with our DSP and logic functions Thus you can build and simulate very complex systems and pro duce production ready designs faster than ever before Then as your requirements change or as design errors are uncovered you can quickly modify your design and resume production without losing cus tomers That s the power of programma bility that s what Xilinx does best The advantages are enormous Focusing on the Future Our original focus was on supporting logic
187. ore giving you complete vr freedem of choice u Easiest to Use Software A SOLUTION FOR EVERY SYSTEM DESIGN CHALLENGE The three Virtex4 platforms LX SX and FXA offer you up to 200 00 logic cells and 500 MHz tuned performance Our new ChipSynec technology simplifies source synchronous interfaces You can implement serial protocols at any speed from 600 Mbps to Lt Gbps with Rocket multi gigabit transcervers Hardware acceleration for the embedded PowerPC is easy with our auxiliary processing unit And with Xtreme DSP delivering 256 GMACS you can solve those ultra high performance DSP challenges All of your design possibilities just became realities See for yourself at wHewailinx com virterd 7 XILINX It E canes Looe Cru WWW XIIINx com LOWEST SYSTEM CosT HIGHEST SYSTEM PERFORMANCE PN 0010828 i Sr el ate ovine Med
188. out of block RAM The implementation of an MLSE comprises a modified version of the Viterbi algorithm and demands considerable resources This full feature demodulator can be optimized simplified or targeted at an ASIC but as a first pass iteration it pro vides a good estimation of the resources needed for its implementation Total Slices Virtex Il XC2V3000 Table 1 Resources used in the FPGA Conclusion A Model Based Design approach in the development of a complex wireless applica tion for a DSP FPGA architecture is very effective for thoroughly testing a design while implementing it for target hardware Nonetheless the use of FPGA cores and DSP libraries also allows the implementation to be quite efficient even with a high level design approach Combined with flexible platforms such as the SignalMaster these tools and approaches really help today s designers tackle the difficult challenges of designing state of the art wireless systems For additional information on this proj ect and our SignalMaster line of DSP FPGA development platforms visit www lyrtech com o Xcell Journal 4 DSP VN DIGITAL SIGNAL PROCESSING v Hardware Design Kits Turbo Charge DSP Co Processing Applications Xilinx and Avnet have released new design kits that reduce time to market for a wide range of DSP applications et d w a Pi r z ia re Pa F ogee me e e oe pe K RS
189. out the world to become ubiquitous GSM was one of the first digital cellular systems and as such represented another level of magnitude in terms of complexity with support for growth features such as GPRS General Packet Radio Service which provides data capabilities to GSM phones In this context designing a GSM system is a complex task and could benefit from advanced design flow techniques where initial system simulation phases can be seamlessly carried over to the implemen tation phases 44 Xcell Journal g d In this article well describe such a design flow for GSM development start ing from research and model simulation implementation in The MathWorks MATLAB to FPGA imple mentation through simulation phases in The MathWorks Simulink and Xilinx System Generator Project Context Our implementation is in the general con text of wireless application development examples to showcase our DSP FPGA platforms A previous project the imple mentation of a SSB single side band AM radio also with The MathWorks featured a simpler analog radio and was showcased at Xilinx Programmable World 2003 We wanted to implement a much more complex radio and so we selected the GSM digital cellular standard In doing so we are getting closer to our goal to design Pei DESIGN T ever more complex wireless systems for our customers We wanted to have a simplified system that still operates as a GSM sy
190. oved into the FPGA one pixel at a time and aggregated into a larger unit often a line with manipulations performed on these points An example pre processing MATLAB m file script might contain e Reading in a source test image using the IMREAD function in the Image Processing Toolbox e Analyzing variables such as width height and color depth of the image to pass as arguments to Xilinx block set ment and analyze any image processing algorithm fe _ tokens This enables easy parameteriza tion and scaling of the application e Storage and creation of other variables necessary to the application Examples include rotation angle resizing per centages and bit precision within the algorithm e Converting the matrix data from an m x n array to a 1 x m n with for loops and concatenation This allows The MathWorks DSP block set Signal From Workspace token to pass ele ments as samples to the Xilinx block set Gateway In token e Viewing the source test image for later subjective analysis using the IMSHOW function in the Image Processing Toolbox An example post processing m file might contain e Conversion of ToWorkspace variables from a 1 x p q array to a p x q array for easy manipulation by using for loops e Displaying the resulting matrices using IMSHOW for subjective analysis e Computing qualitative analysis of the results versus the origin
191. oves the need for the processor to copy unaligned buffers TCP IP Stack Ethernet Frame Size MontaVista Linux 9000 bytes jumbo None MontaVista Linux 9000 bytes jumbo Zero copy checksum offload Treck Inc Treck Inc Table 1 TCP transmit benchmark results 16 Xcell Journal Optimization 9000 bytes jumbo Zero copy checksum offload Checksum offload is a feature of the LocalLink Gigabit Ethernet LLGMAC peripheral It allows the TCP payload check sum to be calculated in FPGA fabric as Ethernet frames are transferred between main memory and the peripheral s hardware FIFOs GSRD removes the need for costly buffer copies and processor checksum opera tions leaving the PowerPC 405 to process only protocol headers TCP IP Per Packet Overhead Per packet overhead is associated with opera tions surrounding the transmission or recep tion of packets Packet interrupts hardware interfacing and header processing are exam ples of per packet overheads Interrupt overhead represents a consider able burden on the processor and memory subsystem especially when small packets are transferred Interrupt moderation coalesc ing is a technique used in GSRD to alleviate some of this pressure by amortizing the inter rupt overhead across multiple packets The DMA engine waits until there are n frames to process before interrupting the processor where n is a software tunable value Transferring larger sized p
192. ox If a blank Figure 5 Implementation of state machine algorithm through Xilinx black box configuration Xcell Journal 19 N DIGITAL SIGNAL PROCESSING Xilinx black box is in a model file and the VHDL code to configure it resides in the folder in which the model file is saved then the configuration wizard for the black box will automatically generate an m file to describe the functionality of the black box With System Generator 3 1 you can con figure the black box manually s K sse Cal Figure 6 Example model demonstrating clock enable probe use Ge si line iinet U Figure 7 Scope output from clock enable probe example A problem exists when generating the m file through the wizard The configuration wizard for the black box cannot realize mul tiple entities in the vhd file The VHDL file bbrx vhd contained multiple entities because of inefficient VHDL code genera tion through Xilinx StateCAD Thus you must manually manipulate the code to reduce it to one entity You can then use the 80 Xcell Journal bebo tee cael Deny Geen Es Figure 8 Propagation of enable signal through delay blocks modified VHDL code in conjunction with the Xilinx black box wizard creating a block shown like that in Figure 5 System Generator 6 1 Features The Fast Fourier Transform FFT imple mentation through configuration of a Xilinx black box block with M file and VHDL wrapper f
193. ping filter multi phase cas caded integrator comb CIC interpolation filter with a fixed interpolation rate multi phase numerical controlled oscillator NCO and multi phase digital mixer Conclusion Developing testing and implementing SDR IP cores is simplified with the Sundance SMT148 platform You can now focus on developing additional IPs without worrying about peripheral pro cessing or I O devices as these are simply off the shelf add on IP blocks For more information please visit www sundance com Winter 2004 Eee De SS ews Implementing High Speed yx Optical Burst Switching SS with ter Pro FPGAs by Sam Sanyal Solutions Marketing Manager Xilinx Inc sam sanyal xilinx com Mrugendra Singhai Research Engineer MCNC Research and Development Institute msinghai anr mcnc org Imagine a telecom network where an opti cal network can be set up and torn down in an instant without any human interven tion An optical burst switching OBS protocol at work at the Microsystems Computer North Carolina Research Development Institute MCNC RDI in Research Triangle Park North Carolina does just that OBS combines the best features of opti cal circuit switching and optical packet switching An OBS network can switch ji variable sized data bursts instead of indi vidual data packets In an OBS network transmission of data bursts can begin even before those bursts are
194. plication image to the FPGA Advanced Software Tools Up to this point we have bypassed many aspects of application software design assuming that you have code ready to compile and link and download to the FPGA In fact as systems become ever more complex both hardware and soft ware designers require advanced state of the art tools to help them complete their projects within budget and on time The Xilinx EDK configurable version of Nucleus PLUS uses the standard GNU suite of tools supplied with the Xilinx EDK package This is more than adequate for many projects for getting systems up and running but advanced application development often needs more Accelerated Technology can provide a complete range of tools that encompass all phases of the software design process Winter 2004 melee me Eigen Ise age pb Fa ee Lee peg Ef ee ee Figure 3a After installing Nucleus PLUS in EDK Nucleus appears as an option in the drop down menu choosing which operating system to use with the PowerPC 405 processor If code footprint or performance is important then consider the highly optimizing Microtec compiler for PowerPC Virtex II Pro devices This ensures that the code that is shipped is the same as the code that is debugged a goal not achieved by many compilers Application debugging often needs RTOS awareness advanced break points and debugging of fully opti mized code These feat
195. plied to all syntax elements except those related to the quantized trans form coefficients For the CABAC Xilinx FPGA V2P50 flags ref_ptr rec_ptr ref_index Diff Arb oo Figure 2 Typical H 264 AVC hardware software functional block partition Diagram courtesy of W amp W Communications issues between the encoder and decoder in the inverse transform Furthermore an additional transform based on the Hadamard matrix is also used to exploit the redundancy of 16 DC coefficients of the already trans formed blocks Compared to a DCT all applied integer transforms have only integer numbers ranging from 2 to 2 in the transform matrix This allows you to compute the transform and the inverse transform in 16 bit arithmetic using only low complexity shifters and adders Arithmetic and context adaptive entropy coding Two methods of entropy coding exist a low complexity technique based on the use of context adaptively switched sets of variable length codes CAVLC and the com putationally more demanding algo rithm of context based adaptive binary arithmetic coding CABAC CAVLC Winter 2004 a more sophisticated coding scheme is applied The transform coefficients are first mapped into a 1 D array based on a predefined scan pattern After quanti zation a block contains only a few sig nificant non zero coefficients Based on this statistical behavior five data elements are used to convey infor m
196. quence of the design It may be shared to control other user logic such as the MAC layer implementation for cable communica tion at the head end and baseband to IF dig ital upconversion The functional block diagram in Figure 8 depicts how you can leverage the capabilities of the Virtex II Pro architecture for the J 83 design Virtex Il Pro D A J 83 Core DUC g locks PPC or MPEG Rate Mgmt E MicroBlaze J 83 Configuration Figure 8 J 83 single chip system design Conclusion Xilinx System Generator enables the rapid development and simulation of high per formance systems on Xilinx FPGAs SRL16s allow you to design a 16 channel granularity modulator without using 16 times the resources of a 1 channel granu larity modulator or 4 times the resources of a 4 channel granularity modulator You can build various standard compli ant modulators for video broadcast for transmission over terrestrial links DVB T via satellite DVB S2 or to handheld devices DVB H quickly and efficiently using System Generator for DSP and vari ous library blocks available from Xilinx SRL16s in Xilinx FPGAs allow efficient time multiplexed dataflow structures offering significant resource savings For more information about the Xilinx J 83 Modulator IP visit www xilinx com ipcenter j83_modl Winter 2004 DIGITAL SIGNAL PROCESSING MVE Xilinx Events and Tradeshows Xilinx participates in numerous trade shows and
197. rame is processed sequentially we can use internal block RAM to assemble macro blocks a port to connect to external RAM is not required Decoding Modules The pipeline layout of the decoder is shown in Figure 5 Input on the left is fed by the PPC Output on the right is CCIR 656 format 4 2 2 YCbCr 8 bit video This format matches the output from the VW peripheral analog video decoders The lay out was designed to allow progressive incremental design integration and test ing of the modules The input buffer uses a 512 deep x 32 bit wide FIFO to receive all data from the Winter 2004 PPC This FIFO allows the relatively slow 66 MHz EPB bus to operate at full speed without having to implement low level hardware handshakes A high level hand shake is implemented by making the FIFO s fill level available for read back by the PPC The PPC core can keep track internally of the FIFO fill level and make decisions as to whether to work on filling the FIFO or perform other useful functions The input buffer also contains an auto incrementing register used to generate indirect addresses for rapidly filling tables in other modules to keep the decoder s I O address range on the EPB bus small The variable length VL decoder decodes the Huffman encoded block coef ficients according to MPEG 2 tables B 14 and B 15 State machines to traverse the Huffman code trees and a look up table to extract run level value pairs from the leafs
198. rces within Xilinx FPGAs Physical Communications Infrastructure We had to define the physical infrastruc ture of the tool the network elements that would exist within the FPGA We analyzed a number of data networking standards to assess their viability for use within FPGAs but existing standards lacked the required flexibility and resource efficiency For this reason we developed a dedicated simpli fied network protocol and network infra structure The network elements used in DIME talk are as follows e Routers direct data around the network e Nodes are the user interface to the network and can be connected to user application designs e Bridges move data between physical devices across a defined physical media for example between FPGAs e Edges are used for protocol conver sion to another data transfer standard such as PCI VME or USB on Nallatech systems From a users perspective nodes within the network are the most important these are the points within the network where you connect your algorithm blocks The available interfaces are block RAM FIFO and memory map based which makes developing compatible interfaces within algorithm blocks and connecting these to the network easy In runtime packet based transfers are used across the network enabling the trans fer of data between nodes within FPGAs and also to backplane interfaces and host sys tems Because of th
199. red banner are trademarks of Texas Instruments All other trademarks are the property of their respective owners 2004 TI The F GEN Kaes Su St ly Fs Oorrtimizing your FPGA power requirements at the beginning of your design cycle eliminates last minute surprises and delays Xilinx industry leading partners Texas Instruments National Semiconductor Linear Technology and Intersil make it fast and easy with simple comprehensive reference guides and support It s a powerful first step in developing more robust and reliable Xilinx FPGA designs National Ap sions y LINEAR The Sight amp Sound af information P TECHNOLO i en LE a e d Texas INSTRUMENTS _IMterssl www xilinx com powercentral OW g RB Wg a Be Mech OZ ee ee Wi Wi hk Win Wu he he he Mh ONE FAMILY VIU PLE FLA LI V i j i CR NO LA OFTE OFTE DOFTieilto FAA TORA LOC fh OSF a a Tie TS VIRTEX Introducing the world s first multi platform domain optimized FPGA family delivering breakthrough capabilities and performance at every price point THE FREEDOM TO CHOOSE For the first time ever you can select from multiple FPGA platforms optimized for application domains You choose the exact capabilities you want You pay only for what you need Virtex 4 FPGAs are built upon our unique ASMBL Advanced Silicon Modular Mock architecture enabling Xilinx to desertie logic memory IO DSP processors and m
200. red with pure simula tion blocks At target compilation time the associated DSP library of these last blocks can build the DSP code providing very effi cient code generation and performance FPGA Processing Model Based Design Figure 4 illustrates the main FPGA based Simulink model for the base station On the transmit side the FPGA receives 148 GSM 25 MHz Bandwidth gt 124 FDMA RF Channels Encrypted Bits Encrypted Bits G 124 RF Channels 25 MHz Bandwidth Encrypted Synchronization Encrypted DIGITAL SIGNAL PROCESSING MVE bit GSM frames from the DSP as input modulates them to get a GMSK burst and then frequency shifts the resulting signal in the 70 MHz IF band through SSB modu lation using the Xilinx direct digital syn thesizer DDS block Two FDM transmit channels are simu lated in this model These two signals are then mixed together on the same physical channel and sent to the digital to analog converter DAC that block is located in the Signal I O and Mixer subsystem On the receiver side signals feeding the FPGA come from the analog to digital converter ADC Because the digitized sig nal is 25 MHz wide and can contain as many as 124 GSM channels channel selec tion is required This is performed by a Type of Burst in GSM Normal Burst Fixed Bits G Frequency Correction Burst Synchronization Burst Synchronization Encrypted G RF Channel 200 KHz Bandwidth gt 8 TDM
201. rength Prototyping Flow ASIC Flow Automatic Ensured Figure 1 Design Compiler FPGA offers a fast path to prototype FPGA synthesis tools to try and meet tim ing If the fixed method does not provide the required results your only option is to make manual modifications to the RTL and try again often with the same poor result These manual modifications are time consuming and error prone They can lead to RTL drift where the two descriptions become so diverse that the functional equivalence is jeopardized Even small dif ferences such as a single signal being tied high or low in the FPGA can spell disaster if carried through to ASIC manufacture The complexity of a typical design implemented in devices such as Virtex 4 FPGAs will almost certainly demand a team effort Many existing tools restrict the design process to a top down flow guided by a single user FPGA designers want the flexibility to choose the design flow just like their ASIC counterparts Although none of these flow differences in isolation represent an insurmountable challenge collectively they can add up to a major overhead in the time and effort required to develop the prototype affecting the design integrity between the FPGA and ASIC implementa tions Unless you can easily migrate your design between FPGA and ASIC implementa tions the benefits of prototyping are lost A Unified ASIC FPGA Design Flow Clearly a common d
202. reo 20 bit audio CODEC e Philips SAA7121H digital video encoder 50 Xcell Journal e Analog Devices ADV7123 140 MHz triple video DAC e Interface for OmniVision OV6630AA CMOS color digital camera e Interface for Fujitsu MB86S02A CMOS color digital camera e AvBus expansion connector interface for Sharp LQ057Q3DC02 color TFT LCD module e X Y touchscreen controller e PS2 keyboard and mouse interfaces Communications Memory Add in Module The Communications Memory Module is an expansion daughtercard for use with Avnet Avenue Solutions offerings The daughtercard interfaces through AvBus connectors and provides general purpose resources to complement Avnet Avenue Solutions based modules The daughter card provides all necessary resources for implementation of Xilinx MicroBlaze processor core designs Digital Communications Key elements of the module include e 64 MB SDRAM e 16 MB Flash e 1 MB SRAM e IrDA e 10 100 1 000 Ethernet PHY e USB 2 0 e PC card interface Conclusion For a wide variety of DSP applications it makes sense to start your design with a hard ware based development platform You can pick and chose from three main platforms and customize by mixing and matching a variety of IP cores daughtercards firmware and software Visit www em avnet com dspstartingline for current information on all Xilinx DSP related tools from Avnet You can order any of the kits described in this ar
203. rnal e This design required a large variety of signaling standards The flexible Xilinx I O blocks allowed us to connect direct ly to a large number of different inter faces Voltages ranged from standard 3 3V CMOS down to 1 5V HSTL We required single ended and differen tial interfaces In some cases we could have used external driver and receiver parts but that would have added com plexity and cost to the product Other high speed I O interfaces such as to the DDR 333 memory would not have been possible without direct FPGA support The digitally controlled impedance DCI modes were necessary on the high speed single ended traces With the high data rates involved and the many data interfaces we had a large number of clock domains The quanti ty of global clock nets available and the ability of the digital clock managers DCMs to synthesize clock frequencies made this easy We also used the phase shift ability of the DCM to adjust sam ple times on various interfaces Block RAM is my favorite resource in an FPGA Without block RAM there are two memory options The first option is the logic slices using flip flops or distributed RAM but this is expensive and slow for anything more than 16 to 32 bit addresses The sec ond option is external memory such as SDRAM SDRAM storage is gener ally in the range of tens to hundreds of megabytes leaving a huge size gap between these two memory options Block RAM br
204. rogrammable logic was the obvious choice The high perform ance low cost FPGAs are well suited for all three main components of this design video processing data distribution and sign control Video Processing The video processor accepts a variety of video inputs It captures these video streams as 36 bit RGB 12 bits per color It then crops and places these inputs onto a master sign image for display Color space conver sion adjusts image characteristics such as color temperature and balance Additional processing cor rects for individual LED differences We also use pro prietary image processing algorithms to operate the LEDs efficiently while main taining optimal image quality Data Distribution Video data starts in a control room and ends at the LEDs The first step is the video processor which is located in the control room The video processor breaks the images into manageable chunks to send to the many modules of the sign so that each LED displays the data for the correspon ding pixel More than 3 Gbps of video data alone is required to operate the LEDs In addition to video data we also transfer a Winter 2004 Geh Io OU Tuwe EMBEDDED SYSTEMS Je Figure 1 The world highest resolution LED display is based on Xilinx devices TE CR Ei ane SP Figure 2 The sign is built out of 3 800 display blocks variety of control and status functions Not wanting to re invent the
205. ross the FPGAs e Develop Algorithms use HDL or other tools to develop algorithm blocks Synthesis Implementation Runtime using HDL or other design flows con Synthesize whole design using Implement design using standard Appl ication operating in runtime using DIMEtalk nected to the interface nodes of the standard synthesis tools Figure 2 DIMEtalk design flow We are not suggesting that DIMEtalk networks should completely replace other types of data networks However within FPGAs and going between FPGAs on the same card a low resource easy to imple ment network such as DIMEtalk makes sense as demonstrated by the resource usage shown in Figure 1 DIMEtalk is intended to be used along side other data network and backplane Figure 3 Hardware for example system 106 Xcell Journal implementation tools API functions to communicate across DIMEtalk network types that s why the edge components are so important The edges enable you to use low resource DIMEtalk networks where it is right to do so and interface directly to other protocols off card Using DIMEtalk DIMEtalk is designed to make life easier for developers to deploy applications on an FPGA computing platform The intuitive SRAM SRAM SRAM SRAM DIMEtalk network Connect Algorithms connect the completed algorithm blocks to the network at this stage the network and application are functionally comple
206. roup com A P ff by Greg Lara i ET ws Product Marketing Manger Virtexolutions yi rte y 4 og si Xilinx Inc g S 8 greg lara xilinx com E r As Xilinx began to define the capabilities of the fourth generation of Virtex devices we set out to address the performance func tionality and cost requirements of next gen eration electronic systems and to increase our customers productivity by easing sys tem design challenges We interviewed more than 800 customers including system archi tects and experts in logic design embedded processing high performance DSP and high speed connectivity Despite the differences in their end products these high end FPGA users had a number of common key requirements They asked for higher system performance to meet the demands of their leading edge products lower power consumption to meet stringent power budgets driven by sys e e tem cost and reliability requirements help Virtex 4 FPGAs deliver what you ve been looking for Se anes thrive in a competitive marketplace and Ear solutions to simplify complex design chal Ei lenges such as building source synchronous interfaces to the latest high speed memories and advanced components We achieved these goals by enhancing the features proven popular in earlier Virtex devices and developing new capabilities never before available in FPGAs Combining advanced processing technology with greater inte
207. ry while the magnitude phase cor rection block takes care of the remaining amplitude and phase errors with trigonometric properties of a quadrature signal We performed a cross correlation with the expected training sequence to recover the tim ing and send a complete frame to the GMSK demodulator Hybrid FPGA Simulink model ing was very useful in the development of this subsystem because it was possible to visualize the signals at every step of the processing both in time and frequency Figure 6 displays the spectra of the two FDMA channels before demodulation and channel selection while Figure 7 shows the waveform obtained before and after the magnitude phase correction block FPGA Resource Estimation Table 1 shows the resources used in the Virtex IT FPGA The IF baseband demodulation is based mainly on a three stage decimation and fil tering applied to both I and Q signals each using three 10 tap FIR filters Basically the implementation of the DIGITAL SIGNAL PROCESSING MVE GMSK demodulator brings together three major components the phase recovery module the timing recovery module and the MLSE maximum o likelihood sequence estimation The phase recovery module uses some dividing and square root operators which are costly to implement The timing recovery is based mostly on the data sequences that demand many embed large correlation of ded multipliers and also some large data buffers made
208. s enabling us to develop a very complex sys tem in very little time For more information about MultiMedia LED signs visit www multimediaLED com For more information about the engineering provided by AED visit www aedmt com i Xcell Journal 13 N EMBEDDED SYSTEMS SSIES Considerations tor High Bandwidth Xcell Journal 7 Lest ili E Xilinx Inc chris borrelli xilinx com The TCP IP protocol suite is the de facto worldwide standard for communications over the Internet and almost all intranets Interconnecting embedded devices is becoming standard practice even in device classes that were previously stand alone entities By its very definition an embedded archi tecture has constrained resources which is often at odds with rising application require ments Achieving wire speed TCP IP per formance continues to be a significant engineering challenge even for high pow ered Intel Pentium Q class PCs In this article we ll discusses the per byte and per packet overheads limiting TCP IP performance and present the techniques uti lized in the Xilinx Gigabit System Reference Design GSRD to maximize TCP IP over Gigabit Ethernet performance in embedded PowerPC _ based applications Winter 2004 GSRD Overview The GSRD terminates IP based transport pro tocols such as TCP or UDP It incorporates the embedded PowerPC and RocketIO blocks of the Virtex II Pro device family and is
209. s FPGA device technology Using DC FPGA from Synopsys we were able to meet the 40 MHz wireless LAN 802 11g ASIC proto typing chip performance target a significant speed increase over what we were able to achieve with other FPGA synthesis tools DC FPGAS compatibility with Design Compiler and the flexibility to run on a Linux based platform sig nificantly accelerates our design flow process by giving us access to a common design environment for both ASIC and FPGA design Dirk Haentzschel Sr Design Engineer AMD Dresden Design Center Design Compiler FPGA impressed us because it was the only FPGA synthesis solution that had a work ing formal verification flow In addition DC FPGA was able to handle gated clock transformations that are critical for our low power mobile products as well as a 23 timing improvement over our existing FPGA synthesis solution Dr Michiel Lotter Co Founder and VP of Engineering Zyray Wireless Winter 2004 Intersil Power Management Solutions The EL7530 EL7531 EL7532 EL7534 and EL 536 family of DC DC buck regulators with integrated MOSFETs are simple to use compact and full featured Their high efficiency makes them especially well suited for battery operated products The EL7530 EL7531 devices include pulse frequency mode PFM and pulse width modulation PWM for high efficiency in standby or at full load 100 90 EFFICIENCY S lo mA
210. s inserted into the data before mod ulation The modulated data is then driven to the channel model where inter symbol interference Doppler content and additive white Gaussian noise are introduced into the signal Finally the receiver employs a 16 QAM demodulator that performs adaptive channel equalization and carrier recovery The ASM is stripped from the demodulated data before applying error correction A PicoBlaze microcontroller controls the Reed Solomon decoder maintains frame alignment of the received packets and performs periodic adjustments of the de mapping QAM 16 quadrant reference Both the transmitter and the receiver are targeted to the FPGA whereas the channel is a Simulink model used for simulation and verification Although not all System Generator fea tures are used in this design it is a good example showing a combination of power ful features using Xilinx blockset elements legacy HDL code a PicoBlaze processor and hardware verification resulting in a very elegant efficient and quick way to implement a complex design and qualify it ina single environment Design Implementation with System Generator A System Generator design always starts and finishes with gateways to convert the Simulink double precision data into a Xilinx fixed point format These gateways define the boundaries of your design you can convert them into I O ports for top level designs or an I O interface to import into a
211. s possible with other FPGA architectures Serial Parallel Connectivity In addition to embedded processors the FX platform also includes 3 125 Gbps multi gigabit transceivers that are particu larly suited for interfacing to other DSP processors One such example is high speed serial connectivity using the serial RapidlIO interface which is gaining momentum with DSP vendors With 1 Mbps LVDS interfaces for interfacing to high speed A D converters and a host of DRAM and SRAM memory interfaces for hooking up to frame buffers the Virtex 4 family is an ideal platform for interfacing to other DSP devices that will form part of the system data flow ES ES Cy Output Message Figure 3 Efficient 16 channel Reed Solomon encoder using SRL 16 62 Xcell Journal Virtex 4 DSP Design Solutions The Virtex 4 family includes a beefed up set of DSP design resources e System Generator for DSP allows you to model your design in The MathWorks Simulink and through powerful capabilities like hardware in the loop verify and debug that design from the same environment System Generator also includes a new block that allows you to instan tiate an XtremeDSP slice and config ure it for one of its many operating modes Hardware in the loop is supported for any Virtex 4 development environ ment with a JTAG header Other new capabilities introduced in System Generator 6 3 include the ability to generate VHDL or Veri
212. scale embedded hardware systems that is virtually all of today s communica tions and multimedia systems Similarly C based tools and design flows will not address the software explo Execulobie Spenticonons from Models implementation with SCH Automatic Code Generation ww Figure 2 The elements of Model Based Design DIGITAL SIGNAL PROCESSING MVE sion in these systems In fact automotive companies and others facing the rapid growth of software intensive embedded have turned to Model Based Design Manually developing code in C is systems no longer an option because companies cannot hire enough programmers or test engineers to develop and verify the code The Elements of Model Based Design With Model Based Design specification design implementation and verification can be accomplished and accelerated using a single Simulink model Figure 2 which depicts these elements described below are Fxecutable Specifications from Models Simulink models serve as executable speci fications for system and component behav ior replacing ambiguous text documents These models can span digital and analog hardware as well as software and they facilitate clear unam biguous communication between engineering teams Design with Design with e e Simulation Simulation Simulink is a plat form for multi domain simulation of dynamic systems The Simulink product family provides
213. scatter plot of the output of the baseband section of the modulator Simulation of this complex system in an HDL simulator for a meaningful number of clock cycles such that several frames of data may be processed imposes a huge penalty in the time taken to complete a simulation This makes it an impractical choice but sometimes it is the only option when the design source is in an HDL format This simulation time is drastically reduced when simulating the model in Simulink What might take days to simu late in a gate level simulator could be accomplished in a matter of hours This savings in time is highly valuable not only Winter 2004 do you benefit from superior sim ulation speed in Simulink but you also reap the benefits of a short ened design cycle allowing for overall rapid IP delivery Single and Multi Channel Designs The modulator is constructed out of two primary footprints or gran ularity a single channel imple mentation and a four channel implementation A block diagram of the four channel granularity Annex B and A C are shown in Figure 5 and Figure 6 respectively Each instance of the single channel footprint provides for exactly one independent channel the four channel footprint how ever is optimized to efficiently support four channels at a time using resource sharing techniques You select the granular ity and with that selection make a trade off between resource utili
214. se 7 5 degrees DIGITAL SIGNAL PROCESSING MVE tions are used to create weighting factors used for the bi linear interpolation Weighting the Data Four pixels are fetched from the frame buffer to create a 2 x 2 matrix containing the source address pixel and its original neighbors to the right below and below right which we will identify as XY XpY XYp and XpYp respectively The source address transform uses the decimal portions of Sx Sy represented as Rx Ry to create the following weighting equations e Weighting XY Wxy 1 Rx 1 Ry e Weighting XpY Wxpy Rx 1 Ry e Weighting XYp Wxyp 1 Rx Ry e Weighting XpYp Wxpyp Rx Ry These equations can be shown to be equal to the following equations to reduce the number of multipliers necessary from four to one e Wxy 1 Rx Wxyp e Wxpy Rx Wxpyp e Wxyp Ry Wxpyp e Wxpyp Rx Ry The interpolated pixel value is the sum mation of the 2 x 2 matrix pixels multiplied by their respective weighting functions Compare the original image in Figure 3 with an image that was rotated through 7 5 degrees followed by a 7 5 degree rotation in Figure 4 Conclusion In this article I ve shown how to use System Generator to explore and implement image processing algorithms The use of a mathematical simulation tool with image handling capabilities allows you to easily investigate various options in an intuitive capac
215. se services to open new revenue streams secure prof itability and differentiate their offerings DSL N e Control Plane flexible This flexibility will ensure maxi mum interoperability between network owners while IP standards and protocols continue to change quickly Some proto cols such as the SIP Session Initiation Protocol family are now quite well defined Others are more esoteric and con tinue to evolve And as some standards achieve de facto status each new IP service seems to precipitate a flood of competing Access r IP Edge IP Core z IP Edge v Interconnect i i i Call i SIP RTSP SIP RTSP SIP RTSP i I Wireless o L I _ Edge H Cable Router Media Plane 1 H pateway Controlled Controlled Enterprise Media Other IP Traffic I Other I IP Traffic Figure 1 The 1460 session controller supports direct IP IP network interconnection PSTN gateways are also quite expensive for making IP interconnections The Newport Networks 1460 session controller solves this challenge It sits at the edge of the carrier network to enable serv ice providers to interconnect at the IP level Figure 1 shows how the 1460 supports an IP IP interconnect controlling signaling and media streams as they enter and exit the network Benefits include broadband multimedia interconnection and lower peering costs Any solution designed to en
216. self registering for courses online I like to search for courses on my own without the help of a registrar because it s faster he said The training catalog also makes it easier for me to keep track of courses I ve already taken and find new ones that interest me Conclusion The training catalog is an empowering solution that meets your needs for a per sonalized speedy flexible low cost and quality experience Find out how Xilinx training can trim your learning curve Contact Education Services at 877 XLX CLAS or registrar xilinx com or visit SCH pi www xilinx com education e The training catalog is available to all Xilinx customers however registration for instructor led courses is currently available in the North American region only Xcell Journal 97 MODELSIM DESIGNER 90 DISCOUNT Board of Education The UNM FPGA prototype project shows how the Xilinx University Program helps students learn about programmable logic by Craig J Kiet Graduate Student University of New Mexico kiefc ece unm edu Both universities and corporations share a desire for students and engineers who can easily integrate into high technology pro fessions after graduation Collaborative efforts are the key to developing the neces sary skill sets One such collaboration is the recent Vi Alass eco Der Kg Xilinx University Program University of New Me
217. services by David Vant VP of Marketing Newport Networks david vant newport networks com IP transport has the potential to unlock an enormous variety of communication opportunities Voice over Internet Protocol VoIP is just the first in an avalanche of powerful IP based services These will include sophisticated messaging storefront and customer relationship management applications and complex and personal ized services for mobile workers home workers and hot deskers e To secure the critical mass of subscribers that will allow this powerful new age to take off IP network owners need a cost effective and flexible interconnect that will support the full diversity of IP services both now and in the future Carrier class robust ness is also mandatory Newport Networks chose the Xilinx Virtex II FPGA architecture to ensure those qualities in its next generation IP IP interconnect solution the Newport Networks 1460 session controller Xcell Journal 101 For scalability and robustness Newport Networks decided to implement a significant proportion of the tunctionality in custom hardware Interconnecting IP Networks The earliest all IP networks relied on con ventional public switched telephone net work PSTN gateways to interconnect with other networks even those with simi lar IP infrastructures But a PSTN gateway cannot support cutting edge IP services Carriers will depend on the
218. sitive applications such as mobile phones PDAs and automotive info tainment systems To enable designs to be brought to market quickly some Xilinx AllianceCORE third party IP providers The LIN interface whether imple mented in programmable logic ASIC or ASSP is approximately half the cost of a CAN node LIN Bus Benefits The reliability of LIN is high but it does not have to meet the same levels as CAN A LIN bus is designed to be a logical extension to CAN It is scalable and low ers the cost of satellite nodes No crystal oscillator or resonator is required It is easy to implement has a low reaction time 100 ms max and predictable worst case timing The LIN bus can be implemented using just a single wire while CAN needs two wires This means that a LIN network can also be lower in cost through simpler con nectors and wiring thus also reducing the Xcell Journal 93 Automotive Set Lost Network Per Node CAN Up to 1 Mbps 2 Requirements Crystal oscillator two wires 5V bus supply Size in Programmable Logic 348 slices FPGA Table 1 CAN versus LIN weight of wiring increasing fuel efficiency and reducing handling time and manufac turing costs CAN also needs a 5V supply for the bus whereas LIN only requires 2V Table 1 shows the relative merits of LIN versus CAN In summary LIN offers these benefits e Complementary to CAN as an ultra low cost sub network e Self
219. solutions This trend has led to significant advances in design flows tools and awareness of how to program FPGAs which in turn has made developing the algorithmic ment high performance applications is an extremely demanding task for engineers today Increasingly the demands of space weight and power have led designers portions of a design easier Remove the difficulties of system integration Designers must then begin to integrar the various elements of the overall system f P FPG A d k with one another and interface them to the or SING E or MU Np E SIGNS outside world In a microprocessor system this is generally simple utilizing system level libraries and operating system features In an FPGA design flow it is generally much more complicated especially if you are using more than one device Evidence suggests that developing this inter process communications structure can consume as much as 80 of the development time on a typical project This element of the design is generally not addressed by algorithmic design tools Having experienced first hand how time consuming implementing commu nications in FPGA applications can be at Nallatech we looked for a way to make the process easier developing design tools that we used internally for a few years These early tools and principles formed the basis of DIMEtalk which is now available commercially DIMEtalk allows you to design cus tom
220. ssors Software Edit Compile Loop Figure 3 Development flow with Seehau source level debugger in place Winter 2004 You can look back in time from any execution to follow the path backward or you can use the Seehau event configuration system to specify pre and post triggering complex breakpoints triggers on register reads and writes and triggers on data from the fabric Figure 4 shows a typical source level debug display with processor registers memory data program data in source form and trace and breakpoint status Nohau tools are sold as a system which includes the Seehau debugger an interface pod to the appropriate JTAG connector and the IP Debug TraceBlaze configured with trace memory As a sys tem it may be ordered as EMUL MICROBLAZE PC The Nohau EMUL MICROB LAZE PC provides a 512 frame or 2K frame deep trace with a trigger post trigger count and break control Probe pins may be either 8 or 40 bits wide It will display data connected to it as specified in the XPS MHS file Figure 5 illustrates a trace display in mixed mode with C source and assem bly source intermixed On this single display you can correlate the frame at capture time the execution address the opcode of the instruction execut ed the disassembled MicroBlaze instruction the C source line that gen erated that instruction and 40 bits of data from any logic in the system Data from logic can include signals from your own l
221. station BS Some fixed training bit sequences allow synchronization between the MS and the BS Using the GSM as a design example corresponds very well to our platform s seg mented architecture The main uplink physical layer elements of a GSM speech and data transmission chain are shown in Figure 2 Figure 2 also illustrates how to partition the processing We performed IF processing on the FPGA with functions such as polyphase DDC digital down converter DDS direct digital synthesis and GMSK Gaussian Minimum Shift Keying modu lation DSP based baseband processing can tackle the tasks of encoding encrypting and interleaving as well as burst building functions Finally communication proto col handling occurs at the RISC processor level or in the DSP for simplified proto cols such as in our case Simulink DSP Model The DSP section of the GSM model con tains the following elements e All higher level protocol layers e Baseband processing modules e Data transfer protocol for mapping data onto 32 bit frames before sending to the FPGA through the parallel bus interface Winter 2004 Figure 3 displays the DSP model which contains the baseband processing block and Stateflow diagrams This figure shows a combination of The MathWorks MAT LAB off the shelf functions and target spe cific library blocks This is very typical of a Model Based Design flow in which target specific block performance is compa
222. status display and numerous connections to display blocks can not only read the temperature every ten seconds but perform other similar tasks in the meantime The PicoBlaze processor also provides a quick and easy way to develop control functions The alternative would be to build a custom state machine for each function The PicoBlaze processor is a pro grammable state machine meaning that the state machine is already built one just has to program it It has an intuitive and powerful instruction set and a large code space of 1 024 instructions Programming Winter 2004 ple PC application to download the pro gram code into the block RAM of a configured FPGA We have developed an interface board that connects to the FPGA and has the serial port as well as several seven segment displays to which the PicoBlaze processor can write for debug ging We also allow the selection of different processors so that we can work on multiple processors through the same interface This interface is not only useful for debugging PicoBlaze programs but also for debugging the logic connected to the EMBEDDED SYSTEMS Je processor Because it is so quick and easy to write programs for PicoBlaze proces sors it is very straightforward to write programs to test the various logic circuits attached to the processor We can test each function individually greatly simplifying and accelerating any debugging that becomes necessary A key appli
223. stem while demonstrating a Model Based also known as system level Design flow The target plat form is our SignalMaster DSP FPGA with high speed sampling boards to sample the intermediate frequency IF coming from a special purpose radio frequency RF front end We performed IF processing in the Xilinx Virtex II FPGA and developed the design using System Generator In a complementary manner baseband processing implementation occurs in the DSP using a similar Simulink design flow with the The MathWorks Real Time Workshop C code generator Embedded Target for TI DSP toolbox and LYRtech s GSM DSP libraries We designed protocol oriented transac tions using The MathWorks Stateflow a Winter 2004 tool that allows the graphical design of states and transitions as well as the pro duction of associated code This event driven code can run on the DSP itself or on a companion RISC processor GSM Processing Figure 1 depicts a GSM physical channel There are 124 200 kHz channels that are frequency multiplexed in a 25 MHz wide RF spectrum one for each downlink and uplink path Figure 1 also shows in more detail how the bursts of each GSM chan nel are constructed Basically each burst is part of an 8 slot time division multiplex frame forming a 200 kHz wide spectrum Each one has tail bits and an extended guard interval to avoid interference as long as the mobile station MS is within 35 km of the base
224. synchronization mechanism means no quartz oscillator required e Low cost silicon implementation using on chip UART or SCI e Single wire low baud rate reduced harness cost e No protocol license fee Microcontroller Implementation There are many ways to implement LIN in semiconductors e Software bit bashing e Software UART implementation e Hardware MCU with dedicated LIN port e Hardware PLD Let s look at each way and explore the benefits and pitfalls of each Software Bit Bashing A LIN node can be implemented in many microcontrollers MCUs with no addi tional hardware except for a physical layer driver device It can be implemented using existing on chip MCU resources such as timers GPIO and interrupts effectively bit bashing This type of implementation does have restrictions designers must adhere to 94 Xcell Journal strict real time programming constraints to meet the full LIN specification This is expensive with respect to MCU timing and on chip resources and leaves very little bandwidth for other application code LIN nodes based purely on bit bash ing may also be complicated to test par ticularly when integrated with existing RTOSs With this type of implementation it would be very difficult to achieve accu rate bit timing measurement and control and may not be power efficient or practical Software UART Implementation LIN was originally conceived to make
225. synchronously when Simulink awakens the hardware co simula tion block Let s examine a design that demonstrates switching between free run ning and single step modes The Memec P160 analog ADC con troller from the Simulink library is built with gateway ins defined as board specif ic I O ports When used in a hardware co simulation block the Xilinx imple mentation tools bring these gateways to FPGA input pins which connect to the P160 analog daughtercard according to location constraints for the target device on the board Consequently sampled data from the P160 analog ADC can reach the FPGA e Ee e mee ee Te re Sage olds DR 8 ChipScope validation of F160 Analog DAC to ADG kop back TS or H ka b i Keefer a tha As Figure 4 illustrates we first compile the model on the left for hardware co sim ulation and then connect it as shown on the right We use the Xilinx pause simula tion block to switch clock modes of the co simulation block Hardware co simulation starts in free running mode as we sample an analog input signal through the P160 ADC at a sampling rate derived from the system clock on the Memec development board Sampled data fills a FIFO while the Simulink model polls the FIFO s full flag asynchronously When the full flag goes high simulation pauses the clock mode is switched to single step and captured data samples are read out from the FIFO to Simulink The
226. t and slow mode with 1 LSB settling times of 3 5 ms or 8 ms DIGITAL SIGNAL PROCESSING MVE respectively and supply currents of 750 mA and 450 mA in the two modes The LTC1654 also has shut down capability power on reset and a clear function to OV High Performance DAC Module The Spartan 3 2000 evaluation platform includes an interface to Intersil high speed DAC ISL5x29EVAL1 evaluation modules see Table 2 You can develop high performance DSP applications such as quadrature transmit with an IF range of 0 80 MHz and med ical test instrumentation and equipment You can evaluate the Intersil technology in con junction with the Xilinx Spartan 3 FPGA in high performance DSP designs Figure 2 illustrates how you can load a DSP application provide signal insertion and measure output wavefor m Conclusion Todays semiconductor industry is largely driven by new technologies that are barely available and aggressive development sched ules that utilize these new technologies Suppliers and distributors typically deliver boxes of building blocks and if the design engineer works many extra hours assembling the pieces they may wonder if this new tech nology can even perform the task at hand Nu Horizons and Xilinx have looked close ly at the challenge of limited resources tight schedules and dramatic learning curves and built an approach with TechOnLine s VirtuaLab that provides you with a time sav ing innovative
227. taVista Linux an embedded Linux supporting real time functionality multi processes and multi threading You can operate VW stand alone independent of any host computer or as an add in board driven by a host sys tem On Sun Solaris Microsoft Windows Wind River Systems VxWorks or Linux based host systems graphic drivers allow VW to function as primary or secondary display An API pro vides control of basic VW functions Building the l Frame Decoder We had a clean room software decoder developed from the MPEG specification available in house at the start of the project We partitioned the I frame decoding functions into mod ules and did software profiling and hardware simulation to determine how to distribute the modules across the FPGA hardware and PPC software Integration with VW FPGA Internals vcs We connected the I frame decoder Scaling inside the FPGA as a standard video and 7 a Display Put Figure 4 shows a portion of the VW FPGA internals and how the decoder s two ports connect to the pre existing circuitry The EPB port carries encoded data tables and con trol register setup data from the PPC The CCIR 656 video out port con nects to a video multiplexer that selects between all of the video inputs This allows us to re use the existing design s video storage circuitry to move frame data into video memory and ultimately to the display Because the I f
228. tandard processing block is implemented on each card type Around this hardware block we can quick ly configure a line interface card LIC by simply adding network processing blocks The LIC processor performs header and packet stripping packet analysis traffic classification and other processing The session controller accommodates as many as 12 LICs allowing easy scaling to sup port rapid subscriber growth The Newport Networks 1460 is capable of sup porting as many as 100 000 simultaneous toll quality VoIP calls Alternatively by combining the process ing core with switching blocks instead of network processing blocks we can quickly configure a switching card These are dual redundant cards that also include the switching fabric on board The processing hardware is implemented in an array of four PowerPC processors each accompanied by high performance FPGAs that deliver hardware acceleration and provide the flexibility to react to future changes in IP protocols services and busi ness models Hardware accelerated func tions implemented in the FPGAs include e Data plane integrity checking and sta tistical gathering e Packet segmentation and reassembly on either side of the switch fabric e Checksum assist e Time critical functions such as packet analysis are unloaded to the FPGAs The power of this configuration means that hardware assist such as payload string search is also an option Winter 2004
229. te e Assign to Devices assign the whole design to FPGAs using a drag and drop feature e Code Generation automatic code generation for design e Synthesis using standard synthesis tools e Implementation using the Xilinx ISE software tool flow Figure 4 Hardware block diagram for example system Winter 2004 Forthcoming developments in future releases of DIMEtalk will include additional interface support and links directly into algorithm development tools Application Architecture Example Lers look at DIMEtalk in an application context The easiest approach is a very high level one to avoid getting caught up in the details of a potential system the focus being on the overall architecture rather than low level functionality A typical application might include the following e VME form factor e Multiple high density platform FPGAs e High speed external analog interfaces e High speed synchronous SRAM memory e Gigabit Ethernet interface This relatively complex hardware config uration is shown in Figures 3 and 4 In this case the system comprises commercial off the shelf hardware products From a func tional perspective the algorithm processing blocks that would perform the function of this application reside within the FPGAs VME Interface Edge Co Router Bridge Node You can develop these algorithm blocks using the design entry flow of your choice including
230. tem Generator for DSP to design a reconfigurable video encryption system in less than two weeks The system enables their customers to re verify an entire system when changing components and interfaces without any knowledge of VHDL With this design flow we efficiently implemented our system and algorithms with a signifi cant improvement on traditional design times without sacrificing performance says Daniel Denning a research engineer with Nallatech Coding in VHDL would have taken us three times as long Simulink 6 In June 2004 The MathWorks introduced Simulink 6 which increases performance responsiveness modeling fidelity and work flow efficiency when modeling large sys tems Simulink 6 also extends the scope of Model Based Design to new domains and applications These enhancements include e Component based modeling for large scale systems including the ability to simulate test and implement each design component independently e Unified data management for model and signal parameters across compo nent models including a graphical model explorer tool Winter 2004 a aio Peri om Dak 4 mema tf SS OD DER e Os Bee Seis Se WI Figure 5 Frequency specification top and Simulink model of a fixed point CIC filter bottom for a digital down converter left in a software defined radio receiver front end Simulink Verification and Validation which links models to
231. test data rate the throughput from the processor to the FPGA is your primary concern using the MicroBlaze core with the FSL bus or the PowerPC with its on chip memory OCM interface will provide the highest possible performance for streaming data between software and hardware components By using CoDeveloper and the Impulse C libraries you can make use of multiple Xcell Journal 21 A SERIES DEC ES T i A 1 p ee sr pai ome Seep oe a d Por et A D DECH Im rm geed Sieg e des singen D Eee Z rte i ol a A PH pie pahi iby ae SS Zug REI a ps LLnL LE i PAPRA ee ne er vo K OGL adh rps eee ee i ieee up Ap 99 Ss dees Bed ee La Bs ed em Figure 2 A software based unit test operating on the embedded processor com municates with the hardware unit under test through data streams streaming software hardware interfaces using a common set of stream read and write functions These stream read and write functions provide an abstract pro gramming model for streaming communi cations Figure 2 shows how the Impulse C library functions support streams based communication on the software side of a typical streaming interface Moving Test Generators to Hardware To maximize the performance of test gen eration software routines you can migrate critical test functions such as stimulus gen erators into hardware Rather than re implementing such functions in VHDL or Verilog auto
232. ticle from your local Avnet sales office or obtain additional information from the Avnet DSP Startingline website at www em avnet com dspstartinglinel e 16 QAM demodulator for software defined radio e A QAM system with packet framing and FEC for telemetry channels e Concatenated FEC codec for DVB standard e Costas loop carrier recovery e Digital down converter for GSM applications Signal Processing e A D and delta sigma D A conversion e FFT IFFT in streaming mode e MS based adaptive equalization e Custom FIR filter reference library e Polyphase 1 8 MAC based FIR using SRLT6ES e IR filtering multi channel folded implementation e IR filtering 2nd order Direct Form implementation Image Processing e 2D DWT filter e 2D filtering using a 5 x 5 operator e Color space converter Mathematical Operators e CORDIC based rectangular to polar coordinate converter e CORDICbased divider circuit e CORDIC based sine and cosine function Control Logic Table 1 DSP demos in Xilinx System Generator e Debugging a PicoBlaze microcontroller design Winter 2004 Y tremeDSP DIGITAL SIGNAL PROCESSING Using System Generator tor DSP to Create the J 83 Cable Modulator System Generator enables the rapid development of multi channel cable head end modulators to provide a true low cost solution Winter 2004 by Veena Kumar Staff Design Engineer Xilinx Inc veena kumar xilinx com Hemang Parekh Se
233. time prototyping and deployment in the target system The strategic partnership between Xilinx and The MathWorks has brought automatic hardware code generation capabilities to Xilinx FPGAs Continuous Test and Verification You can ensure quality throughout the development process by integrating tests into the models at any stage and quantify ing test coverage of the model This con tinuous verification and simulation helps identify errors early when they are easier and less expensive to fix and streamlines the final verification stage The system model or golden refer ence can serve as the test bench for the hardware or software implementation which you can verify through software or hardware in the loop co simulation Applications of Model Based Design Model based design can accelerate and simplify the development of many tech nologies These examples are a small subset of the many applications available on The MathWorks website UWB Wireless The range of ultra wideband UWB links is limited by the requirements for low power high speed and low cost imple mentation Fixed point word length and scaling have a direct impact on hardware size cost and signal to noise ratio SNR degradation Using Simulink the 10 bit orthogonal frequency division multiplexing OFDM transceiver for UWB shown in Figure 3 was designed in a few days The receiver operates with a 0 5 dB degradation in sig nal to noise ra
234. tio relative to a floating point reference model The optimal word length was deter mined through simulation over a range of word lengths and channel conditions to evaluate trade offs between chip size and wireless range The results are shown in Figure 4 The transceiver operates within a complete end to end system model that serves as both an executable specification and a test harness for verifying downstream implementation Digital Down Converter for Software Defined Radio FPGAs are being used to perform high data rate signal processing in many emerg ing software defined radio applications A Winter 2004 typical application is the digital down con verter DDC which is a sequence of multi rate filters that decimate the RF sig nal down to the baseband rate The design challenge is to design an architecture for each filter stage that optimizes the trade offs among word length computational delays and accuracy of the overall filter response to avoid aliasing and other unwanted numeric effects The Simulink model of the cascaded integrator comb CIC filter used in the DDC shown in Figure 5 was automatical ly generated from the MATLAB filter spec ification Models such as this can provide a reference design for developing optimized Xilinx FPGA implementations with Xilinx System Generator Reconfigurable Encryption System Nallatech a provider of high performance FPGA systems used Simulink and Xilinx Sys
235. tion HD resolution encoding solution sys tem architects often employ multiple FPGAs and programmable DSPs To illus trate the enormous computational com plexity required lets explore the typical run time cycle requirements of the H 264 AVC encoder based on the software model provided by the Joint Video Team JVT comprising experts from ITU T s Video Coding Experts Group VCEG and ISO IEC s Moving Picture Experts Group MPEG Using Intel VTune software run ning on an Intel Penttum II 1 0 GHz general purpose CPU with 512 MB of memory achieving H 264 AVC SD with a main profile encoding solution would require approximately 1 600 BOPS bil lions of operations per second Table 1 illustrates a typical profile of the H 264 AVC encoder complexity based on the Pentium III general purpose processor architecture Notice that in Table 1 roblock block processing motion estimation mac including mode decision and motion compensa tion modules are the primary candidates for hardware acceleration Winter 2004 Functional of Run Time Blocks Total Cycles 67 31 mv_search c block c macroblock c biariencode c memcpy asm abs rdopt_coding_state c loopFilter c 0 03 Table 1 H 264 AVC encoder complexity profile by files However computation complexity alone does not determine if a functional module should be mapped to hardware or remain in software To evaluate the viab
236. tional FPFA applications that may also require some DSP capability Reduced Power Consumption In today s infrastructure applications driv ing down cost per channel is not the only goal diligently pursued Wireless infra structure manufacturers are under increas ing pressure to stay within power limits imposed by governing telecom standards The integrated XtremeUSP slices on Virtex 4 FPGAs minimize the need to use logic slices for many signal processing and arithmetic tasks device providing you with as much as a staggering 10X increase in the available GMACs per dollar see Figure 1 This dra matically extends production volumes where it makes economic sense to use an FPGA for performance centric signal pro cessing applications The new DSP slices also dramatically reduce power consump tion allowing you to drive down both cost and power per channel DSP to Logic Resources At the heart of the Virtex 4 FPGA s signal processing resources are new highly inte grated XtremeDSP slices sometimes DSP48s Figure 2 Depending on the family member you referred to as can utilize as many as 512 XtremeDSP slices each capable of providing 500 MHz throughput Each slice contains a dedicated 2 s com plement signed 18 x 18 bit multiplier and Winter 2004 plication accumulation MACC func tions MACC cascading wide 48 bit addition and wide multiplexing Configuration wizards in Xilinx ISE or System Ge
237. ts this availability chal lenge with a network of daughter sites that i aca Global Bus IF 1 Global Bus IF2 im Bus IF4 Watchdog Global Timer 4 McBSP SHB RS485 To From LVDS Drivers Receivers Figure 3 Interconnections diagram of the digital modules inside the SMT 148 Data rates on this port are in excess of 100 MHz 400 Mbps and are useful for transferring sampled 16 bit I and Q Processing data streams can take place either in the embedded PowerPC in the Virtex II Pro device or throughout an array of other add on Virtex II Pro FPGA based modules with embedded PowerPC The FPGA on the SMT148 carrier card is connected to many different devices and therefore has many internal interfaces that allow it to exchange data or commands with the external world All interfaces are reset at power on when applying a manual reset Figure 4 shows the interconnections between the digital modules inside the FPGA 28 Xcell Journal you can use for additional resources such as signal processors reconfigurable computing modules and Sundance s large family of add on modules These add on modules include a variety of embedded system options such as reconfigurable modules with tightly coupled Virtex II Pro FPGAs and DSPs digital and analog converters data conversions trans ceivers and I Os of all types Designed for Developers Powered by an external supply the SMT 148 platform has an impressive topol og
238. ture webinars and demonstrations References 1 Horgan J March 29 2004 Hardware Software Co verification EDA Caf Weekly 2 Krasner J January 2004 Model Based Design and Beyond Solutions for Today s Embedded Systems Requirements Embedded Market Forecasters American Technology International Framingham Mass Xcell Journal 69 DIGITAL SIGNAL PROCESSING DSP Intertacing Simulink to the Analog World The P160 Analog Module adds analog 1 0 to Memec Xilinx development boards 0 Xcell Journal by Luc Langlois DSP Specialist Memec Insight luc_langlois ins memec com If your product performs digital signal pro cessing it probably must interface to real world analog signals To avoid surprises it s best to introduce analog I O early in the design process the ideal point is in the modeling phase with Xilinx System Generator for DSP under Simulink from The MathWorks Consider the development of a digital QPSK demodulator in a software defined radio The FPGA performs several signal processing tasks including carrier recovery DPLL digital phase locked loop down conversion to baseband down sampling pulse shaping and symbol timing recovery You may wish to compare simulated demodulator performance against the real thing by injecting a heterodyned analog signal with noise through an ADC analog to digital converter to the FPGA running at full
239. tween the processor and the LocalLink Gigabit Ethernet MAC peripheral The MPMC and CDMAC can be leveraged for PowerPC based embedded applications where high bandwidth access to DDR SDRAM memory is required For more information about XAPP536 and XAPP535 visit www xilinx com gsrdl wi Associated Links Xilinx XAPP536 Gigabit System Reference Design http direct xilinx com bvdocs appnotes xapp 536 pdf Xilinx XAPP535 High Performance Multi Port Memory Controller http direct xilinx com bvdocs appnotes xapp535 paf Treck Inc www treck com MontaVista Software www mvista com End System Optimizations for High Speed TCP www cs duke edu ari publications end system pdf Use sendfile to optimize data transfer Attp builder com com 5100 6372 10441 12 html Winter 2004 EMBEDDED SYSTEMS M SS Peis by Gordon Cameron Business Development Manager Accelerated Technology Mentor Graphics gordon _cameron mentor com Accelerated Technology s Nucleus E Q U E US PLUS real time operating system RTOS is already available for both the Xilinx MicroBlaze 32 bit soft Mk processor core and the IBM d n Inx PowerPC 405 core integrated into Virtex II Pro devices This determin istic fast small footprint RTOS is ideal for hard real time applications Nucleus PLUS RTOS for MicroBlaze and PowerPC 405 processors With the release of the Xilinx Platform
240. ule is ideal for interfacing FPGAs to external analog signals We have shown example tech niques to quickly get you started to cap ture process and produce analog signals in your DSP applications under Simulink As your needs evolve to meet cus tomer demand for ever increasing ana log I O performance expect Memec s next generation of its analog module to deliver faster sampling rates and higher resolution within a consistent support framework in System Generator for DSP and Simulink The P160 Analog Module specs Memec Xilinx DSP Simulink library and reference designs are available to all cur rent P160 analog customers You can obtain the free download through the Memec Reference Design Center at http legacy memec com solutions reference xilinx Winter 2004 e Increased visibility with FPGA dynamic probe Intuitive Windows XP Pro user interface e Accurate and reliable probing with soft touch connectorless probes e 16900 Series logic analysis system prices starting at 21 000 Agilent Direct Get a quick quote and or FREE CD ROM with video demos showing how you can reduce your development time U S 1 800 829 4444 Ad 7909 Canada 1 877 894 4414 Ad 7910 www agilent com find new16900 www agilent com find new16903quickquote Agilent Technologies Inc 2004 Windows is a U S registered trademark of Microsoft Corporation Now you can see inside your FPGA designs in a way that will save days o
241. ulti Gigabit Serial Interconnect Requirements Next Generation SONET SOH Networks successiul 10 Gbps Backplane Design using Xilinx FPGAs Ansoft Xilinx Enabling the Jump from Megabit to Mult Gigabit VO Design using Mentor Graphics Signal Integrity Analyses Solutions Mentor Graphics Kei Designing high performance DSP systems The FPGA Platform Aadio The Enabler for B2G and 4G Communication Systems Accelerating Productivity through New DSP Design Techniques for FPGAs Accelerate High Performance Real Time Video amp Imaging Applications with an FPGA and a Programmable DSP Harness the Power of the Virtax 4 FPGA XtemeDSP Slice and Get the Highest Perlormance DSP Functionality Logic New tools techniques and architectural features Sparnan 3 Low Cost Design Techniques HDL Coding Techniques to Exploit the New Capabilities of Virtex 4 FPGAs Physical Synthesis Synplicity Memory Interface Solutions with Virtlax 4 FPGAs Dramatically Accelerate In Circuit Debug of FPGA based Systems Agilent J Processors A comprehensive embedded solution Implementation and Debug of a Dual PowerPC System with Floating Point Co Processor FSL SW Acceleration with MicroBlaze or Multi Channel Data Acquisition Accelerating Development of ATOS based Embedded Systems with Integrated Embedded Tools Ethemet Solutions for Xilinx Processors PRESENTED BY a a Mentor AW gt ZC Ae WY Texas P Reed Electroni Su ale ghet l AN a XILINX IX Agilent Techn
242. um DCM circuits enable flexible generation of multiple clock domains with differential signaling supporting frequencies of up to 500 MHz performance and 40 less jitter than previous circuitry In addi tion Virtex 4 devices are the only FPGAs to provide differential clocking networks a key advantage in implementing precision clocks with minimal skew and jitter Virtex 4 FPGAs further enhance clock management with phase matched clock dividers PMCD that provide improved handling of multiple synchronous clock domains These circuits together with enhanced software support give designers precise edge control and frequency synthe sis capabilities enabling the generation of high quality clock networks Power Advantage Virtex 4 FPGAs power with a combination reduce of techniques By using a technology triple oxide Xilinx can make trade offs between speed and leakage that reduce static power consumption by 40 as we build transistors with differ ent gate oxide thicknesses for configuration intercon nect and I O This technol ogy enables us to offset and even reverse the increase in 34 Xcell Journal Features ASMBL Architecture Enables Cost Optimized Platforms With traditional FPGA architectures increasing the size of the devices to meet the demands for greater logic capacity and more memory typically results in parallel scaling of all the advanced features on the die rapidly increasing cost T
243. ures are available on PowerPC Virtex II Pro devices with the industry standard XRAY debugger To bring software development forward in time so that it can be started before the hardware is complete software teams can use our advanced prototyp ing products Nucleus SIM or Nucleus SIMdx These tools allow the develop ment of the complete application soft ware in a host based environment UML enables software teams to raise their level of abstraction and produce models of their software Nucleus BridgePoint enables full code genera tion by using the xtUML subset of UML 2 0 You can verify software hardware inter action in the Mentor Graphics Seamless co verification environment which allows combined hardware and software simulation for PowerPC Virtex II Pro devices EMBEDDED SYSTEMS Je SE rE SE ae Figure 3b Nucleus appears as an option in the drop down menu choosing which operating system to use with the MicroBlaze soft core processor Figure 4 Enabling the cache in the RTOS configuration parameters These tools when combined with the Nucleus PLUS RTOS are ideal for helping you maximize the functionality and efficien cy of your designs Conclusion The latest EDK configurable Nucleus PLUS RTOS brings a new dimension to systems incorporating high performance embedded processors from Xilinx Its small size means that it can use available on chip
244. urnal i hier Ghul i al jaj YAm Figure 2 Input distorted channel of the QAM demodulator XY Plo Figure 3 Output constellation of the QAM demodulator teen Fir t upija ref AMD wf re Spikes Dera Capan Pat Wate apa DEH Geet Pr Locate Pr seat gn Corer Cieee rattle D in shinai tz Eii Sota Zesbk Spiders Perel beet U Figure 4 System Generator GUI interface Le ls D E e Ges Figure 5 Phase offset slider bar points in a format well understood by the DSP architect such as sinewave or constel lation You can also automatically generate this exact same test bench and export it to an HDL simulation tool in this case in a format well understood by the FPGA designer binary waveform Also automat ically generated is golden data confirming similar functionality in the Simulink and HDL environments In the QAM example we chose to dis play the distorted channel on the output of the channel model Figure 2 and the 16 point constellation on the output of the QAM demodulator Figure 3 Note that the automatic generation of the test bench and golden data saves an enormous amount of time over other design flows Proceeding with the design implemen tation you can now generate the netlist through the System Generator token Figure 4 At the push of a button you can decide to generate a VHDL netlist which w
245. use of existing UARTs within standard MCUs along with on chip timers GPIO inter rupts and serial ports This is a better way Hardware MCU with Dedicated LIN Port An MCU with dedicated LIN port may appeal to more designers as it uses off the shelf verified silicon Thus it will not burden the software application with LIN protocol processing as shown in the pre vious examples This type of micro is well suited for CAN to LIN bus bridging applications where a need exists to pass data between the two networks This implementation also tends to be less power hungry than the equivalent soft ware solution As with most emerging networks however the availability of silicon and relatively high cost may be an issue and create long lead times so forward plan ning is a must with respect to ordering devices One of the potential downfalls of using these devices is when more than one LIN is needed For example in an ECU gateway you may need to use more than one MCU which will impact part costs manufacturing costs stocking costs and PCB complexity If your design requires something out side of the specification provided by the silicon vendor this may also cause issues A LIN node can be implemented in many micro controllers MCUs with no additional hardware except for a physical layer driver device of implementing than simply bit bashing but may have certain limitations in designs that already us
246. ve linear and decision feedback equalization DFE e 8b 10b and 64b 66b encode decode e Sonet jitter compliant at OC 12 and OC 48 line rates Table 1 RocketIO features at a glance vide comprehensive equalization tech niques to ensure signal integrity in a wide variety of applications Table 1 These advanced equalization techniques enable engineers to give new life to old systems by upgrading legacy backplanes In addition Virtex 4 FX _ devices include built in Ethernet connectivity enabling seamless chip to chip connec tions without consuming programmable logic resources The Ethernet MAC core supports 10 100 1000 Mbps data rates with UNH verified standards compliance and interoperability Winter 2004 High Performance DSP Developers told us they need to achieve higher DSP performance targets to imple ment next generation applications such as MPEG 4 video compression decompres sion and multi channel mobile commu nications Scaling existing DSP implementations to meet these targets with multiple programmable DSPs or dedicated ASIC hardware can be prohib itively expensive Designers also need to control system power consumption as they squeeze more functionality into smaller form factors To address new DSP performance requirements Xilinx crafted the versatile XtremeDSP slice providing twice the DSP performance of previous implemen tations while drawing less than 1 7th of the power Although all Virtex
247. ve way to bring designs to market quickly and also allow design flexibility A LIN network comprises one master node and one or more slave nodes All nodes include a slave communication task that is split into a transmit and a receive task while the master node includes an additional master transmit task The com munication in an active LIN network is always initiated by the master task the master sends out a message header that comprises the synchronization break syn chronization byte and message identifier Exactly one slave task is activated upon reception and filtering of the identifier which starts the transmission of the mes sage response The response comprises two four or eight data bytes and one checksum byte The header and the response part form one message frame The identifier of a message denotes the content of a message but not the destina Winter 2004 broadcasting messages from the master to all nodes in a network The sequence of message frames is controlled by the master and may form cycles including branches Flexible LIN Solution Programmable logic has long been accept ed as an effective way to bring designs to market quickly and also allow design flexi bility right up to production and beyond Historically this time to market advantage and flexibility had to be balanced with higher component costs But times have changed PLDs cost much less and can now be used in high volume cost sen
248. ver 80 of that market According to market esti mates the DSP market addressable by FPGAs is expected to grow to more than 3 billion by 2007 So as you can see the future looks very bright for Xilinx as the demand for very high performance DSP continues to grow We are well positioned to provide the devices the development tools and the support services to meet this growing demand Embedded Processing We are relatively new to the embedded pro cessing market three years ago we intro duced our Virtex IT Pro family which includes an embedded hard core IBM PowerPC processor Although it took awhile for the idea to catch on we now have thousands of design wins using our embed ded processors And in addition to the PowerPC processor we now offer our 32 bit MicroBlaze and the 8 bit PicoBlaze soft core processors All of these embedded processors work together using the same peripherals and IP so you can easily create complete high performance multi processor systems on a single low cost chip The total embedded processor market is very fragmented because there are multiple architectures and multiple operating sys Winter 2004 V E VV M THE tems Customers tend to stay with a known architecture because of their long term soft ware investment no one wants to re code and re port their designs to a new architec ture That s one reason why we chose the PowerPC as our high performance proces s
249. w seconds BlazeGen or User Source User Code BlazeGen or BSB Retargeting GNU GCC Hardware Edit Compile Loop A second path for code development is shown in Figure 3 BlazeGen generates small pre tested code snippets that fit entirely in one on board block RAM to provide you with a solid starting place for initial power up check out These snippets are treated just like user code for input to the GNU compiler You can enter and compile C C code from inside XPS or from an external editor and compiler using its own make files For large programs I recommend using an external GNU make facility The output from the compile process is an elf file that contains all code and symbolic information to be loaded directly by Seehau As shown in Figure 3 the classic edit compile debug loop familiar to embedded system engineers centers around the Seehau debugger Additionally a hard ware edit compile debug loop is now included that loops back through new builds in XPS Debugging with Seehau Seehau provides an intuitive source level debugger that can be made aware of logic signals in the fabric RTOS state and vari ables correlation of hardware signals to code execution and Ethernet performance char acteristics in Internet aware applications Seehau is a full featured source or assembly debugger with an integral real time trace facility It supports either PowerPC hard core or MicroBlaze soft core proce
250. with the same capabilities you would have as if the board was connected to your local PC Additional include the ability to functionalities e Reset the platform e Power cycle the platform e View a Web cam display of the VirtuaLab e Control voltage e View a component description e Observe LEDs e Utilize a virtual touch screen The remote target control gives you the ability to stimulate and observe the hard ware and can best be described as providing access to all the features of the hardware by having it on the Windows desktop During a VirtuaLab session you will have exclusive use of the evaluation platform and access to a complete Integrated Development Environment IDE that includes sample Xilinx high performance DSP application software as well as the ability to compile and upload code or any other application An innovative support feature called shadowing allows Nu Horizons field applications engineers to log into your ses sion providing you with technical assis tance if needed Xcell Journal 81 VW DIGITAL SIGNAL PROCESSING The Nu Horizons Xilinx VirtualLab Architects and designers can reserve exclu sive blocks of time by simply logging onto VirtuaLab TechOnLine com to register You can select one of three types of interactive VirtuaLab experiences 1 New Xilinx user Although techni cally astute you have not been exposed to either the development tool flow or the Xilinx FPGA
251. works visit S www mcnc rdi org Xcell Journal 3l d BE HEF ft i til iil iy anita At HI rw UL jj dn WO b d Jr s l i 5mp pil Bb d i T it LE E FEEF gg tits tie H r TET own oi ieat ii D e Het F F d i 4 Elke cl Au D gd i Lik nemit ony bebe e ge 1 H III ie al 1 i a H ODER AA Ku i Buurg sek rr St WS Ser a Km VIRTE A PHO n an easy to use stand das USB2 0 cree Berra a PCL PCI oon The DN6000k10 supports up to 9 2vp100 VirtexII Pro FPGAs fenced ble amount of FPGA to FPGA interconnect for easy logic partitioning FPGA s are interconnected with rocket I O s enabling the movement of data between them at 100 s of GB s In addition to 6M gates the DN6000k10 also packs on board e 2 PowerPC cores per FPGA 400MHz e Up to 8MB embedded RAM 444 18x18 multipliers per FPGA The e 12 external 133MHz 32M x 16 DDR SDRAM s 5 4Mx16 FLASH INT e 480 connections for daughter card and logic analyzer interfaces Configuration is fast easy and robust using a SmartMedia based FLASH card or via the G USB interface Every tool utility driver and support application that The Dini Group could toU p imagine you might need is included Please contact us for complete specifications we are eager to show you how our hard work can make your job easier 1010 Pearl Street Suite 6 e La Jolla CA 92037 e 858 454 3419 e Email sales dinig
252. xible topologies are ideal the need exists for global standards to offer better business cases to suppliers which would ultimatelylead to greater competi tion and lower prices J1850 in the U S and the ubiquitous Bosch defined Controller Area Network CAN in Europe are the most popular standards to date but in some applications can be con sidered overkill In such applications you could consider using LIN as an alternative The Local Interconnect Network LIN is a single wire UART based networking architecture originally developed for automotive sensor and actuator networking applications The LIN master node connects the LIN net work to higher level networks like CAN extending the benefits of networking all the way to the individual sensors and actuators 92 Xcell Journal Winter 2004 In addition to CAN LIN also comple ments Media Oriented Systems Transport MOST for high speed data rates and FlexRay for safety critical applications such as steer and brake by wire Figure 1 shows the relative cost per node and speed of var ious automotive networks Conceived in 1998 the LIN consortium comprises car manufacturers Audi BMW DaimlerChrysler Volvo and Volkswagen LIN is an inexpensive serial bus used for distributed body control electronic systems in vehicles It enables effective communication for smart sensors and actuators where the bandwidth and ver satility of CAN is not requ
253. xico XUP UNM prototype board project When the XUP needed someone to develop a prototype board for donated Virtex 1000 devices UNM jumped at the chance The quality of this project clearly shows the high level of inter __ E action between the local Xilinx facility the XUP and the university UNM students had to meet several key design criteria before beginning the proj ect The most important was that they had to design the board using donated Virtex 1000 BGA560 FPGAs These million system gate devices are ideal for university projects Their functionality and size make them suitable for a wide range of projects with the Xilinx Integrated Software Environment ISE System Generator or Embedded Development Kit EDK Winter 2004 Xcell Journal 99 Project Requirements As shown in Figure 1 the primary goal was a platform on which students could com plete entire projects as well as one that would allow easy interfacing to multiple external options for increased capability Another requirement for the prototype board was to allow a maximum number of inputs and outputs students need to be able to get signals into and out of the board Where possible it was beneficial to interface with Digilent series circuit cards already available Because Digilent input output boards are available to schools through the XUP donation program this allows students a wide range of input and output options by switching betw
254. xpense caused thereby eis lee re ee ore SIN all tle HIN Sle EINE IDIDIEID 5 SE DIGITAL SIGNAL PROCESSING Taking Digital Signal aL ek Processing to the Extreme Embedded with Xilinx TTTII n this series on digital signal processing In this series on embedded processing the Xcell Journal spotlights the challenges the Xcell Journal samples a broad base of and solutions to developing extremely of embedded processing applications high performance DSP applications C OMERO STORY Sign of the Times Xilinx makes high tech A Tiig Al e outdoor advertising in a Times Square possible UI WINTER ZO A bootie oO il Considerations tor High Bandwidth TCP IP PowerPC Applications 5 The Xilinx Gigabit System i Reference Design maximizes TCP IP performance Ji 4 Virtex 4 Breakthrough Performance at the Lowest Cost Virtex 4 FPGAs deliver what you ve been looking for 3 3 Implementing the H 264 AVC Video Coding Standard on FPGAs Xilinx Virtex FPGAs provide excellent co pre and post processing hardware A acceleration solutions Simulink Brings Model Based Design to Embedded Signal Processing The complexity ot FPGA based signal processing systems drives the need for new 6 6 development approaches Xcell journal Bic H ff o 6 VESA T Ee cee 9 DEER 10 High Bandwidth TCP IP PowerPC Antenn 14 Embedded Nucleus PLUS RTOS Using Xilinx HL 17 MicroBlaze and PowerPC as Test G
255. xperiences learned in these types of endeavors pay great benefits by allowing students to learn from both real world practice and theoretical classroom experiences UNM has been a long time supporter of Xilinx training and the school has developed a software hardware and number of online tutorials on many topics ISE VHDL Generator One series includes Floorplanner System and XPower www eece unm edulvhdih Another series includes EDK and System Generator www eece unm edu xup and www eece unm edu signals And yet another indication of the interactive efforts between academia and industry are the annual professors workshops www eece unm edu xup workshops htm Dr Howard Pollard Dr Pattichis Alonzo Vera and Jorge Parra of Marios UNM were essential to the success of this project Frank Wirtz and Reno Sanchez of Xilinx Albuquerque Rick Ballantyne of Xilinx Canada and Jeff Weintraub of XUP were also a tremendous help For more information on any of these topics e mail Craig Kief at kiefc ece unm edu or Alonzo II Vera at alonzo ece unm edu e Winter 2004 FPGAs Ensure Flexible and Adaptable IP Interconnection Newport Networks 1460 session controller enables direct P IP F d CSS a es E ve ef A we eeeenee en GC e a e e e r Ae oe irr S Crew kW Wei e See p Ke Winter 2004 2 interconnection to support the full potential of IP
256. y that accepts input signals from various sources through a network of multi pin connectors High speed I O channels sup port the additional nodes in a network topology neatly harmonized to the Xilinx architecture Fully interconnected and configurable through their communication ports these add on sites are also connected to the embedded Virtex II Pro FPGA The avail ability of a network of add on sites removes the main expandability restrictions often associated with other platforms and offers the OEMs a highly compact design and development tool More importantly this scalable system architecture makes the SMT148 a perfect development plat form with which to resolve the many issues related to the implementation of multiple radio functionalities in a SE single environment These can be addressed as multiple software mod SS ules running on Sundance s reconfig urable hardware platform GB3 IP Cores GB4 Sundance takes advantage of the high performance DSP acceleration capabilities and flexible connectivity that the Virtex II Pro FPGA provides by supporting developers with a fam ily of software tools and IP cores The SMT148 I O flexibility enables you to rapidly investigate and experiment with features of the Virtex II Pro FPGA as well as those of developed IP cores from Sundance These include multi tap complex fil ters Viterbi decoders encoders a complete transmitter QAM mapper multi phase pulse sha
257. zation and individual channel control DIGITAL SIGNAL PROCESSING MVE The trade off is essentially in the area resource utilization the optimized four channel group solution results in a very efficient and compact design requiring fewer FPGA resources However it impos es the restriction that the four channels must share the same controls The single channel solution imposes no such restric tion the trade off here is the linearly increasing FPGA resources used which is directly proportional to the number of channels required Multi channel modulators are automat ically constructed through the use of mul tiple copies also referred to as groups of the single or four channel implementa tion For example a four channel modula tor may be constructed with four copies of single channel granularity or a single copy of the optimized 4 channel granularity design Similarly a 12 channel modulator may comprise 12 copies of the single chan nel granularity design or 3 copies of the optimized 4 channel design The ease of use is evident in that the Figure 5 J 83 Annex B four channel granularity design Figure 6 J 83 Annex A C four channel granularity design Xcell Journal 53 No DIGITAL SIGNAL PROCESSING only requirement is for you to specify the parameters the multiple instantiations of the basic footprints and the required con nections between them are automatically generated leaving you with a core desi

Xcell Journal: The authoritative journal for programmable

Contents

Download Pdf Manuals

Related Search

Related Contents