Home

"TMS320C6000 Imaging Developer's Kit IDK User's Guide"

image

Contents

1. Word Word Y Buffer 000 000 001 001 010 010 011 011 first pixel captured Cr Buffer first pixel captured Cb Buffer Hardware Architecture 2 5 Video Capture Read accesses to the frame memory are throttled as appropriate using the DSP EMIF ARDY signal Since the SDRAM memory is faster than the ASRAM interface this is generally only necessary at the beginning of a burst of reads and possibly when refreshes of the SDRAM bank are required The FPGA in cludes a small read FIFO to minimize the effect of this It should be noted how ever that the frame memory management is most efficient when accessed lin early It is suggested that the application software access the memory in a lin ear fashion to minimize SDRAM page misses which slow the memory trans actions The ARDY signal is also asserted when bank conflicts occur resulting from arbitration effects with the capture line FIFOs The effect is minimized by the existence of the FIFOs plus a priority scheme implemented in the FPGA controller All video input timing is provided by the TVP5022 This includes a vertical syn chronization pulse plus a composite blanking signal which indicates the pres ence of active data on the pixel bus A pixel clock is also provided which is used by the FPGA to latch data into the aforementioned line FIFOs Data is routed t
2. 0006 eens 2 7 RGB16 Display Buffer Format 0 ccc ect e 3 1 IDK Demo Block Diagram 0 0 eee III III 3 2 Channel Task Layouts for JPEG Loop Back Demo and Image Processing Demo 3 3 JPEG Loop Back Channel 0000 c cece cnet nent e nee eens 3 4 JPEG Loop Back Demo Channels and I O Buffers lilii eee ene 3 5 Split Cache SRAM Mode with QDMA Data Transfer 00 cee eee ees 4 1 Software Architecture for ImageLIB Functions Based Standard Algorithms 4 2 2D Wavelet Transform 00 cece tent n eee 5 1 JPEG Loop Back Demonstration 0 00 cece eee eens 5 2 Multichannel H 263 Decode Demonstration 6 0 cee narrer vannene 5 3 Image Processing Demonstration 0 cece cece eee eee eee 5 4 Image Processing Demonstration Display aauvnuanannrnannrenrnrenrreen 5 5 H 263 Loop Back Demonstration 0000 c cece ete teens 5 6 2D Wavelet Transform Demonstration 0000 eects 5 7 2D Wavelet Transform Components sssusssssssessss sees 6 1 JPEG Encoder s Luisa tide dare smeden oe e red Age dodo td we ad Ree ad ares 6 2 Raster Scanned Image Data 6c c cece nett ee 6 3 Reformatted Image Data csucs erririk ee eee enne 6 4 Zig Zag Reordering of Transformed Coefficients Input and Output 6 5 JPEG DSO der 6 ses seta ceed ties MERE e edens PP MEE REIR de PERROS 6 6 Decoded Image Da
3. Create an external section called ext sect for external memory Three arrays are declared in external memory for the actual sizes to be expected in IDK scenario These are as follows IMAGE DATA input ch data 640 by 480 character array with 8 bit pixels output ch data 640 by 480 character array with 8 bit pixels SCRATCH PAD external scratch pad for storing temporary results External scratch pad is twice the image size and 12 lines for context Intermediate array is an array of shorts Therefore we need space to store up to 2 arrays of shorts of the image size and 6 lines of context Henc xternal memory has IMG COLS IMG ROWS 6 4 External memory usage input ch image 640 by 480 char array 307200 gt 30 K bytes ef D 1 2D Wavelet Transform Algorithm Example int output ch image 640 by 480 char array gt 307200 gt 30 K bytes ext mem 2 arrays of 646 by 480 shorts gt 1 24 M bytes Kf A 4 Align external image arrays and scratch pad on dword boundaries and declare sections Also declare the arrays with the right sizes pA pragma DATA ALIGN input ch data 8 pragma DATA ALIGN output ch data 8 pragma DATA ALIGN ext scratch pad 8 pragma DATA SECTION input ch data image ext sec
4. Display output may be in the form of an 8 bit gray scale or a 16 bit RGB 565 signal The daughtercard hardware includes the following One set of TMS320C6000 daughtercard connectors male solder side Female RCA connector for composite video input NTSC PAL Female 15 pin VGA connector for RGB monitor output Figure 2 1 IDK daughtercard Block Diagram Daughtercard Description Peripheral daughtercard connector Events TINPn EINTn FPGA Video CTL Display line FIFO CTL regs enable EMIF logic o Line FIFOs 16 Display line FIFO A DSP EMIF interface y Peripheral daughtercard connector c3 32 Aa em C7 Hardware Architecture RGB out Composite in 2 3 Video Capture 2 2 Video Capture The IDK daughtercard includes one video input port for NTSC PAL video The NTSC PAL input consists of an industry standard RCA jack for composite vid eoinput Theinputis routedto the TVP5022 video decoder and may be config ured for square pixel or ITU standard resolutions The TVP5022 performs digi tization and minimal filtering of the video inputs All video input data is digitized in the 4 2 2 format to produce a standard YCrCb pixel stream Since most DSP algorithms operate on input data as separate Y Cr and Cb blocks the FPGA interface performs separation of the digital stream before writ
5. Introduction 1 3 IDK as a Rapid Prototyping Platform m The ability to develop new libraries that use CSL as their foundation to al low for easy data transferred An example of this is the Image Data Manag er described below that uses CSL to abstract the details of double buff ered DMAs CSL software and associated documentation is available by accessing http www ti com and navigating to the appropriate site Image Data Manager Image Data Manager is a set of library routines that of fer abstraction for double buffering of DMA requests to efficiently move data in the background during processing They have been developed to help re move the burden from the user of having to perform pointer updates and man aging buffers in the code Image Data Manager uses CSL calls to move data between external and internal memory during the course of processing Image Data Manager offers the following advantages to software developers 3 E The ability to separate and compartmentalize data transfers from the algo rithm leading to software that is easy to understand and simple to main tain The ability to re use the data transfer routines where applicable 1 2 2 Rapid Prototyping Hardware The IDK hardware consists of a C6711 DSK with 16MB SDRAM and a daugh ter card that provides the following capabilities d 3 d Video Capture of NTSC PAL signals composite video Display of RGB signals 640x480 or 800x600 re
6. get pointer to output bitstream buffer gt out execute H 263 encoder H263ENC TI IH263ENC encode IH263ENC Handle encHandleO uchar amp in out get encoder status H263ENC TI IH263ENC control 1IH2603l JX Handle encHandleO0 IH263ENC GET STATUS amp es C6000 DSP Image Video Processing Applications 6 19 H 263 Encoder 6 44 H 263 Encoder Performance H 263 Encoder performance has been measured on live video The following performance is based on measurements from code operational on C6711 DSK Table 6 3 H 263 Encoder Performance Bit Rate 96 Not Cycles Frame kbps Format 96e INTRA INTER Coded Frame Rate 512 CIF 3 72 25 7 887 000 20 128 QCIF 3 68 29 1 971 000 76 Note TMS320C6711 CPU Frequency 150 MHz CIF 352x288 4 2 0 QCIF 176x144 4 2 0 6 4 5 Further Information on H 263 Encoder For further information on C6000 DSP H 263 Encoder implementation see H 263 Encoder TMS320C6000 Implementation Literature number SPRA721 6 20 H 263 Decoder 6 5 H 263 Decoder The H 263 video compression standard was originally developed for video conferencing However it is also finding use in other areas such as streaming video The fundamental coding techniques involved in H 263 are Motion Com pensated prediction Discrete Cosine Transform DCT Quantization and En tropy Coding In the baseline H 263 standard video frames are coded in either intr
7. of pixels and perform pixel expand followed by num lines calls to the horizontal wavelet to process each line one at a time Get char buffer and perform pixel expand by calling ImageLIB pixel_expand_asm routine writing the array of expanded shorts into pix_expand Image Processing Functions in ch data unsigned char dstr get 2D amp i dstr pix expand asm cols num lines in ch data ptr pix expand X Call the horizontal wavelet once per line num_lines times to perform horizontal wavelet and write out output into the output array Increment input and output pointers by cols after every iteration of the loop 0 j lt num lines j ptr wave ptr pix expand j cols ptr out out data heey wols wave horz asm ptr wave qmf mqmf ptr out cols If half the iterations of this loop have been completed then perform rewind using Image Data Manager and start fetching from new location This performs fetching of the odd field in case of odd even field case or from odd line for progressive rows num lines gt gt 1 1 Commit last chunk that was written rewind input and output streams to their respective rewind addresses Software Architecture Algorithms Creation 4 13 Image Processing Functions dstr put 2D amp o dstr dstr rewind amp i dstr in rewind DSTR INPUT 1 d
8. Each of the Image Processing functions such as those listed above is a wrapper function around one or more core ImageLIB kernels These wrapper functions are responsible for managing image data input and output for the ImageLIB function to enable it to process an entire image or part of an image The actual data movement is done by the Image Data Manager As an exam ple of Image Processing Functions structure the wave horz image func tion is shown below A fuller explanation of the Image Data Manager invoked in the example below is provided in section 4 6 void wave horz image IMAGE in image ev IMAGE in image od short qmf short mqmf SCRATCH PAD scratch pad int scale img type img type val initialization and control code if scale nose Input Stream i dstr Jo E Start address in image ev gt img data Size external size Internal address int mem Size pix char offset Quantum cols Multiple num lines Stride stride cols Window size 1 Double buffering Direction DSTR INPUT c ET RENS FA LE E a iU err code dstr init amp i dstr void in image ev gt img data 2 in image ev img rows in image ev gt img cols void int mem pix char offset cols num lines stride cols 1 DSTR INPUT if err code fprintf stderr error initializing input stream pix_expand n exit 3 4 10 Image Processing Func
9. create decoder parent instance decParent H263PDEC TI Obj ALG create IALG Fxns amp H263PDEC TI IALG NULL IALG Params NULL C6000 DSP Image Video Processing Applications 6 25 H 263 Decoder create decoder child instance decHandleO H263DEC TI Obj ALG create IALG Fxns amp H263DEC TI IH263DEC decParent IALG Params NULL clear decoder status structure H263DEC TI IH263DEC control IH263DEC Handle decHandleO0 IH263DEC CLR STATUS amp ds while 1 get pointer to input bitatream gt in get pointer to output frame buffer out execute H 263 decoder H263DEC TI IH263DEC decode IH263DEC Handle decHandleO in out t encoder status EC TI IH263DEC control IH263DEC Handle decHandleO0 IH263D ET STATUS amp ds 6 5 4 H 263 Decoder Performance H 263 Decoder performance has been measured on a collection of bitstreams representing various types of scene content commonly used resolutions and bitrates The following performance is based on measurements from code op erational on C6201 EVM and C6211 DSK 6 26 H 263 Decoder Table 6 4 H 263 Decoder Performance Bitstream News News Foreman Coastguard Coastguard Foreman Silent Silent Format QCIF QCIF QCIF QCIF QCIF CIF CIF CIF TMS320C6201 TMS320C6211 Not Frame Frame INTRA INT
10. Capture display driver Lnd CSL DSP BIOS In order for algorithms to work in a real time system there must be an applica tion framework to connect algorithms with DSP hardware peripherals In a typi cal DSP application the framework is a software module or a group of software modules that resides on top of algorithms and peripheral I O drivers It is usual ly responsible for getting input data from peripheral devices passing the data to algorithms for processing and sending the processed data to peripheral de vices for output The framework is also responsible for the creation deletion configuration and execution of algorithm instances In simple static applications the framework is usually hard wired and statical ly configured to run just a single algorithm or a fixed set of algorithms However in dynamic multichannel multi algorithm applications the framework can be fairly complicated It is usually divided into multiple layers so that its core is ap plication independent This allows the same framework core to be used for a range of different applications The framework layer in IDK applications includes all modules above the algo rithms and the video capture and display driver as shown in Figure 3 1 The framework structures are slightly different among different demo scenarios Software Architecture Applications Framework 3 3 Framework for Combining eXpressDSP Compliant Algorithms For example in the
11. first time gput gets called since no output is ready nothing gets xj committed It merely initializes the outputs side Since the last output buffer will not be ready till the end of the loop one extra put is required outside the loop AJ Kf Use 2D stream routines for sliding window and 1D stream routines for plain double buffering DP All algorithms should get the current set of working buffers by wy calling get and put functions that return pointers to current buffers to be processed and sent out The first call to put merely gets th Kl address of the first buffer to be written to by the algorithm T Using Image Data Manager C 3 Using Image Data Manager for i 0 i lt 32 i i buf dstr get 2D amp i dstr o buf dstr put amp o dstr printf i 2d i for j 0 j 4 jt t o buf j i buf j i_buf j 4 i_buf j 8 i buf j 12 printf 3d pet 3d 3d i buf j i buf j 4 i buf j 8 i buf j 12 putchar n Flush out the last buffer and close the output stream Rewind and start operations from the 4th word instead of the Oth word if dstr put amp 0 dstr dstr rewind amp i dstr void arrayl 4 DSTR INPUT 2 for i 0 i lt 32 i i buf dstr get 2D amp i dstr o buf dstr put amp o dstr printf i 2d iy for j 0 j lt 4 j o buf j i buf j i_buf j 4 i_buf j 8 i
12. 5 4 1 Data l O and User Input Specifics iilis 5 4 2 Signal Processing Operations Sequence auuravunrnnnurennurenn 5 4 38 eXpressDSP APIs for H 263 Loop Back Demonstration 5 5 2D Wavelet Transform Demonstration 0 000 c cece eee eee ee eed 5 5 1 Data l O and User Input Specifics 0 0c eee eee 5 5 2 Signal Processing Operations Sequence 000 cece eee eee 5 5 3 eXpressDSP APIs for 2D Wavelet Transform Demonstration 6 C6000 DSP Image Video Processing Applications 0000eee eee eee e eee Describes C6000 DSPs used in image video processing applications JE SEL EEE 6 2 JPEG ENCOd r 5 corel vie ated DESEE br D LA RE Lieb E e Le ee Rd 6 2 1 JPEG Encoder Algorithm Level Description aaavannnnrannnnannnn 6 2 2 JPEG Encoder Capabilities and Restrictions 0 00 cee eee 6 2 3 JPEG Encoder API 00 ccc ccc eect eee ANENE RN 6 2 4 JPEG Encoder Performance 00 ccc cece een eee eres 6 2 5 Further Information on JPEG Encoder 0 0 cece eee eee eens 63 JPEG Decoder eres esee eek cela Sean ead Sawa ER CE d asd es 6 3 1 JPEG Decoder Algorithm Level Description 0002 0e seen 6 3 2 JPEG Decoder Capabilities and Restrictions 0 cece ee eee 63 3 JPEG Decoder APL irte ec brutes eta serken Hadad eek ei 6 3 4 JPEG Decoder Performance ssslssssleesee eee 6 3 5 Fu
13. WAVE PARAMS wave params int err Set parameters for even field namely a start address b columns c rows In this case half the rows are assumed to be in th ven field and the other half is assumed to be in the odd field in_image_ev img_data input_ch_data in_image_ev img_cols TMG_COLS in_image_ev img_rows TMG_ROWS gt gt 1 Set parameters for odd field namely a start address b columns c rows In this case since we have a contiguous image to simulate fields the odd field is set to point to the second line in_image_od img_data input_ch_data TMG_COLS in_image_od img_cols TMG_COLS in_image_od img_rows TMG_ROWS gt gt 1 Set parameters for output image a start address b columns c rows The rows of the output image will be the sum of the output rows of the input even and odd field rows out image img data output ch data out image img cols TMG COLS out image img rows TMG ROWS Set parameters for external scratch pad and internal scratch pad namely a external scratch pad b size of external scratch pad c internal scratch pad d size of internal scratch pad Scratch pad ext data ext scratch pad scratch pad ext size sizeof ext scratch pad scratch pad int data int scratch pad scratch pad int size sizeof int scratch pad
14. about the macroblock coding type and the coded block pattern of two chromi nance blocks in the current macroblock CBPY is also a VLC that is used to derive the coded block pattern of four luminance blocks in the current macro block DQUANT is a 2 bit code which specifies the change in quantization scale with respect to the previously coded macroblock If the macroblock type is of type INTER then the motion vectors for luma and chroma are decoded The vector predictor is obtained from the three neighbor hood vectors by using a median filter as specified in the H 263 standard The differential vector derived from the bitstream is then added to the vector predic tor to reconstruct the luminance vector This vector is scaled by a factor of 2 to obtain the chrominance vector According to the derived vectors the ad dresses of reference blocks in the reference frame located in external data C6000 DSP Image Video Processing Applications 6 23 H 263 Decoder memory are computed and used by the function loadRefMB to load the data The motion compensation type a for copy b for horizontal interpolation c for vertical interpolation d for two dimensional interpolation is also deter mined in this function If at least one of the six CBP bits is set the decoder must decode the IDCT coefficients and apply IDCT Functions for VLD Inverse Quantization Inverse Zigzag Scan and IDCT are invoked If the macroblock type is INTRA then
15. by decoding several bits during each table look up and 2 by effectively constraining the search range within the table for each look up The Imbd instruction gives the bit position where a first bit reversal occurs in a register Many intelligent decoding methods can be designed using this ca pability For example in this implementation the value returned by the Imbd instruction is used to select a sub table from the entire variable length table for an exhaustive search VLD using the Imbd operation is shown below register A4 contains valid 32 bits from the JPEG bit stream The Imbd operation on A4 returns the number of leading 1s in A4 which results in Lj Decoding of 5 code bits in a single cycle _j Unique identification the sub table for exhaustive search Lj Identification of the number of additional bits after the five 1s to be ex tracted from A4 for the exhaustive search A4 11111011001111111001100001011010 Imbd 0 A4 z 5 gt Unique sub table and number of additional bits to be extracted from A4 for further decoding C6000 DSP Image Video Processing Applications 6 9 JPEG Decoder Such optimizations in the VLD mechanism restrict the use of the algorithm to a specific table This is because the structure of the table is exploited during the decoding process The baseline JPEG recommends separate DC and AC tables for luminance and chrominance components Hence VLD decoding has to be done separately for the two components
16. cycs 7204 Corr 3x3 3x3 Correlation with cols 2 4 5 21 cycs 1120 bytes Roundin i3 ounding cols is number of image columns For cols 256 cycs 1164 For cols 720 cycs 3252 corr_gen Generalized Case 1 Even number of filter taps 768 bytes Correlation m 15 cols m 2 cycs m is number of filter taps cols is number of image columns For m 8 cols 720 cycs 2968 Case 2 Odd number of filter taps k 15 cols k 2 10 cols 3 4 cycs k m 1 m is number of filter taps cols is number of image columns For m 9 cols 720 cycs 3518 dilate bin 3x3 Binary Dilation cols 4 6 34 cycs 480 bytes cols is number of image cols in bytes For cols 128 8 cycs 226 For cols 720 8 cycs 1114 erode_bin 3x3 Binary Erosion cols 4 6 34 cycs 480 bytes cols is number of image cols in bytes For cols 128 8 cycs 226 For cols 720 8 cycs 1114 6 32 ImageLIB Library of Optimized Kernels Table 6 6 ImageLIB Kernels Performance Continued Function errdif bin fdct 8x8 histogram idet 8x8 mad 8x8 mad 16x16 median 3x3 perimeter Description Error Diffusion Binary Output Forward Discrete Cosine Transform FDCT Histogram Computation Inverse Discrete Cosine Transform IDCT 8x8 Minimum Absolute Difference 16x16 Minimum Absolute Difference 3x3 Median Filter
17. gorithms The maximum size is updated if the current algorithm requested a bigger buffer Call CM SetAlg to set the algorithm in a channel Inside Channel Man ager a new instance of that algorithm is then created and is attached to that particular channel In this step the Channel Manager calls the algAlloc function of the agorithm again and checks each entry in the memTab array It then allocates persistent memory blocks and external scratch memory blocks on either internal on external heap according to the memory requests of the algorithm It also allocates a scratch buffer on the internal heap space using the maximum scratch size information col lected in CM RegAlg earlier if the scratch buffer has not been allocated yet If all memory allocations succeed the Channel Manager calls the algl nit function of the algorithm to initialize the allocated memory blocks and completes the creation of a new instance of that algorithm The deletion of algorithm instances also happens in CM SetAlg function Be fore new algorithm instances are set to a channel old instances must be de leted Channel Manager calls the algorithm s algFree function to get the base addresses of all allocated memory blocks in that instance It then frees all blocks that are either persistent blocks or external scratch blocks Parent instance and internal scratch buffer are not deleted because they are shared resources 3 5 5 Parent Instance Support Note
18. the picture layer Based on the information it then sets up several variables in the main parameter structure H263DecPa ram including frame buffer pointers dimension of the image etc The func tion calls h268DecGOB an appropriate number of times The function h263DecGOB extracts GOB layer specific information from the bit stream and calls h268DecMB an appropriate number of times The function h263DecMB performs the actual decoding on a macroblock MB of data This function first determines whether the current macroblock has been coded If it has not been coded then a corresponding macroblock in the reference frame buffer must be copied to the output frame buffer so that the decoder can properly reconstruct the next frame Not coded is a 1 bit flag in the H 263 macroblock layer syntax that indicates whether the corresponding macroblock has been coded or not If it is not coded the function copyMB is invoked to copy the corresponding MB in the reference frame buffer to the out put frame buffer Figure 6 11 h263DecMB Overview H 263 Decoder Decode MCBPC amp CBPY deccbp Y Decode IDCT coefficients dect coef N Motion Compensation h263DecMC If the macroblock has been coded then further processing is required starting with decoding of the following information from the H 263 macroblock layer syntax MCBPCis a Variable Length Code VLC that contains the information
19. Ck Ck Ck Ck KKK KKK KKK KKK KKK KKK ck ok ck ok ck ok ckok ck ok ke e ke x f Since the output of the high pass filter is decimated by eliminating odd output samples the loop counter I increments by 2 for every iteration of the loop Let the input data be do d1 dy 1 where N cols and M 8 Let the high pass filter be go 9g7 Outputs yo yi are generated as EJ Yo Y97AN m 2 J6dn m 1 God yi G7An m 2 G6U M 1 godi ua If the input array access d goes past the end of the array the pointer is wrapped around Since the filter is in floating point sf it is implemented in Q15 math Filt ptr points to the end of the xy high pass filter array and moves in reverse direction d Jf KKK KK KK kk kk kk kk Ck Ck kk kk kk kk kk kk Ck kk kk kk kk kk kk Ck Ck KKK KK KK KKK KKK KKK e ke x f Software Architecture Algorithms Creation 4 17 ImageLIB or Custom Kernels i lt iters i 2 sum Or filt ptr mqmf M 1 XpLr xsStart xstart 2 if xstart x end xstart in data for j 0 j M J xdata xptr hdata filt ptr prod xdata hdata if xptr gt x_end xptr in_data sum prod out data sum gt gt Qpt 4 18 Image Data Manager 4 6 Image Data Manager Image Data Manager IDM is a set of library routines that offer abstraction for double buffering DMA requests to effi
20. Consider the JPEG Loop Back demo consisting of two channel tasks as shown in Figure 3 2 Each channel task contains one channel object The loop back channel object consists of three algorithm instances a JPEG en coder instance a JPEG decoder instance and a color space conversion instance as shown in Figure 3 3 Figure 3 3 JPEG Loop Back Channel p c In the loop back channel the output of the encoder instance feeds directly into the decoder instance and the output of the decoder instance feeds directly into the color space conversion instance This is the reason they can be grouped into a single channel The Channel Manager is then responsible to execute these instances and control the data flow between instances In other cases it is better to have algorithm instances in separate channels even when the output of one algorithm instance feeds into another This can happen in cases where output data of one instance is shared by multiple instances Again considering the JPEG Loop Back demo as an example please refer to Figure 3 4 which shows three channel objects One of the channel objects is the loop back channel discussed above another one is the Channel Manager Object Types pass through channel consisting of an instance of the color space conversion algorithm The third channel is the preprocessing channel which consists of an instance of the pre scale algorithm to convert the in
21. Cr Buffer 2 of 3 field 0 0xM00OE8080 0xMOOFD1 FF Capture Frame Memory Cb Buffer 2 of 3 field O Field Programmable Gate Array FPGA Interfaces A 3 EMIF ASRAM Interface Table A 2 IDK Memory Map 2MB Capture Memory Option Continued Address Range 0xM0100000 0xMO012A2FF OxM012A300 0xM013F3FF 0xM013F400 0xM01545FF 0xM0154600 0xMO017E8FF 0xM017E900 0xMO0193A7F 0xM0193A80 0xM01 A8BFF 0xM01A8C00 0xM01 D2EFF 0xM01D2F00 0xMO01E807F 0xM01E8080 0xMO01FD1FF 0xM01FD200 0xMO01FFFFF O0xM0300000 0xM037FFFF 0xM0380000 0xMOSFFFFF Interface Capture Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Capture Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Capture Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Reserved FPGA control registers TVP3026 Registers Table A 3 IDK Memory Map 8MB Capture Memory Option Address Range 0xM0000000 0xMOOSFFFF 0xM0040000 0xM005FFFF 0xM0060000 0xM007FFFF 0xM0080000 0xMOOBFFFF 0xM00C0000 0xMOODFFFF OxM00E0000 0xM00FFFFF OxM0100000 0xM013FFFF 0xM0140000 0xM015FFFF 0xM0160000 0xM017FFFF 0xM0180000 0xMO01BFFFF 0xM01C0000 0xM01DFFFF OxM01E0000 0xM01FFFFF Interface Capture Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Capture Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Capture Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Captur
22. DAT PRI LOW 0 wavelet codec amp in image ev amp in image od amp out image amp scratch pad amp wave params FLDS DAT Close 0 DAT PRI LOW 0 See Appendix D for a full driver code example The Algorithm in turn invokes the multiple Image Processing Functions that compose the overall algorithm as shown below for the example of Wavelet Transform void wavelet codec IMAGE in image ev IMAGE in image od IMAGE out image SCRATCH PAD scratch pad WAVE PARAMS wave params img type img type val internal memory initialization 4 8 Algorithm Perform the horizontal wavelet transform on the whole image by calling wave horz image to perform 1 scale of analysis wave horz image in image ev in image od qmf int mamf int scratch pad 0 img type val i Ru Perform the vertical wavelet transform on the whole image by calling wave vert image to perform 1 scale of analysis ul wave vert image in image ev in image od qmf int mqmf int scratch pad 0 img type val Perform the wavelet display of the resulting wavelet transform by determining the maximum and minimum of the sub images and re normalizing to scale pixels to range 0 255 wavelet display out image scratch pad wave params Software Architecture Algorithms Creation 4 9 Image Processing Functions 4 4 Image Processing Functions
23. FPGA Interfaces A 7 EMIF ASRAM Interface Table A 4 IDK FPGA Control Register Bit Descriptions Continued Register Field Function Comments DISPEVT HEVENT Display horizontal timing event 000 TINPO 001 TINP1 010 None 011 None 100 EINT4 101 EINT5 110 EINT6 111 EINT7 DISPEVT VEVENT Display vertical timing event 000 None 001 None 010 None 011 None 100 EINT4 101 EINT5 110 EINT6 111 EINT7 DCOMP ADDRESS Display frame buffer address address compare for display FIFO DDRAM COL Display memory column bits 00 8 bits 01 9 bits 10 10 bits 11 Reserved CAPTCTL FLIP Flip page request Write to request read status CAPTCTL OWN Application buffer ownership 00 Own Buffer 1 of 3 01 Own Buffer 2 of 3 10 Own Buffer 3 of 3 11 Reserved CAPTEVT EVENT Capture horizontal timing event 000 None 001 None 010 None 011 None 100 EINT4 101 EINT5 110 EINT6 111 EINT7 CAPTEVT MEM Capture memory size select 0 2MB 1 8MB CAPTEVT SQP Capture sample rate 0 Square pixel 1 ITU601 A 8 Appendix B Scaling Filters Algorithm Pre Scale Filters These filters are use to pre scale captured field data from 640x240 resolution to 320x240 resolution for input to the following dem onstrations JPEG Loop Back Image Processing The pre scale filters horizontally scale 640 samples per line to 320 samples per line using averaging filters as shown below If consecutive input samples on a row are A B C D as show
24. Image Processing demo there are four Channel Tasks one for each processing channel while in the JPEG Loop back demo there are only two Channel Tasks Figure 3 2 shows the channel task layout of these two demos Also I O tasks and Message Handling tasks are different depending on whether the demo needs capture data input or whether it han dles a particular type of message Figure 3 2 Channel Task Layouts for JPEG Loop Back Demo and Image Processing Demo JPEG loop back demo Image processing demo To make the framework more general and scalable and to make modules re usable the framework modules are divided into two layers The upper layer is application specific while the lower layer is application independent The upper layer includes all DSP BIOS tasks and system initialization module which is the main function This layer is responsible to start the application to process host messages and to get I O buffers from capture display drivers and pass them to channels for processing This layer makes use of DSP BIOS task and semaphore objects for task scheduling and synchronization The lower layer of the framework is the Channel Manager CM module which directly interfaces with algorithms Channel Manager is a generic algorithm framework and responsible for the creation deletion execution and configura tion of algorithm instances An algorithm that is compliant with the eXpressDSP Algorithm Standard and has a pro
25. Int algNumAlloc Void IALG Fxns The algorithm implements the algAlloc function to inform the framework of its memory requirements by filling the memTab structure It also informs the framework whether there is a parent object for this algorithm Based on the information it obtains by calling algAlloc the framework then allocates the re quested memory Alglnit initializes the instance persistent memory requested in algAlloc Af ter the framework has called alglnit the instance of the algorithm pointed to by handle is ready to be used The IALG Interface To delete an instance of the algorithm pointed to by handle the framework needs to call algFree It is the responsibility of the algorithm responsibility to set the addresses and the size of each memory block requested in algAlloc such that the application can delete the instance object without creating memory leaks The parent object that implements the IALG interface is an important and use fulfeature of the eXpressDSP API It was created primarily to allow the sharing of global data between all instances of the same algorithm Software Architecture Applications Framework 3 7 Integrating an Algorithm into the Channel Manager 3 3 Integrating an Algorithm into the Channel Manager The Channel Manager supports all required features of the eXpress DSP Standard and is fairly generic Most eXpressDSP compliant algoritnms can work with Channel Manager
26. Set parameters for wavelet codec namely a address of low pass filter b address of high pass filter c number of scales of decomposition currently only 1 scale is supported wave_params qmf_ext qmf_ext wave_params mqmf_ext mqmf_ext wave_params scale 1 Initialize CSL and set L2 mode to be half cache half SRAM Enable caching over this region Perform a cache clean to remove any dirty tags that are previously cached 2D Wavelet Transform Algorithm Example Kif Kf 5 un pA i y S Ky a7 EJ Kf D 3 2D Wavelet Transform Algorithm Example CSL Toit CACHE SetL2Mode CACHE 32KCACHE CACHE EnableCaching CACHE CEO00 CACHE Clean CACHE L2 0x80020000 0xF2000 Open channel for DMA to be performed and get a handle to be passed to algorithm Call wavelet algorithm wavelet codec wavelet codec amp in image ev amp in image od amp out image Jt amp scratch pad amp wave params in_image_ev pointer to structure for even field in_image_od pointer to structure for odd field out_image pointer to structure for output image scratch_pad pointer to structure for scratch pad wave_params pointer to structure for wavelet codec img_type FLDS for odd even fields and PROG for progressive If img type is PROG then in image od is ignored and JU the im
27. TYPE IWavelet Handle This handle is used to reference all Wavelet instance objects Kf typedef struct IWavelet_Obj IWavelet_Handle IWavelet Obj This structure must be the first field of all Wavelet instance objects tf typedef struct IWavelet Obj struct IWavelet Fxns fxns IWavelet Obj IWavelet Status Status structure defines the parameters that can be changed or read during real time operation of the algorithm typedef struct IWavelet Status Int size must be first field of all status structures int img cols Software Architecture Algorithms Creation 4 5 eXpressDSP API Wrapper int img rows short qmf ext Short mqmf ext int scale IMG TYPI img val IWavelet Status IWavelet Cmd The Cmd enumeration defines the control commands for the Wavelet control method FA typedef enum IWavelet_Cmd IWavelet_GETSTATUS IWavelet_SETSTATUS IWavelet Cmd IWavelet Params This structure defines the creation parameters for all Wavelet objects typedef struct IWavelet_Params Int size must be first field of all params structures int img_cols int img_rows const s qmf ext const s mqmf ext int scale IMG TYPE img val IWavelet Params IWavelet PARAMS Default parameter values for Wavelet instance objects 57 extern IWavelet Params IWavelet PARAMS eXpressDSP API
28. buf j 12 printf 3d 3d 3d 3d i_buf j i buf j 4 i buf j 8 i buf j 12 putchar n dstr close amp o dstr dstr close amp i dstr DAT Close Clean out the cache and commit any part of external memory that is cached CACHE Clean CACHE L2 0x80020000 OxF2000 Check for correctness of results j 0 p 0 k 4 C 4 Using Image Data Manager for i120 lt 1247 i printf 3d c array4 i i lt 248 amp amp array4 i 48 4 p 64 3 r ie QI POE if i amp 15 15 putchar n k ptt if Ik j if k p 0 if k k 4 putchar n return 0 Using Image Data Manager C 5 Appendix D 2D Wavelet Transform Algorithm Example include lt stdio h gt include lt stdlib h gt include lt c6x h gt EE EEE Sas SEE EEE AEE EEE EE EEE EEE Header files that use ImageLIB components Jd PEE EE MM ke n KE em PEN include filters h include csl h include cache h include dat h include wavelet h os ags eyen 2224 X Normal images on IDK capture board are 640 by 480 Data set used for testing is 256 by 256 These are defined IMG ROWS and IMG COLS and TMG ROWS and TMG COLS are set to 256x256 for the test data being used define IMG COLS 640 define IMG ROWS 480 define TMG ROWS 256 define TMG COLS 256
29. bytes 544 bytes 608 bytes 576 bytes ImageLIB Library of Optimized Kernels Table 6 6 ImageLIB Kernels Performance Continued Function Description Cycles Code Size wave horz Horizontal Wavelet 4 cols 5 cycs 640 bytes Transform cols is number of image columns For cols 256 cycs 1029 For cols 512 cycs 2058 wave vert Vertical Wavelet 8 cols 48 cycs 736 bytes Transform i ar cols is number of image columns For cols 256 cycs 2096 For cols 512 cycs 4144 6 6 1 Further Information on ImageLIB The ImageLIB package including source code and documentation may be downloaded from http www ti com then navigate to the appropriate site C6000 DSP Image Video Processing Applications 6 35 Chapter 7 Testing and Compliance Initial versions of the IDK meet the following testing and compliance require ments 1 IDK software is capable of operating on Dell Latitude lap top computers under Windows 98 Every demonstration scenario described in this document has been tested for continuous operation for at least 24 hours Individual algorithm level software e g applications such as JPEG H 263 functions such as Wavelet Transform Sobel Edge Detection have been tested for all known corner cases at the individual algorithm level 7 1 Appendix A Field Programmable Gate Array FPGA Interfaces The field programmable gate array FPGA provides several interf
30. consist of 256 bins corre sponding to the 256 possible pixel intensities Each bin will contain a count of the number of pixels in the image that have that particular intensity val ue Histogram processing such as histogram equalization or modifica tion are used in areas such as machine vision systems and image video content generation systems Boundary and Perimeter computation functions boundary and perim eter respectively are provided These are commonly used structural op erators in machine vision applications C6000 DSP Image Video Processing Applications 6 31 ImageLIB Library of Optimized Kernels Morphological operators for performing Dilation and Erosion operations on binary images are provided dilate bin and erode bin respectively Dilation and erosion are the fundamental building blocks of various mor phological operations such as opening closing etc that can be created from combinations of dilation and erosion These functions are useful in machine vision and medical imaging applications Table 6 6 provides a listing of the routines provided in this software package as well as C62x performance data for each Table 6 6 ImageLIB Kernels Performance Function Description Cycles Code Size boundary Boundary Structural 1 25 cols rows 4 cycs 352 bytes Operator cols is number of image columns rows is number of image rows For cols 128 rows 3 cycs 484 For cols 720 rows 8
31. in num comps unsigned in num qtables unsigned in interleaved unsigned in format unsigned in quality unsigned in num lines 3 unsigned in num samples 3 unsigned in output size IJPEGENC Params typedef IJPEGENC Params IJPEGENC Status EGENC PARAMS Default parameter values for JPEGENC instance objects f extern IJP Params IJP ENC_PARAMS EG This structure defines all of the operations on JPEGENC objects T typedef struct IJPEGENC Fxns IALG Fxnsialg IJPEGENC extends IALG XDAS Bool control IJPEGENC Handle handle IJPEG Cmd cmd IJPEGENC Sta tus status XDAS Int32 encode IJPEGENC Handle handle XDAS Int8 in XDAS Int8 out IJPEGENC Fxns endif IJPEGENC 6 2 4 JPEG Encoder Performance JPEG Encoder performance has been measured on a wide range of test imag es The following performance is based on measurements on C6201 EVM and C6211 DSK C6000 DSP Image Video Processing Applications 6 7 JPEG Encoder Table 6 1 JPEG Encoder Performance Frames sec with Image Resolution 200MHz C6201 128x128 4 2 0 569 256x256 4 2 0 156 352x288 4 2 0 CIF resolution 104 640x480 4 2 0 VGA resolution 36 720x480 4 2 0 SDTV resolution 32 Frames sec with 150MHz C6211t 382 106 69 24 21
32. interleaved PAL Capture 768x576x25fps 4 2 2 interlace interleaved NTSC Progressive Display Driver 640x480 8bpp 60Hz mode d d 1 PAL Progressive Display Driver 800x600 8bpp 60Hz mode GUI Based User Inputs B Frame Rate Selection select input frame rate from choice of 5 10 30 frames sec 5 5 2 Signal Processing Operations Sequence Thel O task calls the capture driver using VCAP_getFrame function with SYS FOREVER argument which blocks until a new frame is available It then signals the channel task to begin processing Daughtercard FPGA planarizes captured YC data Only Y channel data used Each frame odd and even fields processed as one frame Lj Output is written in GRAY8 form see section 2 3 for further details 1 Display rate of 60fps achieved by repeating display of any given frame from display buffer as suitable 5 14 2D Wavelet Transform Demonstration J Upon completion of processing the channel task signals the I O task The I O task calls the display driver using VCAP toggleBuffs function with ar gument 0 5 5 3 eXpressDSP APIs for 2D Wavelet Transform Demonstration See Appendix E for eXpressDSP APIs of functions used in this demonstration Demonstration Scenarios 5 15 Chapter 6 C6000 DSP Image Video Processing Applications This chapter describes C6000 DSPs used in image video processing applications Topic el Overview eee heiter 6 2 JPEG Encoder 6 3 JPE
33. is written in GRAY8 form see section 2 3 for further details Display rate of 60fps is achieved by repeating display of any given frame from display buffer as suitable Demonstration Scenarios 5 9 Image Processing Demonstration Upon completion of processing the channel task signals the I O task The I O task calls the display driver using VCAP toggleBuffs function with ar gument 0 5 3 3 eXpressDSP APIs for Image Processing Demonstration See Appendix E for eXpressDSP APIs of functions used in this demonstration 5 10 H 263 Loop Back Demonstration 5 4 H 263 Loop Back Demonstration This demonstration includes H 263 Encode and Decode Image data is cap tured and H 263 Encoded The encoded bit stream is then subjected to H 263 Decode and sent to display after Color Space Conversion Figure 5 5 shows the sequence of standard algorithms connected by Channel Manager to create this demonstration Inthis demonstration two channels are utilized Channel 1 where the input data after pre scale is subjected to Color Space Conversion and Channel 2 where the same data is subjected to H 263 Encode H 263 Decode and Color Space Conversion In the demonstration both channels may be run simultaneously by the Channel Manager to provide a demonstration of before and after H 263 Encode Decode Figure 5 5 H 263 Loop Back Demonstration Color space um H 263 H 263 Color space encoder decoder conversion Conditione
34. nnn E 2 E 2 eXpressDSP API for Pre Scale Filter 0 ccecceeeeeeeeees E 3 E 3 eXpressDSP API for Color Space Conversion E 5 E 4 eXpressDSP API for Image Processing Functions E 7 E 5 eXpressDSP API for Wavelet Transform ssseuueess E 9 E 1 eXpressDSP API Overview E 1 eXpressDSP API Overview The eXpressDSP API wrapper is derived from template material provided in the algorithm standard documentation Knowledge of the algorithm standard is essential to understand the eXpressDSP API wrapper See the algorithm standard documentation for details on the algorithm standard A complete dis cussion on how to make the algorithm eXpressDSP compliant is beyond the scope of this document however the algorithm interface to eXpressDSP will be discussed as knowledge of this ensures inter operability of algorithms The algorithm standard provides a framework for this to be achieved An algorithm is said to be eXpressDSP compliant if it implements the IALG Interface and observes all the programming rules in the algorithm standard The core of the ALG interface is the IALG Fxns structure type in which a number of function pointers are defined Each eXpressDSP compliant algorithm must define and initialize a variable of type IALG Fxns In IALG fxns algAlloc algInit and algFree are required while other functions are optional typedef struct IALG Fxns Void implementationId V
35. or EDMA and timer module if appropriate to service display events The intended operation is that one DMA channel will be dedicated to servicing line events once per horizontal sync pulse and a separate DMA or CPU event per vertical sync pulse will be used for synchronization The hori zontal event forces the DMA to transfer a line of data to the FPGA display FIFO via the aforementioned read of the motherboard SDRAM The FPGA latches this data into the FIFO autonomously which feeds the output display devices in real time Display events are scheduled such that data is ready for the display devices before it is needed Specifically this is achieved by scheduling the first event atthe end of the vertical synchronization period At this point several lines of blanked display for which no data is needed must still be timed so the DMA has time to perform the required accesses In the case of an interrupt being used for the horizontal line events generation of this event is straightforward In the case of a timer however generation is slightly more complicated be cause the FPGA does not always source the horizontal video timing In this case special hardware inside the FPGA inserts additional TINPn pulses to fake a first line of video display to force a DMA of data to the FPGA line FIFO The following diagram outlines the operation in both cases Since the FPGA is always one line ahead of the display the last line event reads data
36. other line of C data into DSP during pre scale processing not accurate strictly speaking because horizontal location of center of gravity of 4 2 2 and 4 2 0 C data is different The data thus created is referred to as Condi tioned Input Data in Figure 5 5 H 263 Encoder is set up to process one frame of data at a time followed by decode of the encoded data stream Color Space Conversion function converts H 263 decoded data from 4 2 0 to RGB Initial demos use a 16 bit RGB output Display rate of 60fps is achieved by repeating display of any given frame from display buffer as suitable The Color Space Conversion function also provides the ability for a pitch to control the positioning of the output frame within the frame buff er NTSC mode display Decoder output picture resolution is 352x288 320x240 upper left region is extracted for display This data is written in the lower right corner of 640x480 region in frame buffer The uncom pressed pass through image is written in the upper left corner of the same 640x480 region of the frame buffer The application only has to write the picture in the appropriate location of the frame buffer the entire frame buff er is initialized with zeros by the system at the start of the application PAL mode display Decoder output picture resolution is 352x288 This data is written in the central lower right corner of 800x600 region in frame buffer The uncompressed pass through i
37. rates data formats functions partitioning among various ele ments of the overall system However there do tend to be some commonly used DSP functions across the range of products and applications Texas In struments has identified the following functions and developed optimized C6000 DSP code for them JPEG Encoder JPEG Decoder H 263 Encoder H 263 Decoder ImageLIB library of optimized functions O O O O L JPEG Encoder 6 2 JPEG Encoder JPEG Joint Photographic Experts Group image compression standard finds application in a wide range of end products including Printers Digital Camer as Network Cameras Security Systems Video Conferencing Document Ar chival and many others 6 2 1 JPEG Encoder Algorithm Level Description Figure 6 1 provides an overview of the processing involved in JPEG Encoder Figure 6 1 JPEG Encoder Data DC Quantization reformat encode and RLE Data Reformat This operation converts raster scanned image component data into a contiguous set of 8x8 image blocks Figure 6 2 shows the raw image data as stored in the memory All image samples belonging to the same row in the image frame are represented by a single alphabet Figure 6 3 shows the reformatted data as required for any block based compression scheme This operation also converts the dynamic range of the pixel intensity values from 0 255 to 128 127 thus eliminating the DC bias of the signal Figure 6 2 Raster Scann
38. said to be eXpressDSP compliant if it implements the ALG Interface and observes all the programming rules in the algorithm standard The core of the ALG inter face is the IALG_Fxns structure type in which a number of function pointers are defined Each eXpressDSP compliant algorithm must define and initialize a variable of type IALG Fxns Shown below is the IALG functions structure IH263DEC_Fxns H 263 Decoder typedef struct IH263DEC Fxns IALG_Fxns ialg IH263DEC extends IALG void control IH263DEC_Handle handle IH263DEC Cmd cmd IH263DEC Status status decode IH263DEC Handle handle uint in uchar out IH263D ialg This is the default IALG function control This function is used to obtain updated status from the decoder decode Execute the H 263 decoder Shown below is example code in which one parent instance and one child instance are created Note that since the decoder extracts whatever informa tion it needs from the bitstream parameters are not required at creation time Refer to TMS320 Algorithm Standard Rules and Guidelines SPRU352 for more information on eXpressDSP specific function APIs void main H263PDEC TI Obj decParent decoder parent handle H263DEC TI Obj decHandle0 decoder child handle WA IH263DEC Status ds decoder status unsigned char in input bitstream unsigned char out 3 output frame Y Cb Cr
39. stored on C6711 DSK board memory read in and decoded resulting data subjected to color space conver sion and displayed Figure 5 2 shows the sequence of standard algorithms connected by Channel Manager to create a channel Multiple channels may be utilized in this demon stration with a task corresponding to each channel See Table 5 1 for a listing of number of channels possible as a function of system capability Each task reads in a bit stream need not be a unique bit stream per channel performs H 263 decode color space conversion and writes the resulting data to display buffer Figure 52 Multichannel H 263 Decode Demonstration Task 1 To display buffer Bit stream 1 H 263 Color space i i i decode convert T i i i E us ke SS ne EE AEE Eye Task2 To display buffer Bit stream 2 H 263 Color space decode convert roe ee ee eee ee ee eee ee eee ee ee eee Taskn To display buffer Bit streamn H 263 Color space decode convert LE x uer orum Au m OX Em te datu Led temet ug 5 2 1 Data I O and User Input Specifics Lj Input Pre Compressed H 263 Data Progressive Display Driver mode 2 640x480 16bpp 60Hz 3 The same demonstration is used for NTSC or PAL based systems L GUI Based User Inputs B Aplay listis provided with each task enabling the user to select any of the available bitstreams for any of the tasks Demonstration Scenarios 5 5 H 263 Multichanne
40. that is off the end of the display buffer This does not have any ad verse effects as the line FIFO is automatically reset during the vertical syn chronization period The data read is discarded and the first line event genera tion described above re synchronizes the display properly Video Display Figure 2 4 Display Event Generation VSYNG CBLNK EINTn if enabled TINPn if enabled I FPGA fakes first lines worth of pixel clocks on TINPn if enabled last line bogus data first line second line third line DMA activity FIFO held in reset discards last data Vertical synchronization is not explicitly necessary however it is added for the ease of software and to facilitate debugging in a clean environment One of the challenges of the design is support for debugging wherein the DMA will typically keep running but the CPU is halted The TMS320C6000 DMA and EDMA controllers both have provisions to support auto reloading called link ing in EDMA of parameters to maintain synchronization while the DSP core is halted However when the DSP is restarted it may be restarted at any point during an actively displayed frame In order for the DSP to re synchronize to the display it must receive an interrupt from the daughtercard The vertical event interrupt is provided via one of the DSP EINTn signals The interrup
41. that the second parameter of the XXXAlloc function above is a pointer to a pointer of an IALG Fxns structure This IALG v table represents the par ent object of the algorithm if it has one The eXpress Algorithm Standard al lows an algorithm to optionally implement a second IALG interface which can be used to create a parent instance of that algorithm The parent instance of an algorithm usually contains global data sharable by all instances of that algo rithm such as global look up table etc The Channel Manager fully supports the creation of parent instances In the CM RegAlg function the Channel Manager calls the XXXAlloc function of an algorithm and checks whether fxns points to a valid v table If so the Chan nel Manager creates the parent instance for that algorithm in a manner similar to the creation of an ordinary algorithm instance The handle of the parent instance is then attached to that algorithm object and later passed to all instances of that algorithm when they are created Software Architecture Applications Framework 3 15 Channel Manager API Functions 3 6 Channel Manager API Functions 3 0 D O 0 O O Q L CM Init Channel Manager module initialization CM Open create a new channel object CM Close delete the channel object CM SetAlgs set algorithms in the channel Old instances in the channel are deleted and new instances are created according to the new algorithm settings CM GetA
42. the packmb function is called to pack and adjust offsets of the IDCT output Other wise the motion compensation function h263DecMC is called to add IDCT output and the reference macroblock to reconstruct the current macroblock The motion compensation function supports the four modes mentioned above The mode used depends on the motion compensation type determined pre viously Each reconstructed pixel value is clipped to the range 0 255 The final stage of h263DecMB involves writing the reconstructed macroblock to the out put frame buffer by calling the function writeRecMB 6 5 2 H 263 Decoder Capabilities and Restrictions eXpressDSP compliant H 263 Decoder code optimized for TMS320C620x and TMS320C6211 DSPs is currently available from Texas Instruments Ca pabilities and restrictions relevant to the decoder are Baseline H 263 decoder implementation only does not support H 263 standard annexes Bitstream for a full frame is required per call Capable of decoding a single macroblock RTP ready Hooks for RTP support partially in place 6 5 3 H 263 Decoder API 6 24 The eXpressDSP API Wrapper is derived from template material provided in the algorithm standard documentation Knowledge of the algorithm standard is essential to understand the eXpressDSP API wrapper See the algorithm standard documentation for details on the algorithm standard Also see Appendix E for an overview of eXpressDSP APIs An algorithm is
43. the FPGA control registers Table A 4 identifies the func tion of each control register bit and or field Field Programmable Gate Array FPGA Interfaces A 5 EMIF ASRAM Interface Figure A 1 FPGA Control Registers GBLCTL GPCTL HTOTAL HESYNC HEBLNK HSBLNK VTOTAL VESYNC VEBLNK VSBLNK DISPCTL DISPEVT DCOMP DDRAM CAPTCTL CAPTEVT A 6 wo 2 e e wo wo wo wo wo wo 1 1 1 1 1 1 Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved 8 7 6 5 3 2 1 0 5K 6K RGB TT R W 1 R W 1 RW 1 R W 1 RW 1 10 9 8 7 RW 1 R W 1 11 10 2 1 0 GPIO1 GPIOO RW 1 RW 1 0 HTOTAL R W x 87 11 10 HESYNC R W x HEBLNK R W x 11 10 Reserved HSBLNK R W x VTOTAL o o R W x VESYNC o o R W x VEBLNK e o R W x 9 0 VSBLNK R W x 32 0 MODE wo wo wo 1 1 1 Reserved 30 29 ADDRESS R W x 1110 87 R W 010 2221 R W 111 3 N eo HEVENT VEVENT R W 010 Reserved wo 1 R W 00 2 1 0 wo 1 R W 0 R W 0 R 10 2 0 EVENT R W 010 0xM0300000 0xM0300004 0xM0320000 0xM0320004 0xM0320008 0xM032000C 0xM0320020 0xM0320024 0xM0320028 0xM032002C 0xM0340000 0xM0340004 0xM0340010 0xM0340014 0xM0360000 0xM03600
44. to a positive integer value This integer represents the additional number of bits to be appended to the variable length code itself The value of the additional bits is calculated as part of the encoding process Byte Stuff In the JPEG standard control markers are flagged by a OxFF This flag is followed by one or more bytes of control code A 0x00 byte following a OxFF byte signifies that the OxFF byte is indeed part of the data and not control segments This step inserts a 0x00 byte after every OxFF byte within the entro py coded i e VLC segments 6 2 2 JPEG Encoder Capabilities and Restrictions eXpressDSP compliant JPEG Encoder code optimized for TMS320C620x and TMS320C6211 DSPs is currently available from Texas Instruments Cer tain restrictions have been placed on the broad JPEG standard to produce the code that provides optimal performance while addressing the features of JPEG useful in most common applications The capabilities and restrictions of the encoder are listed below Lossless JPEG encoding is not supported Only JPEG standard VLC tables are supported Lj Arbitrary quantization tables are supported and may be changed per image during encoding Lj Progressive image transmission coding is not supported C6000 DSP Image Video Processing Applications 6 5 JPEG Encoder Lj Only non interleaved data is supported Following data forms supported 4 2 0 4 1 1 4 2 2 4 4 4 Lj 8 bits component pixel only su
45. 04 EMIF ASRAM Interface Table A4 IDK FPGA Control Register Bit Descriptions Register Field GBLCTL EN GBLCTL SDEN GBLCTL 5KRST GBLCTL RGBRST GPCTL GPIOO GPCTL GPIO1 GPCTL GPIOOEN GPCTL GPIO1EN HTOTAL HTOTAL HESYNC HESYNG HEBLNK HEBLNK HSBLNK HSBLNK VTOTAL VTOTAL VESYNG VESYNG VEBLNK VEBLNK VSBLNK VSBLNK DISPCTL MODE Function Endianness SDRAM controller enable TVP5022 Reset TVP3026 Reset GPIO bit 0 GPIO bit 1 GPIO bit 0 output enable GPIO bit 1 output enable RGB output horizontal total RGB output horizontal sync RGB output horizontal end blank RGB output horizontal start blank RGB output vertical total RGB output vertical sync RGB output vertical end blank RGB output vertical start blank Display Mode Comments 0 Big Endian 1 Little Endian 0 Disabled 1 Enabled 0 Normal Operation 1 Held in reset 0 Normal Operation 1 Held in reset Read Write access Read Write access 0 input 1 output 0 input 1 output Period of HSYNC in pixel clocks Width of HSYNC in pixel clocks Width of horizontal back porch in pixel clocks HTOTAL HSBLNK Width of horizontal front porch in pixel clocks Period of VSYNC in lines Width of VSYNC in lines Width of vertical back porch in lines HTOTAL HSBLNK Width of vertical front porch in lines 000 GRAY8 001 RGB8 010 RGB16 011 RGB32 100 YC640 101 YC720 110 Reserved 111 Reserved Field Programmable Gate Array
46. 1 3 Overview 1 1 Overview The IDK consists of J TMS320C6711 DSK board with 16Mbytes SDRAM Mmm gt gt gt Note The image video processing algorithms included in the IDK are fixed point implementations suitable for operation on fixed point DSPs such as the TMS320C6211 The IDK is based on the TMS320C6711 floating point DSK board only because Tl is standardizing DSK boards on the C6711 DSP The fact that the IDK is based on the floating point C6711 DSP may also be useful to developers using this platform to develop other algorithms for image vid eo graphics processing Imaging Daughtercard for video capture display and data conversion support M Input signals are limited to NTSC PAL composite video B Display is limited to 640x480 or 800x600 pixels RGB Computer Moni tor driven by drivers for 8 bits pixel gray scale or 16 bits pixel 565 format RGB Software toolkit consisting of Code Composer Studio v2 on the IDK soft ware CD which also includes a chip support library CSL used for the vid eo drivers and demos Demonstration software showcasing C6000 DSP capabilities across a range of image video processing applications m JPEG loop back encoder and decoder demonstration Multichannel H 263 decoder demonstration H 263 loop back encoder and decoder demonstration 2D Wavelet transform demonstration Image processing functions demonstration The JPEG loop back H 263 decoder and
47. 1D DCTs one for each column of the array resulting from the row IDCT computation column computation DC Encode This step quantizes and Huffman encodes also called Variable Length Coding VLC the DC coefficients obtained from the DCT module In JPEG the DC coefficientis differentially encoded i e a difference between the present and the preceding DC component is computed and this difference is quantized and encoded Quantization involves an inherent division operation with an element from the quantizer table In this implementation a reciprocal quantizer table pre computed from the quantizer table is used Quantization and RLE This step quantizes the AC coefficients casts them in a zig zag pattern and run level encodes the resulting coefficients As in the case of the DC coefficient quantization involves an inherent division operation with an element from the quantizer table In this implementation a reciprocal quantizer table pre computed from the quantizer table is used The result of the zig zag re ordering of transformed coefficients is shown in Figure 6 4 JPEG Encoder Figure 6 4 Zig Zag Reordering of Transformed Coefficients Input and Output AC VLC This step performs Variable Length Coding VLC of the run level pairs that are output by the quantization routine to construct the entropy coded segments of the image The variable length codes in JPEG do not map directly to quantized AC coefficients Instead they map
48. 20C6000 DSPs 1 1 OVervIeW Klan Sad vepres Iesu Ahad thio Like boe we oe eae hoa heed ns 1 2 IDK as a Rapid Prototyping Platform 0 eee eee 1 2 1 Rapid Prototyping Software Suite 1 2 2 Rapid Prototyping Hardware ssseessesse ne Hardware Architecture 00 cece eee eee eee n n nnn Describes the IDK hardware architecture 2 1 Daughtercard Description lisslsesssleeeeeeee III 2 2 WId6O Captures 3 2 cose stp phedasereci tt ERR PRU E REDI Re EE ERU k re Rar 2 9 Video Display Jte ix rx ix EENE ETERS aRREY RW REP er RES pd Software Architecture Applications Framework seseeeeeeeeeeee Describes the multiple software architecture levels of the IDK 3 1 Framework for Combining eXpressDSP Compliant Algorithms 3 2 ThelALG Interface suie ieres canau ta aueia she 3 3 Integrating an Algorithm into the Channel Manager sslslus sels 3 4 Channel Manager Object Types 0 000 eee cece I 3 5 Channel Manager Memory Management 0000 eee cece eee eee 3 5 4 6711 DSK Memory Architecture 0 000 c cece eee 3 5 2 Data Memory Requirements of IDK Algorithms 00 08 3 5 3 Internal and External Heaps 0 cece eee eee 3 5 4 Creation and Deletion of an Algorithm Instance 000005 3 5 5 Parent Instance Support 0c cece sere 3 6 Channel Manager API Functions 2 0000 c cece eee ened
49. 3 6 14 API Reference orreri iE ee nnn Software Architecture Algorithms Creation 0000 cece ence eee Describes algorithm creation in the software architecture 4A OVONIGW TEE 4 2 eXpressDSP API Wrapper 00sec cette mm 43 Ammar 4 4 Image Processing Functions 0 0 cece eh 4 5 ImageLIB or Custom Kernels 00 0 cece eect ee 4 6 Image Data Manager 0 cece e e Contents 5 Demonstration Scenarios ccee eee eee eee eee Describes the demonstration scenarios currently included in the IDK 5 1 JPEG Loop Back Demonstration 0000 eee ees 5 1 1 Data l O and User Input Specifics liiis 5 1 2 Signal Processing Operations Sequence 0 ccc cece ee eee 5 1 8 eXpressDSP APIs for JPEG Loop Back Demonstration 5 2 H 263 Multichannel Decoder Demonstration 0 00 ccc eee eee eee 5 2 1 Data l O and User Input Specifics 0 00 c cece 5 2 2 Signal Processing Operations Sequence 0c cece cece eens 5 2 3 eXpressDSP APIs for H 263 Multichannel Decoder Demonstration 5 3 Image Processing Demonstration 0 0 cece tenes 5 3 4 Data l O and User Input Specifics liliis 5 3 2 Signal Processing Operations Sequence 000 c cee cece eee 5 3 8 eXpressDSP APIs for Image Processing Demonstration 5 4 H 263 Loop Back Demonstration 0 c cece eee
50. 8 pragma DATA ALIGN array4 8 Declare two arrays in internal memory and align to dword boundary E array2 and array3 are arrays in on chip or internal memory in internal memory sections sl_windowl and sl window2 pragma DATA SECTION array2 chip image int sectl pragma DATA SECTION array3 chip image int sect2 pragma DATA ALIGN array2 8 pragma DATA ALIGN array3 8 Internal array sizes should be twice as large as the sliding window to be supported For example an array of 32 ints can support a sliding window of size 4 lines with each of the 4 lines containing 4 integers int array1 512 int array4 512 int array2 32 int array3 8 Declare two streams i dstr and o dstr for input and output double buffering dstr t tudstr o dstr main Tt gr Ji KD int i buf o buf int err code Use CSL to set L2 mode to be 3 4 cache and enable caching over this region Clean Cache and invalidate any external memory that is cached in L2 C 2 Using Image Data Manager CACHE SetL2Mode CACHE 48KCACHE CACHE EnableCaching CACHE CE00 CACHE Clean CACHE L2 0x80020000 OxF2000 Initialize external memory by CPU to contain values Xf for i 0 i lt 512 i arrayl i i for i 0 i lt 32 i a
51. Binary Erosion errdif_bin Error Diffusion Binary Output fdct_8x8 Forward Discrete Cosine Transform FDCT histogram Histogram Computation idct_8x8 Inverse Discrete Cosine Transform IDCT mad_8x8 8x8 Minimum Absolute Difference mad_16x16 16x16 Minimum Absolute Difference median_3x3 3x3 Median Filter perimeter Perimeter Structural Operator pix_expand Pixel Expand pix_sat Pixel Saturate quantize Matrix Quantization with Rounding scale_horz Horizontal Scaling ImageLIB Library of Optimized Kernels Table 6 5 ImageLIB Kernels Continued Function Description scale vert Vertical Scaling sobel Sobel Edge Detection threshold Image Thresholding wave horz Horizontal Wavelet Transform wave vert Vertical Wavelet Transform ImageLIB provides a collection of C callable high performance routines that can serve as key enablers for a wide range of image video processing applica tions These functions are representative of the high performance capabilities ofthe C62x DSP Some of the functions provided and their areas of applicabili ty are listed below The areas of applicability are only provided as representa tive examples users of this software will no doubt come up with many more creative uses Forward and Inverse DCT Discrete Cosine Transform functions fdct 8x8 and idct 8x8 respectively are provided These functions have applicability in a wide range of compression standards such as JPEG Encode Decode MPEG Video Enco
52. C Using Image Data Manager Demonstrates how to use the DMA streaming routines to implement a sliding window Appendix D 2D Wavelet Transform Algorithm Example describes a 2D wavelet transform algorithm Appendix E eXpressDSP APIs for IDK Demonstrations provides the APIs pertinent to IDK demonstrations Related Documentation From Texas Instruments The following references are provided for further information Documentation TMS320C6000 Imaging Developer s Kit IDK Video Device Driver User s Guide Literature number SPRU499 TMS320C6000 Imaging Developer s Kit IDK Programmer s Guide Literature number SPRU495 IDK Software Architecture Information For ImageLIB Information go to http www ti com and navigate to the appropriate site C6000 JPEG Information 3 3 TMS320C6000 JPEG Implementation Application Report Literature number SPRA704 Optimizing JPEG on the TMS320C6211 With 2 Level Cache Application Report Literature number SPRA705 C6000 H 263 Information 3 3 H 263 Decoder TMS320C6000 Implementation Application Report Literature number SPRA703 H 263 Encoder TMS320C6000 Implementation Application Report Literature number SPRA721 1 Contents Introduction iilis gk lee Reed a E a E A RE RE ecw uns 1 1 Describes how the Imaging Developer s Kit IDK has been developed as a platform for devel opment and demonstration of image video processing applications on TMS3
53. Cmd The Cmd enumeration defines the control commands for the Wavelet control method typedef enum IWavelet Cmd IWavelet GETSTATUS IWavelet SETSTATUS IWavelet Cmd IWavelet Params This structure defines the creation parameters for all Wavelet objects E typedef struct IWavelet Params Int size must be first field of all params structures int img cols int img rows const s qmf ext const s mqmf ext int scale IMG_TYPE img_val IWavelet_Params IWavelet PARAMS Default parameter values for Wavelet instance objects zy extern IWavelet_Params IWavelet_PARAMS E 10 eXpressDSP API for Wavelet Transform IWavelet Fxns This structure defines all of the operations on Wavelet objects Af typedef struct IWavelet Fxns IALG_Fxns ialg IWavelet extends IALG XDAS_Bool control IWavelet Handle handle IWavelet Cmd cmd IWavelet Status status XDAS Int32 apply IWavelet Handle handle XDAS Int8 in XDAS Int8 out IWavelet Fxns fendif IWavelet eXpressDSP APIs for IDK Demonstrations E 11
54. DAT Module to transfer data between internal and external memory If the application consists of multiple processing channels then all channels share the same internal scratch memory buffer Note that the algorithms themselves are responsible for man aging their on chip off chip data transfer Table 3 1 shows the L2 operation modes of the C6211 C711 DSP for various IDK demos Since the JPEG loop back demo requires less than 16KB on chip scratch buffer about 13KB it operates in 48KB cache 16KB RAM mode to ensure high performance The other scenarios operate in 32KB cache 32KB RAM mode because algorithms in those demos require more than 16KB on chip memory Table 3 1 C6211 C6711 L2 Operation Modes for IDK Demos 3 5 3 Demo Scenarios L2 Operation Mode Cache RAM JPEG Loop Back 48 Kbytes 16 Kbytes H 263 Loop Back 32 Kbytes 32 Kbytes Multichannel H 263 Decoder 32 Kbytes 32 Kbytes Image Processing 32 Kbytes 32 Kbytes Forward Wavelet Transform 32 Kbytes 32 Kbytes Internal and External Heaps As shown in the previous section algorithms in the IDK require memory blocks in both on chip and off chip data memory space To accommodate these re quirements and to optimize the usage of the limited on chip L2 RAM space the Channel Manager usually maintains two memory heaps The internal heap is located in on chip L2 RAM and the external heap is located in off chip SD RAM The Channel Manager uses DSP BIOS MEM module API function
55. ER Coded Cycles Frame Rate Cycles Frame Rate 1 41 39 04 59 55 246 388 812 177 532 845 0 92 36 75 62 33 236 648 845 168 648 889 4 02 88 07 7 91 346 264 578 290 084 517 1 09 92 47 6 43 341 892 585 286 860 523 0 36 82 81 16 83 305 952 654 252 536 594 6 74 82 08 11 18 1 324 240 151 1 089 296 138 0 58 31 27 68 15 890 972 224 616 624 243 1 56 35 24 63 2 943 480 212 668 564 224 Note For TMS320C6201 CPU Frequency 200 MHz For TMS320C6211 CPU Frequency 150 MHz CIF 352x288 4 2 0 QCIF 176x144 4 2 0 For every test bitstream the TMS320C6211 showed superior performance over the TMS320C6201 This is due largely to the EDMA and its ability to exe cute external to external transfers without having to break it up into two sepa rate requests which forces the CPU to wait for the first request to complete Note that the average number of cycles used by the CPU to decode one frame Cycles Frame includes the core decoder codes control codes as well as any overhead associated with calling and exiting the entire decoder instance For the TMS320C6211 the numbers also include stalls incurred by any cache misses L1 I L1 D and L2 Note also that for bitstreams with high percent age of MBs not coded News and Silent the TMS320C6211 is able to decode faster even at lower clock frequency 6 5 5 Further Information on H 263 Decoder For further information on C6000 DSP H 263 Decoder implementation see H 263 Decoder TMS320C6000 Implementa
56. FPGA owns This event is generically referred to as a flip page function Once the flip page re quest has occurred via write to an FPGA control register bit the IDK driver can read another FPGA register to extract the buffer number which may be re turnedto the application Because of the three buffer architecture this can oc cur immediately after the flip page request has been posted even though the capture stream may not be at a point where this could occur had a two buffer scheme been used The FPGA performs the page flip during the capture verti cal blank interval Special detection logic is included to avoid boundary condi tions which are specifically the end and start of vertical synchronization Note the following specific to IDK demonstrations 1 While the daughtercard provides support for little endian as well as big en dian data all data is assumed to be little endian for the IDK Some of the IDK demonstrations make use of only one of the odd or even fields of video data Since the daughtercard assigns odd and even fields to separate memory locations this is comprehended by only addressing one of the fields for data read for DSP processing Lj While capture is limited to 4 2 2 format some of the IDK demonstrations require 4 2 0 data 4 2 2 to 4 2 0 conversion is achieved by reading every other line of captured C data for DSP processing While this is not an en tirely accurate way to convert 4 2 2 data to 4 2 0 fro
57. G Decoder 6 4 H 263 Encoder 6 5 H 263 Decoder 6 6 ImageLIB Library of Optimized Kernels 6 1 Overview 6 1 Overview Video Processing Set Top Box Digital TV DVD Player D VCR HDD VCR Digital Camera Camcorder Network Camera Video Conferencing Packet Based Video Video on Demand C6000 DSPs are used today in a wide range of image video processing ap plications Texas Instruments has an ongoing attempt to understand these var ious applications and provide reference DSP code for functions that can be useful across a wide range of applications Reference DSP code is provided as a means to enable C6000 DSP users to develop rapid prototypes of applica tions and also to enable use of highly optimized code as building blocks in the development of user applications A representative listing of C6000 DSPs use and potential use in image video processing applications segmented by end products is shown below Document Processing Printers Copiers FAX Machines Scanner Controllers Rasterization Accelerators Image Analysis Image Synthesis Security Monitoring 3D Graphics Factory Inspection Video Games Medical Imaging Flight Simulators Defense Imaging Machine Vision Networking Infrastructure Transcoding Multimedia Router Switcher Wireless Multimedia Optical Character Recognition Each end product typically has its own unique requirements in terms of algo rithms data
58. G Decoder Performance 200MHz C6201 150MHz C6211t 128x128 4 2 0 528 374 256x256 4 2 0 159 108 352x288 4 2 0 CIF resolution 107 72 640x480 4 2 0 VGA resolution 39 26 720x480 4 2 0 SDTV resolution 35 23 t C6211 performance data based on 48K cache 16K SRAM configuration Recommended for JPEG 6 3 5 Further Information on JPEG Decoder Further information on C6000 DSP JPEG Decoder implementation is avail able from the following application reports Li TMS320C6000 JPEG Implementation Literature number SPRA704 Optimizing JPEG on the TMS320C6211 2 Level Cache DSP Literature number SPRA705 6 14 H 263 Encoder 6 4 H 263 Encoder The H 263 video compression standard was originally developed for video conferencing However it is also finding use in other areas such as streaming video The fundamental coding techniques involved in H 263 are Motion Com pensated prediction Discrete Cosine Transform DCT Quantization and En tropy Coding In the baseline H 263 standard video frames are coded in either intra frame or inter frame mode and are called frames or P frames respec tively For I frames the frame is independently coded without any relation to other frames whereas for P frames the current frame is predicted from the previous reference frame and the difference between the current frame and the previous frame i e the prediction error is encoded A frame to be encoded either as intra or inter fra
59. H 263 loop back demonstra tions are built using licensable libraries The other demonstrations are built using ImageLIB a navailable library of optimized image video processing kernels see section 6 6 for details on ImageLIB It is easy with the IDK platform to run these libraries in real time and make algorithm adjust ments Device driver software for video capture display and demonstrations sup port IDK as a Rapid Prototyping Platform 1 2 IDK as a Rapid Prototyping Platform In addition to showcasing the demonstrations listed previously the IDK also serves as arapid prototyping platform for the development of image and video processing algorithms Using the software and hardware components pro vided in the IDK developers can quickly move from algorithm concepts devel opment to high performance working implementations on TMS320C6000 DSP board with live video input and output to evaluate their algorithms This rapid prototyping ability is based onthe following developments included in the IDK 1 2 1 Rapid Prototyping Software Suite The Rapid Prototyping Software Suite consists of a software package that in cludes ImageLIB Chip Support Library CSL and Image Data Manager ImageLIB This is an optimized Image Video Processing Functions Library for C programmers on TMS320C6000 devices It includes many C callable as sembly optimized general purpose image video processing routines These routines are typically used in co
60. IF resources The FPGA monitors the display device and generates events to the DSP mo therboard The events supported by the FPGA for display are shown in Table 2 3 Table 2 3 Display Events May be Mapped to Event Signal daughtercard Signal Intended Use Pixel clock active pixels only TOUTO or TOUT1 Timer period set to pixels per line TINT drives DMA line event Composite blank falling end of EINT7 EINT6 EINT5 EINT4 EINTn drives DMA line event active line EINTn drives CPU interrupt Vertical sync falling end of EINT7 EINT6 EINT5 EINT4 EINTn drives DMA frame event frame EINTn drives CPU interrupt The preferred use of the above events is that the pixel clock be routed to one ofthe timer inputs and a single interrupt is used on the vertical synchronization pulse to synchronize the DSP to the display In this configuration the selected timer must be configured in pulse mode with a period equal to the number of active pixels per line The FPGA is capable of driving to all DSP event lines which include the four processor edge triggered interrupts EINTn n 2 4 7 and the two timer inputs TINPn nz O or 1 Any DSP event line not selected for one of the above event sources is tri stated by the FPGA allowing it to be used by another daughter card or motherboard interface Hardware Architecture 2 9 Video Display 2 10 Based on the above event selection the IDK Display Driver configures the DSP DMA
61. JPEG Loop Back demonstration and the Image Processing Demonstra tion The eXpressDSP API for Pre Scale Filter is IPrescale Interfac PEN ifndef IPrescale_ define IPrescale_ include lt ialg h gt include lt xdas h gt IPrescale Handl This handle is used to reference all Prescale instance objects typedef struct IPrescale Obj IPrescale Handle IPrescale Obj This structure must be the first field of all Prescale instance objects xj typedef struct IPrescale Obj struct IPrescale Fxns fxns IPrescale Obj IPrescale Status Status structure defines the parameters that can be changed or read during real time operation of the algorithm rA typedef struct IPrescale Status Int size must be first field of all status structures int width int height IPrescale Status eXpressDSP APIs for IDK Demonstrations E 3 eXpressDSP API for Pre Scale Filter IPrescale The Cmd enumeration defines the control commands for the Prescale control method uA typedef enum IPrescale Cmd IPrescale GETSTATUS IPrescale SETSTATUS IPrescale Cmd IPrescale Params This structure defines the creation parameters for all Prescale objects E typedef struct IPrescale_Params Int size must be first field of all params structures int width int height IPrescale_Params IPrescale PARAMS Default parameter values for Pr
62. NT32 OutputCt Number of algorithm outputs HANDLE Returns a handle to the registered algorithm Returns INV if algorithm could not be registered Registers an algorithm with Channel Manager Channel Manager gets to know this algorithm and it collects all information it needs to create and execute an instance of the algorithm later All CM_RegAlg calls must be prior to any CM SetAlg call In other words all algorithms must register with Channel Manager before any of them can be assigned to a channel Executes channel BOOL CM Exec HANDLE hChan FRM OBJ In FRM OBJ Buffs FRM OBJ Out UINT32 Post HANDLE hChan Handle to an open channel FRM OB In An array of pointers to input frames J FRM OBJ Buffs An array of pointers to intermediate buffers FRM OBJ Out An array of pointers to output frames UINT32 Post Post value BOOL TRUE function succeeded FALSE function failed Executes channel Get or set status of algorithm instance in channel BOOL OM InstCtr HANDLE hCha int InstNo Software Architecture Applications Framework 3 19 CM Control int Cmd void InstStatus Arguments HANDLE hCha Handle to the channel int InstNo Number to identify the algorithm instance in the channel int Cmd Control command specific for that particular algorithm type void InstStatus Pointer to the instance status structure Return Value BOOL TRUE function succeeds FALSE function fails Description Get or setthe sta
63. Perimeter Structural Operator Cycles cols 4 14 rows 21 cycs cols is number of image columns rows is number of image rows For cols 720 rows 8 cycs 23 173 160 num fdcts 48 cycs num_fdcts is number of fdcts For num facts 6 cycs 1008 For num fdcts 24 cycs 3888 9 8 n 582 cycs n is number of points processed For n 512 cycs 1158 For n 1024 cycs 1734 168 num_idcts 62 cycs num_idcts is number of idcts For num_idcts 6 cycs 1070 For num_idcts 24 cycs 4094 62 H V 21 cycs H columns in search area V rows in search area For H 4 V 4 cycs 1013 For H 64 V 32 cycs 126 997 231 H V 21 cycs H columns in search area V rows in search area For H 2 4 V 4 cycs 3717 For H 64 V 32 cycs 473 109 9 cols 55 cycs cols is number of image columns For cols 128 cycs 1207 For cols 720 cycs 6535 3 cols 2 14 cycs cols is number of image columns For cols 128 cycs 392 For cols 720 cycs 2168 Code Size 480 bytes 1216 bytes 960 bytes 1344 bytes 768 bytes 768 bytes 544 bytes 358 bytes C6000 DSP Image Video Processing Applications 6 33 ImageLIB Library of Optimized Kernels Table 6 6 ImageLIB Kernels Performance Continued Function pix expand pix sat quantize scale horz scale vert sobel thres
64. TMS320C6000 Imaging Developer s Kit IDK User s Guide Literature Number SPRU494A September 2001 I TEXAS INSTRUMENTS Gd eie debo IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries Tl reserve the right to make corrections modifications enhancements improvements and other changes to its products and services at any time and to discontinue any product or service without notice Customers should obtain the latest relevant information before placing orders and should verify that such information is current and complete All products are sold subject to Tl s terms and conditions of sale supplied at the time of order acknowledgment TI warrants performance of its hardware products to the specifications applicable at the time of sale in accordance with Tl s standard warranty Testing and other quality control techniques are used to the extent TI deems necessary to support this warranty Except where mandated by government requirements testing of all parameters of each product is not necessarily performed TI assumes no liability for applications assistance or customer product design Customers are responsible for their products and applications using TI components To minimize the risks associated with customer products and applications customers should provide adequate design and operating safeguards TI does not warrant or represent that any license either express or implied is granted under any TI patent ri
65. Wrapper IWavelet Fxns This structure defines all of the operations on Wavelet objects BR typedef struct IWavelet Fxns IALG Fxnsialg IWavelet extends IALG XDAS Bool control IWavelet Handle handle IWavelet Cmd cmd IWave let Status status XDAS Int32 apply IWavelet Handle handle XDAS Int8 in XDAS Int8 out IWavelet Fxns endif IWavelet Software Architecture Algorithms Creation 4 7 Algorithm 4 3 Algorithm The algorithm for the Wavelet Transform example has the form shown below void wavelet codec IMAGE in image ev IMAGE in image od IMAGE out image SCRATCH PAD scratch pad WAVE PARAMS wave params img type img val where in image ev pointer to structure for even field in image od pointer to structure for odd field out image pointer to structure for output image Scratch pad pointer to structure for scratch pad wave params pointer to structure for wavelet codec img type FLDS for odd even fields and PROG for progressive The structures referred to above are defined in Appendix D If img type is PROG then in image odis ignored and the image is assumed to be contigu ous starting at the address in image ev If img type is FLDS then half the rows are assumed to be in the even field and the other half in the odd field Shown below is an example of how a user may make use ofthis including han dling of DMA open and close DAT Open 0
66. a frame or inter frame mode and are called frames or P frames respec tively For I frames the frame is independently coded without any relation to other frames whereas for P frames the current frame is predicted from the previous reference frame and the difference between the current frame and the previous frame i e the prediction error is encoded A frame to be encoded either as intra or inter frame is first decomposed into a set of macroblocks then motion compensated prediction is employed to reduce temporal redun dancy The prediction errors are compressed using DCT and quantization Furthermore the motion vectors are differentially coded Finally the differen tial motion vectors are combined with the quantized DCT information and en coded using entropy coding 6 5 1 H 263 Decoder Algorithm Level Description The H 263 decoder essentially reverses the process described above to re cover video data from the compressed bitstream Figure 6 10 shows a high level view of the H 263 decoder operation on TMS320C6000 DSPs C6000 DSP Image Video Processing Applications 6 21 H 263 Decoder Figure 6 10 H 263 Decoder Overview 6 22 h263Decode Decode picture layer JH i Decode GOB layer 3 h263DecMB For each frame the decoder is provided with an input H 263 bit stream The function h263Decode starts parsing the bit stream and extracts information pertaining to the entire frame
67. aces to the DSP EMIF through an asynchronous SRAM interface The following sections define each such interface Topic Page Meli me T amp 2 EMIF ASRAMIntetface 2 cree oe eere eec eee A 1 I2C Interface A 1 I C Interface Programming of the TVP5022 is provided via an I C interface Although the opportunity exists to include a simple 12C controller such that the DSP can per form standard reads and writes of the interface it is noted that code already exists for the TMS320C6000 processor to perform writes in a bit banging fashion The FPGA includes four control register bits that may be read written by the DSP These bits provide the data and output enable function for two general purpose I O pins that are tied to the SDA data and SCL clock pins of the TVP5022 The TVP5022 is addressable at the 12C addresses identified in Table A 1 Table A 1 1 C Base Address A 2 I2C Address Interface Ox5G 0x5D TVP5022 EMIF ASRAM Interface A 2 EMIF ASRAM Interface A 2 1 A 2 2 CE Selection The FPGA provides the DSP an interface to the control registers TVP3026 palette interface 12C interface and on board capture frame memory This in terface is provided as a 32 bit wide ASRAM interface and consumes one EMIF CE space Due to timing constraints in the FPGA design a modest setup of 1 5 0 EMIF cycles for setup strobe hold is used The multi strobe period al lows use of the ARDY pin which may be asserte
68. age is assumed to be contiguous starting at the EX address in_image_ev If img_type is FLDS then half the rows are assumed to be in th ven field and half in the odd field DAT Open 0 DAT PRI LOW 0 wavelet codec amp in image ev amp scratch pad DAT Close 0 DAT PRI LOW 0 return 1 amp in image od amp wave params amp out image FLDS A listing of the file wavelet h is provided below include pixel expand h h include wave horz h h include wave vert h h define IN define IN LINES CH 42 LINES SH 21 typedef struct image unsigned char img data int img cols int img rows IMAGE typedef struct char ext data int ext size char int data int int size SCRATCH PAD D 4 2D Wavelet Transform Algorithm Example typedef struct short qmf ext short mqmf ext int scale WAVE PARAMS typedef enum img type FLDS PROG img_type void wavelet_codec IMAGE in_image_ev IMAGE in_image_od IMAGE out_image SCRATCH_PAD scratch_pad WAVE PARAMS wave params img type img val 2D Wavelet Transform Algorithm Example D 5 Appendix E eXpressDSP APIs for IDK Demonstrations eXpressDSP APIs for JPEG Encoder JPEG Decoder and H 263 Decoder are provided in Chapter 6 Other APIs pertinent to IDK demonstrations are pro vided here Topic Page E 1 eXpressDSP API Overview sessssse n
69. an 8x8 data block and outputs a corresponding 8x8 block of image component samples The input to this rou tine is an array of amplitude values corresponding to specific 2D frequencies The output from it is an array containing a 2D array of amplitude values which correspond to image samples 7 7 S i 3 p C C S cos EN te cos 2y xr 0 u 0v where Cy Cy 1 V2 for u v 0 Cy Cy 1 otherwise Syy is the DCT component at u v Sy is the spatial sample value of the image pixel at x y JPEG Decoder The 2D IDCT is separated into two 1D operations to reduce the number of processing operations as shown below L Perform eight 1D IDCTs one for each row of the array row computation J Perform eight 1D IDCTs one for each column of the array resulting from the row IDCT computation column computation Data Reformat Data reformatting converts a contiguous set of 8x8 image blocks into a raster scanned image frame Figure 6 6 shows the decoded image data as stored in the memory before reformat Figure 6 6 Decoded Image Data Before Reformat XXXXXXxxyyyyyyyygzzzzzzzzooooooooppppppppaadaqaqauaqaqg kkkknnnnnnnnxxxxxxxxyyyyyyyyzzzzzzzzoooooooopppp qqgqqgqqqaqakkkkkkkknnnnnnnnxxxxxxxxyyyyyyyyzzzzzzz 000o0ooooppppppppaaqaagaaqgaqaqakkkkkkkknnnnnnnnxxxxxxxxy All image data belonging to a single 8x8 block occur contiguously followed by the data for the next block Successive groups of eig
70. cessing method that meets Channel Manager s criteria can be plugged into it Channel Manager is independent of specific applications algorithms and es sentially the DSP hardware All IDK demo applications use the same Channel Framework for Combining eXpressDSP Compliant Algorithms Manager module This is basically the same Channel Manager that is used in the Multichannel Vocoder TDK DSK version with some minor API level changes Changes have been made to make it more general New features include support for request of multiple memory blocks on chip scratch buffer and multiple heaps Also Channel Manager is now transparent to DSP cache settings and essentially independent of hardware configurations which makes it possible to reuse it even on different hardware platforms Eventually the Multichannel Vocoder TDK DSK version will be updated with the changes made in Channel Manager for the IDK applications The most im portant feature of Channel Manager is its built in support for multichannel mul ti algorithm applications It provides high level APIs to register algorithms to open close channels to create delete a group of algorithms instances in a channel and to execute those instances To optimize DSP memory usage and to meet memory requirements of a wide range of DSP algorithms Chan nel Manager manages two memory heaps one located on chip and one lo cated off chip Channel Manager also supports parent instance to allow gl
71. ciently move data in the background during processing They have been developed to help remove the burden from the user of having to perform pointer updates in the code IDM functions use DAT Calls from CSL to move data between external and internal memory They can be extended in future to use EDMA DMA calls as appropriate based on the device The following IDM functions are currently defined J J dstr open Open an input output image data stream to bring data from external to internal memory or vice versa dstr get Bring data from external to intenal memory allowing for either one line at a time or multiple lines at a time without any offset between them This function should only bre used on ainput stream The behaviour of this function when used on an output stream cannot be guaranteed dstr get 2d Bring data from external to internal memory allowing for etither one line at a time or multiple lines at a time with no fixed offset between the lines This function should only be used on an input stream The behaviour of this function when used on an output stream cannot be guaranteed dstr put Commit data from internal memory to external memory either one line at a time or multiple lines without any offset between them This function should only be used on an output stream The behaviour of this function when used on an input stream cannot be guaranteed dstr put 2d Commit data from internal memory to external memory either on
72. d input data To display buffer Channel 2 l 5 4 1 Data l O and User Input Specifics NTSC Capture 640x480x30fps 4 2 2 interlace interleaved PAL Capture 768x576x25fps 4 2 2 interlace interleaved NTSC Progressive Display Driver 640x480 16bpp 60Hz mode PAL Progressive Display Driver 800x600 16bpp 60Hz mode D D D GUI Based User Inputs Target bitrate in kbps 5 4 2 Signal Processing Operations Sequence The lO task calls the capture driver using VCAP_getFrame function with SYS FOREVER argument which blocks until a new frame is available to be processed At that point it signals the channel task which can then be gin processing J Daughtercard FPGA planarizes captured YC data Demonstration Scenarios 5 11 H 263 Loop Back Demonstration Only one set of fields even fields is used B NTSC mode Field data is converted from 640x240 to 320x240 by us ing pre scale filters based on the description in Appendix A A 352x288 data array is created with the scaled input data in its upper left corner This is CIF resolution image input to H 263 encoder B PAL mode Input field data is 768x288 resolution The first 64 samples per line are ignored and the remaining 704 samples per line are used to create 352x288 data by using pre scale filters based on the de scription in Appendix A This is CIF resolution image input to H 263 encoder B Input conversion from 4 2 2 to 4 2 0 is by reading every
73. d by the FPGA when access ing the capture frame memory All accesses to the FPGA registers which in clude the 12C interface occur within the specified timing and do not force an assertion of ARDY Itis noted thatthe ARDY outputis tri stated when accesses are not directed at the FPGA allowing it to be used by other daughtercard and mother board interfaces The CE spaces dedicated to the FPGA may be selected via resistors on the daughter card In the first implementation two CE spaces are used The first space is configured for asynchronous operation and provides access to the FPGA control registers 12C interface palette control registers and capture memory The second CE space is configured for SDRAM and is used to effi ciently access the display FIFOs IDK Memory Map Table A 2 outlines the IDK memory map Table A 2 IDK Memory Map 2MB Capture Memory Option Address Range Interface Comments 0xM0000000 0xM002A2FF Capture Frame Memory Y Buffer 1 of 3 field O 0xM002A300 0xMOOSFSFF Capture Frame Memory Cr Buffer 1 of 3 field 0 0xM003F400 0xM00545FF Capture Frame Memory Cb Buffer 1 of 3 field 0 0xM0054600 0xM007E8FF Capture Frame Memory Y Buffer 1 of 3 field 1 0xM007E900 0xM0093A7F Capture Frame Memory Cr Buffer 1 of 3 field 1 0xM0093A80 0xMOOA8BFF Capture Frame Memory Cb Buffer 1 of 3 field 1 0xM00A8CO00 0xMOOD2EFF Capture Frame Memory Y Buffer 2 of 3 field O 0xMOOD2F00 0xMOOE807F Capture Frame Memory
74. de Decode H 26x Encode Decode These compression standards in turn are used in diverse end applications such as B JPEG is used in printing photography security systems etc B MPEG video standards are used in Digital TV DVD Players Set Top Boxes Video on Demand Systems Video Disc Applications Multime dia Streaming Media Applications etc B H 26x standards are used in Video Telephony and some Streaming Media Applications Note that the Inverse DCT function performs an IEEE 1180 1990 com pliantinverse DCT including rounding and saturation to signed 9 bit quan tities The forward DCT provides rounding of output values for improved accuracy These factors can have significant effect on the final result in terms of picture quality and are important to consider when implementing DCT based systems or comparing performance of different DCT based implementations Quantization is an integral step in many image video compression sys tems including ones based on the widely used variations of DCT based compression such as JPEG MPEG H 26x The routine quantize can be used in such systems to perform the quantization step C6000 DSP Image Video Processing Applications 6 29 ImageLIB Library of Optimized Kernels 6 30 d Functions 8x8 Minimum Absolute Difference mad 8x8 and 16x16 Minimum Absolute Difference mad 16x16 are provided to enable high performance Motion Estimation algorithms used in applications s
75. dif bin Median filtering is used in image restoration to minimize the effects of impulsive noise in imagery Applications can cover almost any area where impulsive noise may be a problem including security defense machine vision video compression systems Optimized implementation of median filter for 3x3 pixel neighborhood is provided in the routine median 3x3 Edge detection is a commonly used operation in machine vision systems Many algorithms exist for edge detection and one of the most commonly used ones is Sobel Edge Detection The routine sobel provides an opti mized implementation of this edge detection algorithm Different forms of Image Thresholding operations are used for various reasons in image video processing systems For example one form of thresholding may be used to convert gray scale image data to binary image data for input to binary morphological processing another form of thresholding may be used to clamp image data levels into a desired range and yet another form of thresholding may be used to zero outlow level per turbations in image data due to sensor noise This latter form of threshold ing is addressed in the routine threshold The routine histogram provides the ability to generate an image histo gram An image histogram is basically a count of the intensity levels or some other statistic in an image For example for a gray scale image with 8 bit pixel intensity values the histogram will
76. dress or handle to the parent instance ALG OBJ The INST OBJ object encapsulates an algorithm instance It has a pointer pointing to its base ALG OBJ and contains handles of that instance It also has a pointer to the status parameters structure of that instance The definition of INST OBJ is shown below typedef struct ALG OBJ AlgPtr pointer to the base algorithm object void ContextAddr context pointer or IALG handle to the algorithm instance void algParams pointer to the structure of status parameter of that instance UINT32 CopyMode data copy mode not used in C6211 C6711 version UINT32 DynamicID instance ID INST OBJ Software Architecture Applications Framework 3 9 Channel Manager Object Types The CHAN OBU object contains algorithm instances in a particular channel When a channel is executed it runs all instances in that channel in a serial manner so that the outputs of the pervious instance become the inputs of the next one The definition of CHAN OBJ is shown below typedef struct char Name name of the channel SIG_OBJ Sig signal object ad T32 CopyMode not used in C6211 C6711 version T32 AlgCt number of instances in the channel T OBJ Algs CM MAX CHA ALGS instance handles T32 InputCt number of inputs T32 OutputCt number of outputs EJ T32 S completion signal mode uA CHAN OBJ
77. e 0 0 0 varaner s hen A2 GESelection coccorsseb cree reas ena rath RARRR A CER LER enst be A 2 2 IDK Memory Map sssssssseeee IR III HH I A 2 3 FPGA Control Registers 0 0 00 cece eee eee ees Scaling Filters Algorithm ssissadiessetksudre dress ae ele dr px EI EREXIT Describes the scaling filters algorithm Using Image Data Manager seseeeeeee III nnn C 1 Demonstrates how to use the DMA streaming routines to implement a sliding window 2D Wavelet Transform Algorithm Example lseeeeeeeeeren Describes a 2D wavelet transform algorithm eXpressDSP APIs for IDK Demonstrations sseeeeeeeeeennnne Provides the APIs pertinent to IDK demonstrations E41 eXpressDSP API Overview ssssssssesss ete eens E 2 eXpressDSP API for Pre Scale Filter 0 0 0 cece eee eee E 3 eXpressDSP API for Color Space Conversion 0 0 cece eee eee E 4 eXpressDSP API for Image Processing Functions 000 eee eee eens E eXpressDSP API for Wavelet Transform 00 cee Contents vii Figures 2 1 IDK daughtercard Block Diagram 0 000 cece ees 2 2 NTSC Capture 1 of 3 frames shown 00 0 eee eee 2 3 Capture Buffer Management lssssuusssssssssss nn 2 4 Display Event Generation 0000 c cece tet e eee eens 2 5 Display Interrupt Generation 0000 c cece II 2 6 GRAY8 Display Buffer Format
78. e Image Data Manager There are several different buffering schemes sup ported by the Image Data Manager Image Processing functions that require the same buffering can easily be implemented using a common wrapper struc ture Topic Page 3 1 Framework for Combining eXpressDSP Compliant Algorithms 3 2 3 2 The IALGilntentace 222 21 59 5990920950590 o OE EN 3 6 3 3 Integrating an Algorithm into the Channel Manager 3 4 Channel Manager Object Types s esee 3 9 3 5 Channel Manager Memory Management 3 6 Channel Manager API Functions seseseee eee 3 1 Framework for Combining eXpressDSP Compliant Algorithms 3 1 Framework for Combining eXpressDSP Compliant Algorithms Each of the IDK demo applications consists of two separate parts the Host GUI and the target Executable Figure 3 1 shows the system block diagram of all the imaging demonstrations that have more than one processing chan nel For the wavelet transform demo the I O task and the channel task are merged into one task because there is only one processing channel Target Executables are built upon DSP BIOS kernel and the C6211 C6711 Chip Support Library CSL The tasks shown in Figure 3 1 are literally DSP BIOS tasks There are three types of tasks in the IDK Demos The Message Handling Task detects a command sent from the Host GUI parses the command and dispatches it to app
79. e Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Comments Buffer 2 of 3 field 1 Buffer 2 of 3 field 1 Buffer 2 of 3 field 1 Buffer 3 of 3 field 0 Buffer 3 of 3 field 0 Buffer 3 of 3 field 0 Buffer 3 of 3 field 1 Buffer 3 of 3 field 1 Buffer 3 of 3 field 1 Unused See below See TVP3026 User s Guide Comments Buffer 1 of 3 field 0 Buffer 1 of 3 field 0 Buffer 1 of 3 field 0 Buffer 1 of 3 field 1 Buffer 1 of 3 field 1 Buffer 1 of 3 field 1 Buffer 2 of 3 field 0 Buffer 2 of 3 field 0 Buffer 2 of 3 field 0 Buffer 2 of 3 field 1 Buffer 2 of 3 field 1 Buffer 2 of 3 field 1 EMIF ASRAM Interface Table A 3 IDK Memory Map 8MB Capture Memory Option Continued Address Range Interface Comments 0xM0200000 0xMO23FFFF 0xM0240000 0xMO25FFFF 0xM0260000 0xM027FFFF 0xM0280000 0xMO2BFFFF 0xM02C0000 0xMO2DFFFF 0xM02E0000 0xMO2FFFFF OxM01FD200 0xM01FFFFF 0xM0300000 0xM037FFFF 0xM0380000 0xMOSFFFFF A 2 3 FPGA Control Registers Capture Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Capture Frame Memory Y Capture Frame Memory Cr Capture Frame Memory Cb Reserved FPGA control registers TVP3026 Registers Buffer 3 of 3 field 0 Buffer 3 of 3 field 0 Buffer 3 of 3 field 0 Buffer 3 of 3 field 1 Buffer 3 of 3 field 1 Buffer 3 of 3 field 1 Unused See below See TVP3026 Users Guide Figure A 1 defines
80. e line at a time or multiple lines with no fixed offset between successive lines This function should only be used on an output stream The behaviour of this function when used on an output stream cannot be guaranteed dstr rewind This function performs a stream rewind by resetting the pointer to the external memory to the new location The number of itera tions that have been executed is not reset Hence when the stream is ini tialized the size of the external memory should be the sum of all the re gions in external memory from which data will be feteched dstr close This function closes the streams opened using dstr open This function waits for any previous DMAs to complete and then closes the stream This function should only be called on a stream that has already been opened Software Architecture Algorithms Creation 4 19 dstr get Initializes input output stream Prototype Arguments Return Value Description Prototype Arguments Return Value Description int dstr open dstr t dstr void x data int x Size void i data unsigned short i size unsigned short quantum unsigned short multiple unsigned short stride unsigned short w size dstr t dir t dir dstr t dstr DMA Stream Structure void x data External data buffer int X size Size of external data buffer void i data Internal data buffer Size of internal data buffer Size of single transfer get put Number of lines Stride am
81. ects typedef struct IJPEGDEC Params Int size must be first field of all params structures IJPEGDEC Params FL EGDEC Status This structure defines the status parameters for all JPEG DEC objects typedef struct IJPEGDEC Status Ef Int size must be first field of all params structures unsigned int num lines 3 unsigned int num samples 3 unsigned int gray FLAG unsigned int outputSize EC Status EGDEC PARAMS Default parameter values for JPEG DEC instance objects extern IJPEGDEC Params IJPEGDEC PARAMS C6000 DSP Image Video Processing Applications 6 13 JPEG Decoder EGDEC This structure defines all of the operations on JPEG DEC objects 57 typedef struct IJPEGDEC Fxns IALG Fxns ialg IJPEGDEC extends IALG XDAS Bool control IJPEGDEC Handle handle IJPEG Cmd cmd IJPEGDEC Sta tus status XDAS Int32 decode IJPE EC Handle handle XDAS Int8 in XDAS Int8 out IJPEGDEC Fxns fendif IJPEGDE 6 3 4 JPEG Decoder Performance JPEG Decoder performance has been measured on a wide range of test imag es and compression factors The following performance is based on measure ments on C6201 EVM and C6211 DSK Table 6 2 JPE
82. ed Image Data XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXxx YYyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy ZZ2Z22222222222222222222222222222222222222222222222 0000000000000000000000000000000000000000000000000 PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP qqqqqqaqaggagqqqqqqqagqaggqqqqaqaqqaqagqqqqaqaqaqqaagqqaqaaqadag kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn Figure 6 3 Reformatted Image Data XXXXXXxxyyyyyyyylbgzzzzzzzzooooooooppppppppaaaqaqaag kkkknnnnnnnnxxxxxxxxyyyyyyyyzzzzzzzzoooooooopppp qqgqqgqqqaqakkkkkkkknnnnnnnnxxxxxxxxyyyyyyyyzzzzzzz 0o0oooooppppppppqaaaaqqqaqagaagakkkkkkkknnnnnnnnxxxxxxxxy C6000 DSP Image Video Processing Applications 6 3 JPEG Encoder DCT This operation performs a 2 D Discrete Cosine Transform DCT on the reformatted 8x8 block of image samples and outputs a corresponding 8x8 block of 2 D frequency components The mathematical expression for the DCT is given below 1 c P 2 54008 2x Des oaf SY De 16 16 where Cy Cy 1 V2 for u v 0 Cy Cy 1 otherwise Syy is the DCT component at u v Syx is the spatial sample value of the image pixel at x y The 2D DCT is separated into two 1D operations to reduce the number of proc essing operations as shown below Perform eight 1D DCTs one for each row of the array row computation Perform eight
83. edian 3x3 finclude lt ialg h gt include xdas h Imedian 3x3 Handle This handle is used to reference all median 3x3 instance objects T typedef struct Imedian 3x3 Obj Imedian 3x3 Handle Imedian 3x3 Obj This structure must be the first field of all median 3x3 instance objects typedef struct Imedian 3x3 Obj struct Imedian 3x3 Fxns fxns Imedian 3x3 Obj Imedian 3x3 Status Status structure defines the parameters that can be changed or read during real time operation of the alogrithm E typedef struct Imedian 3x3 Status Int size must be first field of all status structures int pitch Imedian 3x3 Status Imedian 3x3 Cmd The Cmd enumeration defines the control commands for the median 3x3 control method eXpressDSP APIs for IDK Demonstrations E 7 eXpressDSP APIs for Image Processing Functions typedef enum Imedian 3x3 Cmd Imedian 3x3 GETSTATUS Imedian 3x3 SETSTATUS Imedian 3x3 Cmd Imedian 3x3 Params This structure defines the creation parameters for all median 3x3 objects Kile typedef struct Imedian 3x3 Params Int size must be first field of all params structures int pitch Imedian 3x3 Params Imedian 3x3 PARAMS Default parameter values for median 3x3 instance objects extern Imedian 3x3 Params Imedian 3x3 PARAMS Imedian 3x3 Fxns This structure defines all of the operations on median 3x3 objects tj typedef s
84. escale instance objects xtern IPrescale Params IPrescale PARAMS IPrescale Fxns This structure defines all of the operations on Prescale objects E typedef struct IPrescale_Fxns IALG_Fxns ialg IPrescale extends IALG XDAS Bool control IPrescale Handle handle IPrescale Cmd cmd IPres cale Status status XDAS Int32 apply IPrescale Handle handle XDAS Int8 in XDAS Int8 out IPrescale Fxns endif IPrescale eXpressDSP API for Color Space Conversion E 3 eXpressDSP API for Color Space Conversion Color Space Conversion is used to convert output data from YUV to RGB form in the JPEG Loop Back demonstration H 263 Decoder Demonstration and Scaling Demonstration The eXpressDSP API for Color Space conversion is iyuv2rgb h IYUV2RGB Interface Header AE ifndef IYUV2RGB define IYUV2RGB include ialg h include xdas h IYUV2RGB Handle This handle is used to reference all YUV2RGB instance objects y typedef struct IYUV2RGB Obj IYUV2RGB Handle IYUV2RGB Obj This structure must be the first field of all YUV2RGB instance objects xy typedef struct IYUV2RGB Obj struct IYUV2RGB Fxns fxns IYUV2RGB Obj IYUV2RGB Status Status structure defines the parameters that can be changed or read during real time operation of the alogrithm typedef struct IYUV2RGB Status Int size must be first field of all status structures int widt
85. essing functions ImageLIB or custom kernels Image data manager CSL Lj The top most layer of this hierarchical architecture is the eXpressDSP API Wrapper This is the interface available to other algorithms or users of the eXpressDSP compliant algorithm The next layer is the actual Algorithm It typically invokes one or more Image Processing Functions The ordering of the functions and data passing between the functions is controlled by the standard algorithm Anlmage Processing Function is a wrapper around one or more Imag ing Kernels and is responsible for managing data I O for the kernels ImageLIB or Custom Kernels are the core processing operations Typi cally they are DSP code that has been highly optimized for performance Many of these kernels are contained in the TI ImageLIB software while others are custom software for specific applications Image Data Manager is a set of library routines that offer abstraction for double buffering of DMA requests to efficiently move data in the back ground during processing They have been developed to help remove the burden from the user of having to perform pointer updates in the code Image Data Manager uses CSL DAT calls to move data between external and internal memory during the course of processing To illustrate the use of various layers of software shown above we use the 2D Wavelet Transform IDK algorithm as an example The sequence o
86. f operations performed is shown in Figure 4 2 Overview Figure 4 2 2D Wavelet Transform Horizontal Vertical Enhance wavelet wavelet pixel transform transform saturate Software Architecture Algorithms Creation 4 3 eXpressDSP API Wrapper 4 2 eXpressDSP API Wrapper The eXpressDSP API Wrapper is derived from template material provided in the algorithm standard documentation Knowledge of the algorithm standard is essential to understand the eXpressDSP API wrapper See the algorithm standard documentation for details on the algorithm standard For the wavelet example the eXpressDSP API Wrapper consists of the files wavelet ti h and iwavelet h shown below Descriptions of the file elements are included wavelet ti h Interface for the Wavelet TI module TI s implementation of the IWavelet interfac ah ifndef Wavelet_TI define Wavelet_TI_ include lt iwavelet h gt include lt ialg h gt Wavelet_TI_IALG TI s implementation of the IALG interface for Wavelet ne extern IALG Fxns Wavelet TI IALG Wavelet TI IWavelet TI s implementation of the IWavelet interfac E extern IWavelet_Fxns Wavelet TI IWavelet endif Wavelet TI 4 4 eXpressDSP API Wrapper iwavelet h IWavelet Interfac Kf ifndef IWavelet_ define IWavelet_ include lt std h gt include lt xdas h gt include lt ialg h gt typedef enum img_type FLDS PROG IMG
87. ght copyright mask work right or other TI intellectual property right relating to any combination machine or process in which TI products or services are used Information published by TI regarding third party products or services does not constitute a license from TI to use such products or services or a warranty or endorsement thereof Use of such information may require a license from a third party under the patents or other intellectual property of that third party or a license from TI under the patents or other intellectual property of TI Reproduction of information in TI data books or data sheets is permissible only if reproduction is without alteration and is accompanied by all associated warranties conditions limitations and notices Reproduction of this information with alteration is an unfair and deceptive business practice TI is not responsible or liable for such altered documentation Resale of TI products or services with statements different from or beyond the parameters stated by TI for that product or service voids all express and any implied warranties for the associated TI product or service and is an unfair and deceptive business practice TI is not responsible or liable for any such statements Mailing Address Texas Instruments Post Office Box 655303 Dallas Texas 75265 Copyright 2001 Texas Instruments Incorporated About This Manual Preface Read This First The Imaging Developer s Kit IDK has been de
88. h int height int pitch IYUV2RGB Status eXpressDSP APIs for IDK Demonstrations E 5 eXpressDSP API for Color Space Conversion IYUV2RGB Cmd The Cmd enumeration defines the control commands for the YUV2RGB control method EJ typedef enum IYUV2RGB Cmd IYUV2RGB GETSTATUS IYUV2RGB SETSTATUS IYUV2RGB Cmd IYUV2RGB Params This structure defines the creation parameters for all YUV2RGB objects ET typedef struct IYUV2RGB Params Int size must be first field of all params structures int width int height int pitch IYUV2RGB Params IYUV2RGB PARAMS Default parameter values for YUV2RGB instance objects ne extern IYUV2RGB Params IYUV2RGB PARAMS IYUV2RGB Fxns This structure defines all of the operations on YUV2RGB objects 4 typedef struct IYUV2RGB Fxns IALG Fxns ialg IYUV2RGB extends IALG XDAS Bool control IYUV2RGB Handle handle IYUV2RGB Cmd cmd IYUV2RGB Status status XDAS Int8 convert IYUV2RGB Handle handle XDAS Int8 in XDAS Int8 out IYUV2RGB Fxns endif IYUV2RGB eXpressDSP API for Image Processing Functions E 4 eXpressDSP API for Image Processing Functions eXpressDSP APIs are very similar for the different components of the Image Processing demonstration So the API for only one of the components Median Filter is described below imedian 3x3 h Imedian 3x3 Interface Header f fifndef Imedian 3x3 define Im
89. hold 6 34 Description Pixel Expand Pixel Saturate Matrix Quantization with Rounding Horizontal Scaling Vertical Scaling Sobel Edge Detection Image Thresholding Cycles 0 5 n 26 cycs n is number of data samples For n 256 cycs 154 For n 1024 cycs 538 n 37 cycs n is number of data samples For n 256 cycs 293 For n 1024 cycs 1061 blk size 16 4 num blks 12 26 cycs bIk size is block size num blks is number of blocks For blk size 64 num blks 8 cycs 426 For blk size 256 num blksz24 cycs 4696 I_hh 1 k sf n_x 15 cycs where k 1 4 _hh when hh 8 0 k 0 otherwise _hh is number of filter taps per output sf is scale factor n x is pixels per line in input For hhz8 n x 640 sf 0 1875 cycs 1005 For I hh 16 n x 1024 sf 1 3333 cycs 22 201 0 75 hh cols 6 l hh 37 cycs I hh is number of filter taps per output cols is number of image columns For cols 128 1 hh 4 cycs 445 For cols 720 I hh 16 cycs 8773 3 cols rows 2 34 cycs cols is number of image columns rows is number of image rows For cols 128 rows 8 cycs 2338 For cols 720 rows 8 cycs 12 994 cols rows 16 9 50 cycs cols is number of image columns rows is number of image rows For cols 128 rows 8 cycs 626 For cols 720 rows 8 cycs 3290 Code Size 288 bytes 448 bytes 1024 bytes 416
90. how the user chooses to create each instance For example suppose that one wishes the encoder to process three MBs at a time For QCIF the h263EncMB function will be called four times processing 3 3 3 and 2 MBs each time H 263 Encoder Figure 6 9 h263EncMB Overview Y Motion Estimation h263EncME rdRefC ion Compensation h263EncMC MB difference diffmb SA Unpack MB unpackmb F4 Unpack MB unpackmb ASM Forward DCT fact ASM Forward DCT fdct C Quantize amp Zigzag scan tazia C Quantize amp Zigzag scan taziq Y SA Encode MCBPC amp CBPY enccbp SA Encode MCBPC amp CBPY enccbp SA Encode VLC encvle1 ASM Inverse DCT idct1 N SA Encode MVD encmva TRAD Y 4 Y nM SA Encode VLC encvicI encvlcP N H ASM Inverse DCT iacti idctP 9g Y SA Pack MB packmb Y SA Pack MB packmb NTRA gt N N oon C Copy MB to output Y IASM Add IDCT output mcAi bc rcUpdateMB p Return 6 4 2 H 263 Encoder Capabilities and Restrictions eXpressDSP compliant H 263 Encoder code optimized for TMS320C620x and TMS320C6211 DSPs is currently avai
91. ht samples are depicted by a different alphabet Figure 6 7 shows the reformatted data as required for display of an image frame Reformatting also converts the dynamic range of the pixel intensity values from 128 127 to 0 255 as per the JPEG stan dard Figure 6 7 Reformatted Image Data in Raster Scan Format XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXxx YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYyYyY ZZ2Z22222222222222222222222222222222222222222222222 0000000000000000000000000000000000000000000000000 PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP qqgqgqqqqqqqgaqqgaqqaqqaqaqqaqqgaqqgqqqqqaaqaqqgaqqagqqaqqaaqqaag kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn 6 3 2 JPEG Decoder Capabilities and Restrictions eXpressDSP compliant JPEG Decoder code optimized for TMS320C620x and TMS320C6211 DSPs is currently available from Texas Instruments Cer tain restrictions have been placed on the broad JPEG standard to produce the code that provide optimal performance while addressing the features of JPEG useful in most common applications The capabilities and restrictions of the decoder are listed below C6000 DSP Image Video Processing Applications 6 11 JPEG Decoder Lossless JPEG Decoding is not supported Only JPEG standard VLD tables are supported Arbitrary quantization tables are supported and may be changed per image Progressi
92. in order to exploit individual table structures Variable length decoding with partial JPEG bit streams is a non trivial problem The DMA packets used for transferring data to DSP generally do not end at block boundaries Complex structures would be required to track the number of run level pairs decoded and to ensure that data is not read beyond the end ofa DMA packet To circumventthis problem the number of bytes that are con sumed from the DMA packet when a complete block 8x8 is decoded is moni tored If this number exceeds a threshold value smaller than the DMA packet size the VLD is discontinued and the blocks that have been decoded thus far are grouped into a set This set of blocks is passed down the decoding chain inasingle pass The succeeding DMA packetis concatenated to the remaining bytes in the present packet and the process is repeated Run Level Decoding RLD and Dequantization The quantized DC coeffi cient and the run level pairs that were decoded from the variable length de coder routines are input to this function This function expands the run level pairs with explicit zeroes and quantized AC coefficients in the same zig zag pattern as at the encoder It then performs inverse quantization i e a multi plication with the corresponding element in the quantization tables of all non zero coefficients Inverse Discrete Cosine Transform IDCT This routine performs the in verse DCT on the frequency components of
93. in updated status from the encoder encode Execute the H 263 encoder Shown below is example code in which one parent instance and one child instance are created As shown below the creation parameter is set to NULL which means that the default set of parameters defined in IH263ENC_PARAMS defined in in263enc c is used to create each child instance One can also set one s own parameters prior to each creation and passing the address of the parameters structure to the ALG create function Refer to TMS320 DSP Algorithm Standard Rules and Guidelines Literature number SPRUS52 for more information on eXpressDSP specific function APIs H 263 Encoder void main H263PENC TI Obj encParent encoder parent handle H263ENC TI Obj encHandleO0 encoder child handle X IH263ENC Status es encoder status A unsigned char in 3 input frame Y Cb Cr unsigned int out output bitstream EJ creat ncoder parent instance encParent H263PENC_TI_Obj ALG create IALG Fxns amp H263PENC_TI_IALG NULL IALG Params NULL create encoder child instance encHandleO H263ENC TI Obj ALG create IALG Fxns H263ENC TI IH263ENC encParent IALG Params NULL clear encoder status structure H263ENC TI IH263ENC control IH263ENC Handle encHandleO IH263ENC CLR STATUS amp es while 1 get pointer to input video frame gt in
94. ing it to the capture frame buffer Captured data is stored as two separate fields in three separate blocks in the frame buffer Data is expected from the TVP5022 in the Cr0 YO Cb0 Y1 Cr2 Y2 Cb2 Y3 format The FPGA internally adjusts the data stream for endian and stores it into the capture frame memory as shown in Figure 2 2 The FPGA manages a capture frame buffer in an on board SDRAM memory bank SDRAM was chosen due to its low cost for the required memory bank size however the DSP interface to this buffer is ofthe ASRAM type The FPGA per forms this translation autonomously It is noted that the capture frame memory is read only to the DSP interface Any writes attempted to the frame memory by the DSP are discarded The FPGA SDRAM controller supports both 2MB and 8MB configurations of SDRAM and is controllable via software Table 2 1 outlines the capture for mats vs memory requirements Table 2 1 Video Capture Memory Requirements 2 4 Format Required Memory NTSC square pixel 2MB PAL square pixel 8MB NTSC ITU601 2MB PAL ITU601 8MB mu eve Note The TVP5022 chipset and FPGA support sampling of all versions of the PAL standard though stuffing options of the TVP5022 crystal may be required un Video Capture Figure 2 2 NTSC Capture 1 of 3 frames shown Y Buffer Gr Buffer Cb Buffer Y Buffer Cr Buffer Cb Buffer Little Endian Big Endian first pixel captured 32 bits M 32 bits
95. k A ke koe ke ke koe eoe eoe ee kx x f iters number of iterations half the width of the input line xstart starting point for the high pass filter input data X KK IK KK KK A A A A KKK KKK KKK KKK KKK Ck Ck KK Kk kk kk kk ke koe ke ke koe eoe eoe ee kx x f int iters cols short xstart in data cols M 2 KK KK KKK AK A A A A A A A A A A A A Ck KKK kk kk kk ke ko ke ke ke koe ke ke koe eoe ee x x f Since the output of the low pass filter is decimated by x eliminating odd output samples the loop counter i increments by A 2 for every iteration of the loop Let the input data be x do d7 and the low pass filter be hg hg Outputs yo yi are generated as A yo hgdg hid hod h3d3 h4d4 hs5ds hgdg h d Af y hod2 hjd3 h2d4 h3ds h4dg hsd hgdg h do xu If the input array access d goes past the end of the array the pointer is wrapped around Since the filter is in floating point it is implemented in Q15 math Or is the associated wu round value Hof KKKKKKKKKKKKK KK KK KK KK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KK KKK ImageLIB or Custom Kernels i iters i 2 sum Qr xptr in data i for j 0 j lt M j xdata xptr hdata qmf jl prod xdata hdata sum prod if xptr gt x end xptr out data sum gt gt Qpt Jf KKK KK KK kk kk kk kk kk kk kk kk Ck kk kk kk
96. king because horizontal location of center of gravity of 4 2 2 and 4 2 0 C data is different The data thus created is referred to as Conditioned Input Data in Figure 5 1 PAL Mode Field data is converted from 768x288 to 384x288 by using pre scale filters based on the description in Appendix B JPEG Encode and Decode are performed on 384x288 resolution data Input conversion from 4 2 2 to 4 2 0 is by reading every other line of C data into DSP during pre scale processing not accurate strictly speaking because horizontal location of center of gravity of 4 2 2 and 4 2 0 C data is different The data thus created is referred to as Conditioned Input Data in Figure 5 1 JPEG Encoder is setup to process one frame of data at a time followed by decode of the encoded data stream Color Space Conversion function converts JPEG decoded data from 4 2 0 to RGB Initial demos use a 16 bit RGB output A display rate of 60fps is achieved by repeating display of any given frame from display buffer as suitable The Color Space Conversion function also provides the ability for a pitch to control the positioning of the output frame within the frame buff er NTSC Mode Display Decoded output picture resolution is 320x240 This data is written in the lower right corner of 640x480 region in the frame buff er The uncompressed pass through image is written in the upper left cor ner of the same 640x480 region of the frame buffer The application only has
97. l Decoder Demonstration WI Ability to start and stop each task independently m Frame Rate Selection select decode frame rate from choice of 5 10 30 frames sec 5 2 2 Signal Processing Operations Sequence E Input data transferred from host PC to DSK board RAM DSP external memory using C6711 HPI In case of multichannel decode the multiple bit streams are loaded into the DSK board RAM at the initialization of the demonstration and are available in different areas of DSK RAM for de code Number and size of bit streams that can be used for input depends on C6711 DSK board memory availability Budget allocations based on 16Mbytes of board memory availability are shown below Table 5 1 DSK Board Memory Budget Allocations for Multichannel H 263 Decode DSK Board Memory H 263 Decoder data program Multichannel Framework Buffers between decode and display 352x288x1 5x2 Display H 263 Bit streams Memory Used 16 Mbytes 16 Mbytes 400 Kbytes 400 Kbytes 100 Kbytes 100 Kbytes 304 13 Kbytes 1 85 Mbytes 16 bit triple buffered 304 13 Kbytes 3 69 Mbytes 32 bit triple buffered 1 92 Mbytes 3 bitstreams each 10 secs 512kbps 1 92 Mbytes 3 bitstreams each 10 secs 512kbps 4 58 Mbytes 6 42 Mbytes Note thatthe bit stream configurations shown in Table 5 1 are only meant to provide representative examples Other multichannel decode varia tions such as one CIF decode and one QCIF dec
98. lable from Texas Instruments Ca pabilities and restrictions relevant to the encoder are J Baseline H 263 encoder implementation only does not support H 263 standard annexes Capable of processing between one and maximum number of MBs per GOB to suit the user s system C6000 DSP Image Video Processing Applications 6 17 H 263 Encoder 6 4 3 H 263 Encoder API The eXpressDSP API Wrapper is derived from template material provided in the algorithm standard documentation Knowledge of the algorithm standard is essential to understand the eXpressDSP API wrapper See the algorithm standard documentation for details on the algorithm standard Also see Appendix E for an overview of eXpressDSP APIs An algorithm is said to be eXpressDSP compliant if it implements the IALG Interface and observes all the programming rules in the algorithm standard The core of the ALG inter face is the IALG_Fxns structure type in which a number of function pointers are defined Each eXpressDSP compliant algorithm must define and initialize a variable of type IALG Fxns Shown below is the IALG functions structure IH263ENC Fxns typedef struct IH263ENC Fxns IALG_Fxns ialg IH263DEC extends IALG void control IH263ENC Handle handle IH263ENC Cmd cmd void input encode IH263ENC Handle handle uchar in 3 uint out s IH263ENC Fxns ialg This is the default IALG function control This function is used to obta
99. lgs get algorithm settings in the channel CM RegAlg register an algorithm to Channel Manager CM Exec execute all algorithms in the channel object CM InstCtrl set or get the status parameters of a specific instance in the channel CM Control set or get Channel Manager global configuration data CM Close 3 6 1 API Reference Prototype Arguments Return Value Description CM Open Prototype Arguments Return Value Description CM Close Prototype Arguments Return Value Description Initializes Channel Manager module BOOL OM Init none BOOL TRUE function succeeded FALSE function failed Initializes the Channel Manager module Must be called at least once before any other CM API functions can be called Creates new channel object HANDLE CM Open char Name UINT32 Flag SIG OBJ Signal char Name Name of the channel Flag TBD SIG OBJ Signal Signal object used to post application upon each completion of running the channel HANDLE Returns a handle to the open channel INV is returned upon failure Create a new channel object Deletes channel object Void CM Close HANDLE hCha HANDLE hCha Handle to the channel none Delete the channel object Software Architecture Applications Framework 3 17 CM SetAlgs CM SetAlgs Assigns set of algorithms to channel Prototype BOOL CM SetAlgs HANDLE hCha Arguments Return Value Description CM GetAlgs Prototype Argument
100. lter demo includes the ability to select between Low Pass Fil ter High Pass Filter and Sharpness Filter B Frame Rate Selection select input frame rate from choice of 5 10 30 frames sec for each demo task independently B Ability to start and stop each task independently 5 3 2 Signal Processing Operations Sequence 3 The I O task calls the capture driver using VCAP_getFrame function with SYS FOREVER argument which blocks until a new frame is available to be processed At that point it signals the channel task which can then be gin processing Daughtercard FPGA planarizes captured YC data Only Y channel data is used Use only even field data For NTSC mode even field data is con verted from 640x240 to 320x240 by using pre scale filters based on the algorithm described in Appendix B For PAL mode even field data is con verted from 768x288 to 384x288 by using pre scale filters based on the algorithm described in Appendix B Each resulting array of 320x240 NTSC or 384x288 PAL Y channel data is used as inputforthe following processing operations Binary Threshold Low Pass Filter Pass Through Sobel Edge Detection Four resulting output arrays are written to output buffer such that they are tiled to create a single 640x480 frame The code for the individual func tions Binary Threshold Low Pass Filter Pass Through Sobel Edge Detection is responsible for producing output offset to enable tiling Output
101. m a theoretical stand point it has been found to be adequate for simple demonstrations J While capture resolution is limited to 640x480 or 720x480 pixels some of the IDK demonstrations require other resolutions e g 320x240 Such a resolution conversion is achieved by using Scaling Filters described in Appendix B Capture drivers supporting the video capture modes discussed here are included in the IDK The drivers are written using DSP BIOS and CSL Refer to the TMS320C6000 Imaging Developer s Kit IDK Video Device Driver s User s Guide Literature number SPRU499 for further details 2 3 Video Display Video Display The IDK daughtercard includes RGB output port for a standard computer mon itor The RGB output is driven by the TVP3026 and can drive any of the stan dard monitor resolutions In the case of RGB output the FPGA provides the video timing to the output Consequently the DSP display driver software must also program the FPGA integrated video controller which drives the timing information to the TVP3026 RGB palette Video data is built up in buffers in system memory on the C6711 DSK Frame buffer memory is of the SDRAM type with a read CAS latency of three The imaging daughtercard does not include any addressable amount of video dis play memory Video output data is transferred in real time from the frame buff er to the imaging daughtercard This data service can be provided by the DSP EDMA controller and EM
102. mage is written in the upper left corner of the same 800x600 region of the frame buffer The application only has to write the picture in the appropriate location of the frame buffer the entire frame buffer is initialized with zeros by the system at the start of the application Display rate of 60fps is achieved by repeating display of any given frame from display buffer as suitable 5 12 H 263 Loop Back Demonstration Uponcompletion of processing the channel task signals the I O task The I O task calls the display driver using VCAP toggleBuffs function with ar gument 0 5 4 3 eXpressDSP APIs for H 263 Loop Back Demonstration See sections 6 4 3 and 6 5 3 for H 263 Encoder and Decoder eXpressDSP APIs respectively Also see Appendix E for eXpressDSP APIs of other func tions used in this demonstration Demonstration Scenarios 5 13 2D Wavelet Transform Demonstration 5 5 2D Wavelet Transform Demonstration Figure 5 6 shows the standard algorithm configuration for this demonstration while Figure 5 7 shows the ImageLIB and Custom kernel level components of the block labeled Wavelet Transform in Figure 5 6 Figure 5 6 2D Wavelet Transform Demonstration Wavelet i Capture transform aiken Figure 5 7 2D Wavelet Transform Components Horizontal Vertical Enhance wavelet wavelet pixel transform transform saturate 5 5 1 Data I O and User Input Specifics Li NTSC Capture 640x480x30fps 4 2 2 interlace
103. me is first decomposed into a set of macroblocks then motion compensated prediction is employed to reduce temporal redun dancy The prediction errors are compressed using DCT and quantization Furthermore the motion vectors are differentially coded Finally the differen tial motion vectors are combined with the quantized DCT information and en coded using entropy coding 6 4 1 H 263 Encoder Algorithm Level Description Figure 6 8 shows a high level view of the H 263 encoder operation on TMS320C6000 DSPs C6000 DSP Image Video Processing Applications 6 15 H 263 Encoder Figure 6 8 H 263 Encoder Overview 6 16 h263Encode Encode picture layer Read part of input rdcurBuff h263EncMB Write reconstructed MB wrRecBuff Write encoded bits wrBits Last MB N set Y Byte align bits bytealign N Last GOB Y Write last set of bits wrBits Wait for all data transfers to complete For each frame the encoder is provided with a frame in the YUV 4 2 0 format The h263Encode function begins by encoding the picture layer and a GOB layer as appropriate The encoder is capable of processing between one and maximum number of MBs per GOB provided that the system has sufficient heap to allocate necessary scratch memory The rdCurBuf f function brings in as much of the captured frame as required into on chip memory The h263EncMB function is called the appropriate number of times depend ing on
104. mputationally intensive real time applications where optimal execution speedis critical ImageLIB offers the following advan tages to software developers By using the routines provided in ImageLIB an application can achieve execution speeds that are considerably faster than equivalent code writ ten in standard ANSI C language By providing ready to use DSP functions ImageLIB can significantly shorten image video processing application development time ImageLIB software and associated documentation is available by accessing http www ti com and navigating to the appropriate site Chip Support Library CSL CSL is a set of Application Programming Inter faces APIs used to configure and control all on chip peripherals It is intended to make software development easier in making algorithms operational in a system The goal of this library is ease of peripheral use some level of compat ibility between devices shortened development time code portability some standardization and hardware abstraction CSL offers the following advan tages to software developers Enables development of DSP application code without having to physical ly program the registers of peripherals This helps to make the program ming task easier quicker and there is also less potential for mistakes The availability of CSL for all C6000 devices allows an application to be developed once and run on any member of the TMS320C6000 DSP family
105. n below A B C D E s Outputs are P A B 2 Q C D 2 R E F 2 B 1 Appendix C Using Image Data Manager This example demonstrates how to use the DMA streaming routines to imple menta sliding window that contains four lines each of length four words or four interrupts After each iteration the input pointer jumps down by two lines Therefore the sliding window looks as follows after the following iteration Iteration O0 Line0 word3 mword2 wordi word0 Linel gt word7 word6 word5 word4 Line2 gt wordll word10 word9 word8 Line3 gt word15 wordl4 word13 wordl12 Iteration 1 Line0 wordll wordlO0 word9 word8 Linel gt word15 word14 mwordl3 word12 Line2 gt word19 word18 wordl7 wordl6Line3 word23 word22 word21 word20 The stride argument lets the user specify an external memory stride to move by and this lets the user implement strip lining Consider the scenario where each line contains 16 pixels and you are processing the data using a sliding window sliding two lines at a time Line0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Linel 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Line2 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Line 3 48 49 50 51 52 54 55 56 57 58 59 60 61 62 63 64 err code dstr init amp i dstr void arrayl si zeof arrayl void array2 sizeof array2 4 sizeof int 2 8 sizeof int 2 DSTR INPUT The above lines let the user initia
106. narios This section describes the demonstration scenarios currently included in the IDK Each demonstration contains the components described in the following sections as well as a G 723 1 speech decoder executing as a separate task For each demonstration the G 723 1 speech decoder plays a verbal narration of the demonstration Topic Page 5 1 JPEG Loop Back Demonstration Luuus 5 2 5 2 H 263 Multichannel Decoder Demonstration 5 5 5 3 Image Processing Demonstration ssseueeeesee 5 8 5 4 H 263 Loop Back Demonstration else 5 5 2D Wavelet Transform Demonstration 5 1 JPEG Loop Back Demonstration 5 1 JPEG Loop Back Demonstration This demonstration includes JPEG Encode and Decode Image data is cap tured and JPEG Encoded The encoded bit stream is then subjected to JPEG Decode and sent to display after Color Space Conversion Figure 5 1 shows the sequence of standard algorithms connected by Channel Manager to create this demonstration In this demonstration two tasks are uti lized Task 1 where the input data after pre scale is subjected to Color Space Conversion and Task 2 where the same data is subjected to JPEG Encode JPEG Decode and Color Space Conversion In the demonstration both tasks are run to provide a demonstration of before and after JPEG Encode De code Figure 5 1 JPEG Loo
107. o ap plications The C6711 DSK has 16 MBytes external SD RAM operating at 100MHz 3 5 2 Data Memory Requirements of IDK Algorithms Image processing algorithms typically work on very large quantities of data with sizes far larger than the on chip memory space on most typical proces sors On the other hand at any given time an algorithm is only processing a small portion of the entire image such as an 8x8 block or a vertical horizontal line Data access is usually localized and predictable This makes it possible for algorithms to bring data to fast internal data memory before processing it and send it back outto external memory after the processing is done The fast est way to perform the data movement is Direct Memory Access DMA By using double buffering schemes most or all overhead of data movement can be eliminated by doing the DMA transfer in the background Figure 3 5 shows the system memory layout for a typical image processing algorithm Figure 3 5 Split Cache SRAM Mode with QDMA Data Transfer Internal L2 Memory 64 Kbytes External SDRAM 16M bytes 3 12 SRAM QDMA transfer service Cache Channel Manager Memory Management As shown in Figure 3 5 the on chip SRAM operates in split mode with part of it configured as RAM and the rest as L2 cache for both program and data The on chip RAM is primarily used as internal scratch data buffers At run time algorithms call DMA data service functions CSL
108. o the FPGA over an 8 bit video input port Data may be captured in ei ther the square pixel 640x480 or 768x576 or ITU 720x480 or 720x576 for mat The format is determined via a control register bit in the TVP5022 which must be programmed to denote line length divisibility by 64 or 72 all formats fit into one of these two categories The setting of the input mode as well as complete configuration of the TVP5022 is provided via an I C interface A complete list of the addressable registers and their functions in the TVP5022 is available by accessing http www ti com and navigating to the appropriate site Captured data is stored as two separate fields odd and even fields in three separate blocks Y Cr Cb in the frame buffer memory on the daughtercard Note that the memory locations of the fields as well as the blocks within the fields are not necessarily contiguous Up to three frames of captured data may be stored in the daughtercard memory At any given time the FPGA controls two ofthe buffers to which it writes captured video data in a ping pong fashion The application has access to the third buffer which typically has the most re cently captured data If the application falls behind in processing the two buff ers that the FPGA controls can be toggled and the application simply runs at a processing rate less than the captured 30 frames sec If the application can maintain the full processing rate the buffers are physicall
109. obal data sharable by all instances of the same algorithm And in order to use the on chip DSP memory more efficiently Channel Manager overlays the on chip scratch buffer for all algorithm instances Software Architecture Applications Framework 3 5 The IALG Interface 3 2 The IALG Interface Since all algorithms must implement the IALG interface in order to plug into Channel Manager itis essential to have a good understanding of the standard IALG interface before further discussions on Channel Manager details An algorithm is said to be eXpress compliant if it implements the IALG Inter face and observes all the programming rules in the algorithm standard The core of the IALG interface is the IALG Fxns structure type in which a number of function pointers are defined Each eXpress compliant algorithm must de fine and initialize a variable of type IALG Fxns as shown below In IALG fxns algAlloc alginit and algFree are required while other func tions are optional typedef struct IALG Fxns Void implementationId Void algActivate IALG Handle algAlloc const IALG Params struct IALG Fxns IALG MemRec algControl IALG Handle IALG Cmd IALG Status algDeactivate IALG Handle algFree IALG Handle IALG MemRec algInit IALG Handle const IALG MemRec IALG Handle const IALG Params algMoved IALG Handle const IALG MemRec IALG Handle const IALG Params
110. ode and or different bit streams at different bit rates may be used in the demonstration as suit able Color Space Conversion function converts H 263 decoded data from 4 2 0 to RGB Initial demos use a 16 bit RGB output A display rate of 60fps is achieved by repeating display of any given frame from display buffer as suitable The Color Space Conversion function also provides the ability for a pitch to control the positioning of the output frame within the frame buffer H 263 Multichannel Decoder Demonstration Decode output picture resolution is CIF 352x288 or QCIF 176x144 Multiple outputs will be positioned as suitable inthe 640x480 display buffer area Display rate of 60fps is achieved by repeating display of any given frame from display buffer as suitable Uponcompletion of processing the channel task signals the I O task The I O task calls the display driver using VCAP toggleBuffs function with ar gument 0 In multichannel decode for each channel of decode and color space con version a separate frame buffer is used between the decode and color space conversion operations The multiple channels of H 263 decode are each processed as a separate channel but all as one task by the Channel Manager see section 3 4 for details on the Channel Manager 5 2 3 eXpressDSP APIs for H 263 Multichannel Decoder Demonstration See sections 6 5 3 for H 263 Decoder eXpressDSP API description Also see Appendix E fo
111. oid algActivate IALG Handle Int algAlloc const IALG Params struct IALG Fxns IALG MemRec Int algControl IALG Handle IALG Cmd IALG Status Void algDeactivate IALG Handle Int algFree IALG Handle IALG MemRec Int algInit IALG Handle const IALG MemRec IALG Handle const IALG Params Void algMoved IALG Handle const IALG MemRec IALG Handle const IALG Params Int algNumAlloc Void IALG Fxns The algorithm implements the algAlloc function to inform the framework of its memory requirements by filling the memTab structure It also informs the framework whether there is a parent object for this algorithm Based on infor mation it obtains by calling algAlloc the framework then allocates the re quested memory Alglnit initializes the instance persistent memory re quested in algAlloc After the framework has called alglnit the instance of the algorithm pointed to by handle is ready to be used To delete an instance of the algorithm pointed to by handle the framework needs to call algFree It is the algorithm s responsibility to set the addresses and the size of each memory block requested in algAlloc such that the application can delete the instance object without creating memory leaks eXpressDSP API for Pre Scale Filter E 2 eXpressDSP API for Pre Scale Filter Pre Scale filters are used as to pre condition input data for JPEG Encoder in the
112. options 2 12 Figure 2 6 GRAY8 Display Buffer Format Little Endian M 32 bits p Figure 2 7 RGB16 Display Buffer Format Little Endian 4 32 bits Video Display Big Endian M 32 bits first pixel captured Word Word 000 000 001 001 010 010 011 011 Big Endian 4 32 bits p first pixel captured Word 000 001 010 on 3 283 R G or 3130 26252 1514 109 5 N R G Hardware Architecture 2 13 Chapter 3 Software Architecture Applications Framework The IDK has multiple software architecture levels Atthe highest level the IDK framework provides a way to pipeline eXpressDSP compliant algorithms easi ly Some of the standard algorithms used are exposed to the user only at the algorithm level such as the JPEG encoder JPEG decoder H 263 decoder H 263 encoder These algorithms are made available in source code form only under license The framework software provides a means for building demonstrations using combinations of such applications level code an example is the JPEG Loop Back demonstration that combines Pre Scale Filter JPEG Encode JPEG De code and Color Space Conversion Other standard algorithms for simpler image processing functions have been built using a common layering approach combining ImageLIB kernels with th
113. ount for external pointer Window size 1 for double buffering Direction Input Output unsigned short i size unsigned short quantum unsigned short multiple unsigned short stride unsigned short w size dstr t dir Int 0 function succeeded 1 2 3 function failed Initializes input output stream Must be used before dstr_put dstr_get or dstr_put_2d dstr_get_2d calls are used dstr_close should be used only ona stream that has been opened using dstr_open Returns pointer to current area in internal memory void dstr_get none void Returns a pointer to current input buffer Returns a pointer to the current area in internal memory that contains valid data Software Architecture Algorithm Creation 4 21 dstr get 2d dstr get 2d Prototype Arguments Return Value Description Prototype Arguments Return Value Description Prototype Arguments Return Value Description 4 22 Returns pointer to current area in internal memory void dstr get 2d none void Returns a pointer to current input buffer Returns a pointer to the current area in internal memory that contains valid data This function is called on an input stream when succesive lines in exter nal memory are seperated by a fixed offset Returns pointer to current buffer void dstr put none void Pointer to current buffer in which output results can be stored Returns a pointer to current buffer in which outpu
114. p Back Demonstration Color space l JPEG JPEG Color space ay To display encoder decoder conversion buffer 5 1 1 Data I O and User Input Specifics Task 1 Conditioned input data NTSC Capture 640x480x30fps 4 2 2 interlace interleaved PAL Capture 768x576x25fps 4 2 2 interlace interleaved NTSC Progressive Display Driver 640x480 16bpp 60Hz mode PAL Progressive Display Driver 800x600 16bpp 60Hz mode GUI Based User Inputs m JPEG Encoder Quantization Factor Setting integer values in the range 1 12 B Frame Rate Selection select input frame rate from choice of 5 10 30 frames sec m Ability to start and stop each task independently 5 2 JPEG Loop Back Demonstration 5 1 2 Signal Processing Operations Sequence Lj The I O task calls the capture driver using VCAP_getFrame function with SYS_FOREVER argument which blocks until a new frame is available to be processed At that point it signals the channel task which can then be gin processing Daughtercard FPGA planarizes captured YC data Only one set of fields even fields is used NTSC Mode Field data is converted from 640x240 to 320x240 by using pre scale filters based on the description in Appendix B JPEG Encode and Decode are performed on 320x240 resolution data Input conversion from 4 2 2 to 4 2 0 is by reading every other line of C data into DSP during pre scale processing not accurate strictly spea
115. pen Defines directions input output typedef enum dstr dir t DSTR_INPUT DSTR_OUTPUT dstr dir t none Software Architecture Algorithm Creation 4 23 DMA Stream Definition Return Value Description DMA Stream Definition Prototype Arguments Return Value Description 4 24 none Structure that defines directions input output User can use the above defined symbolic names to set direction of image stream Maintains state information typedef struct dstr t char x data int x ofs unsigned X Size char i data unsigned short i ofs unsigned short i size unsigned short w size unsigned short quantum unsigned short multiple unsigned short stride unsigned xfer id dstr t char x data int x ofs unsigned x size char i data unsigned short i ofs unsigned short i size unsigned short w size unsigned short quantum unsigned short stride unsigned xfer id none Pointer to external data Current offset to external data Length of external data buffer Pointer to internal buffer Offset to internal buffer Size of internal buffer Size of window Amount transferred by a single get put call Byte offset between succesive lines in external memory that need to be fetched Transfer id of the previous DMA Internal structure that IDM uses to maintain state information User declares input and output streams of type dstr t for using IDM Chapter 5 Demonstration Sce
116. pported Image component dimensions rows columns for every component must be multiples of 8 Simple compression ratio control capability is provided in the encoder 6 2 3 JPEG Encoder API The eXpressDSP API Wrapper is derived from template material provided in the algorithm standard documentation Knowledge of the algorithm standard is essential to understand the eXpressDSP API wrapper See the algorithm standard documentation for details on the algorithm standard Also see Appendix E for an overview of eXpressDSP APIs The eXpressDSP API for the JPEG Encoder is ENC Interface Header ifndef IJPEGENC define IJPEGENC include std h include xdas h include lt ialg h gt include lt ijpeg h gt EGENC Handle This handle is used to reference all JPEGENC instance objects 7 typedef struct IJPEGENC_Obj IJPEGENC Handle EGE This structure must be the first field of all JPEGENC instance objects A typedef struct IJPEGENC Obj struct IJPEGENC Fxns fxns IJPEGENC Obj JPEG Encoder EGENC Params This structure defines the creation parameters for all JPEGENC objects f typedef struct IJPEGENC Params Int size must be first field of all params structures unsigned in sample prec unsigned
117. put image from 640X240 4 2 2 to 320X240 4 2 0 for NTSC data or from 768x288 4 2 2 to 384x288 4 2 0 for PAL data This channel is located in the I O task and it is a separate channel because both the loop back and the pass through channels share its output data Figure 3 4 JPEG Loop Back Demo Channels and I O Buffers Capture buffer Intermediate buffer Display buffer JPEG loop back channel Pre scale channel Pass through channel Software Architecture Applications Framework 3 11 Channel Manager Memory Management 3 5 Channel Manager Memory Management 3 5 1 This section describes various aspects of Channel Manager memory manage ment including the C6711 DSK memory architecture data memory require ments of algorithms used in the IDK memory heaps management by the Channel Manager creation and deletion of algorithm instances by the Chan nel Manager and parent instance support C6711 DSK Memory Architecture The TMS320C6211 6711 DSP employs a two level memory architecture for on chip program and data access The first level L1 has dedicated 4 KBytes each program and data caches L1P and L1D respectively The second level memory L2 is a 64 KBytes memory block that is sharable by both program and data The L2 memory is divided into four 16 KByte blocks Each of the four blocks can be independently configured as either cache or memory mapped RAM This feature is ideal for efficient implementation of imaging vide
118. r eXpressDSP APIs of other functions used in this demonstration Demonstration Scenarios 5 7 Image Processing Demonstration 5 3 Image Processing Demonstration This demonstration highlights several commonly used image processing func tions Image Thresholding Image Filter Sobel Edge Detection Figure 5 3 shows the standard algorithms configured as four separate tasks Figure 5 3 Image Processing Demonstration r 3 Task 1 Pass through To display buffer Task 2 Binary threshold o display buffer Conditioned Y input data Task 3 Image filter To display buffer Task 4 Sobel edge detect Jo display buffer The input image as well as results of the image processing functions will be simultaneously displayed as shown in Figure 5 4 Figure 5 4 Image Processing Demonstration Display Display 640x480 or 800x600 d Binary Original threshold Low pass filter edge detect 5 8 Image Processing Demonstration 5 3 1 Data I O and User Input Specifics 3 4 4 4 4 NTSC Capture 640x480x30fps 4 2 2 interlace interleaved PAL Capture 768x576x25fps 4 2 2 interlace interleaved NTSC Progressive Display Driver 640x480 8bpp 60Hz mode PAL Progressive Display Driver 800x600 8bpp 60Hz mode GUI Based User Inputs M Binary Threshold demo includes ability to select an integer value in the range 0 255 as the threshold value B Image Fi
119. re in data is a pointer to one row of input pixels qmf is a pointer to qmf filter bank for low pass filtering mqmf is a pointer to mirror qmf filter bank for high pass filtering out data is a pointer to row of detailed reference deci mated outputs and cols is the number of columns in the input image Thekernelwave horz asm performs a 1 D Periodic Orthogonal Wavelet de composition It also performs the row decomposition component of a 2D wave let transform An input signal x n is low pass and high pass filtered and the resulting signals decimated by factor of two This results in a reference signal r1 n which is the decimated output obtained by dropping the odd samples of the low pass filter output and a detail signal d n obtained by dropping the odd samples ofthe high pass filter output A circular convolution algorithm is imple mented and hence the wavelet transform is periodic The reference signal and the detail signal are each half the size of the original signal Behavioral C code for the kernel wave horz asm is provided below Software Architecture Algorithms Creation 4 15 ImageLIB or Custom Kernels define Qpt 15 define Or 16384 void wave horz short in_data short qmf short mqmf short out data int cols int i short xptr in data short x end amp in data cols 1 int j sum prod short xdata hdata short filt ptr int M 8 KK KKK KR A A A A A A A A A A A A A A A A A kk A k
120. ropriate tasks for ac tions Examples are commands to change frame rate for each channel task and to suspend or resume a channel task Thel O Task calls capture and display drivers to get input and output buff ers for all channel tasks It signals each channel task for the readiness of its I O buffers and waits for completion signals from channel tasks before releasing input and output buffers back to drivers Synchronization among tasks is achieved by using the DSP BIOS semaphore objects In some cases such as in the JPEG Loop Back and the Image Processing Demos pre processing is also performed in the I O task Each Channel Task consists of an instance of the channel object created by calling CM Open and represented by the handle returned from that call Each channel object encapsulates a group of algorithm instances where output of a given algorithm instance provides input to the next instance Host GUls are CCS plug ins that can be launched from Code Composer Stu dio Tools menu The Host GUI sends commands to the target application us ing the Real Time Data Exchange RTDX technology which can transfer data through the JTAG emulation interface at run time without halting the DSP Framework for Combining eXpressDSP Compliant Algorithms Figure 3 1 IDK Demo Block Diagram Host GUI CCS plug in RTDX Application Framework Message handling task I Channel task Channel manager Algorithms LL
121. rray2 i OxDEADBEEF Perform Cache flush to commit writes to external memory before DMA starts j CACHE Flush CACHE L2ALL 0x00000000 0x00000000 Open data channelO with PRI LOW as priority Ky DAT Open 0 DAT PRI LOW 0 Initialize input stream i dstr with external array arrayl of size Kl sizeof arrayl internal array array2 of size sizeof array2 to sf fetch 8 ints every iteration quantum over a sliding buffer of size 4 lines jumping by 2 lines every time as an input stream Check for error codes to make sure that the input stream was initialized correctly ey err_code dstr_open amp i_dstr void arrayl sizeof arrayl void array2 sizeof array2 4 sizeof int 2 8 sizeof int 2 DSTR_INPUT if err_code printf error initializing i_dstr n exit 1 Initialize output stream o_dstr with external array array4 of size sizeof array4 internal array arrayl of size sizeof arrayl to yt fetch 4 ints every iteration quantum using double buffering for the output Check error codes to make sure that the output stream is initialized correctly Kf err code dstr init amp o dstr void array4 sizeof arrayd4 void array3 sizeof array3 4 sizeof int 1 4 sizeof int 1 DSTR OUTPUT if err code printf error initializing o_dstr n exit 1 Use stream get and put methods to get new and commit old buffers The
122. rther Information on JPEG Decoder rvvuurnnunrennunrennurenn 6 4 H 263 Encoder less eres Ra o e Eu RE Ru seek ctw ek Geek GT Are YEN 6 4 1 H 263 Encoder Algorithm Level Description 0 0 eee e eee 6 4 2 H 263 Encoder Capabilities and Restrictions 0 00 c eee eee 0 4 3 H 263 Encoder APl wie sci eeeeie ones ERR RE REOR HORE ad RU eR ai 6 4 H 263 Encoder Performance sssuslusseseleeelle eese 6 4 5 Further Information on H 263 Encoder cc cece eee een e eee vi Contents 65 H 263Decoder 0 00 c cece se hh 6 5 1 H 263 Decoder Algorithm Level Description 6 5 2 H 263 Decoder Capabilities and Restrictions 6 5 8 H 263 Decoder API uusuuuueseuseeeeseeeeseeee eene 6 5 4 H 263 Decoder Performance 0 cece eect cent eee 6 5 5 Further Information on H 263 Decoder cce cece eee 6 6 ImageLIB Library of Optimized Kernels 0 00 cece eee eee ees 6 6 1 Further Information on ImagellB 000 eee Testing and Compliance eeueeseeeeeeee nnn nnn 7 1 Describes how the initial versions of the IDK meet the testing and compliance requirements FPGA Interfaces 0 0 0 cece cece ee ee eee eee eee eee eee eee Describes the FPGA interfaces to the DSP EMIF through an asynchronous SRAM interface AM LG Interface 22 ctacieae or dea Basse wate bathe seder se iR oase sees A 2 EMIF ASRAM Interfac
123. s Return Value Description UINT32 Count HANDLE Algs HANDLE hCha Handle to an open channel UINT32 Count Number of algorithms to assign HANDLE Algs An array of algorithm handles to assign BOOL TRUE function succeeds FALSE function fails Assigns a set of algorithms to the channel Channel Manager creates algo rithm instances according to algorithms specified in the Algs Gets number of algorithms and handles to algorithms in channel UINT32 CM GetAIgs HANDLE hCha HANDLE Algs HANDLE hCha Handle to an open channel HANDLE Algs An array of handles to all algorithms in this channel UINT32 Number of algorithms in the channel Gets number of algorithms and handles to those algorithms in the channel Get or set the status of an algorithm instance in a channel Inside Channel Manager calls that instance s algControl function CM RegAlg Prototype Arguments Return Value Description Prototype Arguments Return Value Description CM InstCtrl Prototype CM InstCtril Registers algorithm with Channel Manager HANDLE CM RegAlg char Name void algFxns void process void algParams UINT32 InputCt UINT32 OutputCt char Name Name of the algorithm void algFxns Pointer to XDAIS IALG function pointer table Void process Function pointer to algorithm processing routine Void algParams Pointer to XDAIS algorithm parameter structure UINT32 InputCt Number of algorithm inputs UI
124. s to manage memory allocation and de allocation on the two heaps The heaps are created in the DSP BIOS CDB file and passed to Channel Manager by calling the CM Control function By default or if no heap IDs are passed into the Chan nel Manager it uses memalign and free functions in the run time support library These two functions make use of the traditional heap defined in that same library The Channel Manager allocates memory blocks on these two Software Architecture Applications Framework 3 13 Channel Manager Memory Management heaps for algorithm instances according to their memory requirements Each instance is then responsible to initialize its memory blocks and to manage data transfer between its on chip and off chip data memory blocks All algorithms in the IDK use CSL DAT module API functions for data transfer services 3 5 4 Creation and Deletion of an Algorithm Instance 3 14 Each eXpressDSP compliant algorithm must implement the algAlloc func tion in its IALG interface implementation To create an instance of that algo ritnm the Channel Manager uses that function to find out the memory require ments of the algorithm The prototype of the algAlloc function in an algorithm named XXX is shown below Int XXXAlloc const IALG Params params struct IALG Fxns fxns IALG MemRec memTab In the XXXAlloc function the algorithm fills out the memTab array with its memory requests and returns with
125. solution 16 bits per pixel 565 format Video data formatting by an on board FPGA to convert captured inter leaved 4 2 2 data to separate Y Cr Cb components that may be sent to the DSP for processing Video capture and display drivers software written using DSP BIOS and CSL This enables users to quickly set up a development environment that includes video input and output capability Chapter 2 Hardware Architecture The IDK hardware consists of a C6711 DSK with 16MB SDRAM and a daugh tercard that provides video capture display and formatting capabilities Topic Page 2 1 Daughtercard Description a nenne 2 2 220Video Capture cs oo ee ru ET 2 4 2 33 Video Blsplay o seo cece nae ane nee eae ie Eee 2 9 2 1 Daughtercard Description 2 1 Daughtercard Description 2 2 The daughtercard Figure 2 1 includes Li NTSC PAL digital video decoder IC TI TVP5022 Video Palette IC TI TVP3026C 3 Xilinx field programmable gate array FPGA that includes the following functions card controller FIFO buffer manager front back end interfaces Details of the interfaces served by the FPGA are provided in Appendix A 16 Mbit SDRAM capture frame memory with option to support 64 Mbit devices The daughtercard provides the ability for the following types of video capture and display Input video signal capture is limited to a single NTSC PAL signal _j Input signal should of composite video format
126. str rewind amp o dstr out rewind DSTR OUTPUT 1 Commit last set of buffers and close output stream dstr put 2D amp o dstr dstr close amp o dstr 4 14 ImageLIB or Custom Kernels 4 5 ImageLIB or Custom Kernels ImageLIB or Custom Kernels are the core image processing utilities Many of these kernels are contained in the TI ImageLIB software while others are cus tom software for specific applications They typically rely on wrapper functions such as the Image Processing Functions described above to provide them in put data and take their output data Continuing with the Wavelet Transform ex ample the first ImageLIB kernel utilized is void pix expand asm int n unsigned char in data short out data where n is number of samples processed in data is pointer to input array unsigned chars and out_data is pointer to output array shorts The kernel pix expand asm takes an array of unsigned chars pixels and zero extends them up to 16 bits to form shorts Typical Imaging Kernels are implemented in optimized assembly code Behavioral C code for the kernel pix expand asm is provided below void pix expand asm int n unsigned char in data short out data int J for j o KANG JER out data j short in data j Another key kernel used in the Wavelet Transform example is void wave horz asm short in data short qmf short mqmf short out data int cols whe
127. t pragma DATA SECTION output ch data image ext sect pragma DATA SECTION ext scratch pad image ext sect unsigned char input ch data IMG COLS IMG ROWS unsigned char output ch data IMG COLS IMG ROWS char ext scratch pad IMG COLS IMG ROWS 6 2 2 x Create section in internal memory called chip image that will contain Kf various lines of the external image DMA ed into internal working PA buffers The size of the internal buffer is allocated for the worst case usage of all algorithms combined This happens in the vertical wavelet algorithm where 8 lines of input are required for producing 2 as lines of output Thus the internal memory requirement is 42 lines of y the input image X A 42 640 26880 bytes 26 25 k Bytes Ty il pragma DATA ALIGN int scratch pad 8 pragma DATA SECTION int scratch pad chip image int sect char int scratch pad IMG COLS 21 2 a Start of main code KY x7 main LE A IMAGE structures for even and odd input images are in_image_ev and in image od and are the inputs to the wavelet codec xu IMAGE structure for output image is out image Kf SCRATCH PAD details output and input scratch pad x i EZ IMAGE in image ev in image od IMAGE out image SCRATCH PAD scratch pad LE xY Wavelet parameters include customizable 8 tap filters Currently only one scale of decomposition is performed x EX wf D 2 2D Wavelet Transform Algorithm Example
128. t C6211 performance data based on 48K cache 16K SRAM configuration Recommended for JPEG 6 2 5 Further Information on JPEG Encoder Further information on C6000 DSP JPEG Encoder implementation is available from the following application reports number SPRA705 Li TMS320C6000 JPEG Implementation Literature number SPRA704 Optimizing JPEG on the TMS320C6211 2 Level Cache DSP Literature JPEG Decoder 6 3 JPEG Decoder 6 3 1 JPEG Decoder Algorithm Level Description Figure 6 5 provides an overview of the processing involved in JPEG Decoder Figure 6 5 JPEG Decoder Byte RLD and Data unstuff dequantization reformat Byte Unstuff In the JPEG standard control markers are flagged by a preced ing Ox FF followed by one or more bytes of control code A 0x00 byte following a OxFF byte signifies that the OxFF byte is indeed part of the data and not con trol Thus every OxFF byte occurring in the entropy VL coded data is followed by a redundant 0x00 byte which has to be stripped off Variable Length Decode VLD VLD decodes the JPEG bit stream and gen erates image data in the DCT domain The decoding is done in two steps 1 DC coefficient decoding followed by 2 AC run level decoding The decod ing is conceptually implemented as a series of exhaustive look ups into a pre defined table The C6000 ISA has a single cycle instruction Imbd that can re duce the decoding complexity It facilitates a faster decoding 1
129. t results can be stored It also commits the results of the previous output buffer to external memory Returns pointer to current buffer void dstr put 2d none void Pointer to current buffer in which output results can be stored Returns a pointer to current buffer in which output results can be stored It also commits the results of the previous output buffer to external memory This function should be used when the output lines need to be written to external memory either with zero fixed offset between successive lines Prototype Arguments Return Value Description Prototype Arguments Return Value Description Direction Structure Definitions Prototype Arguments Direction Structure Definitions Rewinds input output streams int dstr rewind dstr t dstr void x data dstr dir t dir unsigned short w size dstr t dstr DMA stream structure void x data Pointer to external buffer to which stream is reset dstr dir t dir Direction of stream input output unsigned short w size Window size 1 for double buffering int 0 for succesful rewind Rewinds input output streams to start fetching data from new location in exter nal memory The external offset is reset to 0 This resets the number of external transfers completed to 0 Closes stream void dstr close dstr t dstr dstr t dstr Pointer to DMA stream structure void none This function closes the stream that was opened using dstr o
130. t signal may also be routed to the DMA controller within the DSP which can be used as an added security measure against losing synchronization with the display Alternatively the DSP ISR may wish to reprogram the DMA param eters during the ISR as part of a page flipping routine The following diagram shows the interrupt point for the vertical synchronization event Hardware Architecture 2 11 Video Display Figure 2 5 Display Interrupt Generation Interrput to CPU and or DMA A VSYNC CBLNK Video display data written to the FPGA FIFO is extracted from the FIFO by the IDK display device the RGB palette TVP3026 Table 2 4 outlines the sup port matrix for the various display modes Table 2 4 Display Modes Display Mode Data Format OutputSelected GRAY8 8 bit grayscale TVP3026 RGB8 VGA 256 colors TVP3026 RGB16 5 6 5 or x 5 5 5 TVP3026 RGB32 True color 24 bit TVP3026 From the modes listed in Table 2 4 the IDK initially uses a 16 bit RGB display mode and an 8 bit gray scale display mode is utilized for demonstrations with gray scale output Display drivers supporting these video display modes are included in the IDK The drivers are written using DSP BIOS and CSL Refer to the TMS320C6000 Imaging Developer s Kit IDK Video Device Driver s User s Guide Literature number SPRU499 for details Figure 2 6 and Figure 2 7 show the frame buffer format for these display
131. ta Before Reformat 0 00 cece cence eee nn 6 7 Reformatted Image Data in Raster Scan Format 0000s cece e nee ene 6 8 H 263 Encoder Overview 6 00 c cece eee nee tenes 6 9 h263EncMB Overview 0 0 eee nent eee eee eee 6 10 H 263 Decoder Overview ssssssssssssssse ss 6 11 h263DecMB Overview sssssssssssse eR re A 1 FPGA Control Registers 006 cece eee n viii Log dg RON Lot dood 1 ROD DDD DADANANNNNND Lod LLL P PPP o nm A Tables Video Capture Memory Requirements Capture Events Display Events Display 010 ES C6211 C6711 L2 Operation Modes for IDK Demos 00 cece eee ees DSK Board Memory Budget Allocations for Multichannel H 263 Decode JPEG Encoder Performance JPEG Decoder Performance H 263 Encoder Performance H 263 Decoder Performance ImageLIB Kernels eorr de eee tos ended eee pened ImageLIB Kernels Performance l2G Base Address aa essc eek SALTET en DR RR ISEEN AUR Bd died ere de dah hate IDK Memory Map 2MB Capture Memory Option IDK Memory Map 8MB Capture Memory Option IDK FPGA Control Register Bit Descriptions Contents ix Chapter 1 Introduction The Imaging Developer s Kit IDK has been developed as a platform for de velopment and demonstration of image video processing applications on TMS320C6000 DSPs Topic Page ENN 1 2 1 2 IDK asa Rapid Prototyping Platform 0cceeeeeeeee
132. te a sliding window of four lines each line be ing four words wide and have it slide down by two lines every time When the window size is one we get double buffering capability Therfore on iteration 0 Line0 0 L2 3 Linel gt 8 97 X05 TL Line2 gt 16 17 18 19 Line3 24 25 26 27 C 1 Using Image Data Manager On the next iteration we get Line 0 gt 16 17 18 19 Line 1 24 25 26 27 Line 2 gt 32 33 34 35 Line 3 gt 40 41 42 43 Header files for using CSL and Image Data Manager include lt stdio h gt include lt csl h gt include lt cache h gt include lt dat h gt include dstr 2D h Declare two arrays in user defined section in external memory and align them to a double word arrayl and array4 are arrays in external memory sections ext sect and ext sectl If external arrays and internal arrays are aligned the stream routines get and put will return an aligned pointer as long as the quantity transferred on any given iteration is an integral number of the alignment requested Therefor if an array is dword aligned then the stream routines get and put will return an dword aligned pointer as long as an integral number of dwords is transferred on any given iteration x pragma DATA SECTION arrayl image ext secti pragma DATA SECTION array4 image ext sect2 pragma DATA ALIGN arrayl
133. the number of memory blocks that the framework must allocate in orderto create an instance of that algorithm Each MemRec entry corresponds to a request of one memory block It contains the size alignment space and attributes information of that memory block Four types of data memory requests are currently supported in the Channel Manager Lj Internal Persistent Memory is allocated directly on the internal heap External Persistent Memory is allocated directly on the external heap 3 Internal Scratch Memory is overlaid on the internal scratch buffer which is allocated on the internal heap according to the maximum requested size of internal scratch memory space among all registered algorithms External Scratch Memory is allocated directly on the external heap Following are the steps to create a new algorithm instance Register the algorithm to Channel Manager by calling the CM RegAlg funciton and a handle of that algorithm is returned by Channel Manager In this step the Channel Manager calls the algAlloc function of the algo rithm to find out whether the algorithm has a parent object If so the Chan nel Manager creates a parent instance for that algorithm The Channel Manager also finds out if the algorithm requests an on chip scratch buffer If so the Channel Manager gets the size of the requested buffer and Channel Manager Memory Management compares it with the maximum size requested by previously registered al
134. tion Also provided are support function for pixel expansion and saturation see explanations below that may be used with the scaling functions Scaling functions are universally used in image video processing applica tions where everthere is a need to convert one image size to another Ap plications include systems for Displays Printing Photography Security Digital TV Video Telephony Defense Streaming Media etc The routines pix expand and pix sat respectively expand 8 bit pix els to 16 bit quantities by zero extension and saturate 16 bit signed num bers to 8 bit unsigned numbers They can be used to prepare input and output data for other routines such as the horizontal and vertical scaling routines ImageLIB Library of Optimized Kernels Correlation functions are provided to enable image matching Image matching is useful in applications such as machine vision medical imag ing security defense Two versions of correlation functions are provided corr 3x3 implements highly optimized correlation for commonly used 3x3 pixel neighborhoods A more general version corr gen can imple ment correlation for user specified pixel neighborhood dimensions within documented constraints Error Diffusion with binary valued output is useful in printing applications The most widely used error diffusion algorithm is the Floyd Steinberg al gorithm An optimized implementation of this algorithm is provided in the function err
135. tion Literature number SPRA703 C6000 DSP Image Video Processing Applications 6 27 ImageLIB Library of Optimized Kernels 6 6 ImageLIB Library of Optimized Kernels ImageLIB is an optimized Image Video Processing Functions Library for C programmers on TMS320C6000 DSPs It includes many C callable assem bly optimized general purpose image video processing routines These rou tines are typically used in computationally intensive real time applications where optimal execution speed is critical By using the routines provided in ImageLIB an application can achieve execu tion speeds that are considerably faster than equivalent code written in stan dard ANSI C language In addition by providing ready to use DSP functions ImageLIB can significantly shorten image video processing application devel opment time ImageLIB contains highly optimized TMS320C62x DSP code for the functions listed below These functions may be used along with the Image Data Manager and the software architecture described in Chapter 4 to quickly prototype high performance image video processing algorithms ImageLIB kernels are also used in the various applications provided in the IDK such as JPEG Encode JPEG Decode and H 263 Decode Table 6 5 ImageLIB Kernels 6 28 Function Description boundary Boundary Structural Operator corr 3x3 3x3 Correlation with Rounding corr gen Generalized Correlation dilate bin 3x3 Binary Dilation erode_bin 3x3
136. tions Output Stream o dstr Start address ext horz start ev Size external size 2 Internal address ptr wave horz start Size 4 x num lines x cols Quantum 2 x cols Multiple num lines Stride 4 x cols Window size 1 Double buffering Direction DSTR OUTPUT err code dstr init amp o dstr void ext horz start ev ext size gt gt 1 void ptr wave horz start 4 cols num lines cols 2 num lines cols 4 1 DSTR OUTPUT if err code fprintf stderr error initializing output stream horizontal wavelet n printf err_code d Mn err code exit 4 Software Architecture Algorithms Creation 4 11 Image Processing Functions Begin horizontal wavelet transform When using Image Data Manager the first call to put merely returns the first available buffer to write to Here dstr put 2D and dstr get 2D are used to obtain the next available output input buffers For each input buffer pixel expand is performed by issuing 1 call to process cols num lines pixels while the horizontal wavelet is called num lines times and incrementing the input and output pointers by cols after iteration i lt rows num lines i Get output buffer to write to by calling put 2D routine out data short dstr put 2D amp o dstr If it is the first scale of analysis obtain input pointer to array
137. to write the decoded picture in the appropriate location of the frame buffer the entire frame buffer is initialized with zeros by the system at the start of the application Demonstration Scenarios 5 3 JPEG Loop Back Demonstration J PAL Mode Display Decoded output picture resolution is 384x288 This data is written in the central part of lower right corner of 800x600 region in the frame buffer The uncompressed pass through image is written in the central part of the upper left corner of the same 800x600 region of the frame buffer The application only has to write the decoded picture in the appropriate location ofthe frame buffer the entire frame buffer is initialized with zeros by the system at the start of the application Display rate of 60fps is achieved by repeating display of any given frame from display buffer as suitable Upon completion of processing the channel task signals the I O task The I O task calls the display driver using VCAP toggleBuffs function with ar gument 0 5 1 3 eXpressDSP APIs for JPEG Loop Back Demonstration 5 4 See sections 6 2 3 and 6 3 3 for JPEG Encoder and Decoder eXpressDSP APIs respectively Also see Appendix E for eXpressDSP APIs of other functions used in this demonstration H 263 Multichannel Decoder Demonstration 5 2 H 263 Multichannel Decoder Demonstration This demonstration showcases C6000 DSP capability for multichannel H 263 decode Pre compressed bit streams are
138. truct Imedian 3x3 Fxns IALG Fxnsialg Imedian 3x3 extends IALG XDAS Bool control Imedian 3x3 Handle handle Imedian 3x3 Cmd cmd Imedian 3x3 Status status XDAS Int32 apply Imedian 3x3 Handle handle XDAS Int8 in XDAS Int8 out Imedian 3x3 Fxns endif Imedian 3x3 E 8 eXpressDSP API for Wavelet Transform E 5 eXpressDSP API for Wavelet Transform eXpressDSP API for the Wavelet Transform used in the Wavelet Transform demonstration is IWavelet Interface Header LA ifndef IWavelet define IWavelet include std h include xdas h include lt ialg h gt typedef enum img type FLDS PROG IMG TYPE IWavelet Handle This handle is used to reference all Wavelet instance objects Ser typedef struct IWavelet Obj IWavelet Handle IWavelet Obj This structure must be the first field of all Wavelet instance objects s typedef struct IWavelet Obj struct IWavelet Fxns fxns IWavelet Obj IWavelet Status Status structure defines the parameters that can be changed or read during real time operation of the alogrithm ur eXpressDSP APIs for IDK Demonstrations E 9 eXpressDSP API for Wavelet Transform typedef struct IWavelet Status Int size must be first field of all status structures int img cols int img rows short qmf ext short mqmf ext int scale IMG TYPI img val IWavelet Status IWavelet
139. tus of an algorithm instance in the channel Internally CM calls that algorithm s algControl function CM Control Executes CM control function Prototype UINT32 CM Control CM CTRL ID Id Arguments CM CTRL ID id Control ID may be one of the following CM RESET CM GET CHA INFO CM GET ALG INFO CM SET INTERNAL HEAP CM SET EXTERNAL HEAP O O O O L Additional Arguments on control Id none CM RESET HANDLE AlgPtr CM GET CHA INFO CM ALG INFO StatsPtr CHAN OBJ ChaPtr CM CHA INFO StatsPtr CM GET ALG INFO Int HeapID CM SET EXTERNAL HEAP int HeaplD CM SET INTERNAL HEAP Return Value UINT32 Return value depends on control ID Description Executes a CM control function 3 20 Chapter 4 Software Architecture Algorithms Creation This chapter describes algorithm creation in the software architecture Topic Page qd V OVELUIEW ae sare iy roe m DES oe cU E E 14 2 4 2 eXpressDSP API Wrapper esee nen 4 4 4 3 Algorithm EE 4 8 4 4 Image Processing Functions 00cceeeseeeeeeeeeenees 4 10 4 5 ImageLIB or Custom Kernels sese 4 15 46 Image Data Manager pes 4 19 4 1 Overview 4 1 Overview ImageLIB functions based standard algorithms may be created using the soft ware architecture shown in Figure 4 1 Figure 4 1 Software Architecture for ImageLIB Functions Based Standard Algorithms eXpressDSP API wrapper Algorithm Image proc
140. uch as MPEG Video Encode or H 26x Encode Video encoding is useful in vid eo on demand systems streaming media systems video telephony etc Motion estimation is typically one of the most compute intensive opera tions in video encoding systems and the high performance enabled by the functions provided can enable significant improvements in such systems Wavelet processing is finding increasing use in emerging standards such as JPEG2000 and MPEG 4 where it is typically used to provide highly effi cient Still Picture Compression Various proprietary image compression systems are also Wavelets based Included in this release are utilities wave horz and wave vert for computing horizontal and vertical wavelet transforms Together they can be used to compute 2 D wavelet transforms for image data The routines are flexible enough within docu mented constraints to be able to accommodate a wide range of specific wavelets and image dimensions Horizontal and Vertical Scaling functions scale horz and scale vert respectively are provided These functions implement Polyphase FIR Fil tering for horizontal and vertical re sizing of images The functions are flexible enough within documented constraints to be able to accommo date a wide range of image dimensions scale factors and number of filter taps These functions may be used in concert to implement 2 D image re sizing or individually for 1 D image resizing depending on the applica
141. ve image transmission coding is not supported Only non interleaved data is supported Following data forms supported 4 2 0 4 1 1 4 2 2 4 4 4 8 bits component pixel only supported Image component dimensions rows columns for every component must be multiples of 8 1 Decoder only supports bit stream structure identical to one created by the encoder 6 3 3 JPEG Decoder API The eXpressDSP API Wrapper is derived from template material provided in the algorithm standard documentation Knowledge of the algorithm standard is essential to understand the eXpressDSP API wrapper See the algorithm standard documentation for details on the algorithm standard Also see Ap pendix E for an overview of eXpressDSP APIs The eXpressDSP API for the JPEG Decoder is EC Interface Header ifndef IJPEGDEC define IJPEGDEC include lt xdas h gt include lt ialg h gt include lt ijpeg h gt EGDEC_Handle handle is used to reference all JPEG_DEC instance objects 6 12 JPEG Decoder typedef struct IJP EC Obj IJP EC Handle xj EGD This structure must be the first field of all JPEG DEC instance objects typedef struct IJPEGDEC Obj struct IJPEGDEC Fxns fxns Kl EGDEC Params This structure defines the creation parameters for all JPEG DEC obj
142. veloped as a platform for de velopment and demonstration of image video processing applications on TMS320C6000 DSPs The IDK is based on the floating point C6711 DSP may also be useful to developers using this platform to develop other algo rithms for image video graphics processing How to Use This Manual This document contains the following chapters J J Chapter 1 Introduction provides information about the function and process of the Imaging Developer s Kit IDK Chapter 2 Hardware Architecture describes the IDK hardware archi tecture Chapter 3 Software Architecture Applications Framework de scribes the multiple software architecture levels of the IDK Chapter 4 Software Architecture Algorithms Creation describes algorithm creation in the software architecture Chapter 5 Demonstration Scenarios describes the demonstration scenarios currently included in the IDK Chapter 6 C6000 DSP Image Video Processing Applications de scribes C6000 DSPs used in image video processing applications Chapter 7 Testing and Compliance describes how the initial versions of the IDK meet the testing and compliance requirements Appendix A FPGA Interfaces describes the FPGA interfaces to the DSP EMIF through an asynchronous SRAM interface Appendix B Scaling Filters Algorithm describes the scaling filters al gorithm Related Documentation From Texas Instruments Appendix
143. without any changes In general algorithms must meet the following requirements in order to work with the Channel Manager 3 The algorithm works on the C6711 DSK The algorithm is eXpressDSP compliant i e it must implement the IALG interface and observe all rules required by the eXpress DSP Algorithm Standard The algorithm provides the Channel Manager with a function pointer that points to its processing function which is in the form void XXXApply IALG Handle handle void in void out Channel Manager Object Types 3 4 Channel Manager Object Types There are three basic object types in the Channel Manager the algorithm ob ject ALG OBJ the instance object INST OBJ and the channel object CHAN OB The ALG OBJ object inherits the IALG interface It has a process method and other information that the Channel Manager needs to create an algorithm instance The definition of ALG OBJ is shown below typedef struct char Name Name of the algorithm void algFxns XDAIS IALG v table void process execution method void algParams pointer to the structure of the algorithm s creation parameters UINT32 InputCt number of inputs for the algorithm UINT32 OutputCt number of outputs UINT32 ContextSz total persistent data size UINT32 TableSz total constant table size UINT32 InstCt number of instances currently running in the system y Void TableAddr global table ad
144. y walked through by both the FPGA and the application in a circular fashion See Figure 2 3 for an explanation of the capture buffer management Video Capture Figure 2 3 Capture Buffer Management Application FPGA owns gcn Y Y Y Buffer A Buffer B Buffer C t2 input t1 input tO input on VSYNC falling if FLIP PAGE requested Application FPGA owns NE Y Y Y Buffer A Buffer C Buffer B t2 input t3 input t1 input else Application FPGA owns rm Y Y Y Buffer A Buffer B Buffer C t2 input t3 input tO input The FPGA directly controls all the capture data management without any DSP resource specifically a DMA channel The FPGA provides a capture frame interrupt to the DSP which is used to inform the driver that a new frame is avail able for processing The capture event may be mapped to one of the DSP events as shown in Table 2 2 Hardware Architecture 2 7 Video Capture Table 2 2 Capture Events DSP Event EINTR n 4 7 Mapped to System Event Intended Use Vertical sync falling end of captured frame Interrupt to CPU driver Any DSP event line not tied to a capture or display event is tri stated such that it may be used by another daughtercard or motherboard interface To maintain this buffer scheme it is necessary for the IDK driver software to inform the FPGA when the application has completed use of its buffer and that it may be returned to the pool of capture buffers which the

Download Pdf Manuals

image

Related Search

Related Contents

Différents composants du ventilateur Légende  accessories accesorios accessoires acessórios  BENDIX BW1564 User's Manual  2.4 GHz Digital Wireless Video Baby Monitor  1_grüßen_ salutare - Deutsches Institut Florenz  BS500/User Guide  『KUMIHIMO LOOMY』 4/23(木)新発売!    iStarUSA DAGE412U20-ES disk array  Janvier 2015  

Copyright © All rights reserved.
Failed to retrieve file