Home
as a PDF
Contents
1. PASS d1 1 PASS do do d1 dO d1 PASS do PASS d1 Figure 5 Internal graph structure The hardware description is parsed into an internal graph structure where nodes represent modules and edges represent interconnections We assume that mod ules with variable behaviors have a distinguished con trol input ctr The list of possible module behaviors together with the according control code is attached to each node Fig 5 shows a part of the graph for the example structure fig 4 Graph edges are directed opposite to the data flow direction The behavior tables are shown within the nodes e g the ALU alu performs four different oper ations on its data inputs dO and di depending on the control code ctr 00 01 10 11 addition subtraction and passing d0 resp di unchanged to the output outp The multiplexer alumux only performs PASS operations on its inputs The register accu stores a new value at its input when its control input is 0 otherwise pre serves its state Modules without a control input have a behavior table consisting of a single entry Busses are represented as dummy modules only supporting a PASS operation Every bus driving module has to have a TRISTATE operation All storage modules are assumed to change their state at the same clock edge if they are enabled A restriction is that modules may have only one output port except multiport memories This restriction can be circumvented by splitting the o
2. c p where t is the module to be disabled c is the according partial control word field setting p is the necessary precondition The purpose of PRISMA is to extract the possible mi crooperations and disabling operations for all storage modules within a programmable structure described in a HDL We assume a microcontrolled structure and monophase execution Instruction encoding and bit sharing are permitted Note that it is not useful to extract the set of microinstructions instead of micro operations from a hardware structure since this would result in a very large computation time for pairwise consistency check for all extracted micro and disabling operations Compacting microoperations into a mi croinstruction can quickly be done by the RC on de mand avoiding consideration of unnecessary microin structions 3 Circuit representation The HDL read by PRISMA allows description of hard ware structures comprising the following types of generic components multiplexers ALUs decoders logic gates comparators hardwired constants busses registers me mories tristate drivers and external pins RTL netlists are described by enumeration of all modules together with their interconnections One distinguished ROM within the specification has to be marked as memory for instructions The netlist contains the complete con troller structure so that instruction conflicts are de tected by PRISMA itself 1 12 11 alumux 0
3. lt 0 XxxXxXxxxxx01xxxXXxXXXXXXXXXX empty XXXXXXXXX1XXXXXXXXXXXXXX accu gt 0 yielding two alternatives for assigning each of the two expressions to register PC All register memory assignments see section 4 are transformed into microoperations Assuming Split com putes a list 1 ct1 n ctn for the condition tree ct of an assignment einp ct toa target register t the set of microoperations mo t e_inp ci cti is constructed for this assignment Memory assignments are transformed accordingly into microoperations with a non empty address field In order to avoid unnecessary restrictions all micro operations having the same target and the same ex pression address are heuristically compacted i e if the preconditions are compatible and the instruction words differ only in one bit 6 two microoperations can be replaced by one with 6 X After compaction the extraction of microoperations is finished For the example structure PRISMA finds 4 Operations for register PC 1 operation for register SM 10 operations for register accu 5 operations for memory MAIN The output including bit range information in case of register PC is MICROOPERATIONS FOR REGISTER PC 1 Expression PC 7 0 1 7 0 Preconditions lt empty gt Code 24 50 16 12 8 4 0 XXXXXXXXXOOXXXXXXXXXXXXX 2 Expression I PC 7 0 7 0 Preconditions lt empty gt Code 24 20 16
4. real time require ments by dedicated hardware modules yet offering pro grammability Therefore ASIPs can dramatically re duce design costs This trend was recently outlined in a survey by Paulin 1 On the other hand moving pieces of functionality from hardware to software increases software development costs Traditionally software development for DSP sys tems has been done at the assembly level due to the lack of high level language compilers for each different target processor Because of the obvious drawbacks of assembly level programming high level language pro gramming is strongly desirable for ASIPs but requires compilers capable of exploiting application specific in struction sets Therefore a new class of CAD tools is currently investigated by researchers retargetable compilers RCs RCs read both a behavioral descrip tion given at a high level of abstraction C PASCAL VHDL processes and a structural hardware descrip tion given in a HDL The behavior is mapped onto the programmable structure and binary code implementing the behavior is generated Since the target structure is part of the compiler input no compiler redesign is necessary when changing the target Only the HDL description has to be adapted Therefore using RCs instead of target specific ones obviously can keep the software development costs for DSP systems low Sev eral implementations of RCs have been reported The structural HW description PR
5. registers versions are extracted that force a register to store its previous value i e PRISMA looks for cyclic paths within the structure For instance the accu can preserve its state by setting instruction bit 15 to 1 MOLOAD see fig 4 and 6 or by setting instruc tion bit 15 to 0 LOAD and bits 9 8 to 10 PASS d0 via alu Generating these additional disabling op erations can provide more alternatives for a compiler to select from during code compaction 8 Experimental results PRISMA has been implemented with about 10 000 lines of C code on a Sun SPARCstation 10 It has been applied to several target structures among them the datapaths mentioned in 3 diff and 5 mssv and partial datapaths of the TMS320 DSP family 7 and of the ADSP2101 signal processor 8 Since no informa tion about the internal controller structure is provided for TMS and ADSP an arbitrary VLIW controller has been added to these structures in order to permit ex traction structure modules OPs CPU sec 5 0 5 Table 1 Experimental results The results shown in table 1 indicate that instruction set extraction is feasible even for larger realistic struc tures Runtime grows rapidly with the number of ex tracted microoperations due to the compaction phase Time for compaction could be reduced by pre sorting the extracted operations into several groups according to their expressions and compaction only within each gr
6. represents the term REG1 SHIFTL 2 ACCU The address a is an expression as well The control code c is a string over 0 1 X C where X denotes a don t care value and C denotes a bit be longing to an immediate constant The length of e is equal to the instruction word length Conditions are represented by the following data structure REG1 0011 and REG2 0110 Figure 3 Condition tree or Def A condition tree is either empty or a triple ct r ctand Clor where r is the root clang 18 a condition tree ctor 18 a condition tree The root r of a condition tree is a single equation or inequation A condition tree ct is true if one of the following conditions holds a ct empty b r true AND ctang true c ctor true Thus for example the condition tree shown in fig 3 represents the condition REG1 0011 AND REG2 0110 OR ACCU gt 0 The equations and inequations within a condition tree refer to storage contents module ports bitstrings or constants A precondition p of a microoperation mo is a condition tree in which no in equation refers to the instruction memory since all conditions concerning the instruction memory are already encoded in the control code c of mo The necessity of preconditions can be seen in the structure in fig 4 which will serve as a running example throughout this paper In this 8 bit processor the binary instructions are lo cated in the ins
7. 12 8 4 0 XXXXXXXxxX01xxxxxCCCCCCCC 3 Expression I PC 7 0 7 0 Preconditions accu 7 0 gt 0 Code 24 20 16 12 8 4 0 XXXXXXXxXX1xxxxxxCCCCCCCC 4 Expression PC 7 0 1 7 0 Preconditions accu 7 0 lt 0 Code 24 20 16 12 8 4 0 1 1 XXXXXXXXX XXXXXXXXXXXXXX There are two alternative operations 2 and 3 for load ing the program counter immediately from I Opera tion 3 only needs bit 14 of the instruction word to e set to 1 but requires accu to be greater zero from a previous instruction cycle A RC might regard this as a conditional jump operation whereas operation 2 could serve as an unconditional jump EURO DAC 1994 1994 ACM 0 89791 685 9 94 0011 1 50 6 When subranges of the instruction word occur in ex pressions the according bits in the control code are symbolically set to C to indicate that the instruction provides an immediate constant at these bits and thus a restriction exists for setting these bits in microoper ations to be executed concurrently 7 Storage disabling When several microoperations are packed into one mi croinstruction unused storage modules must remain unchanged Therefore PRISMA also extracts disabling operations for each register or memory port This task is very similar to extraction of microoperations All expression precondition pairs are computed that guar antee NOLOAD operations for each storage Additionally in case of
8. EURO DAC 1994 1994 ACM 0 89791 685 9 94 0011 1 50 1 Instruction Set Extraction From Programmable Structures Rainer Leupers Peter Marwedel University of Dortmund Dept of Computer Science XII 44221 Dortmund Germany Abstract Due to the demand for more design flexibility and design reuse ASIPs have emerged as a new im portant design style in the area of DSP systems In order to obtain efficient hardware software partition ings within ASIP based systems the designer has to be supported by CAD tools that allow frequent re mapping of algorithms onto variable programmable target struc tures This leads to a new class of design tools re targetable compilers Considering existing retargetable compilers based on pattern matching automatic instruc tion set extraction is identified as a profitable frontend for those compilers This paper presents concepts and an implementation of an instruction set extractor 1 Introduction Recent application areas for VLSI circuits in partic ular real time DSP systems have led to a new de sign style application specific instruction set proces sors ASIPs ASIPs offer a compromise between ASIC implementations and programmable off the shelf pro cessors Increasing circuit performance by technologi cal advances now enables system designers to trade off between speed and programmability ASIC develop ment is no longer obligatory on high throughput sys tem design since ASIPs can fulfill the
9. ExpandConditionTree is called for each condition tree produced during expression extraction Therefore fi nally all conditions only refer to the distinguished in struction memory or to other storages This allows splitting the conditions into control word field settings and preconditions 6 Microoperation generation In order to transform the previously extracted regis ter memory assignments into microoperations as de fined in section 2 the subroutine Split condition tree list of pairs c ct is used where is a partially initialised instruction word and ct is a condition tree only comprising pre conditions Split performs several subtasks a Check whether a condition refers to the instruction memory or to another storage If the instruction mem ory is referenced set the according bits in c and elim inate the condition from the tree Otherwise keep this condition as a precondition b Check for instruction conflicts of conditions sharing subranges of the instruction word c Check whether the remaining precondition tree is consistent 1 e all conditions to be fulfilled concurrently are pairwise compatible Calling Split for the previous example trees results in Since occurence of cyclic inconsistencies in practical struc tures is very unlikely only pairwise compatibility is checked o Expression partial instruction precondition XK XXKKOOXKRXXKNREXKRE empty XXXXXXXXX1XXXXXXXXXXXXXX accu
10. ISMA behavioral retargetable description compiler binary code Figure 1 Functionality of PRISMA approach taken by Mavaddat 2 is focussed on local mi crocode generation i e mapping pieces of straight line code onto a given datapath The datapath is regarded as a formal language and binary code generation is done by parsing Langevin and Cerny 3 propose an other theoretic method for local microcode generation which uses a state machine model of the target data path Binary code is derived by traversing state se quences Both methods do not support compilation of control structures and need large amounts of computation time The compilers MSSQ and MSSV by Marwedel and No wak 4 5 are based on pattern matching between reg ister transfers and hardware structures Compilation of data flow as well as control flow structures is sup ported Since the complete controller specification is part of the structural hardware input description nei ther the microinstruction format nor instruction re strictions due to encoding or sharing have to be for mulated by the user but are detected by the compiler itself Microcode compaction and maintenance of tem porary cells is restricted to basic blocks resulting in suboptimal code but pattern matching as a paradigm for code generation guarantees fast compilation times Thus the designer is allowed to play around with the target architecture in order to achieve
11. ady detected in this phase During the steps of binary code generation allocation and code compaction the com piler can rely on this pattern database rather than on the RTL structure permitting a higher level of abstrac tion Experiments with realistic structures proved fea sibility of instruction set extraction Future work will include extending the structure domain towards mod ules with multi cycle operations References 1 P G Paulin DSP Design Tool Requirements for the Nineties An Industrial Perspective 6th International High Level Synthesis Workshop 1992 2 M Mahmood F Mavaddat M I Elmasry Experiments with an Efficient Heuristic Algorithm for Local Mi crocode Generation Proc International Conference on Computer Design ICCD 1990 pp 319 323 3 M Langevin E Cerny An Automata Theoretic Ap proach to Local Microcode Generation Proc Euro pean Design and Test Conference EDAC 1993 pp 94 98 4 L Nowak P Marwedel Verification of Hardware Descriptions by Retargetable Code Generation Proc 26th Design Automation Conference 1989 pp 441 447 5 P Marwedel Tree Based Mapping of Algorithms to Predefined Structures Proc International Conference on Computer Aided Design ICCAD 1993 pp 586 593 6 C Liem T C May P G Paulin Instruction Set Match ing and Selection for DSP and ASIP Code Generation to appear in Proc European Design and Test Confer ence EDAC 1994 7 TMS320C2x User s Guid
12. an efficient hard ware software partitioning within an ASIP based sys tem The CodeSyn compiler by Paulin 6 uses pattern matching as well and is tailored towards DSP applica tions Pattern matching is done between data control flow patterns and instructions Therefore the target hardware description is not pure structural but is based on an instruction set specification This includes the instruction behaviors and formats as well as the inter instruction restrictions which all have to be manually specified by the user It is stated that full retargetabil ity of CodeSyn required automatic instruction set cap ture i e the compiler should be able to extract the possible instructions of a programmable target struc ture automatically EURO DAC 1994 1994 ACM 0 89791 685 9 94 0011 1 50 2 Figure 2 Expression tree The MSSQ compiler 4 generates binary instructions on demand for each statement within the input be havioral description which results in unnecessary rep etitions of operator allocation and data routing within the target structure Only branch instructions are al located in advance during a preallocation phase Thus automatic instruction set extraction from a tar get hardware description is well motivated In this pa per we present the tool PRISMA Predetermination of Instruction Sets of Microprogrammed ASIPs that performs this task PRISMA reads a flat RTL netlist of a programmable hardware structure
13. and extracts the set of possible microoperations the hardware can per form Additionally binary code for disabling unused storage devices is produced This information can di rectly be fed into a RC fig 1 Note that instruction set extraction has to be performed only once for each specific target structure The rest of this paper is organised as follows Sections 2 and 3 describe the tool functionality and the basic in ternal data structures The main extraction steps are presented in sections 4 to 7 Finally the practical appli cability of PRISMA is discussed using several example datapaths and some conclusions are drawn 2 Basic definitions Instruction set extraction from a programmable hard ware structure yields a list of microoperations the struc ture can perform In order to explain the functionality we informally define some basic notations Def A microoperation is a tuple mo t a c p where t is a target module register or memory port e is an expression which is assigned to t a is an address in case that t is a memory otherwise empty c is a partial control word field setting necessary to execute the assignment p is a precondition that has to be fulfilled to execute the assignment An expression e is a tree whose inner nodes represent operators arithmetic logic concatenation compari son and whose leaves represent storage modules con stants or external inputs For instance the tree in fig 2
14. e Rev B Texas Instruments 1990 8 ADSP2101 2102 User s Manual Architecture Analog Devices 2nd Edition 1991
15. indEz pressions the function Allocate recursively backtracks data flow through the graph structure until leaf mod ules are reached In general constants are allocated at storages HW constants or decoders Semantical infor mation about operations is taken into account as well e g an ALU can among other alternatives produce a constant c by computing c 0 or ce AND 11 11 Note that perfect constant allocation is practically impos sible due to the large number of alternatives that may arise consider for instance the number of versions to generate a certain 32 bit constant by addition Thus there cannot exist a tool for extracting all possible mi crooperations from any arbitrary target structure Ex panding a condition tree resulting from expression ex traction is done by the following procedure PROCEDURE ExpandConditionTree condition tree ct BEGIN FOR EACH node module port bitstring IN ct DO m module whose output is connected to module port ct Allocate m bitstring IF ct IMPOSSIBLE THEN REPLACE node BY ct FI OD END The structure of condition trees can be exploited e g if expansion of a node fails all nodes in its AND subtree can be skipped Calling ExpandCondition Tree for the two trees in the previous example see section 4 one obtains pe net SRT POT TS 1 T AND accu lt 0 TPC 14 13 01 OR I PC 14 13 1X AND accu gt 0 I PC 7 0
16. oup but this is only an implementation issue The number of extracted microoperations itself is mainly in fluenced by features of the structural model the num ber of RT level modules the number of different be haviors per module and the chaining degree within the structure The number of resulting microoperations is only reduced by instruction word restrictions such as bit sharing see e g alumux and alu in fig 4 9 Conclusions With ASIPs emerging as a new important means of DSP system implementation CAD tool support be comes necessary for software development for these sys tems Retargetable compilers that map behavioral de scriptions onto specific structures by exploiting com plex instruction sets with high degree of potential par allelism have been proposed to solve this problem Pat tern matching has been identified as one key technique in this context 5 6 Matching is performed between data control flow patterns in the behavioral descrip tion and RT patterns in the hardware structure The latter can be extracted from the hardware descriptions automatically resulting in full compiler retargetabil ity A prototype tool for instruction set extraction was presented in this paper which provides a convenient interface between compiler and target structure since the possible patterns of a given structure can be com puted in advance and be stored in a database Restric tions arising from the controller structure are alre
17. right neutral elements of operations like 0 for addition and disjunction or 1 for multiplication and conjunction 2The notation lt m gt lt p gt denotes port lt p gt of module lt m gt Thus an ALU performing subtraction on two inputs a and b implicitly has a virtual PASS operation presum ing the constant 0 can be allocated at input b Ex ploiting virtual operations leads to an extended set of microoperations which in turn provide higher degrees of freedom for a compiler Using FindEzpressions as a subroutine all expressions assignable to any storage module can be extracted in the following manner PROCEDURE ExtractAllExpressions target storage module t BEGIN m inp module connected to the input port of t L inp FindExpressions m _inp FOR EACH e inp ct_inp IN L inp DO IF t is an addressable memory THEN m adr module connected to the address port of t L_adr FindExpressions m _adr FOR EACH e adr ct_adr IN L_adr DO ct empty condition tree AND Insert ct control code for LOAD operation of t AND Insert ct ctinp AND Insert ct ct_adr ATTACH e inp e adr ct TO t OD ELSE m is a single register ct empty condition tree AND Insert ct control code for LOAD operation of t AND Insert ct ct_inp ATTACH e inp ct TO t FI OD END Procedure ErtractAllErpressions is called for each stor age module As a result of this phase each register has been attached a li
18. rnal input is reached The expression delivered by a leaf module is simply its contents resp value For non leaf modules FindErpressions backtracks the data flow and possible operations within the structure while building up the according condition trees on the fly Necessary conditions are AND inserted whereas alternatives are OR inserted in the current condition tree When Find Expressions is called for a bus it ensures that all bus drivers unused for a certain expression are in TRISTATE mode in order to avoid bus conflicts Recursion also must terminate when cyclic paths are detected e g due to bus exchanges We consider a call to FindEzpressions for module pcmux which determines the next state of the program counter PC The pemux has the behavior table 00 PASS do 01 PASS dl 1X IF cnd THEN PASS d1 ELSE PASS d0 i e control codes 00 resp 01 pass the inputs dO resp di to the output and control codes 10 and 11 pass one input depending on the condition input cnd FindEr pressions pemux calls itself recursively for the prede cessor modules peincr and I and yields the following results No Expression vondition pemux ctr 700 OR pcmux ctr 1X AND pcmux cnd 0 pemux ctr 01 OR pcmux ctr 1X AND pcemux cnd 1 FindErpressions also takes into account semantical in formation about the operations performed by the dif ferent modules This is e g done by exploiting left and
19. st of expression condition tree pairs that represent possible assignments to it For memory modules additionally all address expressions have been generated together with the according condition trees A triple e_inp e_adr ct attached to memory M means M e_adr e_inp if ct is true All single conditions are equations of the type mod ule input port bitstring In order to determine whether a condition can be fulfilled and thus results in a valid microoperation the bitstrings have to be al located This task is subject of the following section 5 Constant allocation The next step is to expand the condition trees found in the expression extraction phase A condition tree is called expanded when all its single conditions only refer to storage contents instead of module ports For instance the condition shifter ctr 10 can be ex panded to SM 10 in the example structure Con dition tree expansion requires constant allocation at module ports This is the purpose of the function Allocate module x bitstring condition tree computing a condition tree that forces a module to produce a given bitstring at its output port If the module cannot provide the required constant Allocate 3 Actually PRISMA also considers port subranges during ex traction This has been left out in the paper for sake of simplicity EURO DAC 1994 1994 ACM 0 89791 685 9 94 0011 1 50 5 returns an IMPOSSIBLE value Similar to F
20. truction memory I whose width is 24 bits All modules are directly controlled by the instruc tion word except the shifter which is residually controlled by the contents of the 2 bit shift mode reg ister SM Depending on the SM contents 00 01 10 11 the shifter performs a left shift of 0 2 4 or 5 bits Thus e g moving data from the accumulator accu to the main memory MAIN with a left shift of 4 re quires SM 10 besides the necessary control codes SM 10 has to be ensured in a machine cycle before the data transfer is executed therefore we call this a precondition Preconditions always refer to states be ing reached in an earlier machine cycle Two functions for inserting an additional in equation q into a condition tree are defined 1 OR Insert inserts a new in equation q into a condition tree ct so that q is an alternative to fulfill ct 2 AND Insert inserts a new in equation q into a condition tree ct so that q is necessary in any case to fulfill ct Both functions can be extended for inserting a complete condition tree ct into another tree ct In addition to microoperations we define special op erations for disabling storage modules i e operations 17 k 1 denotes an instruction word subrange EURO DAC 1994 1994 ACM 0 89791 685 9 94 0011 1 50 3 pins 7 0 Figure 4 Example structure that preserve the current state of a storage module Def A disabling operation is a triple do t
21. utput into several subranges The internal graph structure is similar to the one pro posed in 4 and allows quickly backtracking data sources within the structure This is essential for the tasks de scribed in the following two sections EURO DAC 1994 1994 ACM 0 89791 685 9 94 0011 1 50 4 4 Extraction of expressions As defined in section 2 every microoperation assigns an expression e to a target storage module t Each as signment in general requires several conditions to be fulfilled e g see fig 4 loading memory MAIN at a certain address with the accumulator contents shifted left by 4 requires shifter ctr 10 perform left shift of 4 accudrv ctr 1 accudrv drives the databus MAIN ctr 00 MAIN reads from databus idrv ctr 0 idrv is in TRISTATE mode Since these conditions can be represented by a condi tion tree we can think of every expression e to have an associated condition tree cte In order to extract all microoperations systematically at first all expres sions with the associated condition trees for each stor age module input are extracted This is the purpose of the function FindExpressions module gt list of pairs e cte yielding a list of expressions e a module can deliver at its output port assuming condition ct is fulfilled Due to the underlying graph structure FindErpressions can be implemented recursively Recursion stops when a leaf module i e storage HW constant or exte
Download Pdf Manuals
Related Search
Related Contents
Hardware User Manual - Falcon Technical Ltd User Manual - BME Shared Lab Resource Tyco Fire Protection Products - Fire Detection ALFRA Akku-Compact Flex™ Handstanze Elektrohydraulische Owner`s Manual Copyright © All rights reserved.
Failed to retrieve file