Home

Precise Register Allocation for Irregular Architectures

1. 1 Introduction A compiler initially generates code assuming an infinite number of symbolic registers In a later compiler phase the register allocator maps these symbolic registers to a ma chine s real registers Because the number of real regis ters is small not all symbolic registers may get real regis ter assignments and those symbolic registers must then re side in memory Spill code is generated to move those sym bolic registers to and from memory The register allocation problem is to create a register assignment that minimizes the amount of spill code Global register allocation analyzes the entire function at once and attempts to produce an as signment for all symbolic registers in the function Since global register allocation is NP complete 8 most alloca tors use heuristic techniques to produce a solution Traditional allocators perform global register allocation using a graph coloring approach originally developed by Chaitin et al l This approach builds an interfer ence graph whose vertices represent symbolic registers and edges connect vertices for symbolic registers that cannot be assigned the same real register The graph is then colored with the same number of colors as there are real registers For vertices that did not receive a color the corresponding symbolic register is spilled The allocator relies on heuris tics to decide on the order of symbolic registers to spill and the placement of the spill c
2. The paper discussed several irregular architecture fea tures but there are others that remain to be modeled For example instruction selection can be integrated into regis ter allocation to further reduce spill code The IP allocator can be extended to use other instructions such as the x86 s XCHG which exchanges the contents of two registers Fi nally the current paper does not address the issue of op timally inserting copy instructions which are important in processors with combined source destination operand spec ifiers Work is under way to port the IP allocator to a commer cial x86 compiler The resulting setup will allow a compari son between the IP allocator and an aggressive industrial al locator The result will be reported in a future paper Acknowledgements This research was sponsored by Microsoft Research and by the University of California under the UC MICRO pro gram References 1 G Chaitin M Auslander A Chandra J Cocke M Hopkins and P Markstein Register allocation via coloring Computer Languages 6 41 57 1981 2 K Dixit New CPU benchmark suites from SPEC In Digest of Papers Compcon Spring 1992 pages 305 310 IEEE 1992 3 D Goodwin and K Wilken Optimal and near optimal global register allocation using 0 1 integer programming Software Practice and Experience 26 8 929 965 1996 4 ILOG Inc CPLEX Division CPLEX 6 0 Documentation Supplement 1998 5 Intel Penti
3. A register architecture can become irregular due to ir regularities in the structure of the register file These irreg ularities limit the generality of the registers Examples of these irregularities include the partitioning of the register file based on data type and the sharing of bit fields among regis ters In partitioning based on data type registers are grouped into sets with each set capable of only holding data of cer tain types For example the Motorola 680x0 processor has 16 general purpose registers which are divided into sets of 8 data and 8 address registers 6 Although all are 32 bit registers values in the data registers are not considered memory addresses and cannot serve as the base address in effective address calculations In bit field sharing the register architecture specifies that certain registers must have certain bits in common mak ing the registers physically overlap Such registers are no longer independent Storing a value in one precludes storing a value in the overlapping registers An example of bit field sharing is found in the x86 The 8 bit AL register shares the same bits as the least significant half of the 16 bit AX regis ter which in turn shares the same bits as the least significant half of the 32 bit EAX register 5 3 2 Irregularities in Register Usage The second cause of irregularities in the register architec ture are irregularities in register usage found in the instru
4. AL AX EAX register to these instructions when ever these instructions appear with an immediate operand The shorter instruction is modeled in the IP allocator by a reduced cost on the decision variables corresponding to the allocation of symbolic registers to the AL AX EAX register Specifically for an instruction that uses symbolic register A if the instruction has a shorter size when the EAX register is allocated to A then the decision variable that describes the allocation of EAX to A would have a lower memory cost instruction size a 4 M where M is the memory cost of allocating A to any real reg ister besides EAX 5 4 2 Long Addressing Mode Specifications The x86 uses an additional one or two bytes after the instruc tion opcode to specify the addressing mode and the registers used in the effective address calculation Six of the eight x86 general purpose registers can participate in these addressing modes The other two ESP and EBP can participate with the following penalties The ESP register which is the architectural stack pointer register requires two bytes in the addressing mode spec ification when used as a base address register Specify ing ESP requires two bytes and dsp8 ESP requires 2 bytes plus an additional byte for the displacement dsp8 In contrast using any other general purpose register as the base register requires only one byte Hence there is a one byte penalty for using ESP as the base re
5. The coalescing of home memory locations has several benefits First the coalescing allows the removal of the load instruction that originally defines the symbolic register Be cause the symbolic register now has the same home mem ory location as the predefined memory value the symbolic register is considered to exist initially in memory and no instruction is required to define it Second since the sym bolic register and the predefined memory value share the same memory location the program s runtime memory re quirement is reduced Third the IP formulation is simpli fied For a predefined memory symbolic register the sym bolic register network from the original define instruction to the symbolic register s first use can be removed as shown in Fig 6 In this region the symbolic register s value exists only in memory making the symbolic register network un necessary The symbolic register network begins just before the symbolic register s first use where the symbolic register may be loaded into a real register A symbolic register becomes a predefined memory sym bolic register through association with a predefined memory value A symbolic register can be associated with a prede fined memory value if 1 the symbolic register is defined by a load instruction that loads the predefined memory value 2 the live ranges of the symbolic register and the prede fined memory value do not interfere and 3 the predefined memory va
6. of the con straints found in the RISC model The simplification is due to the fewer number of real registers available for register allocation the x86 has 6 whereas the RISC has 24 Since IP solution time is roughly O n relative to the number of constraints a four times decrease in the number of con straints translates to a solution time speedup of 32 Further more a newer version of the CPLEX solver is used yield ing a speedup of about 2 and a faster solver machine is used giving a speedup of about 3 Total IP solution time speedup is roughly 192 Furthermore the irregular cost nature of the x86 IP model helps to reduce solver time Irregular costs break up the symmetry of the integer program decreasing the time spent by the solver in searching through equivalent solutions 7 Conclusion and Future Work The paper described an IP based register allocator for ir regular architectures Previous work showed that an IP al locator is feasible for a RISC architecture where registers are uniform and can be treated identically The work pre sented here shows the IP approach for an irregular architec ture where registers have distinctive features and should be treated separately Furthermore the IP approach is shown to be particularly well suited to irregular architectures because individual costs can be modeled precisely and accurately in the IP model These costs on the other hand make alloca tion using heuristics difficult
7. register can be a mem ber of more than one set For example EAX is in the sets EAX AX AL and EAX AX AH EAX is in the first set because it shares the least significant 8 bits with AX and AL and it is in the second set because it shares the next least significant 8 bits with AX and AH To model bit field sharing between registers a general ized single symbolic constraint 3 is generated after each define that involves any register in a register set A single symbolic constraint limits to one the number of symbolic registers that a real register can hold A generalized single symbolic constraint encompasses all registers in a register set and limits the total number of symbolic registers that all registers in the set can hold together to one It is possible that not all registers in a set are needed for register allocation be cause there are no live symbolic registers of the correspond ing size At these program locations the unneeded registers in the set are excluded from the generalized single symbolic constraint As an example consider symbolic registers S1 S2 and S3 where S1 is 32 bit S2 is 16 bit and S3 is 8 bit The following is the generalized single symbolic constraint for real register EAX at a program location where S1 S2 and S3 are all live EAX AX AL zgi 855 2453 lt 1 esp Hasa Hesg lt l AX isthe 0 1 decision variable that indicates whether S1 is assigned to EAX and similarly for x4 an
8. Precise Register Allocation for Irregular Architectures Timothy Kong and Kent D Wilken Dept of Electrical amp Computer Engineering University of California Davis Davis CA 95616 5294 U S A kong wilken ece ucdavis edu Abstract This paper proposes a precise approach to register al location for irregular register architectures which is based on 0 1 integer programming IP Prior work shows that IP register allocation is feasible for RISC architectures which have uniform registers and register usage Extensions to the prior work are proposed that precisely model register irreg ularities including combined source destination specifiers memory operands and variations in the cost of register us age The x86 architecture is selected as a representative irregular register architecture for experimental study An IP register allocator is built for the x86 architecture within the Gnu C Compiler GCC and is compared experimen tally with GCC s graph coloring register allocator Exper imental results show that the IP allocator reduces regis ter allocation overhead by 61 compared with the graph coloring allocator The results also show that the x86 IP al locator is dramatically faster than the prior RISC IP alloca tor because of the smaller number of registers in the x86 ar chitecture and because of the register irregularities These results suggest that IP register allocation is well suited for irregular register architectures
9. ained through profiling The factor B gives the additional memory hierarchy delay caused by each 1 byte increase in code size at 2 and C gives a similar delay due to each 1 byte increase in data memory access at B can be obtained by profiling instruction cache misses and instruction page faults at 2 C can be obtained by profiling data cache misses and data page faults for data accesses originating from spill in structions near 7 Some decision variables do not have all three cost components and the corresponding cost term is zero in the cost model The main advantage of the model is the ability to account for diverse allocation costs easily and precisely As will be shown in Section 5 unique cycle and memory costs arising from an irregular register architecture can be calculated pre cisely using equation 1 Furthermore the factors A B and C provide a convenient way to adjust the relative effect of each cost component For example if the goal is to opti mize purely for program size the cycle and the data mem ory components of the cost can be excluded entirely from the cost model This type of optimization is useful for in stance in embedded applications where the main concern is to reduce hardware cost and a smaller program size reduces hardware cost by requiring less memory for storage 5 IP Model for Irregular Register Architec tures The IP model for register allocation in 3 can be ex tended so that register irregula
10. c tion set Regarding register usage an instruction set is regu lar if all instructions can specify any general purpose register as source and destination operands and there is no advan tage in selecting one register over another Although a reg ular instruction set is easier for the register allocator to han dle in practice sometimes an instruction set is made irregu lar to reduce instruction size and to allow for all the encoding needs of the instruction set This subsection describes these instruction set irregularities and their effects on register us age One way to reduce instruction size is to reduce the num ber of operand specifiers The reduction is often accom plished by overloading one operand specifier to specify both a source operand and a destination operand This requires the source and destination operands to be in the same real register thus restricting register usage An example of this combined source destination specifier is found in the x86 ar chitecture which uses the 2 specifier format for most of the instructions The instruction ADD EAX EBX adds the con tents of the EAX and EBX registers and places the result in EAX A second way to reduce instruction size is to specify reg ister operands implicitly rather than explicitly enumerating them in the instruction word This instruction format re stricts register usage because the operands must always be assigned to these implicit registers The x86 architectu
11. c register A for such an addressing mode the IP variable corresponding to the allocation of A to ESP is excluded from the must allocate constraint for A Thus A must reside in some real register besides ESP For the example in Fig 5 the must allocate constraint is as fol lows m 5 LoreA gt 1 r real regs r fESP The constraint forces the IP solver to put A in another real register besides ESP 5 5 Predefined Memory Symbolic Regis ters A predefined memory value is a value that exists in mem ory at function entry For symbolic registers that are de fined by the loading of a predefined memory value it may be possible to coalesce the home memory locations of the symbolic live range memX register XA network of A original define X of A can A load memX T be removed portion that A can be spill load of A A load memX deleted Aop A symbolic register X predefined memory value l l memX home location of X in memory Figure 6 Removing symbolic register net work segment for a predefined memory sym bolic register symbolic register and the predefined memory value If co alescing is performed the symbolic register becomes a pre defined memory symbolic register Fig 6 shows an example of a predefined memory value X and a predefined memory symbolic register A that is associated with X The symbolic register A shares the same memory location as X which is labeled memX
12. cator currently does not handle The solved column lists the number of function for which the IP solver is able to generate a feasible alloca tion The optimal column lists the number of functions for which the IP solver generated an optimal solution Although the IP solution is optimal relative to the model the alloca tion solution is not optimal because the present IP model does not insert copies optimally as discussed in Section 5 1 Of the functions that are attempted by the IP allocator the IP solver generated a feasible allocation for 98 1 of them The solver was able to generate optimal solutions for 97 6 of the functions it attempted within the time limit of 1024 seconds SPECint92 Functions Benchmark Total Attempted Solved Optimal compress eqntott xlisp sc espresso ccl Table 2 Number of functions solved with a solver time limit of 1024 seconds Fig 9 shows the size of the IP program against the num ber of GCC intermediate instructions Constraints growth rate is only slightly higher than linear relative to the number of intermediate instructions Fig 10 shows optimal solution time against the number of constraints The optimal solu tion time is the time the solver takes to produce an optimal 100000 10000 1000 solved optimally not solved Integer Program Constraints 1 10 100 1000 Intermediate Instructions Figure 9 Number of constraints vs number of intermediate instruc
13. ch register allocation decision is a binary decision at a specific point in the function a certain reg ister allocation action is either performed 1 or is not per formed 0 Register allocation decisions include whether a symbolic register should be defined into a specific real reg ister whether the assignment of a symbolic register to a spe cific real register should continue whether a symbolic reg ister should be stored to or loaded from memory etc The ORA analysis module produces a binary decision variable for each register allocation decision that must be made and records the decision variable and the corresponding register allocation action in the decision variable table as illustrated in Figure 1 The ORA solver module uses the information about de cision variables register allocation overheads and condi tions to construct a 0 1 integer program 7 a linear pro gram with the added requirement that each variable must be assigned an integer solution value that is either 0 or 1 Af ter constructing the 0 1 integer program an optimal solution is found using a commercial integer program solver The solver determines a value of either 0 or 1 for each decision variable so the conditions are satisfied and the total register allocation overhead is optimally reduced The ORA solver module then records in the decision variable table the solu tion value for each decision variable Finally the ORA rewrite module examines the deci
14. d x4 The first constraint applies to the register set EAX AX AL and the second applies to EAX AX AH Now assume at another program location only S1 and S3 are live The generalized single symbolic constraint becomes EAX AL gi 253 lt 1 aE EA The x4 term is missing because the AX register is not needed for register allocation 5 4 Instruction Encoding Irregularities In some architectures the instruction size can vary de pending on the register specified in the instruction For cer tain instructions using a specific register as an operand will result in smaller code size while others will cause code size to increase Furthermore not all registers may appear in all addressing modes These instruction encoding irregularities can be modeled precisely in the IP allocator leading to an allocation that takes into account these diverse effects This subsection presents cases in the x86 architecture 5 4 1 Short Opcodes with AL AX EAX Registers The x86 instruction set allows smaller code size for certain instructions if the register operand is AL AX or EAX In particular the instruction size is shorter by one byte if the instruction uses the AL AX EAX register and an immedi ate operand The shorter instruction size is available for these commonly used instructions ADC ADD AND CMP OR SUB TEST XCHG and XOR For this reason signifi cant code size reduction is possible if the register allocator assigns the
15. e Better register allocation for the x86 architecture will Presented at the 31st International Microarchitecture Conference December 1998 Copyright 1998 Association for Computing Machinery Inc i ORA Analysis ORA Solver ORA Rewrite Module Module Module H B Read i l Write Decision aj Variable Table I I 1 1 I 7 i l Write uo Read I ye 1 1 1 1 1 Figure 1 ORA top level modules and data structures potentially benefit a wide range of users The rest of this paper is organized as follows Section 2 gives a brief background on ORA Section 3 describes fea tures of an irregular register architecture and discusses their effect on register allocation Section 4 presents a cost model suitable for integer programming Section 5 presents an IP formulation for irregular register architectures Section 6 gives experimental results for the x86 architecture followed by a conclusion and discussion on future work in Section 7 2 ORA Background This section gives a brief overview of the Optimal Reg ister Allocation approach to global register allocation pro posed in 3 The ORA register allocator consists of three top level modules the analysis solver and rewrite mod ules as illustrated in Figure 1 The ORA analysis module analyzes a function to determine the points in the function where decisions must be made about various register allo cation actions Ea
16. e third condition requires that the predefined memory value is not aliased Aliasing allows the predefined mem ory value to be modified or used beyond the program loca tion where the predefined memory value is explicitly loaded Consider the following example where X is a predefined memory value residing in memory location memX A load memx call foo memxX The subroutine foo is invoked with the address of memxX Since the subroutine can potentially modify X X is aliased live range memX ce define of A A load memX E X A Aop spill store of A store A A X is modified memX last use of X memX of X spill load of A wrong value loaded A load memX last use of A A A symbolic register X predefined memory value memX home location of X in memory Figure 8 Overwriting of symbolic register by changing the predefined memory value and A cannot be assigned the same memory home location as X 6 Experimental Results An IP allocator for the x86 architecture has been built in side the Gnu C compiler 9 The x86 architecture was cho sen because it includes a large variety of register irregulari ties and because of its widespread use The integer program generated by the IP allocator is sent to a CPLEX 6 0 inte ger program solver 4 The solver runs on a HP 9000 780 workstation with a 160MHz PA 8000 processor and 256MB of main memory The SPEC92 2 integer benchmarks are used as test input
17. gister The EBP register which is the frame pointer register re quires two bytes in the addressing mode specification when used as an index register without an offset The addressing mode EBP requires two bytes In contrast this address ing mode requires only one byte if it involves any other gen eral purpose register Hence there is a one byte penalty for using EBP in this manner The additional cost of using ESP and EBP in address mode specification can be represented in the IP model as fol lows To model the one byte increase a decision variable is generated to represent the use of a symbolic register from each of these registers The variable has a higher relative cost and the variable can be set to 1 only if the real regis ter is allocated to the symbolic register at that program lo cation The variable is entered into the must allocate con straint which ensures that at least one real register is allo cated to the symbolic register at that program location Fig 4 shows an example The symbolic register Ais used to indirectly address memory If A resides in EBP and is used from EBP then the instruction would incur one extra byte in the address specification The variable PERE models this use with a higher memory cost EBP EBP Tusea gt TpreA instruction size x 8 M 1 M is the memory cost of using the symbolic register from any real register besides EBP Crean is set to 1 if EBP is al located to A j
18. h to memory using a combined specifier 5 3 Overlapping Registers Some architectures have registers that physically over lap Storing a value in one register necessarily means that a value cannot be stored in the overlapped register The x86 architecture is such an example The x86 architecture de fines certain registers as bit fields of larger sized registers In particular 16 bit registers are defined as the least signifi cant 16 bits of 32 bit general purpose registers and 8 bit reg isters are defined as the two least significant 8 bit fields of the EAX EBX ECX and EDX registers Fig 3 shows the mapping for the 32 bit EAX register and the registers con EAX AX AH AL Figure 3 Mapping between EAX AX AH and AL registers tained within EAX For register allocation bit field sharing implies that the registers involved can together hold at most one value at a time Register allocation for registers with common bit fields can be modeled as follows All real registers that share a common bit field are grouped into a set Each register in a set is allocated as a distinct individual register with the re striction that at most one register in the set can be allocated to any symbolic register at any given program location This restriction enforces the fact that there is only one underlying bit field in the register set and this bit field is in use when any register in the set is in use A
19. lue is not aliased The first condition asserts that live range memX ee 7 define of A A load memX a X A Aop spill store of A store A last use of X load memX A wrong value loaded A symbolic register X predefined memory value memX home location of X in memory Figure 7 Overwriting of predefined memory value by symbolic register the initial value of the symbolic register and the predefined memory value are identical The only effect of the defin ing instruction is to transfer the value from memory to a real register When the instruction is deleted the value of the symbolic register is unchanged but the value now resides in memory instead of in a real register The second condition prevents the overwriting of either s memory value by the other It is easy to show that this con dition is necessary using counter examples Fig 7 shows the symbolic register A and the predefined memory value X sharing the same home memory location but the live ranges of A and X intersect The figure shows A spilling within the live range of X overwriting the value of Xin memory When X is later used the wrong value of X is loaded Similarly changing the predefined memory value may alter the value of a symbolic register Fig 8 shows the same symbolic reg ister A and the same predefined memory value X X is mod ified within the live range of A When A is later loaded from memory the wrong value of A is returned Th
20. oad action intro duces a load instruction and the processor cycle cost of the load action is the number of processor cycles needed to exe cute the load instruction The second cost component mod els the effect of increased instruction size An allocation ac tion may change the code size of an instruction A larger code size increases program execution time because of de lays in the memory hierarchy in supplying instructions In particular there will be more instruction cache misses and more instruction page faults Finally the last cost compo nent models the effects of increased data memory access generated by an allocation action For example a store spill of a 32 bit value accesses the data cache to store the four bytes of data Increased data memory access causes more data cache misses and more data memory page faults The cost model can be expressed mathematically for each integer program decision variable As described in Section 2 each register allocation action is represented by such a variable The cost coefficient for a variable x is given by this equation cost a A cycle a B instruction size C x data size x 1 Each term corresponds to the cost component described above The factors A B and C are the relative weightings of the cost components These factors depend on the instruc tion at which the allocation action applies For an action at instruction 7 A is the execution count of 7 and can be ob t
21. ode Goodwin and Wilken proposed a fundamentally new reg ister allocation approach based on integer programming 3 The Optimal Register Allocation ORA approach builds an integer program IP representation of the allocation prob lem using integer program variables to represent possible register allocation actions at each point in the computer pro gram Each of these actions has an associated cost An in teger program solver solves the IP problem generating a solution that minimizes total allocation cost A set of con straints limits the solver to choose only actions that lead to a valid allocation Although global register allocation is NP complete in practice ORA consistently produces optimal al locations in O n time 3 The basic ORA model described in 3 focuses on proces sors with uniform register architectures This paper presents an IP formulation for irregular register architectures ar chitectures that place restrictions on register usage Effi cient register allocation for irregular architectures is diffi cult However efficient register allocation is important be cause many real world processors have irregular registers For many embedded and specialized applications low hard ware cost often drives design decisions forcing designers to trade uniformity in register architecture for reduced cost Furthermore the Intel x86 architecture the most widely used desktop processor has an irregular register architec tur
22. onuniform register usage cost As an ex ample in the x86 if the base register used in an indexed ad dressing mode specification is the ESP register the address specification requires an extra byte in the instruction This is because the encoding that should be used for ESP is used in stead to indicate the presence of more bytes in the addressing mode specification As a result irregular instruction encod ing causes register usage to appear irregular 4 Cost Model A precise register allocator must have an accurate model of the costs associated with register allocation An accu rate cost model is especially important for irregular architec tures where costs are nonuniform and can arise from many different sources In particular a precise treatment of code size cost is important because instructions can have variable size and the size can depend on the choice of operand reg isters Changes in program size in turn affect program exe cution time because of memory hierarchy effects A precise cost model for register allocation can be ob tained by dividing the cost of each register allocation action into its component costs For each action there are three types of costs if the action is taken 1 the processor cycle cost 2 the instruction memory cost and 3 the data mem ory cost The processor cycle cost models the time spent by the processor executing instructions resulting from the al location action For example a memory l
23. other ending in real register ry Because S1 is only de fined into one register either x7 ps1 oF re jsi will be zero which is less than the corresponding right hand side Although the proposed IP model will optimally in sert copies prior to instructions with commutative source operands the problem of optimal copy insertion at any pro gram location is beyond the scope of this paper this problem will be considered in a forthcoming paper Thus under the assumption that the allocator can insert copies the proposed IP model for irregular register architectures is precise but it is not optimal 5 2 Memory Operands In 3 Goodwin and Wilken consider one aspect of regis ter irregularity non load store architectures which directly use memory operands The IP model in 3 applies to in structions that allow a separate memory specifier for each memory source operand and or each memory destination However various non load store architectures include in structions that use combined source destination memory specifiers Here we describe an extension to the IP model for combined source destination memory specifiers This model makes optimal use of combined source destination memory specifiers under the traditional register allocation assumption that each symbolic register has a unique spill lo cation With this assumption the source destination mem ory operands that have a combined specifier must be the same symbolic register At an ins
24. prior to the instruction Also it is necessary to select at most one of the pys variables These two facts are captured in the following constraint produced by the copy insertion transformation X r 5 r T copy 8 lt ores r real regs r real regs Y r use of S X use endS r use contS y Figure 2 Potential copy before use of sym bolic register S where 5 is the variable produced by a preceding trans formation which represents whether S is allocated not allo cated to r just prior to the instruction 3 Also at an instruction with commutative source operands a condition is added that enforces the combined source destination specifier requirement For instruction S1 S2 op S3 S1 can be allocated to real register r only if S2 is allocated to r just prior to the instruction and the allocation of S2 to r ends at the instruction or S3 is allocated to r just prior to the instruction and the allocation of S3 to r ends at the instruction This fact is captured by applying a combined specifier transformation which for each r produces the constraint r r r ZdefsSi lt vuse endS2 of Vuse endS3 where the x7 enas 1S 1 if S is allocated to r at the instruc tion and the allocation ends at the instruction 3 Note that the left hand side is less than or equal rather than equal be cause itis possible for the allocation of both source operands to end at the instruction one ending in real register rx and the
25. re of fers many examples of this format the 32 bit multiply in struction uses the EDX register implicitly to store the most significant half of the 64 bit product shift rotate instructions use the CX register implicitly to store the shift rotate count and push pop instructions use the ESP register implicitly as the stack pointer A third way to reduce instruction size is to offer more compact forms of frequently used instructions A compact form may represent one opcode register combination elimi nating the need to specify the register operand separately In the compact instruction the register is specified implicitly Although the instruction can use all registers as operands the compact form reduces code size and the compiler should use this form whenever possible For example the x86 of fers compact versions of many frequently used instructions involving the AL AX EAX register These compact forms reduce the instruction size by one byte Finally an instruction set can be made irregular to sat isfy the encoding needs of the instruction set This situation happens when an encoding pattern that should be used for an opcode register combination is used for some other pur pose making the register unavailable for use with the op code To amend this deficiency sometimes a different en coding is given to the opcode register combination but the encoding is larger resulting in increased instruction size The end effect is n
26. rities are precisely modeled This section describes the IP model extensions for some common register irregularities 5 1 Combined Source Destination Speci fier Various architectures include instructions that use the same register specifier for one of the source operands and for the result s destination The register allocator is required to allocate the corresponding source and destination sym bolic registers to the same real register If the source operand is live after the instruction the register allocator must copy that symbolic register into another real register or spill the symbolic register to memory In contrast the three specifier format typically found in RISC architectures has no restric tion on the assignment of source and destination registers Combined source destination specifiers commonly occur in architectures that have small instruction sizes e g 16 bits or less such as the x86 architecture and the 68000 architecture to minimize the number of instruction bits that the register specifiers consume This register irregularity also occurs in some RISC architectures such as the PA RISC which com bine a branch and an arithmetic operation into the same in struction For combined source destination specifiers register allocation is problematic for instructions with two source operands that are commutative e g S1 S2 S3 In the traditional approach to handling the combined source destination specifiers for s
27. s The benchmarks consist of six programs compress eqntott xlisp sc espresso and ccl For each function in a benchmark a maximum solver time limit of 1024 seconds is allowed The experiment assumes a simplified version of the cost model described in Section 4 In the simplified model the factor A is obtained through instruction execution profiling as described The factors B and C are estimated B is set to 1000 to model the effect of increased code size Specifically it takes on the order of 1000 processor cycles to read in a byte of program code from disk storage C is set to zero The experiment assumes a Pentium implementation of the x86 architecture Each instruction requires the same number of processor cycles to execute as on the Pentium 5 Table 1 gives the cycle and memory cost for instruc tions used by the register allocator Cycle cost in the table is the number of processor cycles required to execution the instruction Memory cost is the instruction s size in bytes instruction cycle cost memory cost load 1 3 store 1 3 rematerialization 1 3 copy 1 2 Table 1 Spill code cost Table 2 shows the number of functions solved The col umn total shows the number of functions in the benchmark Of these functions attempted is the number of functions that are passed to the IP allocator for allocation Some functions are not passed to the IP allocator because they operate on 64 bit integer values which the IP allo
28. sion variable table to determine each decision variable that was set to 1 by the solver and to determine the corresponding register allocation action The intermediate instructions are then rewritten based on the register allocation actions deter mined by the solver with each symbolic register replaced by the assigned real register and with spill code inserted at the prescribed locations 3 Irregular Register Architectures A register architecture is regular in the programming model if all registers can be used interchangeably in each in struction without affecting the instruction s execution time or the instruction s size A register architecture is irregular if the choice of registers affects instruction execution time instruction size or both For a register architecture to be reg ular all registers must appear homogeneous in structure and usage A register structure is homogeneous if all registers can hold the same data types and the content of one register does not interfere with the content of another register Reg isters are homogeneous in usage if any register can be used as an operand whenever a register operand is allowed and there is no advantage in choosing one register over another Whenever the homogeneity in structure or usage is broken the register architecture becomes irregular This section de scribes these irregularities and their effect on register allo cation 3 1 Irregularities in Register Structure
29. tions 1000 00 oO O gt 100 00 E He E 10 00 S z 1 00 dp 0 10 raf o 0 01 amp lt 100 1000 10000 Integer Program Constraints Figure 10 Optimal solution time vs number of constraints solution for those functions that the solver was able to find optimal solutions The growth rate of the optimal solution time is roughly O n 5 with respect to the number of con straints The IP allocator produced significantly less spill code than the GCC allocator Table 3 shows the amount of dy namic spill intermediate instructions produced by both the IP allocator and GCC The IP allocator produced fewer load and stores and eliminated more copies However GCC deleted more rematerialization instructions than it inserted Overall the IP allocator produced only 36 of the total amount of spill instructions as produced by GCC Changes in execution time can be calculated by substituting values from Tables 1 and 3 into equation 1 The IP allocator produced 551M cycles of overhead while GCC produced 1410M cycles The IP allocator reduces execution time x solved non optimally overhead due to register allocation by 61 Overhead Type IP Spill Load Spill Store 317M M 331M 503M Table 3 Components of dynamic spill code overhead The IP solution times for the x86 architecture is signifi cant faster than the times for the RISC architecture presented in 3 The x86 IP model has only about a quarter
30. truction that allows a combined source destination memory specifier and that has symbolic register S as a source operand and as a destination a com bined memory specifier transformation is applied that pro duces a decision variable combined mem use defs Which represents the use of S from memory and the definition of S to memory The variable 2 omsined mem use def 18 given a cost that represents the overhead that occurs for reading from and writing to memory and the overhead for code size increase caused by the memory specifier The combined memory specifier can only be used if S is in memory just prior to the instruction Thus the transformation produces the condition 2 eombined mem use defs S Tires where pres 48 the variable produced by a preceding trans formation which represents whether S is allocated not allocated to memory just prior to the instruction The variable combined mem use adefs 18 included in the must allocate condition 3 for the use which ensures that S is either allocated to a register or to memory at the use The variable amp eombined mem use defs 1s also included in the must allocate condition for the definition The variable Leombined mem use def s Can be used in combination with the Pmemory useS ANd Lmemory defs Variables described in 3 so that the definition and use of S are optimally allocated both to registers to a register and to memory using a separate memory specifier or bot
31. uch instructions a com piler phase prior to register allocation uses a heuristic to select one of the two source operands to share the combined specifier The intermediate representation is then rewrit ten with a copy inserted that copies the selected source symbolic register say S2 to the instruction s destination symbolic register S1 The instruction is then rewritten by replacing S2 with S1 as illustrated below Copy SI lt S2 S1 S1 S3 The register allocator will then allocate the two defini tions and use of S1 to the same real register and may elimi nate the copy instruction if appropriate This traditional ap proach to handling combined specifiers is problematic be cause the choice of which source operand will be combined with the destination is made outside the context of register allocation and thus may often be a poor decision The IP allocator can precisely model combined source destination specifiers First the IP model is ex tended to allow copies to be inserted The IP model in 3 only allows copies to be deleted For an instruction with commutative source operands for each source operand S a copy insertion transformation is applied that produces a decision variable x7 for each real register r which allows S to be copied to r just prior to the instruction as shown in Fig 2 Each pys variable is given the cost of the corresponding copy instruction S can only be copied if S is live in a register just
32. um Processor Family Developer s Manual Volume 3 Architecture and Programming Manual Intel Corporation 1995 6 Motorola MC68020 32 Bit Microprocessor User s Manual Prentice Hall Inc 1985 7 G Nemhauserand L Wolsey Integer and Combinatorial Op timization John Wiley amp Sons 1988 8 R Sethi Complete register allocation problems SIAM Jour nal on Computing 4 3 226 248 September 1975 9 R Stallman Using and Porting GNU CC Free Software Foundation Inc 1995
33. ust before the instruction ao is set to 1 if A resides in EBP and is used from EBP The condition allows for the possibility of allocating EBP to A Ga A 1 with out using the value of A from EBP GENL 0 This can occur if the solver finds it beneficial to put multiple copies of A into different real registers In this case the use of A with the higher cost will not be chosen 1 e rEBP would be useA 0 The must allocate constraint for A is as follows EBP X l r TuseA F VoreA 2 1 r real regs r EBP EBP useA EBP preA is entered into the must allocate constraint instead of to correctly account for the higher memory cost 5 4 3 Exclusion from Addressing Mode The ESP register cannot be used in all addressing modes Specifically while it can serve as the index register it can symbolic register networks for A EBP EAX re ESP EBP EAX ESP preA preA x preA wee A Figure 4 Higher cost of addressing mode specification using EBP symbolic register networks for A ESP EAX soa EBP ESP EAX EBP preA preA x preA vee evel A Figure 5 ESP cannot be used in scaled index addressing mode not be scaled by the constants 2 4 or 8 e g cannot have 2 ESP Incontrast all other general purpose registers can be used as index registers with scaling by these constants The fact that ESP cannot be used in the scaled indexed addressing mode can be modeled as follows At an instruc tion that uses symboli

Precise Register Allocation for Irregular Architectures

Contents

Download Pdf Manuals

Related Search

Related Contents