Home

- Theses

1. Formal algorithm An example Error recovery Specific Actions An example of their use Error States General Action Summary of Error Actions EMANTIC PROCESSING 6 CODE GENERATION The Semantic Mechanisms of KHAR Semantic Actions Semantic Checking in PL O Pass 1 syntactic processing Pass 2 dealing with manifest constants Pass 3 checking variables Pass 4 checking procedures Attriobute Propagation in Expressions Code Generation THE INTERFACE TO KHAR Syntax Languages The General Syntax of SL Languages Graph of the General Syntax 7 Language Symbols of SL Separation of SL and Language Keywords Use of pointers in Transition Table Special Actons SA1 to SA12 Syntax Graph for SLO Transition Table for SLO Action Tabte for SLO Coding of SL1 in SLO The End State Symbol Description of the Statements Actions Syntax of SL4 Syntax Graphs of SL4 Coding of SL4 in SLO 7 IMPLEMENTATION Outline of prototype Transition Table TT and Action Table AT Structure of TT Structure of AT Files and Programs Used Files used in the system Programs used in the system Processes used in the system 8 DISCUSSION 8 CONCLUSIONS Size Simplicity and Extensibility Portability Clear and flexible Interface Definition of Language semantics Potential for development Applicibility to Programming Languages for Microprocessors Conclusions ADDITIONAL MATERIAL LISTING OF THE KHAR TRANSLATOR SYSTEM
2. ABSTRACT We present a translator system KHAR which is designed to use a minimum amount of read urite storage in environments where this is a scarce resource The system may be used for languages which are LL 1 Ve describe the system and use its application to the checking of the syntax of a machine oriented language AML 1 to illustrate KHAR S handling of syntax and error recovery and similarly use its application to the checking of the semantics of Wirth s mini language PL 0 to illustrate KHAR s handling of semantics We show too hou features not found in PL O can be handled The interface to the KHAR system provided for the designer implementer of a language is a set of semantic graphs after Cordy to which may be added error recovery and code emitting actions These graphs are encoded in a development of BNF called here Syntax Languages The Linearized graph with its actions is translated into two sets of tables one to drive a push down automaton to recognise the CFG of the language with cross linkage to the second which defines the action to be taken at that point in the syntax These actions operate on registers and a read only stack which handle integer numbers as the encoded form of language symbols The simplicity of this mechanism is due to the multipass nature of KHAR Ve compare this simple mechanism with those used by Cordy We report on the degree to which KHAR meets its desion objective of minim
3. 5 16 uo ssa Jdxs7 os a a lt gt lt gt 4 aa s a uo ssa Jdxs7 uosseudxe lt qqo 96 5 17 PASS 4 checking procedures As in previous passes mark and flush the stack before entering and upon the exit each time a Cblock is encountered in matrix 98 Push procedure identifier on the stack if it is not already there and if it is report an error that procedure identifier is redeclared We mark other identifiers as of no interest Report an error if any of the identifiers of interest on the stack appears in CONST part in matrix 98 in VAR part in 98 or before in 97 Report an error if the ident after CALL in 97 is not on the stack i e is not declared or is declared but marked as not of interest sz 5 18 Semantic actions Procedure Identifiers 1 mark 2 scope css error ln css declared nl push css push 0 3 scope css error ln css declared nl push css push 13 4 flush 5 search css 33 check index check index 1 error ln css is a procedure name 6 search check index check index 1 zerror in css is not a procedure ore error ln css undeclared kz 5 19 Matrices 99 amp 98 LIA 3704939373 ea quopi JANGIJONA e lt n 990 q lt 66 5 20 Matrix 97 218081035 lt gd
4. W REAL X REAL Y REAL Z REAL Repeating the two scans gives first W REAL lt Y REAL X REAL gt Z REAL and then b W REAL TEMP REAL Z REAL 5 25 A third pair of scans gives lt W REAL TEMP REAL gt Z REAL and TEMP REAL Z REAL which in turn gives lt Z REAL TEMP REAL gt and TEMP REAL which becomes on the second scan TEMP REAL and is no longer an expression since it is not bracketed as such Note that we rely on the syntactic processing to detect and report errors We can also place code emission actions in the error action part of the matrix to repair the error to allow further processing if we wish We tabulate the actions for the two passes and then present the graphs kaa 5 26 2 3 4 5 6 Semantic Actions first pass of attribute propagation prc carmem push css emit 2 1 pop emit 3 emit lt css 2 41 gt emit 4 13 lt css 2 1 gt emit lt css 4 43 2 1 gt emit 3 4 push css emit css Semantic Actions second pass of attribute propaqation push css emit 1 real o emit 1 int emit 1 css emit temp css emit css 5 27 DRE W Jojzouedo SIpoAp SENS er eda 4uep JoyoJsedo Dj esse r Matrices 89 amp 88 l l I m O tu i Beba P
5. e e e Li a e e a U 1 U U U 1 U 1 U U L U U I L 1 I U 1 1 I l 1 smi 1 t i ci NI 1 1 1 1 1 1 i 1 U U U U 1 1 l U t 1 U U oo I t 1 I ine l INI i t 1 1 tel 1 I r U 1 1 i U U U 1 U 1 U U 1 U O I 1 i i i t 1 1 CI imi 1 1 I 1 IF I 1 i 1 Iei 1 1 I 1 1 Q i I i 1 1 U 1 1 U U lt I t L L 1 1 1 1 1 to 1 1 isl i l ft 001 lt 1 1 1 1 t i o 1 1 U U I 1 1 U U I L i i 1 w i i 1 1 1 l I 1 I 1 1 1 4 1 U m i int 1 u 1 1 1 1 U 1 I U i u 1 1 t 1 1 nee ERES al U U 1 i 1 U i 1 U 1 t 1 y hQ M W y sh w me n ce lt 4 I ee ae poi i 1 1 bel i at 8 Idi Ici t tet 1 1 U U U U U U 1 U 1 1 U 1 1 U ot 1 m1 1 1 a m O A A A __ _ _ __ y __ yz_z_ __ z q gt x i ee emo tu l f 1 t 1 Q I tw 1 I sl t 1 1 int 8 1 a 1 U u d 1 i 1 1 Imi zi I CI 1 U i ui i H H 1 1 1 1 LIO l mo lt Imi 1 l 1 t ie 1 i 1 1 1 1 U 1 U U U 1 U 1 3 x Il I SI um er
6. implementations I The flexibility of the interface and the minimal nature of the semantic mechanisms available to the designer requires skill in the use of the system on the part of the language designer Once these are mastered the semantic graphs produced as claimed in CCORDY become a ready means of communicating the exact semantics of the language to its users Indeed Cordy claims the use of semantic graphs to be superior to other means of achieving this DEFINITICN OF LANGUAGE SEMANTICS The reliance on PASCAL to express the semantics of a Language defined using an LL 1 affix grammar in CBOCHMAN produces a textual definition which can only be read if one understands PASCAL Also semantics and code generation are considered together although the details of the latter are concealed by the use of a procedure generate which has to be supplied by the user of the compiler writing system The peste qa that it is hard to see what the semantic meaning is We consider that the essentially graphic nature of the presentation of semantics in KHAR and the individual refinement of the semantics imposed by the multipass approach make it possible for both designer and user to select an aspect of semantics and isolate its effects precisely Further KHAR implements a language once it has been expressed as a set of SL programs and translated so that a definitive implementation is immediately available The translator may well be
7. set of graphs This has been done for the graphs of language AML 1 in the previous chapter This table shows that the graph 92 is only called once in the graph 99 and it is thus more efficient to include it in that araph and have one graph fewer 4 1 The set of elements terminals and nonterminal symbols appearing in a syntax graph is the valid elements of that graph and consequently the union of all of these sets is the set of language symbols Valid elements of a syntax graph are connected by directed Lines For each point of these Lines there exists at least one destination Two points having the same destinations are called the same condition points A piece of Line consisting of such points is called a state line While in a state the set of elements which may appear as the next terminal or nonterminal ir the input stream to cause a change of state are valid elements of that state We have a function to take the state number and return these valid elements We will see this function Later For each syntax graph thereexists one initial Line and one or more exit Lines The initial Line is the Line which comes to the graph and terminates at one or more alternative elements within the graph le call these elements the initial symbols of that graph Ve allocate number 0 to the initial Line and call this the initial state and EXIT to exit Lines and call each of them an exit state Th
8. Ajuo BujgJodos Jod33 JO sSSDDD x Se JDUO 3Dip Jo30Jd JOJUJ Q 1d AVHA AVHA E AVHA H AVHA JVHA AVHN Buy posue 3 9 JO epo gt s ssod g d qope Joy se qo4 Bu A up J JIUIJSIP pssod L gssod gssod 1 ssod lt ___ PREPARATION OF INFORMATION TABLE TO BE USED IN CODING THE SOURCE LANGUAGE SYM2OLS For each Language to be used by our system we need to prepare two sets of information These are the language keywords and the language syntax We call them the Language Symbol Data LSD and the Transition Table Data TTD respectively The former LSD is made from the terminal symbols used in the language syntax and the latter TTD is the information derived from the syntax itself that is from the transition matrices of the language Both of them have their own special syntax in data preparation which we explain separately Language Symbol Data The set of elements on the top of columns in all transition matrices of a language is called the valid elements of that language or language symbols For each programming language we group terminals according to the classification reserved words triple character delimiters given above and call it language symbol data This data is a string of characters divided into groups starting with the associated fixed code for each group and one group following the other Each construct or symbol however is separated from the nex
9. Also this program creates file 4 on which code name table length base code and number of code names are written This information is needed in program N 7 10 program C This program using files 1 and 2 finds out the comment delimiters of the language It then reads source text deletes all comment and outputs the remaining text on file 5 programs D and E These two programs are very similiar They both read and write the same files Their main task is to find the constant string delimiters of the language find constant strings in the input text and replace them by their codes The actual constant string is put in a dictionary To distinguish between these codes which are numbers between 3000 and 3999 and the actual integers used in the text we also have to code integers simultaneously along with coding constant strings But there are files Like Language symbols or transition table data in which the integers used A codes themselves and we do not wish to code them again This is why we have two similar programs Program D codes all integers along with constant strings while program E leaves integers as they are and only codes the constant strings program F n This program reads file 7 replaces all standard names and user defined identifiers by their codes and outputs the result on file 11 Also it puts all user defined identifiers on file 12 7 11 program G N This program will check files 1 a
10. Currently for production PASCAL compilers the method of Amman AMMAN is adopted which has definite Limitations and involves the programming of the explicit addition of symbols to the follow set on entry to a procedure written to recognise a non terminal of the language The approach to error recovery in KHAR is to allow the designer to define error recovery productions No problems arise over semantic information as discussed in GRIES since KHAR rigidly separates syntax and semantics The system is constrained to accept languages with an LL 1 grammar LL 1 grammars and their application are discussed by Griffiths in CEAUERed Here it is enough to say that l a thev make table driven methods useable b error recovery is simpler since even a small degree of look ahead provides sufficient redundant information to enable good recoverv c automatic syntax improving techniques e g Foster s SID CFOSTER exist to improve grammers so that they are LL 1 d LL 1 languages are easier to read and use CHOARE 1 5 This last point is of great importance when the final design objective of KHAR its use in language design is considered As remarked by Watt CWATTb one of the most useful tools available to designers and implementors of programming languages has been the Context Free Grammar CFG A CFG is capable of defining a large part of the syntax of a typical programming Language and the existence of a wide variety of s
11. These actions will be added to the above example matrix after we have discussed these topics However the Linearised form of the graph can now be shown as 98 E O BYTES gt 1 BYTE gt 2 REF gt 5 1 CIDENT gt 2 STRING gt 2 CENDDATADEXIT CIDENT gt 4 W n BYTE gt 3 ENDDATADEXIT 5 IDENT gt 6 6 REF gt S ENDDATADEXIT J This syntax corresponds to code such as scanning fron DATA string BYTES A CHARACTER STRING ENDDATA DATA space BYTES size ENDDATA x size is a manifest constant DATA worki BYTE a BYTE b BYTE c BYTE d ENDDATA DATA refs REF index REF top of stack ENDDATA We also show where actions can be placed by showing a code emitting action and an error recovery action This latter is the worst possible it just stops the scan Semantic actions are discussed in chapter 5 while we consider Error Recovery in the second part of this chapter We show part of this linear form with error recovery sauce added as well as code emitting actions in the following diagram 98 L O BYTES gt 1 BYTE gt 2 REF gt 5 STOP 1CIDENT gt 2 emit rmb css STRING gt 2 emit fcc css nl feb O 2 STOP 4 5 STOP is the error recovery action placed after a at the end of the state The verb emit is one of the semantic and code generatina actions we have access to in KHAR through the SL language As we show in chapter 5 STOP is correct in state 0 which onl
12. relatively inefficient but it is available as a standard by which to Bs judge other compilers for the Language POTENTIAL FOR DEVELOPIIENT We feel that there is considerable potential for development based on KHAR Our experience shows that a minor change in KHAR makes a wide range of application possible The system was intended to deal with relatively simple and small languages approximately subsets of PASCAL but with low level operations on the machine architecture rather than the more abstract operations of PASCAL KHAR can be used to translate such languages Ve suggest that KHAR could be adapted to tackle the problem of strict checking of user defined scalar types which has been avoiced in PASCAL and to handle the evaluation of constant expressions at compile time A possible approach to the former is to use the special table building actions of KHAR which are available at all times to construct at compile time an ii tone checking pass or passes derived from the type declarations in the program This extension is well beyond the original objectives of the work but the possibility has been noted The evaluation of constant expressions at compile time would be a simple extension of KHAR if only integer arithmetic were allowed The stack mechanism would need to be extended by adding arithmetic operations to KHAR and the corresponding operators to SL A technique similar to that suggested for checking the type of
13. Integers apearing in the source text are put in the dictionary of integers and in their places the appropriate code Integers are coded from 10C0 to 1999 that is their index plus 1000 Users defined identifiers are coded from 2000 to 2999 and 3 15 constant strings from 3000 to 3999 ENCODING This encoding is done in six steps The input to step one is the source text and the output from step six would be the internal form of the source text Each step other than the first takes the output of the previous step as its own input The steps are in the following order 1 delete all commentary and code the newline character 2 code all constant strings and integers 3 code standard names and user s defined identifiers 4 code triple special characters 5 code double iene 6 code single special characters The output from step six is a sequence of integers FIXED CODES AND VARIAELE CODES The vocabulary T of terminal symbols of languages we consider can be categorized as follows a Reserved Words CRU b Standard Procedure Identifiers SP c Standard Function Identifiers SF d Standard Constant Identifiers SC e Single CHaracter Del imiters SCHD f Double CHaracter Delimiters DCHD i g Triple CHaracter Delimiters TCHD 3 16 We assume that in the class of languages we use there exists no symbol consisting of more than three special characters These can
14. SL i Language can be improved to produce SL i 1 to deal with the features of a new language to be implemented A program in any SL is the Linearized form of syntax information semantic information and code emitting actions for a language L according to the syntax of SL a linearized form of the transition matrices of the language L A program in SL has one or more blocks bounded in curly brackets and gt with the following structure t main block other blocks Main block is the information constructed from the main transition matrix and other blocks contain the information from the other matrices each of which has a similar layout to the main block Each block has one or more compound statements bounded by square brackets C and Jj Each compound statement has one or more simple statements and one syntax error action part bounded by parenthesis and Blocks and compound statements are all labeled The labels are numeric 6 2 THE GENERAL SYNTAX OF SL LANGUAGES lt program gt lt main block gt L lt block gt lt main block gt 23 lt block gt lt block gt 2 lt matrix label gt lt state gt lt state gt J lt state gt lt state label gt lt statement gt lt statement gt syntax error actions gt lt statement gt 2 lt valid element gt gt lt state Label gt L lt semantic actions gt lt code emitting actions gt J lt matrix Label gt in
15. SUO 300 2 puowes 87015 usus j ARE SS aerei ea SUO 350 2A kojuka l 9C 3 Jsqunu 9 DIS sq gt Matrix 98 O JOQUNU 97D3ST 09 C96 Jequnu 8 73 DIS quewa a PI DA 408433 LIN3 9 d001 3SN A3dd JINS LIXI dOLS 6 23 clyS i 5 EVS IYS BYS quewe e PI DA CS61 lt SSI HOAVAIS CE lt TLL61 lt lt TLL6 lt 94 AIIHI AAVW lt HSN 14 dOd U 475 JubJsUOD ES61 lt 13431 SSI 149vI J9B93U 3a RE HSNd x jeBoqu lt a LIS Nes L96313 LIW3 6 04 al a Matrices 96 amp 95 6 25 CODING OF SL4 IN SLI 99 C O C gt 1 a oo 18 1 21 gt 2 sa1 saz 9 go 16 2 C gt 3 go 18 3 22 gt 4 sa2 go 18 4 gt 5 go 18 5 2217 23 gt 6 sa3 go 18 6 gt gt 7 qo 18 7 22 gt 8 sa4 3 go 18 8 gt 9 sa11 gt 11 sa9 sa11 gt 13 sa9 sa9 2 gt 17 sa9 sa9 sa9 sa11 go 18 9 97 gt 10 sa10 co 18 100 gt 11 sa11 gt 13 sa9 gt 17 sa9 sa9 sa11 9 go 18 11 97 gt 12 sa10 9 go 18 12 gt 13 2 gt 17 sa9 sa11 a co 18 13 23 gt 6 sa3 go 18 140 gt 15 a qo 18 15 gt 3 1 gt 16 go 18 16 gt exit sal sa6 gt 1 ago 18 17098 gt 14 sa10 go 18 18 a find 22 go 4 or 23 gt 22 go 7 or go 16 J 98 L Olstop gt exit sa7 get gt 0 sa7 unget gt 0 sa7 Loop gt 1 sa7 go gt 1 sa7 use gt 1 sa7 find gt 2 sa7 a exit 1 22 gt exit
16. after CALL in matrix 97 Ve tabulate the actions for the pass and then present the graphs th the index number of the action placed to show where it would be carried out Se 6 9 mantic Actions Manifest Constants set O mark set css scope rgCerror ln rg declared push rg push css scope css error ln css declared push css push 0 flush search css check index 1 epror ln css const in LHS search css 33 check index 1 error ln css not a procedure set emit search check Xindex check index 1 emit index 1 2 10 emit css 11 a emit css set emit _ s 90 9019 T ciueus303s 717 1USPi lt aundaoo3d e Ne IE 349pi PEM LSNOS lt s 0 86 66 5 8 USW939IS gge U014 1PU00J lt JITHM 1ueueqoqsqe e rUC N3HL uojyipuosge r f 7ueW 703S INI 349W64045 lt NI93d Pins i lt JUDP g 1 P L6 5 9 uo ss udx TITTTI gt gt Litt tt puoisseudxe cuoisseudxe 7e qqo 5 18 95 94 factor factor x T 93 3 ident 2 E 1 expression gt 5 11 PASS 3 checking variables As in pass 1 in matrix 98 each time on entering block we MARK the stack and on leaving ve FLUSH For each variable identifier encountered we use SCOPE to check if it is already on the
17. nonterminal is only used once in the syntax graphs of other nonterminals we replace it 3 20 by its graph Also it is better to replace the nonterminals by their graphs if they have a small syntax graph even if they are used a few times This minimizes the amount of information we need to keep and also reduces the compilation time Figures 3a to 3c show the syntax graphs of this Language 3 21 FIGURE 3a Matrix 99 for ANL 1 PROGRAM gt i dent LS AT 97 DEFINE gt ident gt gt constant DATA gt 98 SENDDATA PROC gt ident gt 96 gt ENDPROC MAIN0 3 961 0 xENDPROGRAM gt Z 22 ms FIGURE 3b Matrices 98 93 and 97 for AML 1 HERE 98 ji dent AT gt C97 Ee i TT NE SM 97 number Q D H B 3 23 FIGURE 3c Matrices 96 95 and 94 for A L 1 gt 95 WHILE gt 1941 gt D0 96 REP IF gt 94 gt THEN gt 96 END ELSE 961 l CODE ENDCODE CALL gt i dent gt ENDCALL 94 0 CE S Emo AD L bi 3 24 The terminal symbols are represented by their denotations and non terminals by placing their name within square brckets LJ Graph number 99 shows the production of the distinguished symbol of the language To every graph we assign a code number starting from 99 downwards Code 99 is devoted to the main graph and the allocation of graph cod
18. of type In fact KHAR never requires to handle the representation of the attributes of objects in a program Comparision of types as implied above is achieved by seeking to match the symbol read from the input stream with the current expected token That these may both represent REAL or one REAL and the other INTEGER is of no consequence That REAL REAL is valid in ANSI FORTRAN and REAL INTEGER not is not included in KHAR It is expressed by including the branch REAL REAL in the syntax graph of the second of these tuo passes and omitting REAL INTEGER and INTEGER REAL We need only state what is valid The properties of KHAR as a recogniser will do the rest Further since each pass or group of passes deals with one aspect of a language semantic errors of the same type are reported together which should assist the user in program debugging We note also in passing that error recovery actions are also included in KHAR so that error recovery is under the control of the language designer and the majority of syntax errors are reported in 3 7 one pass We discuss error checking in chapter 4 Our picture of the KHAR system as set up to form a PL O compiler is now as in Diagram D on the following page The syntax graphs and actions for each of these passes are of course all defined using SL4 We discuss syntax graphs and our basic checking technique in chapter 4 See lt q vosBo ig gt
19. of the corresponding language and usually simultaneously a complete semantic check of the source program is performed Runtime storage and addresses are then allocated to variables and the internal representation of the program is used to produce assembly or machine language of the target computer by the code generator This code generator is the hardest task in a compiler the most systematic approach presented in the literature being that of CWILCOX A fuller discussion may be found in CGRIES or LEAUERed LEXICAL SCANNING IN KHAR Source text is read as a string of characters each basic symbol of the language is recognised and this will be passed to the output file in an internal form that is to say as an integer number 2 7 Language symbols i e reserved words standard names special symbols etc are represented by predefined codes User defined identifiers constants and integers are represented by an index into the appropriate dictionary plus an offset 2000 3000 which identifies their class This encoding minimises the overhead in transmitting information between processes as it allows the rest of the processes to operate with TORRE ATI symbols rather than variable length strings of characters It is good communications practice to use the shortest encoding for the most frequent bits of information encountered Programming language design properly takes no account of this since it has other wid
20. sa8 exit 2 23 gt 3 sa7 exit 3 gt 2 go gt 4 sa7 A exit 4 22 gt 5 sa2 9 exit 5 or gt 2 sa7 25 gt exit exit 97 L OCemit gt 1 sa7 error gt 1 sa7 push gt 2 sa7 set gt 2 sa7 pop gt 0 sa7 27 gt 0 sa7 flush gt 0 sa7 search gt 4 sa7 check gt 4 sa7 252exit a exit 1 96 gt 0 9 exit 2 gt 3 sa7 26 gt 0 sa7 exit 3 26 gt 0 sa a exit 4 x gt 5 sa7 26 gt 6 sa7 exit 5 26 gt 6 sa7 a exit 6 C gt 7 a exit 7 9728 a exit 8 gt 9 sa7 Q exit 9 97210 a exit 10 gt 11 sa7 exit 110 220 a exit J 96 L DC gt 1 sa7 a exit 10 gt 2 sa css gt 5 sa7 rg gt 5 sa7 20 gt 5 sa7 26 gt 6 sa7 exit 2 index gt 3 sa7 26 gt 5 sa7 9 exit 3 gt 4 sa7 gt 4 sa7 2525 exit 4 26 gt 5 sa7 exit 5 gt 6 sa7 25 gt 6 exit 6 gt 1 sa7 gt exit sa7 exit bad 6 26 fee CHAPTER 7 IMPLEMENTATION Two versions of the KHAR system were developed The first Nai on an ICL1904S in PASCAL and the second on a DEC PDP11 40 written in the C language a derivative of BCPL This second version relies on many of the features of the UNIX operating system in particular the file system and the ability to write command macros This enables the large number of files involved in the use of KHAR see below to be handled by providing commands for the user It must be emphasised that the present version is a prototype designed for use in the research and to exploit the UN
21. some very flexible tools which may be employed These tools are Like features of a programing language in that they are very flexible Because of the simplicity of KHAR and the flexible interface provided via the SL languages one can simply add his own new features to the system and use them These error actions are accessed by calling an action interpreter at the point in the formal algorithm where an error is detected Either one of the specific actions or the general action may be called Their action is described in the following sections 4 17 SPECIFIC ACTIONS Reserved words or verbs in the SL languages are used to call the error actions of the KHAR machine They are as shown in the following table Action Keaning GO go to error state see below STOP stop checking SUCC point to next source symbol LOOP go to another state stay in that state read and ignore source symbols until one of the valid elements of that state appears in the input stream then go to appropriate state s AA The usage of these error actions will become clear by an example In our previous example we made a transition matrix from syntax graph no 98 of AHL 1 This matrix is not complete and in case of error the program syntax checker would stop It needs some more information to be able to continue Therefore we add error actions to each state of that matrix We discuss the action added for each state sepera
22. the designer and will provide a translator System for such languages which has a low read write storage recuirement 8 8 REFERENCES CARMANT CBAUERed CBOCHNANNJ CEROWNa CBROWNbJ CCORDY Amman U The Hethod of Structured Programming Applied to the Development of a Compiler International Computing Eos i Syopos vuln 1973 A Gunther et al eds Amsterdam North Holland 1974 pp93 99 Bauer F L amp Eickel J eds Compiler Construction An Advanced Course 2nd ed Berlin Springer Verlag 1976 Bochmann G V amp Ward P Compiler Writing System for Attribute Grammers Comp J vol 21 no 2 pp144 148 Brown P J The KL 1 Macro Processor Comm ACM vol 10 Oct 1977 pp618 623 Brown Pode Macro Processors and Software Implementation Comp J vol 12 Nov 1969 pp327 331 Cordy J R A Diagrammatic Approach to Programming Language Semantics Technical Report CSRG 67 University of Toronto Computer Systems Research Group 1976 references 1 GLENNIE CGRIES CHOARE CJA ESJ CJENKINSJ CJENSEN CKERNIGHAN Glennie A On the Syntax Machine and the Construction of a Universal Compiler Technical Report No 2 AD 240512 Carnegie Mellon University 1960 Gries D Compiler Construction for Digital Computers New York Wiley 1971 Hoare C A R Hints on Programming Language Design ACH Symposium on principles of pro
23. which has been described and the action interpreter which uses the case statement of C to parse the Linear action code when an action is called for Each action is implemented as a separate segment of code Actions are easily added and can be made available to the user via the SL languages 7 2 Transition Table TT and Action Table CAT These tables of information are interpreted by KHAR to parse and emit code for the appropriate language This is an important part of ch system and all syntax checking semantic checking and code emitting revolves about it Once these tables are made for our smallest language in the SL series by hand we are able to create them automatically for the succeeding languages These two tables are arranged as a one dimensional array of elements In the following section we give the structure of these tables Structure of IT Suppose we have m matrices in the Language then TT consists of m parts and we have pointers to beginning of each ate Each part consists of a number of states all with the same structure In the figure on page ti 7 7 _ We have enlarged one state in which a number of valid element part s is followed by a 0 to indicate the end of a state and followed by a pointer to a syntax error recovery part in AT Each valid element part in TT has four elements 1 a language symbol valid at this point 2 a pointer to another state of the same table for use if this elemen
24. E lt identifier gt lt constant gt x DATA lt data brick gt ENDDATA PROC lt identifier gt BODY lt block gt ENDPROC MAIN lt bLock gt ENDMAIN lt data brick gt lt identifier gt lt Location gt lt declarations gt lt location gt HERE lt at clause gt lt at clause gt AT lt number gt lt suffix gt lt suffix gt B Q D H lt number gt lt digit gt lt digit gt lt digit gt 1 2 3 4 5 6 7 8 9 0 A B C D E F lt declarations gt BYTES lt size gt BYTE lt identifier gt REF lt identifier gt lt size gt lt integer number gt lt string gt lt string gt lt printing character gt lt printing character gt lt block gt lt statement gt lt statement gt lt simple statement gt lt while statement gt lt if statement gt lt simple statement gt lt code brick gt lt call clause gt lt code brick gt CODE lt taraet machine assembler gt ENDCODE lt call clause gt CALL lt identifier gt ENDCALL lt while statement gt WHILE lt cond gt DO lt block gt REP Kif statement gt IF lt cond gt THEN lt block gt ELSE lt bLlock gt END lt cond gt lt simple statement gt lt conditional test gt We first construct a set of syntax graphs from the grammar we assume that the reader is familiar with the rules of graph construction For each nonterminal we may have one syntax graph It is a good practice to reduce the number of graphs to as few as possible by merging them together That is if a
25. IX environment The main aim of the implementation is to enable the study of the KHAR passes and the development of th primitives recuired It is therefore different from the final form required for its intended working environment in several ways OUTLINE OF PROTOTYPE First all the information held in the file store about a program and the language is read from the serial files of the UNIX filestore into a simulated indexed random file organisation This information would be retained on backing store n a stand alone system using floppy discs Second all possible KHAR actions error recovery semantic etc are available in each pass The discussion concluding chapter 5 7 1 k showed that only a few passes principally code generation needed to access these files except when reporting errors Third the present KHAR machine contains a full set of trace statements which can be selectively enabled to give a full trace of the behaviour of the machine This has proved to be of great value in tracing errors The implementation has attempted to be as simple minded as possible day one recursive procedure call is made and no use is made of local variables Thus the basic code of the machine does not require that a procedural language with recursion be used This means that Little overhead is imposed and the machine is easily transportable The main components of the KHAR machine are the recogniser the algorithm of
26. The program execution begins with access to the first valid element of the first state of the main matrix and stops by exiting from this same matrix The other matrices may be called directly or indirectly from this main matrix We have two types of valid elements either they are terminal symbols in which case the source symbol will just be checked with that or they are nonterminals a matrix number In the latter case before entering this new atei the source symbol will be checked with the initial symbols the director symbols of that matrix and if it does not match with any of them the next valid element of the current state will be checked against phe current source symbol but if a match occurred checking continues by pointing to the beginning of the new matrix Upon the exit from this new matrix access is 4 7 regained to that point of the previous matrix from which access was transfered This recovery is achieved by using a stack for the accessing information Information is pushed onto the stack on entering a matrix and popped off on exit The process always expects the source symbol to match with one of the valid elements of the current state If a match occurs the next state is the state indicated by the entry after that valid element Another source symbol is read and access is transfered to the new state and the same process continues until the end of file occurs or the source symbol does not match with any of the
27. Univ of Glasgow 1974 CWATTbI Watt D A The Parsing Problem for Affix Grammars Acta Informatica vol 8 no 1 pp1 20 1977 references 3 WILCOX WIRTHa CWIRTHI Wilcox T R Generating Machine Code for High Level Programming Languages PhD Thesis Cornell University 1971 Wirth N The Pr gramming Language Pascal Acta Informatica vol 1 pp35 63 1971 Wirth N Algorithms Data Structures Programs Englewood Cliffs N J Prentice Hall 1976 references 4
28. aa Ou A oO O n e mn md nooo e a Tu Ha a o s by Majid Azarakhsh A thesis submitted to the University of Glasgow in accordance with the regulations for the award of the degree of Doctor of Philosophy ACKNOWLEDGEMENTS I would Like to thank Pahlavi University Professor for D C Gilles and giving me the apportunity to study for a higher degree I would Like to express my sincere thanks and appreciation to Mr D G Jenkins who supervised this work for his continued support and interest which were of great value in bringing this thesis to completion I also thank all members of the staff of the Computing Science Department of the University of Glasgow who have helped me in so many ways iajid Azarakhsh H askha CONTENTS CHAPTER 1 INTRODUCTION fotivation Philosophy 2 APPROACH Principles of the approach of KHAR Grammar and Formal Definitions Notational System Conventional Compilation Process Lexical Scanning in KHAR Syntax in KHAR Conventional Semantic Processing Semantics in KHAR Code Generation 3 DESCRIPTION OF KHAR The recogniser and the graph encoder Preparation of input to the KHAR system Code Name File and Code Name File Index Internal Form of the Source Program Encoding Fixed Codes and Variable Codes A practical example 4 SYNTAX CHECKING 8 ERROR RECOVERY Syntax Graph Transition Matrix Valid elements of a Transition Matrix Checking technique used in KHAR
29. are given as 1 enter name 2 enter address and so on one for each attribute This immediately brings us to the difference between the approach of KHAR and that of conventional compilation In KHAR we do not construct a symbol table We do have dictionaries of identifiers strings and constant denotations but the objects handled within KHAR are the integer indices into these dictionaries Thus KHAR has no symbol table in the sense of LCORDVI The mechanisms of KHAR are 1 A symbol stack this is a stack the elements of which are integer variables as we shall see items may be pushed onto or popped off the stack the structure is a read only stack rather than a push down store in which only the top element would be accessible 2 Current Source Symbol register CSS this is a register whose content is the last symbol read from the input stream by the recogniser it is used ina read only manner 3 A Register RG this is a working register whose contents may be set using the SET verb from a range of sources or may be incremented or decremented 5 2 5 A Label register LABEL this is used to provide a set of integer values which are used to generate unique Labels 5 A Level register LEVEL this is incremented or decremented to provide a level count within a block structured language used to generate level address pairs for the PL 0 machine 6 An Index register INDEX this register is u
30. arger Language with error recovery SL1 This successfully minimised the encoding which we had to do by hand to about 140 integer codes SL1 was used to define successive versions of SL as facilities were added to KHAR The present interface to the KHAR system is by using SL4 These SL languages are more fully described in chapter 6 Diagram C on the following page nou presents a picture of the KHAR system set up to translate AML 1 CJENKINSJ 3 4 AML 1 Language Descr iption In SL 4 Program in AML 1 Diagram C coding process coding KHAR SITI DICTIONARIES Diagram C 3 5 unix assembler code for motorola 6890 KHAR can deal with languages which have a context free grammar CFG such as AML 1 in which no semantic knowledge is needed to parse sentences successfully in one pass The system up to this point is only scene with establishing that the syntax of the program is correct and syntax here is used in its narrowest sense As remarked by Watt in the introduction to his thesis CWATTa practical languages do not have Context Free Grammars We see this easily enough iby inspecting the syntax of PASCAL as given in JENSEN or CWIRTHDI and find it true also of PL O CWIRTHb We see immediately productions Like lt typeidentifier gt lt identifier gt and lt callclause gt lt CALL gt lt procedureidentifier gt lt gt Me thus have to modify the syntax of PL 0 the
31. as been checked with all the valid elements of the current state and an error has been detected As we mentioned earlier when the expected symbol is a matrix number the program syntax checker before entry to this new matrix checks the current symbol with the valid element of atirei state of that matrix and enters if and only if a match occurs It is obvious here that when we enter a new matrix the current source symbol will definitely match with one of the valid elements In other words we can not have an error so that an error part is not needed The main matrix is an exception to this Conventionally we use STOP as a dummy action to satisfy the syntax of SL We give on the following page the formal algorithm 4 9 FORMAL ALGORITHM push entry point to matrix no 99 onto stack read first symbol end of program FALSE WHILE matrix on stack AND NOT end of program DO BEGIN WHILE NOT exit AND NOT end of program DO BEGIN IF entry in transition data under pointer indicates end of state THEN BEGIN IF NOT Looking for director symbol in first state of matrix THEN BEGIN report error and call error recovery action END ELSE e BEGIN return via stack to point of departure in transition data from which entry was made to this matrix and access next entry pop stack END END ELSE BEGIN IF symbol under pointer into transition data the expected symbol is a matrix code THEN BEGI IF this stack is not already o
32. ate is 1 The THEN es of IF state NOT EXIT is taken so that this next state is accessed The next symbol example is read and the inner WHILE loop repeated since both conditions are still TRUE The flow of control follows the same path as just described and example matches IDENT This moves access to state 2 and the same flow occurs until the IF current symbol matches symbol under pointer statement is reached when the ELSE part is taken and the next expected symbol in the current state is accessed The loop is repeated but now a match occurs state 4 is accessed and the next symbol DATA is read The loop is repeated with a match occurring which takes access to state 8 and the next symbol is work space the loop is AN work space matches IDENT stae 9 is accessed and HERE is read on looping HERE does not match AT the first expected symbol so the next symbol in the current state is accessed and on looping this does match so state 35 is accessed and the next symbol BYTE is read On looping the ELSE part of the first IF is taken again but the THEN part of the next IF matrix code statement is obeyed 4 13 The point of departure is pushed onto the stackand the first state of matrix 98 accessed via the the index table The register level is increased by one to indicate that a matrix which is a potential goal has just been placed on the stack The loop is repeated so that the usual ELSE ELSE ro
33. ause it is not preceded by a But in the following SL statement 5 2 gt 4 end gt 5 stop after we can have either valid element or end state folloved by svntax error actions So having read the first 2 there is an ambiguitv if this is end state that is if this state is an error state or a valid element Using N before 2 takes its special meaning end of state symbol So if is to be used as the first valid element of a state it should be preceded by a N ry 6 17 DESCRIPTION OF TRE STATENENTS ALL statements in an SL program have the same Layout Being in a particular state we expect the current source symbol to match with one of the valid elements of that state and the next state after that valid element is the address of another statement where we can find the next expected symbol The error action part at the end of each statement is for error recovery if none of the valid elements matched the current source symbol A valid element in statement can be a reserved word an integer an identifier a constant string a matrix Label a statement label any special action null svmbol anv svmbol A next state in statements is an existing statement Label 6 18 ACTIONS The actions are Listed here but the majority of them have already been explained elsewhere The chapter is indicated USE Described in chapter 4 SULL bescribed in chapter 4 STOP The whole process would stop se
34. be assumed to be classes or groups and a code can be asigned to each in order to identify them The assigned codes to the above classes of objects remain invariant for all the programming languages we consider Three types of coding arise as undernoted 1 Fixed codes as shown in the table on the next page Although the choice of these codes is arbitrarily associated to the language constructs we assume that for our purpose they remain invariant for all the programming languages to be considered in the work z 3 17 FIXED CODE MEANING 0 undefined in code files comment delimiter in Language Symbol Data reserved word standard procedure identifier standard function identifier SL reserved words single character delimeter double character delimeter triple character delimeter identifier in general constant identifier function identifier procedure identifier tvpe identifier variable identifier field identifier comment delimiters constant string delimiters unsigned integer constant string transition matrix state number valid element action number nil symbol NI N NJ J J N fJ m m m QCURUNSODONERUNHONVDO NO WN any symbol 100 end file 999 end Line 2 Semi fixed codes which change from one Language to another but are fixed within one language e g codes for language symbols 3 Variable codes which change from program to program e g user defined identifiers The code association is acc
35. cated using SEARCH when the load or store order is generated while scanning lt statement gt 5 30 CHAPTER 6 THE INTERFACE TO KHAR As was introduced in chapter 3 the interface to KHAR for the language designer is a Syntax Language which he will use to encode a Linearized form of the syntax graph and error recovery actions etc needed for a particular pass Ve discuss the Syntax Languages in general then present SLO in detail giving the hand coded tables needed to implement SLO having first discussed the Special Actions which are all that may be used in SLO Ve then present the Linearized graph for SL1 which is translated using KHAR set up for SLO to produce the more useable SL1 We then show the syntax of SL1 and coclude the chapter by presenting that of SL4 the current interface to KHAR SYNTAX LANGUAGES In this part we introduce a family of languages SLO SL1 designed for creating the transition table and the action table of any other language we may implement One of the key ideas in the design of SL series is to make possible the automatic creation of these tables The transition table and action table of at least one of SL languages must be made manually and we do this for our smallest language in the SL series SLO Using SLO we may define the SLI tables which may be created 6 1 automatically by SLO and so on Eventually SL i is suitable as the user s interface to KHAR That is to say that
36. cture of the system and briefly presenting its processing of syntax and semantics before presenting the designer s interface the information stuctures and process involved in converting the external representation of a graph or program to an internal one carrying this through to a practical example of the use of this interface for AML 1 KHAR consists of a pushdown automaton which traverses the syntax graph coupled via the obeying of verbs to a pushdown transducer Errors are dealt with by error productions Since KHAR deals separately with syntactic error recovery and the other tasks error recovery can be often achieved by using an existing production in the language This is done for PL O where the emphasis is on the semantic checking and code emission tasks The pushdown transducer is similar to that of Cordy CCORDY but is structurally simpler since only one aspect of the semantic task is dealt with in one pass For example at no time does KHAR access tables of information about identifiers ALL transmission of semantic or environmental information is via semantic tokens emitted by a previous pass Error situations do not have to be defined Semantic checking is achieved by defining acceptable syntax graphs for valid combinations of types and operators Transmission of information up and down the Abstract Parse Tree is achieved by successive alternating passes through two complementary phases to check the semantics of expressions T
37. cur In the following diagram we show matrix 92 together with its error action matrix in complete form to BYTES BYTE REF IDENT STRING ENDDATA sie pr nie cn ess Oo 1 3 51 STOP esse fesses e ESS 11 2 2 G0 gt 7 le ra lassi es fes 2 EXIT Go gt 8 es asini ran sn awn 3 4 Go gt 9 ces SS 4 3 EXIT 6059 o das jew jet a pere je 5 6 60 gt 10 ra s pass c peus res Pe 6 5 EXIT GO gt 10 JENDDATA BYTE REF DATA PROC INTEFRUPT MAIN 26 25 ele Jesse se tie 2 JExIT EXIT EXIT EXIT 7 sid lane Pale Guzi bazi Lo 8 EXIT anni sea Scat aa EE i 9 4 4 EXITJEXIT EXIT EXIT 9 7 esa e ai 10 6 6 EXIT EXIT EXIT EXIT 10 Note that the anything symbol 26 is always used to force looping in an error state until a symbol is read and matched To send the pointer to an error state after error detection ve 4 23 L have action GO For example at the end of state 2 we have G0 gt 8 which means go to state 8 for error recovery Se as attus error action parts from the states and putting them into error states is useful when the same action patrs are repeated for several states We can adopt a mixed policy If an action is used o
38. ded as entries into tables for interpretation by a single translator engine a donkey engine which laboriously translates source code into object code while consuming as little work space as possible This engine has been naned KHAR the farsi for donkey The desire to be able to develop stand alone good compilers was advanced as one reason for adopting a multipass approach A second 1 2 issue is that of portability a good electrical engineering language must be rapidly adaptable and transportable as new microprocessors appear on the market The general issue of portability and one specific approach to it the abstract machine is discussed by Poole in CBAUERed The PASCAL compiler produced by the Free University of Amsterdam generates optimised code CTANENBAUMb which is interpreted by a machine EN 1 designed to be portable across a range of microprocessors A similar approach to use pseudo code as a step in the production of machine code was presented by CPASKO These approaches just outlined are examples of the use of intermediate abstract or virtual machines to map the programming environment and accessible abstract machine of PASCAL onto different actual hardware The implementation of the AML sequence of Languages cannot use this approach since the virtual machine manipulated by the programmer must be the same as the actual underlying hardware Me require a high level language whose virtual machine manip
39. development KHAR however is table driven throughout that is not only in syntactic checking semantic and code generation actions but also in error recovery Notably the error recovery and reporting are driven by the syntax information Conventionally major attention has been paid to the syntax phases of compilation when considering table driven methods Wirth discusses this WIRTHD and this was the basis of Glennie s pioneering work CGLENNIEJ Cordy CCORDYJ has introduced semantic graphs as a means of defining the semantic actions to be taken by shade phase of a compiler given that the input stream is guaranteed to be free of syntactic errors This table driven approach has been included in the system described here although in KHAR the number of semantic primitive operations can be reduced due to the multi pass approach adopted 1 As discussed by Bauer in BAUERed and by Wirth in CWIRTHbI a 1 4 systematic method can be used to derive the coding of a translator from the grammar of its language into the code for a suitably defined abstract machine Both state that error recovery cannot be treated in such a systematic way and that ad hoc methods are required necessarily requiring an understanding of the errors most Likely to be made and of the syntax of the language so that sensible recovery can be attempted Error handling and recovery in an efficient manner has been attacked using table driven methods by James JAMES
40. e BYTES BYTE and REF and the corresponding next states are 1 3 and 5 respectively 4 3 To fill the elements of row i of a transition matrix we put the corresponding next states under the appropriate columns Then the set of elements corresponding to these columns are valid elements of that state The other elements of this row would be empty Ve do the same for all rows and we have a transition matrix for that graph Ve associate the graph number with its transition matrix and call it the matrix number The corresponding transition matrix of graph 98 would be the following matrix BYTES BYTE REF IDENT STRING ENDDATA CANNES 1 RE 2 EXIT 3 MEZO 4 3 EXIT 5 J e 1 6 5 ext fig 4 transition matrix number 98 This is not the final form of our transition matrices There is a syntax error action part for each state to be used for error recovery and also associated with each valid element of each state there would be some semantic and code generation actions
41. e compiler to a minimum and make it possible to use computers with Little available read write storage A process might itself be set up to read one or more input streams in a prelude or setting up phase Thus the general model of the work can be illustrated as shown in the following figure 2 2 l source program gt processi lt intermediate file 9 gt table gt process 2 lt of EN sie infor intermediate file lt mation intermediate file lt gt gt process n code lt lt This shows a number of subprocesses which transform a stream of symbols which can be understood by the programmer as a program to a stream of symbols which is a program for the actual machine The input consumed by one process is the output of a previous process and is the concrete Linear representation of a data structure 2 3 This data structure is transformed into output in some specified way by the process that is through the execution of some set of instructions a program or a procedure The table of information on tad the right of the figure given above is a set of files associated with processes l It should be noted that this base of information changes state as the result of the process The individual processes mav need access to
42. e 16 action rw file consists of all the reserved words used in syntax error semantic and code emittina actions and is made by hand That is these are reserved words used in our small language SL File 25 is also made by hand and it consists of all the symbols used in the language we wish to implement plus all the symbols used in our small language Apart from these two files 16 and 25 the others are made by executing programs in the system which are explained in the next section 7 9 PROGRAHS used in the system There are 14 programs used in the system In this section we explain their tasks the files they use and the files they create We refer to these programs as A B N program A This program reads the language symbols file 25 and makes three files 1 codename file 1 2 cnindex file 2 these two files are explained in chapter 3 3 define file1 3 On this third one we write the lengths of the first two files and the base code We append more information to this file in the other programs program B In chapter 6 we saw that the terminal symbols ef a Language can be categorized into different classes This program reads files 1 and 2 then appends some information to file 3 such as the number of items in each class the position of that class in file 2 and so on for those classes of objects which are not present in the language negative numbers are placed for the number of items in that class
43. e allocation of state numbers to state Lines is otherwise free Syntax graphs have to be translated into a data structure To do this in the KHAR system we use the transition matrix as an 4 2 intermediate step in preparing the input to the translator Transition Katrix In Sede lt 6 translate syntax graphs into their appropriate data structures we first interpret each graph to a transition matrix This is best explained through an example Therefore we translate one graph from AML 1 say syntax graph 98 to its corresponding transition matrix We refer to directed Lines as states and these are already numbered on the graphs so we know how many states we have say m Also we can count the number of valid elements of the graph say n First we draw anm X n matrix Assign valid elements to the columns and state numbers 0 to m 1 to rows In this particular example we have 7 states Q to 6 and 6 valid elements vis BYTES BYTE REF IDENT STRING and ENDDATA since size matrix 93 is IDENT or STRING A tine leaves each valid element of each state This Line is either an exit Line or a state Line In case of a state line it has a state number and according to the state we are in we call this the next state So for each state i ue Have some valid elements and corresponding to each valid element there is a next state or exit code In our example the valid elements of state O ar
44. e chapter 4 Loop n See chapter 4 n is any state number in the same transition matrix E on See chapter 4 n is any state number in the same transition matrix ia 6 19 POP PUSH amp SET These three actions are semantic actions described in chapter 5 EMIT and ERROR These two actions have the same arguments The difference is that EMIT outputs on output file and ERROR outputs on error file They can have any number of arguments seperated by commas in parantheses Arguments and meanings are as follows css outputs the current source symbol RG outputs the content of register constant string outputs the pointer to constant string xn u outputs the n th element down the stack XINDEX outputs the element in the stack accessed bv the current value of INDEX ZINDEX n Mi XINDEX n outputs the element n from that accessed by INDEX If the argument is followed by a full stop then the actual Symbol would be output 6 20 For example EMIT CSS outputs the internal code of current source symbol but EMIT CSS outputs the actual symbol NL outputs a new Line SP outputs a space LN outputs the Line number TAB outputs tab SYNTAX OF SL4 SL4 is the current interface to KHAR We present its syntax as a set of graphs and give its encoding in SL1 6 21 cn SYNTAX GRAPHS i Matrix 99 SUO 390 i Bulqqlwa sposo
45. e end of each block in an SL Language at the time of code 6 6 emitting SA2 This is normally called after each state number is read The action is to remember starting point of the current state in transition table SAS This puts the current symbol on the first avalable item of the array transition table EM This changes the sign of the next state symbol and puts it in the transition table array so that at the end of current matrix they are recognized as they are negative and changed to their actual address by action SA1 SAS This action remembers the beginning of each matrix and initializes all elements of state addresses array into a negative number 6 7 SA6 This action comes at the end of the Last block in an SL program counts the number of matrices records matrix codes with the address of the beginning of each of them in the transition matrix records the length of transition matrix writes transition table on the appropriate file and writes the action table on its file s Puts the current symbol on the action table SAB Puts the next state on the action table SA9 Puts a zero on transition table Puts a zero on action table SA11 Puts the current pointer of action table on transition table 6 8 SA12 Put rw stop on the transiton table This is the only syntax error action in SLO Syntax error action In this part for SLO we have no syntax error recove
46. e use symbol gt to Link symbols with their next state and to separate the alternative symbols So the error action part of state 1 can be written as ENDDATA gt 2 DATA gt 7 PROC gt 7 INTERRUPT gt 7 MAIN gt 7 and after putting the error actions into the matrix 98 it would be n 4 21 as follows BYTES BYTE REF IDENT STRING ENDDATA 25 ENDDATA gt 2 DATA gt 7 PROC gt 7 INTERRUPT gt 7 MAIN gt 7 SYTE gt 4 JENDDATA gt 4 DATA gt 7 PROC gt 7 INTERRUPT gt 7 MAIN gt 7 BYTE gt 4 ENDDATA gt 4 DATA gt 7 PROC gt 7 INTERRUPT gt 7 MAIN gt 7 REF gt 6 ENDDATA gt S DATA gt 7 PROC gt 7 INTERRUPT gt 7 FAIN gt 7 JREF gt 6 EMDDATAD6 DATA gt 7 PROC gt 7 INTERRLPT gt 7 MAIN gt 7 ERROR STATES An alternative implementation of adding error action parts to transition matrices is to add error states to the matrix and to add the set of follow symbols to the head of the matrix Since these matrices are an intermediate representation between graph and encoded form we used a compressed representation which shows these additional 4 22 states and symbols together with any of the ordinary symbols of the matrix used in error recovery in an error matrix Access is made to the indicated error state when an error occurs and the entries in these states direct the scanning process back to a normal state in which recovery will oc
47. ent symbol there is no match and the next expected symbol is accessed As this is 0 end of state on looping the IF end of state statement has its THEN clause executed for the first time and the IF NOT Looking for 4 15 director symbol statement obeyed Since we are seeking a director symbol the ELSE clause is executed and the point of departure into 95 recovered from the stack which is popped and level is decremented so that Eis one matrix remains as an unsatisfied goal and the next entry accessed as the expected symbol On Looping this IF NOT statement is obeyed again since the end of state marker 0 Lies under the pointer The departure point into 96 is recovered and the next entry in that state accessed the stack being popped and level decremented beconing O The accessed expected symbol is ENDPROGRAM so on Looping a match occurs The next state is EXIT so exit becomes TRUE and on Looping the inner WHILE Loop is left As exit caused the termination of the loop the exit from matrix part of the IF NOT end of program statement is obeyed The point of departure is recovered the stack is popped and becomes empty The accessed next state is EXIT so exit becomes TRUE On looping the outer WHILE loop is left since the stack is empty and execution concludes with reporting that the analysis is finished ERROR RECOVERY A possible implementation of error recovery could be made by writing the a
48. er concerns In KHAR a form of intermediate text is used which is chosen after balancing the requirements of readability to aid development and compactness to save space In principle further research could be undertaken to determine the most effective form of encoding for the system based on known or measurable statistical information about programs written in each language but we do not consider this further here Integer representation requires two bytes of storage at most for each code in read write storage and between 3 and 5 characters on backing storage including the space separating integers SYNTAX IN KHAR As remarked in CWATTa a CFG grammar serves as a means of 2 8 communicating both between the language designer and the programmer and between the language designer and the implementor Watt states that a well designed CFG can similtaneously satisfy all the requirements but that unfortunately typical programming languages are not strictly context free Examples of features of a typical language which defy description by CFGs are the correspondence between declarations and spa lacatione of identifiers and the compatibility of formal and actual parameters In a programming Language with generalised data types even type compatibility cannot be defined by a CFG Watt argues that syntax should be extended to cover all criteria for well formedness of a program which can be determined algorithmically Our vie
49. es to the other graphs is otherwise free The table below shows the codes associated with these seven nonterminals Nonterminal Graph code 1 Program 99 2 DbDeclaration 98 3 Location 97 4 Statement 96 S Simple statement 95 6 Conditions 94 7 Size 93 In each square box of the above graphs also is written the appropriate graph code next to their nonterminal names The next table is also useful which shows the number of occurrences of each 3 25 nonterminal in the other graphs Syntax Graph 99 98 97 96 95 94 93 Number of Occurrences once in 99 once in 99 and once in 98 twice in 99 and three times in 96 once in 96 and once in 94 twice in 96 once in 98 3 26 gt CHAPTER 4 SYNTAX amp ERROR RECOVERY In this chapter we discuss the use of syntax graphs and the intermediate step in their encoding the transition matrix We give the algorithm used to check syntax graphs supported by an example before discussing error recovery and its encodina SYNTAX GRAPH We always represent the given syntax of a language as a set of graphs the so called syntax graph as in CWIRTHbI We assign a label to each graph from 99 down to 50 This allows for 50 graphs Label 99 is devoted to the main graph Allocation of Labels to the other graphs is otherwise free but conventionally from 98 downwards It is useful to make a table of occurrence of nonterminals in the
50. expressions shold 8 6 be sufficient APPLICABILITY TO PROGRAMMING LANGUAGES FOR MICROPROCESSORS The problem of language design for microprocessors has been brit outlined in the introduction In summary we may say that two conflicting design goals have to be attained The language must provide the user with access to all the features of the machine yet provide him with all the protection which can be given by a modern high level Language Yet another aim of the designer is to make the language useful for the programming of more than one microprocessor If he succeeds in doing this then the user will have Lost the semantic clues given to him by the peculiarities of the assembler about the semantics of the machine for which he is programming Ve think that the ability to introduce separate definitions of the semantics appropriate to different machine architectures as separate passes associated closely with the code generation for that machine which exists in KHAR would allow the language designer to check the static semantics of the program by defining an appropriate pass 8 7 gt CONCLUSIONS This work on KHAR which began by reconsidering compiler technology to achieve a highly multipass translator system has resulted in a system which although not fully extended in this work both will be useful in research into the design of high level Languages for low level programming because of the clear interface provided for
51. first expected symbol WHILE of matrix 96 accessed via the index table On Looping there is no match and the expected symbol IF accessed again no match is found and matrix 95 is accessed as the next expected symbol Gn looping the matrix departure point is stacked level increased and the first state of 95 accessed giving CALL as the expected symbol On Looping CALL is not matched as the current symbol is CODE and access made to CODE A match occurs on the next loop and results in 3 becoming the next state The next symbol is read ENDCODE and on looping matches since this is the only expected symbol in that state The state code after ENDCODE is EXIT so exit becomes TRUE and the next Loop exits from the inner WHILE loop Level having been set to O to show that matrices 96 and 95 are now satisfied goals that is a member of a director set has been matched The point of departure into 95 is recovered and the next state entry for 95 is found to be EXIT exit etdes TRUE On looping the same flow of control occurs so that the next state entry after 96 is accessed the stack popped but the entry is 24 does not equal EXIT and the loop is obeyed again with only 99 left on the stack exit being FALSE The symbol to be found however is matrix 96 so the complete process described above from On Looping the expected symbol is found to be a matrix number is repeated until CODE becomes the expected symbol Since ENDPROGRAM is the curr
52. ge SL1 99 L J 98 LC J 97 0 C gt 1 1021 gt 2 sa1 sa2 2 C gt 3 3 22 gt 4 sa2 4 gt 5 5 a gt 12 sa9 sa11 23 gt 6 sa3 6 gt gt 7 7 22 gt 8 sa4 8 gt 9 sa9 sa11 gt 11 sa9 sa9 a gt 12 sa9 sa9 sa9 sail 9 97 gt 10 sa10 2 10 gt 11 A gt 12 sa9 sa11 11 23 gt 6 sa3 12 98 gt 13 sa10 130 gt 14 14 23 1215 15 gt exit sal sas 791 O exit gt exit sa7 stop gt exit sa7 go gt 1 sa find gt 2 sa7 1 22 gt exit sa8 203 sa7 2523 30 23 gt 4 sa7 4 go gt 5 sa7 gt 3 sa7 23 gt 4 sa7 5 22 gt 6 sa8 6 or gt 2 sa 25 gt exit 0 sa0 gt 0 sa7 sa1 gt 0 sa7 sa2 gt 0 sa7 sa3 gt 0 sa7 sa4 gt 0 sa7 sa5 gt 0 sa7 sa6 gt 0 sa7 sa7 gt 0 sa7 sa8 gt 0 sa7 sa9 gt 0 sa7 sa10 gt 0 sa7 sa11 gt 0 sa7 sa12 gt 0 sa7 25 gt exit 6 16 THE END STATE SYMBOL We choose a special character which is rarely used in programming languages and assign it as end state in the SL languages In this implementation we use as our end state symbol If it happened that was one of the valid elements of a language in use we may change that to another character if we Like The ambiguity arises only if we have as the first valid element of a state For example in the following SL statement 5 end gt 3 721 stop the first 2 is recognized as a valid element because after we expect valid element but the second one is end state bec
53. gramming languages Boston 1973 James L R A Syntax Directed Error Recovery Method Technical Report CSRG 13 University of Toronto Computer Systems Research Group 1976 Jenkins D G A Microprocessor Language version 1 private communication Dept of Computing Science Univ of Glasgow 1976 Jensen K amp Wirth N PASCAL User Manual and Report Lecture Notes in Computer Science vol 18 Berlin Springer Verlag 1974 Kernighan B W Plauger P J Software Tools Reading llass Addison Wiley 1976 references 2 CLECARMEJ Lecarme 0 Bochmann G V A Compiler Writing System User s Manual Departement d informatique Univ de Montreal Dec 1974 revised July 1975 CPASKO Pasko H J Pseudo Machine for Code Generation Tech Report CSRG 30 Univ of Toronto 1973 CPIERCE a Pierce R H Source language debugging on a small computer Comp J vol 17 no 4 pp313 317 1974 CPUG Pascal Users Group DEC PDP 11 ESI Implementation Notes Pascal News nos 9 amp 10 p83 Sept 1977 CTANENBAUMa Tanenbaum A S A General Purpose Macroprocessor as a Poor Man s Compiler Compiler IEEE Transactions on Software Engineering SE 2 vol 2 June 1976 pp121 125 TANENBAUMbI Tanenbaum A S Implications of Structured Programming for Machine Archtecture CACH vol 21 no 3 pp237 246 1978 CWATTa Watt D A Analysis Orientated Two Level Grammars PhD Thesis
54. he only 3 1 information placed on the stack of the transducer is in the form of the internal code for tokens This enables the handling of considerable programs within modest read write store requirements Diagram A shows a primitive KHAR system This structure requires that the language designer constructs the driving tables by hand Effectively he has to machine code KHAR This is an unacceptable interface for general use and undesirable for the development pork on KHAR We clearly require a translator from an external representation to an internal one as discussed in CWIRTHb This gives the revised diagram B These diagrams follow on the next page 3 2 Diagrams amp B encoding process 4 i dictionaries Diagram driving tables language description tables program code encoding process dictionaries Diacram B 3 3 Ve now have to supply the translator Wirth gives the PASCAL coding of a compiler which recognises a modified ENF to construct the data structure required to drive a syntax analyser for a language In the KHAR system we use the primitive version supplyina a hand coded pair of tables to implement a zeroth version of a Syntax Languace or Small Language SLO and add table building actions to KHAR SLO has the smallest possible syntax and allows access to the minimum features of KHAR needed to allow the definition of tables for a l
55. her with the INDEX register and its use in other actions These features were added to KHAR in about eight working hours requiring the addition of 60 or so Lines of code to the program of KHAR The syntax of SL was redefined and the SL translator or table builder recompiled within this time 8 3 PORTABILITY We have implemented KHAR using global variables so that the stack mechanism of the c Language is used for subroutine entry and return only The depth of nesting used is below eight Thus the coding of KHAR as it stands demands only a minimal support from the hardware for nesting of subroutines The data structures in KHAR are all one dimensional integer arrays Thus a simple machine architecture with Limited indexing capibility should be able to support the KHAR machine The only other hardware requirement would be inexpensive secondary storage capable of random access from KHAR The final requirement for portability would be a version of KHAP written in a language supported by itself The code generation passes would be redefined to generate code for the new machine and the system recompiled CLEAR B FLEXIELE INTERFACE The interface to KHAR is essentially graphical and its encoding only requires a knowledge of the syntax of SL and the semantics of the simple KHAR mechanisms t The interface itself is completely independent of the language used to implement KHAR and thus remains invariant across
56. ier 8 R i 88 First Pass lt lt 86 lt gt Rs I 9 V 32 lt 309P1 lt 1V33 lt HOP AIG TvJN FUOPI lt T 1V3N lt 449P U vI3I lt ZUOPI mL v3I lt 799P lt vga quopi mk vas JUDPpie 1 vale 799P lt XIH 1NI lt qUSpi lt LvO13 Sg 5 29 Matrices 89 amp 88 Second Pass CODE GENERATION We now discuss code generation in terms of generating code for a simple assembler for the PL O machine that is we assume a two pass process which will generate code addresses A simple assembler could be constructed as two KHAR passes but we take the discussion of assembly code as sufficient for the work at this stage The placing of the code emitting actions in the syntax is derived by Re of the compiler code in CWIRTHb The use of the KHAR semantic mechanism to build the information needed for code emission echoes CCORDY but again the separation of the semantics out from the emission simplifies the graphs and the understanding of what is happening The method uses the register LABEL to generate labels for use in transfer of control instructions and LEVEL and RG to construct the address information on the stack required to generate storage references A triple CSS LEVEL RG is pushed onto the stack for each identifier encountered in the VAR part The triple is lo
57. ins all keywords of that language So the system to be able to compile an SL program should know both sets of keywords These two sets of keywords are unchanged except when we add a feature to the language SL in which case we update the set The other set is the keywords of the language for which ve intend to set up our system We concatanate thesertii and call it keywords Sl keywords always come first This ensures that the codes for SL keywords are always the same 6 5 USE OF POINTERS IN TRANSITION TABLE In our transition table each valid element is followed by three potnters 1 pointer to the same table to indicate the next state 2 pointer to semantic actions 3 pointer to code emitting actions If any of the last two pointers is zero it means there will be no such actions for that element However if the valid element was a matrix number the case is a Litle different as follows 1 the same as above 2 pointer to either type of action before entering the matrix 3 pointer to eitheritype of action after entering the matrix SPECIAL ACTIONS SAO TO SA12 These are the only ones available in SLO They may Of course e be used in all other SLs and in any programming language if necessary N This is the null action used to satisfy syntax of SLO Changes the state numbers to their actual addresses of the beginning of the appropriate state n transition table This is done at th
58. izing work storage requirements We also note the applicability of KHAR to research in Language destan because of its clear and flexible interface We discuss the portability of the KHAR system and its implications for the FSE of compilers for microcomputers We also compare the features of KHAR with a compiler writing system CHAPTER 1 INTRODUCTION MOTIVATION At the start of this work the need for small multipass conpilers to work in small background partitions within on line tire critical process control systems was known to exist PIERCE As no clear candidate existed or exists for a standard real time Language any method adopted would have to be language independent and if possible portable Secondly the necessity to research language designs for use by electrical engineers moving from logic subsystems to microcomputer gii is urgent High Level Languages such as PL 1 and PASCAL are inappropriate unless severely sub setted Any language seeking to be of use to engineers must allow direct access to the byte and multiple length features typical of microprocessors while imposing the necessary conditions for reliability say type checking restriction of access to variables and code structures Such a language called A Nicroprocsss r Language is under development JENKINS The development process consists of the production of a series of refined languages AML 1 AML 2 and so on Refinement requires the use of each lang
59. language used to demonstrate the treatment of semantics in the KHAR system in this thesis chapter 5 to remove this semantic intrusion to produce the syntax of a CFG We use the version of the system outlined so far to check the syntax of a PL O program To deal with the semantics of PL O and any other language we focus our attention on one particular aspect of the semantics and draw a syntax diagram with actions placed to deal with that aspect This is then translated so as to drive KHAR to form a pass of the compiler that the KHAR system will become for PL O This approach allows a designer of a language to focus attention one aspect of it at a time and also allows a user to see the effect of a feature of the language specification by inspection of a diagram Where the internal structure of KHAR is too simple to permit full semantic checking as in the propagation of leaf attributes up the abstract parse tree of an expression we define two passes through KHAR A previous pass has appended every identifier with its type These two passes then run alternatively the first amending the presentation of the first operand operand operator triple encountered so that the second pass can use a simple syntax graph to check for the valid operand type type combinations The approach is discussed in more detail in chapter 5 We emphasise that this multi pass approach makes it unnecessary to include in the KHAR machine itself any concept
60. le 17 is made at the end of this process This process is used after any change in language symbols or our Small Language Process 4 Code transition table of SLC and then execute program K This process terminates with having transition table of SLO made with exactly the same structure as those made automatically by program N Process 5 Code action table data of SLO and then execute program L This process creates the action table for SLO with the final structure usable in program N 7 16 CHAPTER 8 DISCUSSION amp CONCLUSIONS We first discuss the extent to which KHAR meets the objectives of the research as regards the use of working storage and the overall size of the system We then consider how the multipass approach leads to a simple structure for KHAR which in turn gives it both extensibility and portability The need to have a clear and flexible interface for the language designer is discussed The multipass approach adopted together with the graphical Location of semantic actions within the syntax of a language defined as operations on a simple stack mechanism leads us to consider the potential of this approach for the definition of Language semantics ke reconsider the Limited range of languages for which KHAR was intended in the Light of its flexibility and suggest that the approach might be extended to languages such as PASCAL or into other fields of application We conclude by c
61. lt 01 PU00 lt JITHM JUBWe103S NIH lt 48013 pPUODOJ lt IT du 3 U2W2707S GNIS EAN ESRI JS oNISJd cd Pie ivo lt UOP lt S 5 21 LS 96 ODD expression Lexpressi on TITTTI lt gt lt d E N expression Matrix 96 PLO FG 5 22 i Matrices 95 94 amp 93 95 PLS 15 94 factor NEC EB WA 93 ident gt DUO e I gt Lexpression gt 5 23 ATTRIBUTE PROPAGATION IN EXPRESSIONS As outlined in the introduction to this chapter PL O does not require that attributes be propagated within the Abstract Parse Tree We therefore discuss this problem for a language with similarities to ANSI FORTRAN Ve assume that an earlier pass of KHAR has emitted reverse polish or post fix code and has appended type tokens to the identifiers in expressions As in CCORDY we assume that expressions are bracketed in some convenient way The attributes of the leaves identifiers have to be propagated up the tree so that semantic checking can take place This is done by repeatedly scanning the Linearized expression from left to right dealing with one operand operand operator triple at a time As language surveys have shown CTANENBAUNbI the majority of expressions are very simple so this labourious approach will only occasionally result in heavy overheads We presen
62. mation such as attributes addresses dimensionalities and so on will be collected and added to these tables which will be referenced at a later time in the same process to check for semantic errors For each identifier we have one entry in the table and the amount of information we need for that depends on the type of that element therefore we have variable length entries in the table For Algol like languages in which procedures may be nested the same identifier may be declared in different procedures arid each such declaration must have a unique table entry associated with it Each time an identifier appears in the input stream it carries some information This information will be checked against the information we have already in the table by our semantic operations SEMANTICS IN KHAR Four kinds of semantic operation are presented in the semantic charts of CORDY These operations which provide different kinds of actions are 2 11 gt 1 Normal Operation 2 Parametrized Semantic Operation 3 Emitting Semantic Operation and 4 Semantic Choice Operation Tables stacks queues and so on are called semantic data structures The operations are called semantic operations and together a semantic data structure and its associated operations are refered to as a semantic mechanism The semantic operations named above are meant to provide a complete set for accessing and managing a semantic data structu
63. mentor are the same person in the KHAR system since the definition of the graphs can be encoded translated by the system and used to carry out a compiling process for his Language Gries discounts some error recovery techniques since they involve semantic information being discarded which does not occur in this approach since no semantic content has been handled So a wide range of methods could be applied because of separation of the checking of syntax and semantics This has not been developed in KHAR but as was remarked above error correction can often be achieved by using an existing production or by adding a few error productions This allows two nere the designer to concentrate on Language definition at the separate levels ani to ignore the error problem although he can improve performance by adding error productions if he wishes SEMANTICS Semantic analysis is a part of compilation which comes between 2 10 syntax checking and code generation for the purpose of checking the semantic structure of the source program To check for semantic correctness we need information about attributes of all identifiers which are found in the declaration part of the source program In semantic processing we use information tables which we created in the process of encoding the symbols in internal form These tables contain information about identifiers integers and character strings In this process some semantic infor
64. n the stack as unsatisfied goal THEM BEGIN push value of pointer onto stack used to recover this point of departure into matrix access new goal matrix via index table to matrices record another matrix as an unsatisfied coal on the stack Level level 1 END ELSE access next expected symbol in this state END ELSE not matrix code could match current symbol BEGIN IF current symbol matches symbol under pointer the expected symbol MIR BEGIN IF next state NOT EXIT THEN access next state finding new symbol ELSE exit TRUE level 0 sets all matrices on stack as satisfied goals read next symbol END ELSE access next expected symbol in current state I END 27 END END 4 IF NOT end of program THEN exit from matrix has occured BEGIN recover point of departure from stack pop stack IF next state is EXIT THEN exit TRUE ELSE BEGIN access next state to find an expected symbol exit FALSE END END END AN EXAMPLE We describe the action of the algorithm for the small matrix described above Consider the following program which satisfies the syntax of ANL 1 but has no practical meaning and make reference to the transition matrix given on the next page which is part of the encoding of graph 99 for A L 1 PROGRAM example HERE DATA work space HERE BYTE work 1 ENDDATA MAIN CODE ENDCODE ENDPROGRAN 4 11 1 U U U 1 U i i 1 e e a a e e e a e e a L e s a e
65. nce we leave it in the state but if it is used several times we introduce an error state and use the GO verb GENERAL ACTION The use of error states and the GO verb allow the user of KHAR to introduce the kind of error recovery used in AMKANJ an improved form of panic action in which a set of symbols is kept in existence on vhich recovery may occur The technique of CAMMAN is such that recovery can only take place on a symbol within the current non terminal or on a symbol within a non terminal from which this one was called Thus substantial sections of valid text can be skipped in certain circumstances The general action uses an appropriate normal state within the matrix as an error recovery state This is possible because of KHAR s rigid distinction between syntax and semantics The verb LOOP is used to access a state for use in this way The use of the state is exactly as in the normal syntactic scan except that encountering the end of state is not the signal for taking error action but to read a new next symbol and repeat the scan from the beginning of the state 4 24 Since this state may contain non terminals and so on the follow set on which recovery may occur is augmented automatically by all the director symbols of these nonterminals SUMMARY OF ERROR ACTIONS USE n the state n should be an error state LOOP n n is any normal state within the matrix not an error state GO n this action forces acce
66. nd 2 to see if any triple character special symbols symbols made from three special characters exist in the language and if so it reads file 11 replacing all triple character special symbols by their codes and outputs the result on file 13 program H This program codes double character special symbols in the same way as program G did for triples proaram I This program codes all single characters left in the text and finally creates file 15 On this file we only have integers program J There is a file of all those reserved words which are used in syntax error recovery semantic and code emitting actions used in our small language This file is called action rw file and numbered 16 We arrange for this file to be coded using programs C through I Now we have action rw file and its appropriate code file file 15 This program reads these two files and writes some constant definitions to be included in program N As an example suppose we have our action file looking as follows Et 7 12 nl sp rg and Di code file as 132 135 139 the output of this program would be as follows tidefine nl 132 Hdefine sp 135 define rg 139 which are constant definitions valid in the programming language C program K The T tables of all the languages we implement are made automatically by program N except for the smallest language in the SL series which is made by hand Once this is done we have the transition table f
67. ng ends start constant string and end constant string symbols So any number of characters appearing between these tuo symbols are considered as one constant string and will be coded to a single internal code If the end const string symbol is itself a part of a constant string it must be typed two consecutive times inside the constant string For example if these symbols for a programming l anguage are START and END then write START neu is the symbol to end a const string END could be a write statement to output END is the symbol to end a const string In this data the fixed code 0 as shown in the table of fixed codes means comment So any text appearing in the data after 0 is treated as comment A typical example of this data follows which belongs to the programming language AML 1 Please note that the separator between symbols is at least one blank and the format is free 3 12 O from this Line until the first semicolon encountered text is treated as comment and is ignored by the program which uses this data The following are the keywords of the AML Language Numeric fixed codes are 1 for reserved words 2 for standard functions 5 for single character delimiters 17 for comment ends 18 for constant string ends 100 for end of file e 1 abdax at do if in end out ref rep body byte call code data else here main proc then bytes while define endcall endcode endproc program endpr
68. ogram 2 cc cs ge gt hi le ls Lt mi ne pL ra sr vc vs Au Ch 2127 U 17 18 100 There is a program to read this data and make two files the code name file CNF and the code name index file CNIF which are permanent files for their corresponding Language dato CODE NAME FILE amp CODE NAME INDEX FILE CNF is constructed so that it has only one blank as a separator between symbols and the group codes are not located along with the information The group codes are separately located in the first column of the two dimensional array CNIF which is simultaneously created to access the information in the Code Name File The second column of this table contains the pointer which Locates the start of TAR group of information according to the corresponding group code from column one of the table while the third column contains the starting code for each group The codes in column three of this table take into consideration the complete symbol within the grouped information These codes are based on an arbitrary starting code the base code This information on CNF accesed via CNIF regarding the various symbols found in programming languages is then used to code the symbols encountered in the source programs Similarly the symbols located in the transition matrices are replaced by the codes determined from this information and the same applies to the transition tables of the SL Languages To find
69. oice to care or do not care about the tvpe of object being handled in the pass Further the outcome only affects the 8 2 i semantic action taken not the path through the graph Thus our graphs are a reduced form of those in CCORDY The effect of this is to introduce complexity into the graphs rather than into the internal structure of KHAR This complexity can be handled successfully because of the multipass approach adopted Thus use of KHAR to act as a translator for a language with new features say an additional type COMPLEX does not require the introduction of new mechanisms or semantic actions into KHAR For example at one stage in the consideration of attribute propagation within expressions we considered the introduction of a new primitive to operate on the stack However careful reconsideration of the problem showed that syntax graphs and a set of semantic actions could be defined using the existing basic operations to handle this extension The most significant changes between ANL 1 and PL O are the introduction of scope and the need for type checking These changes required the change of KHAR from a machine capable of generating code for a language with a CFG to a machine capable of generating code for a typed block structured Languages The change required the addition of six new operations defined on the stack and the ability to index the stack that is NARK FLUSH SCOPE SEARCH and CHECK toget
70. omplished automatically by a program called codefile maker starting from a base code We reserve all one digit and 2 digit integers for use as fixed codes all 3 digit integers for semi fixed codes 4 digit integers for variable codes and have defined 100 as the base code It should he 3 18 be noted that this is arbitrary and can be readily changed to increase space efficiency A PRACTICAL EXAMPLE When we want the system for a particular language L 1 Switch the system to SL language 2 Write a program in SL for the language L 3 Compile and execute this program this builds up tables which control the recognition and parsing of statements in L 4 Replace previous tables by these new ones and the system is ready for the language L In this part we explain how to prepare the system for AML language ALL details are included and can be used as a pattern for using the system Finally we do some changes in the syntax of A L and see the consequent anc necessary changes in the system AML Programming Language AML is a high level assembler designedCJENKINSI to relieve the programmer from some of the tedium of knowing the exact syntax required to use the addressing modes of the microprocessor via the manufacturer s assembler 3 19 The following table gives the grammar of ANL 1 in BNF lt program gt PROGRAM lt identifier gt lt location gt lt program body gt ENDPROGRAM lt program body gt DEFIN
71. onlv a subset of the information in this table and access is controlled soia These processes numbered 1 n can either be the execution of distinct programs or several executions of the KHAR machine each controlled by a different syntax graph or under the control of one graph with KHAR operating successively in each of three of its four modes Modes 1 2 and 3 are used to translate programs and modes 1 and 4 used to encode transition and action matrices by translating syntax graphs encoded in one of the SL languages Ve do not show the syntax graphs in the figure since they are a separate set of structures determining the processes We show this aspect of the system more fully in chapter 3 GRAMMAR AND FORMAL DEFINITIONS ALL language is based on a vocabulary For programming Languages the elements of this vocabulary are called language symbols or language keywords For each of these Languages is defined a set of rules or formulas which define the set of well formed sentences 2 4 These rules are called productions because they determine how a formally correct sentence of the Language is produced A grammar G Z of any programming language of this class is defined to be a nonempty finite set of productions Z must appear as left part of at least one rule and is called the distinguished symbol Those symbols appearing as a left part of the rules are called nonterminals and the other symbols are called terminal
72. onsidering the application of KHAR to the compilation of languages for microprocessors 8 1 SIZE The present KHAR implementation occupies about 12000 bytes of code with 2000 bytes of constants and requires 16000 bytes of working storage Only 4000 bytes of this working storage are required for the KHAR machine itself The remainder is used for data which can be kept on secondary memory The encoded graphs for PL 0 including code generation require about 7000 bytes and may be regarded as read only constants Thus we estimate that a working compiler for PL O could be implemented using 24k bytes of ROM for the fixed tables and code of KHAR 8k bytes of RAM would Leave about 6000 bytes free for program text and dictionaries since KHAR uses less than half its work space in compiling PL O SIMPLICITV amp EXTENSIBILITV The semantic mechanisms proposed in LCORDVI for SP 6 a severe subset of PL 1 consist of a symbol table modified to behave as a stack and three other stack mechanisms one of which also used entries of the same class as those in the symbol table The mechanisms consist of four stack structures and over 50 semantic actions defined for the structures This is an order higher than KHAR Cordv has to take account of tvpe within his mechanisms and introduces semantic choice actions which choose which path to take We avoid semantic decisions based on knowing within KHAR the type being handled We reduce the ch
73. or this Language SLO in table in file 15 This Aros ca ilju this code file and outputs it on file 18 with the actual structure usable bv program N program L This program is used to make the action table of our smallest language in SL series as in program K 7 13 prog ram M This program reads file 6 and amends some constant definitions on file 19 program N l This a has 4 modes 1 syntax kai 2 semantic checking 3 code emitting 4 transition table making In mode 1 the program reads source text which is now converted into integer numbers from code file and using the transition table checks for syntax validity We use mode 2 when there are no syntax errors and mode 3 when there are neither syntax nor semantic errors 7 14 LE PROCESSES USED IN THE SYSTEM In this system we have the following processes each using a sequence of programs Process 1 Executing programs A and B This process is used each time a change is made to language symbols Process 2 Coding As we explained earlier there are two ways of coding a file 1 code all symbols in this case we execute programs C D F G H and I C reads the input text and I terminates with having file 15 all codes 2 Code all symbols but integers We do the same as above but we use program E instead of D Process 3 Code action rw file using process 2 do not code integers and then execute program J Fi
74. peated with the addition of semantic actions in the following sections The only actions placed in the linearized coding are the error correcting actions and may be seen by inspecting the appendix PASS 2 dealing with manifest constants In SEGA 98 each time on entering block we mark the stack and upon exit we flush the stack that is to remove all entries down to and including the mark For each identifier we use SCOPE to check if it is declared already If it is we use ERROR to output a message If not declared at this level we push the identifier actually its index onto the stack For each constant identifier encountered we push two entries onto the stack its code and its value For other identifiers we push the code and 0 as a don t care marker When we are in factor we SEARCH the stack for identifier entries If the matching identifier has zero as a value above its code then it is not a constant in that scope Remember that constants are represented by indices to their denotations If the value is not zero then it is used in EMIT to replace the identifier code otherwise we emit the current source symbol Note that we set RG to EMIT so that symbols are copied unless otherwise required by the semantic action Ve report an error if any of the constant identifiers on the stack appears 5 6 wi in VAR part as a procedure ident in matrix 98 at the left hand side of in matrix 97 or
75. ppropriate error message and accessing an indicated state in an error recovery table where the error recovery process reads source symbols one by one ignoring them until a specific symbol is read For example ignoring symbols until the end of statement symbol is found This is often called panic mode recovery In this case it fails to report further failures Su that statement if any and also may cause many other errors For example if an error occurs in VAR kz 4 16 z t statement in PASCAL program and we ignore to the next semicolon at the end of that statement we have ignored some identifiers and wherever they appear in the rest of program they are undefined sn will cause new errors Sometimes in many compilers it occurs that because of a single error several error messages will be generated which should be suppressed in the error recovery process After detecting an error a good compiler should try to determine what correct symbol had been intended initially For example if ina PASCAL program the reserved word VAR be misspelt usually all the identifiers will be undefined whereas it could be checked for a misspelling of one of the reserved words valid at that state and there is a good chance it would be corrected in the right manner In our error recovery we have one general action and some specific actions Each of these actions may be used for each of the different languages we implement These error actions are in effect
76. r cm C Cc A T ms O IG UI mm 1 1 1 t t 1 t i I 1 1 1 i i i i e cmo e uw mt I Ot O I 1 LI 1 1 1 1 f d 8 1 e i 1 I i i i U U i U U 1 U 1 H 1 lt U 1 LI U a t 1 1 Ole 1 i i f 1 O I t 1 LI t c I i 1 1 1 1 H a I 1 i 1 1 U i l t U i 1 1 U U U U U LI 1 1 i i LI i 1 I 1 Ole tI N I ml Lin O IN IO POLO SINIM t IMIN 1 Min Ol i t 1 1 t 1 Iw lt I i NV INI MIMIMI and set stack is PROGRAM is read The outer WHILE Loop is entered 4 12 The index to the matrix number 99 is pushed onto the symbol first since neither condition is FALSE note that end of program the TRUE by enccuntering the end of file sentinel code while expecting another Language symbol it is an error condition The inner WHILE loop is also entered The entry in the transition data first accessed corresponds to PROGRAM so the ELSE part of the IF statement is taken As the symbol under the pointer is PROGRAM the ELSE part of the IF statement in the ELSE part is taken The first symbol read matches PROGRAM so the next state entry is examined This is the same as the entry in the transition matrix under PROGRAM so that the next st
77. re and the operations on any of the data structures are restricted to these four classes This restriction makes possible a generalized automatic chart interpreter Semantic mechanisms which are commonty needed in producing a set of semantic charts for a programming language are discussed These are 1 symbol table mechanism 2 type stack mechanism si 3 count stack mechanism and 4 address fix mechanism The multipass structure of the KHAR system reduces the mechanisms to one plus a group of registers The operations on the single stack are fewer in number and are distinguished by having no knowlege of the attributes of identifiers This is discussed fully in chapter 5 2 12 CODE GENERATION In KHAR we are dealing with a language in which the programmer desires to control the code exactly this being a design objective of the system Thus code generation is simple as the high level Language must map one for one into the machine order code This mapping may be constrained by our semantic typing and some orders especially transfer of control orders hidden by the flow of control structures of the language At the other extreme say PASCAL we may generate code for any suitable intermediate pseudo machine CPASKO CTANENBAUMbI We have at present generated code for a high level assembler AKL 1 and PL O CHIRTHo 2 13 CHAPTER 3 DESCRIPTION OF KHAR We describe the KHAR system by presenting the stru
78. ry Any error causes the compilation to stop at that point 6 9 SYNTAX GRAPH FOR SLE w SUO 720 jojoeds queus gt p DA lt JOqunu DJO 35 TRANSITION TABLE FOR SLO 99 0 102 0 1 6 2 0 3 34 4 o 5 1 6 21 7 12 8 0 9 3 10 0 1 1 12 L 13 18 14 O 15 0 16 0 17 1 18 22 19 24 20 O 21 6 2 0 23 1 24 25 30 26 0 27 0 28 o 29 1 30 23 31 36 32 0 33 8 34 0 35 1 N OsSNSOV0ONV N o O N N O ru O co O iw O 0 0 O N U 27 68 18 27 68 18 30 20 6 11 EN o O ser OO O O DU O O DV O rs O c ND N 00 N O rJ O O w O o ACTION TABLE FOR SL 0 37 0 0 1 stop 2 0 3 sal 4 sa 5 0 6 sa 7 0 8 sa3 9 0 10 saw 1 0 12 sn 13 sal 14 0 15 sa9 16 sa9 17 0 18 sa 19 0 20 sa10 21 0 22 sa9 23 sa0 24 sa10 25 0 26 sal 27 sa6 28 0 29 sa9 sa9 sa9 sa0 sa12 sa10 u w LI LI LI LI vn EN o Ww LI NO ias 6 12 13 SUO 130 m mi ee Jequnu J x J 0u 3 6 SUO 390 1 B8uj33lwe pos queus 2 ST proa Jequnu Ss 230185 93015 4X3U SYNTAX GRAPHS FO Matrix 99 Matrix 98 za YO JaqgWNnu 9q DJS 09 JUSWa o PI DA 3J4equnu e qpoys n 9 Matrix 97 s PS 6 15 mam lt man The following is a complete program written in SLO for our present small langua
79. s 2 underlined square brackets C and J enclose optional statements or expresions such as Label J 3 three dots indicate repetition of preceding item one or more times CONVENTIONAL COMPILATION PROCESS A compiler or better translator is a program which accepts as its input a source program written in some high level language such as Algol 60 or FORTRAN and produces as its output the appropriate code for a specific computer This output is calted the object program This process is traditionally divided into analysis of the source program and then synthesis of the object code In the simpler analysis part the compiler accepts the source 2 6 program discards those parts of the source which are not to be compiled such as comments and transforms the source into tokens The source program is now a Linear string of symbols the input characters having been grouped into tokens of the language such as language symbols identifiers etc The compiler also builds several tables of information during the analysis part for the definitions of the new tokens for example identifiers These tables are used during both analysis and synthesis phases After the execution of these lexical tasks the source program has been converted into its basic tokens and is ready for syntax and semantic analysis in which the string of tokens is scanned using the syntax rules the tokens are grouped in order to make sentences
80. s Therefore the vocabulary V of a language is the union of its terminals and nonterminals ke will use underlined angular brackets lt and gt for nonterminals to distinguish them from terminals We only consider those languages whose grammars satisfy the following rules 1 The initial symbols of alternative right parts of productions the director symbols must be disjoint The initial symbol of A is the set of all terminals that can appear in the first position of sentences derived from A 2 For every A e n t gi symbol A which generates the empty sequence et the set of its initial symbols must be disjoint from the set of symbols that may follou any sequence generated from A This point is discussed by Griffiths in ch 2 b of CBAUERed and by Wirth in EWIRTHbJ NOTATIONAL SYSTEH To describe these productions we need a notational system or in other words a metalanguage a Language in which we can describe another language The notation we use is called Modified Backus Naur 2 5 Form MBNF It presents an exact explanation of the language construction These notations are as follows 1 underlined braces and are used to enclose multiples from which a choice must be made Expressions may be presented vertically Like this lt label part gt lt const part gt or horizontally Like this lt lebel part gt lt const part gt note the meta symbol vertical stroke between expression
81. s given in the additional material LISTINGS OF THE KHAR TRANSLATOR SYSTEM FILES used in the system 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 In this system we use 26 files as follows codename file code name file cnindex file code name index file define file define file xendefine file code name define file nocomment Call comment removed const file constant file nostring all strings encoded cs file constant string file csindex file constant string index file intgr file integer file noidentifier all identifiers encoded id file identifier file notriplechar all triple characters encoded nodoublechar all double characters encoded code file code file action rw file action reserved word file xdefine rw define reserved word 7 8 18 xttable file transition table file 19 xttdefine file transition table define file 20 ttaction file transition table action file 21 ttcsindex file transition table constant string index file 22 xttcs file transition table constant string file 23 output file 24 error file 25 lang sym language symbols 26 any text this file is any input text for which we wish to convert all its basic symbols to their corresponding codes We refer to the above files by their numbers 1 to 26 Only those files whose names are preceded by an are permanent to the system and the others are temporary files Fil
82. sed to index into the stack and is set by the use of the SCOPE or SEARCH verbs 7 Top of stack register this is used to access the top of the stack it is not explicitly available to the user of KHAR SEMANTIC ACTIONS We may now describe the semantic actions or verbs which operate upon the semantic structures of KHAR MARK This action increases LEVEL by one and puts MARK on the stack FLUSH This action removes all entries down to and including the MARK from the stack and LEVEL is decremented by one If 5 3 MARK was not found on the stack the stack pointer is set to 0 and LEVEL becomes 0 SCOPE SEARCH amp CHECK POP These three actions have similar syntax as shown in SL4 Each one has one argument and two sets of actions One of these two sets of actions will be carried out depending on the result of the SCOPE SEARCH or CHECK action SCOPE first searches if the argument is matched with one of the elements down to the Last mark put on the CORE and accordingly if found the first set of actions is obeyed otherwise the second set of actions would be done This verb is used to check for duplicated declarations SEARCH searches fhe complete stack for a match with its argument Otherwise it behaves just Like SCOPE and is used to locate declared items on the stack Both these actions change the value of INDEX so that values pushed onto the stack next to the matched symbol ma
83. ss to another state without recovery EXIT forces exit from the present matrix to the calling point with recovery Left to the higher Level STOP stops the process of syntax checking SUCC read the next symbol and resume scannina as indicated hn 4 25 zj CHAPTER 5 SEMANTIC PROCESSING amp CODE GENERATION Ve have already outlined the approach we take in the KHAR system In this chapter we discuss the actual mechanism used in KHAR to handle semantics or context sensitivities We then show how these are applied by su their use in the production of a compiler for PL 0 This covers most of the semantic problems encountered in a block structured language but not the propagation of attributes up the abstract parse tree of an arithmetic expression as PL O has only integer variables We illustrate this problem by considering the semantics of expressions in a language such as ANSI FORTRAN THE SEMANTIC MECHANISMS OF KHAR The expression semantic mechanism was used in CCORDY to cover a semantic data structure and the operations upon it His first example of such a mechanism is the symbol table mechanism He asserts that 1 it is universally used 2 contains name of object 3 contains its data type 4 indicates its structure variable array procedure etc 5 contains addressing information and 6 contains auxiliary information dimensionality or number of parameters 5 1 Typical operations upon the symbol table
84. stack If found on the stack there is an error variable identifier redeclared but if it is not found we push that _ identifier onithe stack with 1 above it to mark it as of interest Report an error if any of the variable identifiers on the stack appears in CONST part in matrix 98 as a procedure name in 98 or after CALL in 97 Report an error if ident before in 97 is not on the stack i e it is not declared and marked with 1 ALL other identifers will have been marked with O as in the first pass 5 12 7 Semantic actions Variable Identifiers mark scope css error ln css declared nl push css push 0 scope css error ln css declared nl push css push 13 flush search css check index check index 1 error ln css cannot appear in LHS Ipo hi s csse undeclared search check index check Xindex 1 error ln css not a procedure error ln css undeclared search css error tn css is not defined nl 7 5 13 i LTx3 lt pjuowe oqsj ene R i i 34nd390 4 v lt 390 q 5 14 86 66 Ivens qc STRASS In iP RAS NI Eel eee SD USWS 70 7S qN3 1d u i1o1s lt NIJJE g 199P lt TIVO Il JUSP g 179P albi 7 w ad ident gt L expression gt
85. t a simple example and then give the corresponding semantic graphs Consider the expression I en W X YrT 7Z We assume that it has been translated into an internal form which might be externally represented as W REAL X REAL Y REAL Z REAL Note the use of to represent unary minus and the start of 5 24 expression end of expression markers and We give the semantic graphs and actions for the two passes below while explaining the action of the passes here The first pass transforms our example to lt W REAL gt X REAL Y REAL Z REAL The action of the semantic graph is to stack identifiers with their types until an operator is encountered but emitting operands and their type further than two deep as it does so If the operator is monadic it is emitted folloved by its operand else it is followed by two operands In either case this prefix fragment is bracketed by lt and gt Then the rest of the expression is copied until is reached and emitted The action of the second pass is to match the bracketed prefix operator and its operand s with a syntax graph and semantic code emitting actions The syntax graph defines the valid combinations of operator and types The actual identifiers can be ignored The second graph gives the checking needed for a language which requires explicit type changing The effect of this on our example is to produce
86. t by at least one blank or newline and the same applies to the associated group code number encountered in the beginning of each group R 3 10 The layout of this data is as follows RZE gt Fixed code T endfile lt fixed code a the Language keywords lt associated to the a subset of the items of proceeding fixed code aa a WO ZE CT A Each item of the Language keywords is to appear in one and only one of the above subsets so that each has a unique code These subsets can appear in any order as long as they are preceded by their fixed codes The number of different parts is not fixed It is in this file that we introduce the symbols which may start comment and terminate Commenti They come under the fixed code 17 For example in the symbols of the language Algol W we may write 17 comment or in Pascal 17 These two symbols are separated by a blank and can be any character string The length of this string could be up to the maximum length of identifiers in the Language This means in process of encoding of the source text the system reads and copies all characters until it finds the conment start symbol the first symbol after fixed code 17 then reads and ignores until it finds comment end symbol the second symbol after fixed code 17 3 11 It is also in Language symbol data that we introduce constant stri
87. t matched the current source symbol 3 a pointer to AT for semantic actions 4 a pointer to AT for code emitting actions 1 Although the system includes these in the table whether they are 7 3 required in a sub pass or not the structure could be subdivided if further space reduction were needed Sometimes the error recovery part of a faites in a transition matrix is the same In that case instead of repeatina that part at the end of each state we put it in one state at the end of the matrix calling it the error state and refer to it from the other states Error state has no valid element and in TT it consists of two elements a 0 which can not match anything anda pointer to AT 7 4 A State of TT TT mate ix number matrix number matrix number 7 5 Structure of AT The structure of AT is simpler than TT It consists of a number of action parts Each has some actions followed by a 0 to indicate the end of that action part The pointers to this table from TT are to the start of these parts Execution of an action part terminates n EtcoWA EGiGG the O The diagram on the next page shows this The Structure of AT one state of TT AT element valid element 7 7 FILES amp PROGRAMS USED In this part we explain the programs and the files we use n the system A complete Listing of the source programs i
88. tegers between 99 and 50 lt state label gt integers between to number of states lt valid element gt all Language symbols lt syntax error actions gt depends on which SL i is being defined semantic actions as defined in chapter 5 ERROR EMIT code emitting actions Cspecial actions semantic actions special actions SA1 SA2 SA11 SA12 Different SL languages vary only in the three nonterminals lt syntax error actions gt lt semantic actions gt and lt code emitting actions gt used in the overall syntax of the SL series For SLO these three nonterminals are as follows error syntax action 2 STOP semantic action empty code emitting actions special actions GRAPH OF THE GENERAL SYNTAX A SUO 730 Bu we sposo F SUO 390 gt juDuas 27015 yx u queus o ST pijoa ea A SUO 390 ue T oi mili J a AD 94035 6 4 LANGUAGE SYMBOLS OF SL The present language symbols of SL are 4 tab nl Ln sp rg css set pop push emit error label sa0 sal sae sa3 sa4 sa5 sa sa sa8 sa9 sa10 sa11 sa12 use find go or stop get unget loop exit check search flush index Level mark scope 5 1 SZ 8 amp SEPERATION OF SL KEYWORDS amp LANGUAGE KEYWORDS In our small language we have a set of keywords used in its syntax As mentioned earlier a program in SL is the Linearized form of a language syntax and it conta
89. tely State 0 4 18 There will be no error in the first state of our transition matrices For these kinds of states which do not have errors we place the action STOP as a dummy error action to satisfy the syntax of SL State 1 In this state we expect either identifier or string If any error happens we check for ENDDATA or one of the elements valid after exiting from this matrix As table no 4 shows the graph of this matrix is called only once in graph 99 and by refering to that we see that the follow elements of this matrix are DATA PROC INTERRUPT MAIN So we have a set of symbols and corresponding to each of them there is a next state Vle read and ignore the source symbols until one of the elements of this set appears then we know our next state In the case of ENDDATA we access state 2 of this matrix ENDDATA is a valid element in that state and its corresponding state is EXIT which means exit from this matrix In the case of the other elements of the set we force exit from this matrix To this effect we introduce a new column for the nil symbol code 25 and place a new state 7 in the matrix Its only valid element is this nil symbol for which the next state is EXIT The nil symbol causes a refinemement in the behaviour of 4 19 hos State 2 State 3 State 4 KHAR Nil matches any symbol but no new symbol is read as the next symbol Thus the use of nil makes a key difference in the
90. treatment of ENDDATA and the other keywords on which recovery is made Since a match with unit occurs after the scanner has read one of these this symbol is still the current symbol and may be used to satisfy the syntax of the outer graph from which 98 was called In this state we expect the ENDDATA symbol and the corresponding next state is EXIT Eut if the current symbol did not match we will not Loose much if we go out of this matrix and leave the error recovery to take place in the matrix from uhich we were sent here To do this we just add an exit code under the column of the nil symbol so that if there is no match with ENDDATA exit will occur This state will never produce error so the error recovery part of that is the same as state 0 that is STOP If any error happened in this state we search for BYTE and go to 4 ENDDATA and go to state 4 or for one of the follow symbols as the error action part of state 2 and go out of this matrix The same as state 3 4 20 State 5 Very similar to state 3 but we replace BYTE by REF State 6 We take the same action as state 5 State 7 There would be no error in this state as its only element is the nil symbol To put these error actions in the table we place the appropriate language keywords and SL verbs and symbols in the Linearized form of the matrix as shown briefly in chapter 3 Put at this stage to show these error actions in the matrix w
91. uage by engineers to generate feedback to the designer Since subjective factors are highly important the Language design must be quick to implement be able to respond to user feedback while at the same time providing good error recovery and reporting from the outset for 1 1 otherwise the language would be unusable and no feedback obtained Good languages such as PASCAL were designed to run on large machines CWIRTHa and the majority of implementations of PASCAL still are for large machines The smallest known implementation at present is one for the PDP 11 which runs in a 16k word machine It is thought that this compiler is capable of compiling itself in a 56k byte machine PUG This may be contrasted with the CORAL 65 compiler for the INTEL 8080 which runs in 8k byte neither would satisfy the need to make a high level language available on a typical development system for a small electronics company investigating microprocessors A reasonable starter kit might be expected to A say amp k of RAM but upto 32k byte of ROM If a translator system were made available which as tower work space yet used relatively Little code relying on tables held in ROM or pageable into RAM as required a range of languages could be made available economically PHILOSOPHY This thesis presents the results of re applving table driven methods to the construction of a complete translator scheme in which the actions of each pass are enco
92. ulated by the programmer is as close to the actual machine as prudence and security q of design permit We thus present the KHAR system first in terms of such a language the first of the ANL series AML 1 AML 1 is in fact a context free language since its specification calls for no semantic checking Thus it is ideal for presenting the basic features of KHAR before considering a language of the first type PL 0 which has been described in CHIRTHb The cost of implementing a new language is high since the state of the art is to use the error recovery technique proposed by Amman CAMNANJ This is used in both the Vrije and Belfast compilers for PASCAL Study of the presentation of the PL Q compiler in CWIRTHDI 1 3 reveals the amount of effort required to code the compiler for this minimal language on an interpreted abstract machine designed for the language An alternative approach to the use of a conventional compiler is to use a macroprocessor as discussed in LCTANENBAUMaJ One undesirable feature is that the syntax of the Language may have to be altered to make the task of writing the NL 1 macros possible CBROWNa CBROWNbJ an influence which appears in the design of AML 1 CJENKINSI The addition of any type checking and any attempt at error recovery complicates the task enormously As a result it becomes comparable to that of writing a conventional compiler Table driven compiler techniques are not a recent or new
93. user can find out the exact way in which a sensitivity is handled CHAPTER 2 APPROACH The objective of this work is to construct a flexible and portable translator system for a variety of high Level programming languages to be implemented on small computers We also intend to generate this system in such a way to be able to use it for real time applications and to be highly configurable both in terms of its compile time environment and in terms of the object machine for which it compiles any particular language Regarding the efficiency of the intended translator system we are more concerned about the space requirement of any compiler produced rather than its run time demand Some of the programs in this system are executed only once for each language implemented Their task is to create a few files before any source program can be run These are permanent files as long as no change is made to the language The basic approach to this work is that of Kernighan and PlaugerCKERNIGHAN a system is constructed of a sequence of sub processes each of which consumes as input the output of its predecessor if any 2 1 PRINCIPLES OF THE APPROACH OF KHAR In our approach we aim to break the compiling process into as many separate processes as possible Ideally every separable task is to be carried out by a process with one input and one output stream We aim to reduce the maximum memory requirement in any pass of th
94. ute is followed This does not result in a match as the first expected symbol is BYTES so the next expected state is accessed The Loop is repeated and results in a match as this symbol is BYTE The next state is not exit so the next entry is used to indicate state 3 level is set to zero showing that a director symbol of matrix 98 has been found and the next symbol work 1 is read The loop is repeated resulting in a match with IDENT and the accessing of state 4 The next symbol is now ENDDATA The loop is repeated with no match since BYTE is expected The next symbol in state 3 is therefore accessed which is ENDDATA so that si the next loop a match occurs But the next state is EXIT so exit becomes TRUE and the next symbol MAIN is read The IF NOT end of program encountered on leaving the inner WHILE loop has its THEN clause obeyed so the point of departure is recovered from the stack which is then popped The next state entry associated with this entry point to 98 is examined and is 12 Access is made to the first expected symbol in this state DATA Since the entry point to 99 is on the stack and end of program is FALSE the outer WHILE loop is repeated the inner WHILE is entered and a match made with MAIN in state 12 The state 23 is accessed and the next symbol read CODE On looping the expected symbol is found to be a matrix number 4 14 96 the departure point to 98 is stacked level increased by one and the
95. valid elements of the current state and an error occurs As soon as an error occurs scanning of the input text is no longer controlled by the syntax graph alone The error actions accessed by placing verbs in the error action entry in the table direct the Loana to take the error action chosen by the language designer The errors are caused by unexpected missing or wrongly spelt symbols A good compiler should find all errors in a source program and correct as many of them as possible to reduce the number of submissions ofa job before it is finally completed for other errors which the compiler can not correct it should be able to determine how to continue the analysis when these errors occur For this process the term Error Recovery is used In this system we have one general action and some special actions which we use as the error action part of states in transition 4 8 matrices The error actions can be general for all the different states of all the matrices or different error actions can be added to each state In the latter case usually the recovery is quicker than the first case Each state terminates with an error action part which also serves as the end of state marker That is we check the source symbol with valid elements from the start of the state and if it does not match we check it with the next valid element in the table and carry on until we reach the error action part Then we know the symbol h
96. w is the contary that syntax encompasses precisely those features of a language that can be defined by a CFG and checked by a context free parser In KHAR we deal with context sensitivities by including in KHAR a Limited set of semantic and code emitting verbs which can be placed where required in the linearized form of the syntax and then using semantic graphs which are syntax graphs augmented with semantic actions LCORDVI to define the behaviour of KHAR so as to produce a pass of the compiler which can deal with a particular context sensitivity The advantages of this approach seem to us to be that KHAR need use only one stack of integers to handle semantics as contrasted to the algorithm presented in CWATTb which requires the stack to handle contexts expressed as sets and further that specific sensitivities are described graphically and individually to the user of the language Again as we shall see this approach produces a much 2 9 simpler semantic mechanism than those of CCORDY This graphical approach contrasts with the formalisms of two level grammars and extended affix grammars but of course is unlikely to bear the burden of the proof of correctness which is the distinguishino property of the latter WATTa However the presentation in CEOCHMANI of a compiler writer shows the advantages of the separated graphical presentation which results in KHAR Further we observe here that designer and imple
97. what code should be associated with any symbol the following steps are to be considered 1 a check is made to see if the first character is alphabetic Two possibilities may occur F 3 14 a the first character is alphabetic in which case we form a tentative idea that the symbol may be in any of the first four groups in CNF b The first character is not alphabetic in that case the symbol falls in one of the other two groups i e group with fixed code 5 and 6 If case a happens all four groups may be searched one after the other if required until such times that the symbol is matched otherwise the symbol is treated as a user s specified identifier Although a similar procedure would be undertaken in case b above by considering the length in characters of the source symbol first and one of the two groups can be skipped right in the beginning jhile the matching of symbols is continued the relevant code information from case b is continuously updated so that by the time the search is completed the appropriate code is also determined In this way the digital coding of the symbols is accomplished automatically and can very easily be altered Any alteration in the base code will be accounted for automatically in determining the subsequent codes INTERNAL FORM OF THE SOURCE PROGRAM cont As we mentioned in previous sections standard names and special symbols will be coded as 3 digit integers
98. y be accessed CHECK just checks the argument if it is not zero the first set of actions is carried out otherwise the second This action removes items from the stack POP by itself removes one item POP 0 removes all items and POP N Ee removes n items PUSH This action places one item on the stack PUSH RG puts the contents of RG on the stack PUSH CSS the current source symbol PUSH LABEL the current value of LABEL PUSH LEVEL the value of LEVEL and PUSH alone the value of the following symbol an integer value which is either an integer as vritten or the index to an encoded item SET This action alters the value of the working register RG SET RG pops a value off the stack into RG SET CSS places the current sorce symbol into RG SET LABEL and SET LEVEL the values of LABEL and LEVEL while SET n adds n to RG and SET n subtracts n from RG SET alone as for PUSH places the value of the following symbol in RG SEMANTIC CHECKING IN PL O Semantic checking in PLO is done in three passes In the first pass we concentrate on CONST part in the second pass on VAR part and in the third pass on procedure calls The Linearized SL encoding is given in the supporting material LISTING OF THE KHAR SYSTEM PASS 1 syntactic processing This pass checks the syntax of PL O and outputs expressions in 5 5 postfix notation We do not show the syntax diagrams seperately as they are re
99. y needs a dummy action but it is of no use in state 1 The correct action is given in chapter 5 VALID ELEMENTS OF A TRANSITION MATRIX The valid elements appearing on the top of columns in transition matrices fall in one of the following groups 1 language keywords these elements appear exactly as they are and would be coded automatically by a program using the Language keyword file 2 Aoneeratnals or transition matrices these are already coded to matrix numbers 99 to 50 as mentioned before 3 identifier i 4 constant string 5 integer 6 nil symbol 7 anything symbol Po For groups 3 to 7 above we put the fixed code obtained from the table of fixed codes chapter 3 into the matrix If we have fixed code 26 anvthing svmbol in a transition matrix this is placed in the rightmost of the used entries in a row so that it is encountered after any specific symbols expected in that state CHECKING TECHNIQUE USED IN KHAR For each language we have one or more matrices of these one is the main matrix matrix 99 The matrices are used by KHAR to check the syntax of any source program written in the corresponding programming language For each matrix we have one entry point the first state of that matrix but there may be an exit from any point in any state of that matrix Associated with any valid element terminal or nonterminal within any state there is a pointer to another state of the same matrix
100. yntax directed parsing techniques GRIES has facilitated the construction of efficient deterministic parsers from such syntax definitions He remarks further that CFGs are deficient in two respects Firstly they are incapable of defining context sensitive syntax features Secondly they provide no explicit means of Linking semantics to syntax One approach to this problem has frequently been adopted in translators that is a set of semantic routines is provided and names of semantic routines are inserted in the RHS of production rules an approach conventionally associated with top down parsing The KHAR system uses this approach KHAR itself being a table driven recogniser of a CFG which may have verbs placed at appropriate points of the grammar Our work relies heavily on the use of the syntax graph as in CWIRTHb and the semantic araph as introduced in CCORDY However we have minimised the number of verbs actions required and simplified the internal structure of the semantic mechanism of KHAR by dealing with semantics context sensitivities one aspect at a time This simplification arises from the multi pass philosophy 1 6 adopted throughout the work The meaning and Anpiications of context sensitivities are defined in terms of a semantic graph with the la actions added at the appropriate points This has the advantage that the designer is constrained to describe these sensitivities one at a time and thus the

- Theses

Contents

Download Pdf Manuals

Related Search

Related Contents